Main

October 20, 2017

CitiBike share bikeshare data

Rides, when, where. Citibike shares bikeshare data for NYC.

March 29, 2017

Adsense for the masses

Major publishers admit to 'advertiser-friendly' skew:

If you want to take something good and make it less good, there's no more reliable method than to chop it up into tiny bits and then recombine them. A door made of particleboard isn't as strong as one made of solid pine. An MP3 of a song lacks the sonic richness of a high-fidelity record. A hamburger may or may not be as delicious as a rib-eye, depending on your personal taste, but it's definitely likelier to contain fecal bacteria and pink slime.

The global advertising industry is currently experiencing its own version of food poisoning from tainted ground beef. Johnson & Johnson, Verizon, and AT&T are among the giant marketers that have stopped buying ad space on Google's ad network and on YouTube in response to reports of ads appearing alongside hate speech, ISIS recruiting propaganda, and other objectionable content. Racing to contain the boycott, Google issued an apology on Tuesday and said it is taking steps to ensure greater "brand safety" in the future. Those steps include "taking a tougher stance on hateful, offensive and derogatory content," changing the default settings for ad campaigns, and giving marketers new controls allowing them to exclude specific websites or types of content from their campaigns.

Continue reading "Adsense for the masses" »

January 30, 2017

Propublica: breaking the black box what Facebook knows about you

Propublica's breaking the black box what Facebook knows about you.

August 28, 2016

Canadian long-form census

Any Canadian who finds a long-form census on their doorstep in 2016 but fails to complete it could be hit with a fine of as much as $500 and a jail term of up to three months. The law requiring those penalties for non-compliance was never changed - it simply did not apply to the 2011 national household survey.

But Mr. Bains and Mr. Duclos either did not know, or did not want to discuss, what consequences would befall those who do not co-operate.

Reporters asked the ministers seven times to say whether there would be penalties for non-compliance and, each time, the ministers responded by discussing the importance of persuading Canadians to take part - or the fact that most people take it upon themselves to complete the forms. "If you speak to Canadians and you get them engaged in the process, they will fill out the information, and that's what we are focusing on because we need good, reliable data," Mr. Bains said.

Ather Akbari, an economics professor at St. Mary's University in Halifax, relies on census data dating back to 1981 for his studies of immigrants in the labour market. He said restoring the mandatory questionnaire "will help to set up evidence-based policy." And in his own work, he added, "it will help me draw meaningful comparisons from the past."

Alain BĂ©langer, Statscan's former assistant director, said the findings of the voluntary household survey distributed in 2011 were clearly skewed. If the government had opted for a voluntary survey again in 2016, he said, the results would have been even worse because bad data would have built upon bad data.

-- Gloria Galloway

August 19, 2016

Census.gov on people and wealth

The US Census is the original big data project.

census.gov/people/wealth

Pew Research on some demographers are studying now: multiracial children, gender identity.

Public Use Microdata Area: PUMA1, based on the American Community Survey (ACS) .

July 31, 2016

Future is mining cloud data

The next big competition in cloud computing also involves artificial intelligence, fed by loads of data. Soon, Mr. Kurian said, Oracle will offer applications that draw from what it knows about the people whose actions are recorded in Oracle databases. The company has anonymized data from 1,500 companies, including three billion consumers and 400 million business profiles, representing $3 trillion in consumer purchases.

"Most of the world's data is already inside Oracle databases," said Thomas Kurian, , Oracle's president of product development

That's the kind of hold on people's information that perhaps only Facebook can match. But Mark Zuckerberg doesn't sell business software. At least, not yet.

May 24, 2016

Describe young people as a broad collective or use data ?

What's most bizarre about efforts to describe young people as a broad collective is that technology has rendered such generalizations mostly unnecessary. Thanks to social media, smartphones and reams of searchable data, companies can now track their customers and workers in far more precise ways than simply noting their age cohort. They have your purchase and employment histories, your social media musings, your educational history, your credit report. Companies can break you down analytically, psychographically, financially and in just about every other way short of physically.

Joan Kuhl, one of the aforementioned army of millennial consultants, told me that one of her primary jobs these days was to undo companies' preconceived notions about millennials. (Oh, I should do that, too: It's not true that millennials don't read the news, as I implied above. Hi, millennials -- thanks for reading!)

"It's unbelievable the stories we hear," said Ms. Kuhl, 36, who runs Why Millennials Matter. "They all have stories about managers underestimating them, or recruiters having an impression that they can't live up to the demands of the job, or that they were a flight risk. People are perceiving them as the stereotype of their generation."

May 20, 2016

Identity recognition via social media pictures and sound

Biometric information like face and voice recognition has become a huge and potentially costly area as tech companies plow tons of money and powerful artificial intelligence technologies into photo and video applications that can identify people with an accuracy that would have seemed like science fiction only a few years ago.

Laws curbing these programs can be a huge burden for tech companies, because following the laws could mean slowing the arrival of new features or creating a patchwork of features that could be turned off and on in various states.

While the state violations are often small -- the Illinois act gives citizens the right to sue for up to $5,000 per violation -- potential liability can run to billions or even trillions of dollars once multiplied across hundreds of millions of individual users. That has made privacy law a lucrative new area for class-action lawyers who can extract multimillion-dollar settlements just by bringing a case.

Nevertheless, questions of how much tech companies should be allowed to do without notifying their users will multiply, especially as people adopt more live video and voice technologies that have already made it possible for tech companies to identify people who might not even use their services.

"It's basically a company creating the ability to capture someone's identity when they might not want to reveal their identity," Mr. Marc Rotenberg said.

The Biometric Information Privacy Act was passed in Illinois in 2008 and has quickly become the bane of social media companies. Under the current law, companies have to get a user's consent before turning on features that scan and store faces for identification.

May 13, 2016

Evaluating men and women on different traits, Rate My Professor

Benjamin Schmidt, a professor at Northeastern University, created a searchable database of roughly 14 million reviews from the Rate My Professor site.

Among the words more likely to be used to describe men: smart, idiot, interesting, boring, cool, creepy. And for women: sweet, shrill, warm, cold, beautiful, evil. "Funny" and "corny" were also used more often to describe men, while "organized" and "disorganized" showed up more for women.

In short, Schmidt says, men are more likely to be judged on an intelligence scale, while women are more likely to be judged on a nurturing scale.

"We're evaluating men and women on different traits or having different expectations for individuals who are doing the same job," says Erin Davis, who teaches gender studies at Cornell College.

April 11, 2016

Track business' sales to aid investors

Second Measure by Mike Babineau and Lillian Chou tracks business' sales for investors.


Second Measure takes billions of anonymized credit card transactions and analyzes them so investors can see where consumers are voting with their dollars before a company's quarterly earnings come out.

More in data.

Continue reading "Track business' sales to aid investors " »

April 7, 2016

Datausa government data


Hal R. Varian, chief economist of Google, who has no connection to Data USA, called the site "very informative and aesthetically pleasing." The fact the government is making so much data publicly available, he added, is fueling creative work like Data USA.

Data USA embodies an approach to data analysis that will most likely become increasingly common, said Kris Hammond, a computer science professor at Northwestern University. The site makes assumptions about its users and programs those assumptions into its software, he said.

"It is driven by the idea that we can actually figure out what a user is going to want to know when they are looking at a data set," Mr. Hammond said.

Data scientists, he said, often bristle when such limitations are put into the tools they use. But they are the data world's power users, and power users are a limited market, said Mr. Hammond, who is also chief scientist at Narrative Science, a start-up that makes software to interpret data.