Startups mine hyperlocal information for global insights
Big Data companies are learning some new tricks, tapping unusual sources of information and analyzing it more quickly to find unexpected patterns and insights.
The New York Times
SAN FRANCISCO — David Soloff is recruiting an army of “hyperdata” collectors.
The company he co-founded, Premise, created a smartphone application now used by 700 people in 25 developing countries.
Using guidance from Soloff and his co-workers, these people, mostly college students and homemakers, photograph food and goods in public markets.
By analyzing the photos of prices and the placement of everyday items like piles of tomatoes and bottles of shampoo, and by matching that to other data, Premise is building a real-time inflation index to sell to companies and Wall Street traders, who are hungry for insightful data.
“Within five years, I’d like to have 3,000 or 4,000 people doing this,” said Soloff, who is also CEO of Premise. “It’s a useful global inflation monitor, a way of looking at food security, or a way a manufacturer can judge what kind of shelf space he is getting.”
Collecting data from all sorts of odd places — and analyzing it much faster than was possible even a couple of years ago — has become one of the hottest areas of the technology industry.
The idea is simple: With all that processing power and a little creativity, researchers should be able to find novel patterns and relationships among different kinds of information.
For the last few years, insiders have been calling this analysis Big Data.
Now Big Data is evolving, becoming more “hyper” and including all sorts of sources.
Startups such as Premise and ClearStory Data, as well as larger companies like General Electric, are getting into the act.
A picture of a pile of tomatoes in Asia may not lead to a great conclusion, other than how tasty those tomatoes might or might not look.
But connect pictures of food piles around the world to weather forecasts and rainfall totals and you have meaningful information that people like stockbrokers or buyers for grocery chains could use.
And the faster that happens, the better, so people can make smart — and quick — decisions.
“Hyperdata comes to you on the spot, and you can analyze it and act on it on the spot,” said Bernt Wahl, an industry fellow at the Center for Entrepreneurship and Technology at the University of California, Berkeley. “It will be in regular business soon, with everyone predicting and acting the way Amazon instantaneously changes its prices around.”
Standard statistics might project next summer’s ice-cream sales.
The aim of people working on newer Big Data systems is to collect seemingly unconnected information such as today’s heat and cloud cover, and a hometown team’s victory over the weekend, compare that with past weather and sports outcomes, and figure out how much mint-chip ice cream that mothers would buy today.
At least, that is the hope, and there are early signs it could work.
Premise claims to have identified broad national inflation in India months ahead of the government by looking at onion prices in a couple markets.
The photographers working for Premise are recruited by country managers, and they receive 8 to 10 cents a picture.
Premise also gathers time and location information from the phones, plus a few notes on things like whether the market was crowded. The real insight comes from knowing how to mix it all together, quickly.
Price data from the photos gets blended with prices Premise receives from 30,000 websites. The company then builds national inflation indexes and price maps for markets in places such as Kolkata, India; Shanghai; and Rio de Janeiro.
Premise’s subscribers include Wall Street hedge funds and Procter & Gamble, a company known for using lots of data.
None of them would comment for this article. Subscriptions to the service range from $1,500 to more than $15,000 a month, though there is also a version that offers free data to schools and nonprofit groups.
The new Big Data connections are also benefiting from the increasing amount of public information that is available.
According to research from the McKinsey Global Institute, 40 national governments offer data on such matters as population and land use. The U.S. government alone has 90,000 sets of open data.
“There is over $3 trillion of potential benefit from open government economic data, from things like price transparency, competition and benchmarking,” said Michael Chui, one of the authors of the McKinsey report. “Sometimes you have to be careful of the quality, but it is valuable.”
That government data can be matched with sensors on smartphones, jet engines and even bicycle stations that are uploading data from across the physical world into the supercomputers of cloud-computing systems.
Until a few years ago, much government and private data could not be collected particularly fast or well. It was expensive to get and hard to load into computers. As sensor prices have dropped, however, and things like Wi-Fi have enabled connectivity, that has changed.
In the world of computer hardware, in-memory computing, an advance that allows data to be crunched without being stored in a different location, has increased computing speeds immensely. That has allowed for some real-time data crunching.
Traditional data analysis was built on looking at regular information, such as payroll stubs, that could be loaded into the regular rows and columns of a spreadsheet.
With the explosion of the Web, however, companies like Google, Facebook and Yahoo were faced with unprecedented volumes of “unstructured” data, like how people cruised the Web or comments they made to their friends.
New hardware and software have also been created that sharply cut the time it takes to analyze this information, fetching it as fast as an iPhone fetches a song.
Last month, creators of the Spark open-source software, which speeds data analysis by 100 times compared with existing systems, received $14 million to start a company that would offer a commercial version of that software.
ClearStory Data, a startup in Palo Alto, Calif., has introduced a product that can look at data on the fly from various sources.
With ClearStory, data on movie-ticket sales, for example, might be mixed with information on weather, even Twitter messages, and presented as a shifting bar chart or a map, depending on what the customer is trying to figure out.
There is even a “data you might like” feature, which suggests new sources of information.
The trick, said Sharmila Shahani-Mulligan, ClearStory’s co-founder and chief executive, was developing a way to quickly and accurately find all of the data sources available. Another was figuring out how to present data on, say, typical weather in a community, in a way that was useful.
“That way,” Shahani-Mulligan said, “a coffee shop can tell if customers will drink Red Bull or hot chocolate.”