I’ve been spinning some cycles lately looking for free data. Why? Here’s my thesis: As our use of the cloud evolves, we will come to understand that, to do powerful computational things, we not only don’t need to own massive amounts of IT infrastructure, we don’t even have to own the data. The cloud will offer us both. Think of it as big free data.
At the dawn of modern computing history, one needed millions of dollars to buy big iron to do serious computational things. As time has gone on however, the barrier to entry has steadily declined to the point where we can now access big computing from our living rooms and rent its power with a credit card. And now we don’t even have to own the data to produce serious computational results. Large and powerful data sets that are “open” can increasingly be used for free via the Internet. Here are a few of many big free data sites:
Open Knowledge Foundation–contains links to many large and open data sets and includes a registry of free textbooks. OKF’s mission is to build tools and communities to create, use, and share open knowledge.
The Data Hub–currently has links to 2,292 datasets available for download.
NASA EOSDIS–NASA’s Earth Observing System/Data and Information System. The EOSDIS site offers earth science data gathered by satellites, aircraft, field measurements, and other sources.
And, if you don’t know where to look for the data you need or want to fast-track your access to it, try these sites:
InfoChimps–a place to find, sell and share data with others. Data is available in two forms: 1. Data sets that are available for download–some free, some for sale. 2. Data APIs available with a subscription and an API key. One can access very large or frequently updated data sets.
ScraperWiki–an online tool that simplifies the process of aggregating data and acquiring available in many forms (Web pages, PDFs, spreadsheets, reports) on the Web.
Finally, if you’re at all unsure about whether or how the data you want to use is “open,” you can consult the Open Knowledge Foundation or Open Data Commons.
Why are people now making large and powerful data sets freely available on a global basis? What’s driving the people working in this space is generally not a profit motive although they understand the need to have a funding source in order to continue to be viable. Rather, they are energized more by the belief that by making data easily accessible and converting it into an easily usable form, they are advancing civilization. In the words of the organizers of the OKF, “we promote open knowledge because of its potential to deliver far-reaching societal benefits.”
Can big free data actually make the world a better place to live? Last week, I came across an amazing article published to the online version of Nature. By Steven Pinker, a Harvard University psychology professor, the article is entitled “Decline of violence: Taming the devil within us.” The article is adapted from his new book, “The Better Angels of Our Nature: The Decline of Violence in History and its Causes,” published by Allen Lane/Penguin Books. Pinker’s thesis is that mankind, on average is getting smarter. And, as we get smarter, we also get kinder and gentler. His book presents reams of statistical data to prove his point. He even looks into old fairy tales (they were called grim, right?) for historical patterns of brutality.
Yes, I admit it’s a bit of a leap to go from big free data to world peace. But our power as individuals to acquire knowledge, using nothing more than a laptop and an Internet connection, grows with each passing day. Pinker might say that knowledge is power…and peace.