Google has begun a campaign to organize online information for the scientific community, launching Google Dataset Search, a search engine that allows scientists, journalists, academics and the public to access sets of data from institutions such as universities and governments that publish their data online.
What is Google Dataset Search?
Created partially as a companion to Google Scholar, their search engine for academic reports and studies, Google Dataset Search will index metadata tags that data-publishing institutions will have to provide when they upload their data online and combine it with Google’s Knowledge Graph, to provide search results that are relevant for Dataset Search users. The Beta version launched last week is available in multiple languages, with support for further languages on the way according to a Google blog post from last week.
This initial release will not immediately contain all the datasets available on the internet – Google has said that for now, Dataset Search will focus on data for Photo Editing Services environmental and social sciences, government data, and the sets of data available from news sources like ProPublica, with plans to expand in the future given that the service becomes popular and scientists and institutions start to release and label their data in a way that makes it accessible to Dataset Search users.
What is Google’s aim with Dataset Search?
In conversation with The Verge, Google AI research scientist Natasha Noy said that the aim with Dataset Search was to unify the wide range of different dataset repositories online, as currently scientific domains each have their own preferred repositories and don’t have a centralized resource that they can turn to for access to datasets from other governments, authorities and institutions.
Google’s expansion into datasets isn’t unprecedented, as they recently made it much easier to access tabular data in standard Searches. The same metadata is now required for all tabular data and datasets in order to have them appear in search results, and while that initial update was focussed on making it easier for news organisations and data journalists to access data, this new search engine is for a much broader audience, and will make it easy for any internet users to find and learn about datasets without much difficulty.