/* ---- Google Analytics Code Below */

Thursday, September 06, 2018

Google Announces Dataset Search

Looks useful, would have been useful in a number of cases.  Will be interesting to see how well the search works, note the mention that it is similar to Google Scholar.   Also an invitation to publish your own data indexed in this way.  Could such a system be used to usefully scope what data exists and does not?   Determine what a new data asset might be worth?

Making it Easier to Discover Datasets    By Natasha Noy   Research Scientist, Google AI

In today's world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.

There's a sea of open research data available on the web, but it can be time-consuming to sift through those sites to get at the data -- and it's not always presented in an easy-to-parse format. Google hopes it can make that information more accessible to scientists, journalists and plain old data junkies with its new Dataset Search feature. The tool provides more direct access to data presented in an open standard that makes it clear who created the info, how it was collected and how you're allowed to use it. You could not only track down climate data for a report, but make sure that it's relevant and legal to use.

Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page. To create Dataset search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include  salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. Our approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem. ... " 

Also in Engadget.

No comments: