Tuesday, October 02, 2018

Diffbot : Foundation of Knowledge Graphs

A massive public database and graph.  How might it be integrated with private data? 

In Datanami, a description of the Diffbot Graph
The Graph That Knows the World   By Alex Woodie

Somewhere in a data center in Fremont, California, exists a large computer cluster that’s hoovering up every piece of data it can find from the Web and using machine learning algorithms to find connections among them. It’s arguably the largest known graph database in existence, encompassing 10 billion entities and 10 trillion edges.

No, it’s not some secret government project to catalog the world’s information. In fact, the graph was created and is run by a private company called Diffbot, and in fact you can get access to it for as little as $300 per month.

You can’t accuse Mike Tung, the founder and CEO of Diffbot, of thinking small, or beating around the bush for that matter. During an interview last week, he got right to the point. “The purpose of our company,” he tells Datanami, “is to build the first comprehensive map of all human knowledge.”

That might sound like a crazy thing to do, in 2018, a quarter century after the Web went mainstream, after the first dot-com crash, the rise of Web 2.0, the emergence of e-commerce 3.0, and the forthcoming industry 4.0 wave that’s projected to shake it all lose again. Haven’t we done this already? And isn’t that what Google and Wikipedia are for?

Diffbot CEO and founder Mike Tung graduated from Stanford University with a master’s degree in AI

Not according to Tung, who started work on the Diffbot graph while at Stanford University in 2008 and then started the Diffbot company in 2011. While it’s true that Google and Wikipedia are creating large knowledge graphs, they’re not as useful as one might think, Tung says.

“Our knowledge base is not only larger, deeper and more accurate [than Google’s and Wikipedia’s] but it’s accessible and more useful,” Tung says. “We hope that this is the first step in creating a future where…you have almost infinite access to knowledge.”

AI Crawlers

Tung says that what makes Diffbot unique, apart from its size and public nature, is how it’s assembled. While Google and Wikipedia rely largely on human labor to curate the information that goes into their graphs – and Facebook relies on its 2 billion users to create its knowledge graph —  the Diffbot graph is created automatically  — autonomously, really — through a variety of machine learning techniques, including computer vision, natural language processing (NLP), and others.

The Diffbot knowledge base currently has 10 billion vertices, which correspond to entities, including people, places and things. Connecting those 10 billion entities are 10 trillion edges, which are facts that can be searched through an API or DQL, the SQL-like Diffbot Query Language. ... " 

