Getting Started with Knowledge Graph

Access a graph database of over 10 billion entities (news organizations, people, and more) crawled and structured from all over the public web.

The Diffbot Knowledge Graph is a self-updating graph database of the public web.

Instead of websites, the Knowledge Graph represents web data in the form of real entities, like articles, organizations, and people. Entities hold meaning through their attributes. For example, articles are articles because they have a title and usually, an author (though not always!). Organizations tend to have a location of business, and a headcount of employees (nbEmployees).

Each entity is the result of crawling, structuring, and processing billions of pages on the public web, generating trillions of facts, and linking those facts to each entity in the Knowledge Graph.

A single entity is generally a fusion of data from several different websites.

Here's a peek at the origins field of the organization entity record of Apple, the American technology company.

"origins": [
  "seekingalpha.com/article/4501445-apple-losses-mount-in-a-big-money-pit#efficiens",
  "appleinsider.com/articles/22/02/28/qualcomm-promises-lossless-bluetooth-audio-with-new-chipset#efficiens",
  "lmd.lk/schoolchildren-in-china-work-overnight-to-produce-amazon-alexa-devices#efficiens",
  "guardian.co.uk/media/2010/nov/18/apple-iad-mobile-advertising#efficiens",
  "theguardian.com/world/2022/mar/13/china-shuts-down-business-centres-in-bid-to-halt-covid-outbreak#efficiens",
  ...
]

This list is actually just 5 of the whopping 560 total origins from which all facts for Apple in the Knowledge Graph is derived from. Learn more about where and how data in the KG is sourced.

Another attribute of entities in the Knowledge Graph is that they are linked with each other. Whereas a website simply links to another site by reference, links between entities in the Knowledge Graph have meaning.

Apple's entity in the Knowledge Graph is linked to hundreds of thousands of people. The meaning behind this link can be described as employer/employee.

What does this all mean? At its simplest interpretation, the Diffbot Knowledge Graph can be

  • A company database that never goes out of date
  • An RSS feed for every press release, blog, newspaper, or magazine on the web

With a little more imagination, the Knowledge Graph can be (and has been) a powerful web data source powering

  • M&A target research
  • Sales intelligence and lead sourcing
  • Market monitoring

How do I access the Knowledge Graph?

There are two methods to access data in the Knowledge Graph. You can search it using a structured query language, or you can upload your existing data and enhance it with additional data from the KG.

Search

Search is the most widely applicable interface for Diffbot Knowledge Graph. With just a few strokes of the keyboard, you will be well on your way to gathering the data you need for your application. Start here to learn more about Search.

Enhance

Enhance works differently. No query language is required. Simply upload a dataset containing some identifying parameters, such as name or url. Enhance will match these values for each record to a corresponding entity in the Knowledge Graph and return all the data it has on that entity. Start here to learn more about Enhance.