docs.diffbot.com got a HUGE makeover recently. We've migrated over 250 pages from 3 separate documentation sites and organized multiple API specifications into the same view. Our favorite new feature is the ability to test APIs directly in the docs. Enter some input parameters and submit to get a response immediately. Then copy the request code into your application. Try it out with the DQL Search API.

We've also added some brand new pages on often asked topics:
• A complete Diffbot product overview.
• An explainer on how Credits work at Diffbot.
• An authentication section with its own endpoint.
• A subscribable changelog integrated into the docs.

We've added a new Organization attribute "diffbotClassification" that expands industry classification to three or more levels, e.g.

"diffbotClassification": [
  {
    "level": 3,
    "isPrimary": true,
    "name": "Display Technology Companies",
    "diffbotUri": "https://diffbot.com/entity/IC_qvY0Oloiyj"
  },
  {
    "level": 2,
    "isPrimary": true,
    "name": "Computer Hardware Companies",
    "diffbotUri": "https://diffbot.com/entity/IC_D6llNR8xOo"
  },
  {
    "level": 1,
    "isPrimary": true,
    "name": "Software Companies",
    "diffbotUri": "https://diffbot.com/entity/IC_H04NbzO6L8"
  },
  ...
]

For NAICS and SIC classifications, we nominate a "primary" classification tier in the array of matching classification codes/labels. Documentation of these fields can be found in the Ontology section of docs.diffbot.com.

We are slowly rolling out support for 'technographics' as an attribute of Organizations. Ultimately, there will be many sources of this data and coverage across all Organizations in the graph. To start, more than 8.5Mil companies include some level of technographic data. Below is an example of how the technology is represented in the default JSON output (excerpted from the IBM entity).

"technographics": [
  {
    "technology": {
      "recordId": "EPdsrDmLiMQCskvBLp_dloQ@2275",
      "name": "React",
      "websiteUris": [
        "reactjs.org"
      ],
      "surfaceForm": "React",
      "position": "companyTechnographicsTechnology",
      "type": "DiffbotEntity"
    },
    "categories": [
    	"JavaScript frameworks"
    ]
  },
  {
  "technology": {
    "recordId": "EPdsrDmLiMQCskvBLp_dloQ@2276",
    "name": "TrustArc",
    "websiteUris": [
    	"trustarc.com"
    ],
    "surfaceForm": "TrustArc",
    "position": "companyTechnographicsTechnology",
    "type": "DiffbotEntity"
  },
  "categories": [
  	"Cookie compliance"
  ]
  },
  ...
]

To browse organizations with technographic data, try this query:

type:Organization has:technographics.technology.name

We recently improved the accuracy of the estimate populating the 'nbEmployees' attribute for publicly-traded US organizations in the Knowledge Graph by expanding analysis of data from sec.gov reports. 70%+ of the SEC-10k documents now include nbEmployees for each reporting period.

We also added 'secForms' as an attribute of Organizations. Below is a JSON excerpt from IBM featuring a filing from 2019:

"secForms": [
  {
    "formType": "8-K/A",
    "periodOfReport": {
      "str": "d2019-06-30",
      "precision": 3,
      "timestamp": 1561852800000
    },
    "filingDate": {
      "str": "d2019-09-20",
      "precision": 3,
      "timestamp": 1568937600000
    },
    "documentUrl": "https://www.sec.gov/ix?doc=/Archives/edgar/data/51143/000155837019008675/ibm-20190709x8ka.htm",
    "filingUrl": "https://www.sec.gov/Archives/edgar/data/51143/000155837019008675/0001558370-19-008675-index.htm"
  },
  {
    "formType": "10-Q",
    "periodOfReport": {
      "str": "d2019-06-30",
      "precision": 3,
      "timestamp": 1561852800000
    },
    "filingDate": {
      "str": "d2019-07-30",
      "precision": 3,
      "timestamp": 1564444800000
    },
    "documentUrl": "https://www.sec.gov/ix?doc=/Archives/edgar/data/51143/000155837019006560/ibm-20190630x10q.htm",
    "filingUrl": "https://www.sec.gov/Archives/edgar/data/51143/000155837019006560/0001558370-19-006560-index.htm"
  },
  ...
]

We recently added sources to the Knowledge Graph that provide insights into 'service provider <> customer' relationships. The 'customer' attribute is populated for over 50k companies. Here is an example of a customer of IBM's represented in JSON:

"customers": [
  {
    "summary": "Major airline of the United States",
    "image": "https://kg.diffbot.com/image/api/get?fetch=yes&url=g%3Cj7P0St0DnBJf.x0KwLZrUn.%5B%3CR0Aa4Hh%3B%5Bv738ZqOr7U%3FivHo%3Ef%5B1_l%2F%5B9%40%3BAk%7BAt%3C%60e%5Bt%3DH%2FDk%7DO%28YpOr%29.oCJ",
    "types": [
      "Organization",
      "Corporation"
    ],
    "name": "American Airlines",
    "diffbotUri": "http://diffbot.com/entity/ECm88VZ0TMyq8glAXFBMYaA",
    "targetDiffbotUri": "http://diffbot.com/entity/ECm88VZ0TMyq8glAXFBMYaA",
    "targetDiffbotId": "ECm88VZ0TMyq8glAXFBMYaA",
    "surfaceForm": "American Airlines",
    "type": "Corporation"
  },
  {
    "summary": "Telecommunications company",
    "image": "https://kg.diffbot.com/image/api/get?fetch=yes&url=g%3Cj7P0St0DnBJf.x0KwLZrUn.%5B%3CR0Aa4Hh%3B%5Bv738ZqOr7U%3FDvsR%3EgB%7BYr3izczY2.K%7C%3E",
    "types": [
    "Organization",
    "Corporation"
    ],
    "name": "BT Italia",
    "diffbotUri": "http://diffbot.com/entity/EZK7tKHZCNGCUhTjGW7SaKg",
    "targetDiffbotUri": "http://diffbot.com/entity/EZK7tKHZCNGCUhTjGW7SaKg",
    "targetDiffbotId": "EZK7tKHZCNGCUhTjGW7SaKg",
    "surfaceForm": "BT Italia",
    "type": "Corporation"
  },
  ...
]

We will continue to scale up coverage over time.

List API - NEW!

by Kris Negulescu

Today we've added The List API to the Diffbot API suite of extraction APIs. The List API automatically extracts data from any single web page that contains a primary list of items, such as news index pages, product listings pages, and search engine results pages. It is also embedded under the Diffbot Analyze API which will now attempt to identify and extract pages that match the criteria for a List page. Please refer to the List API documentation for more information.

similarTo 'id1 and ID2'

by Kris Negulescu

We've added the ability to query on look-alikes using two or more Diffbot Org IDs as inputs.

similarTo(id:or(id1, id2, ...)): e.g. 'Organizations similar to Target and Walmart'

type:Organization similarTo(id:or("ExADb18D6MAmunRrlVELe8A", "EOU1WEvHYN6K83Etm91H9fQ"))

We have introduced a new model to predict revenue for private companies. Almost every company in the KG (over 243M) now has revenue either extracted from the web or estimated with this model.