We pushed several breaking changes to Article data, Job Posts, Discussion Threads, and Product Offers in the KG.

FIRST:

The 'types' array was sorted with the most specific type first. We reversed the order to make the array consistent with 'types' in Organization, Person, and Place entities in the graph.

SECOND:
The 'types' array currently always contains GlobalIndexDiffbotEntity for Articles/Job Posts/Discussions/Product Offers. We have removed that unnecessary 'types' designation, i.e.

"types": [
"Article",
"GlobalIndexDiffbotEntity"
]

became

"types": [
"Article"
]

similar to

"types": [
"Organization",
"Corporation",
"Company"
]

FINALLY:
Some 'types' arrays previously contained GlobalIndexComplexTypeWrapper as well. We removed that unnecessary 'types' designation, i.e.

"types": [
"Discussion",
"GlobalIndexComplexTypeWrapper",
"GlobalIndexDiffbotEntity"
]

became

"types": [
"Discussion"
]

In the month of March, we rolled to production two key service layer optimizations to the Diffbot Knowledge Graph architecture. Both radically reduce response time to queries and enrichment requests. The first is called the KG Engine, a look-up service that enables us to improve the performance and scalability of both Enhance and DQL and to update the graph between builds for Organization and Person data. The second is a new search indexing layer that aggregates all Articles in the KG that were crawled &/or published in the last six months which reduces response times when querying recent news.

The Diffbot Knowledge Graph now supports NAICS 2022 Classifications (labels and codes) in addition to NAICS 2017 Classifications. See NAICS docs for more details.

The following is a list of arguments you can add to a Diffbot API call in order to reduce processing time when calling the Article or Analyze APIs:

  • Turn off natural language processing to provide sentiment analysis (natural language processing adds the most time to an on-demand extraction call)
    &nl=false
    
  • Turn off entity tagging and linking in text
    &notags
    
  • Many publishers of articles provide text without having to render the page. If you need full HTML outputs for each page, you will not want to use this argument.
    &norender
    

Added Organization.suppliers to the graph, i.e. type:Organization suppliers.name:"Diffbot"

Added support for non-software technologies such as:

Added new organization industries to the graph including:

  • Real Estate > Real Estate Investment Trusts (naics: 525930)
  • Financial Services > Banks > Central Banks (naics: 521110)
  • Financial Services > Credit Unions (naics: 522130)
  • Financial Services > Money Exchange Providers (to include Foreign exchange companies, and Online remittance providers)
  • Services > Laundry Companies (naics: 812320)
  • Food > Dairy Companies (naics: 112120)
  • Food > Cocoa Companies (naics: 311351)
  • Medical Organizations > Cannabis Companies
  • Construction Companies > Landscaping Services (naics: 561730)
  • Environmental Organizations > Recyclable Material Companies (naics: 423930) (we separated Waste Organizations And Recycling Facilities into "Waste Organizations" and "Recyclable Material Companies")
  • Consumer Service Companies > Parking Companies (naics: 812930)
  • Hospitality Companies > Adult Entertainment Clubs
  • Retailers > Vending machine operators (naics: 454210)
  • Retailers > Used Merchandise Retailers (naics: 459510)
  • Educational Organizations > | Educational Institutions > K12 Schools (to check the differences with Schools and clean in case)

Coverage Reports for Bulk Enhance Jobs (CSV)
The Coverage Report is a detailed summary of attribute coverage, per entity, and includes the % coverage per field in the dataset overall. The report is downloadable as a CSV from the Report API: https://kg.diffbot.com/kg/v3/enhance/bulk//. The report information is also included in the Enhance Bulk job Status API response: https://kg.diffbot.com/kg/v3/enhance/bulk//status, and can be specified for inclusion with the parameters filter or exportspec