Use this optional param to return the extracted data as llm-ready markdown, i.e. '&mode=llm', including interactive elements. Try it here: Website example or Github example.
KG DATA CHANGE NOTIFICATION - Organization.naceClassification
We will be updating Organization.naceClassification to NACE Rev. 2.1 in build v437 of the Diffbot Knowledge Graph, targeted to go live in about two weeks. Please read on for more details.
Ordinarily, we take extraordinary measures to avoid breaking changes in the Diffbot Knowledge Graph ontologies. However, in some cases, there is no benefit in retaining a prior version of the data, so we replace an existing attribute with a new data format. The Organization.naceClassification field is one such case. The current version of the NACE codes in the KG lacks level, isPrimary, and ancestor codes. And, some of the codes are no longer valid in the latest NACE Rev. 2.1 version.
In Rev 2.1 of the NACE codes:
NACE codes are no longer strictly 4-digit numbers.
NACE codes are structured into: Sections (letters A–V, level 1) →
 Divisions (2 digits, level 2) → Groups (3 digits with dot, level 3) →
 Classes (4 digits with dot, level 4).
Codes are unique. For example, both 28 and 29 share the same parent C, but it is not repeated after 28 because it already appears earlier in the primary chain.
There is at most one primary code per level.
Primary codes are listed before non-primaries.
Specific codes (e.g., 29.10) are listed before broader ones (e.g., 29.1).
For a comparison of the existing code format versus the new Rev 2.1 format, see below.
CURRENT DATA FORMAT: NACE codes - Organization.naceClassification
Volkswagen's current NACE classification in the KG appears as the following
[
{
"code": "2910",
"isPrimary": false,
"name": "Manufacture of motor vehicles"
},
{
"code": "7022",
"isPrimary": false,
"name": "Business and other management consultancy activities"
},
{
"code": "7021",
"isPrimary": false,
"name": "Public relations and communication activities"
}
]
Issues with this data:
Missing level information
All codes are marked as non-primary
Parent codes are missing
Codes 7022 and 7021 are no longer valid in the Rev. 2.1 version of the codes
Volkswagen should not be classified under those industries in 7022 and 7021.
NEW DATA FORMAT: NACE Rev 2.1 Codes
When the updates deploy, Volkswagen's Organization.naceClassification NACE codes will look like this:
[
{
"code": "29.10",
"level": 4,
"isPrimary": true,
"name": "Manufacture of motor vehicles",
"version": "Rev 2.1"
},
{
"code": "29.1",
"level": 3,
"isPrimary": true,
"name": "Manufacture of motor vehicles",
"version": "Rev 2.1"
},
{
"code": "29",
"level": 2,
"isPrimary": true,
"name": "Manufacture of motor vehicles, trailers and semi-trailers",
"version": "Rev 2.1"
},
{
"code": "C",
"level": 1,
"isPrimary": true,
"name": "MANUFACTURING",
"version": "Rev 2.1"
},
{
"code": "28.11",
"level": 4,
"isPrimary": false,
"name": "Manufacture of engines and turbines, except aircraft, vehicle and cycle engines",
"version": "Rev 2.1"
},
{
"code": "28.1",
"level": 3,
"isPrimary": false,
"name": "Manufacture of general-purpose machinery",
"version": "Rev 2.1"
},
{
"code": "28",
"level": 2,
"isPrimary": false,
"name": "Manufacture of machinery and equipment n.e.c.",
"version": "Rev 2.1"
}
]
diffbot-small-xl is finally live on https://lmarena.ai/! Check out: https://lmarena.ai/leaderboard/search. LMArena has open-sourced the largest repository of organic human preferences on generative models in the world. These datasets are free and open to access.
You can now label and disable tokens from the Dashboard. And, be sure to use a child token in production environments. They're much easier to replace if compromised.
You can now find us on Postman! We're starting with Extract API, and moving quickly to get the rest of our APIs on Postman as well.
Postman is an API testing platform that eliminates the need to manually write cURL. The API testing UI is quite similar to what we have in the docs, with even more features to setup your environment, testing scripts, and more.
Note that our primary documentation platform will continue to live on docs.diffbot.com. Postman is an extension of our docs presence to make it easier for Postman users to test Diffbot APIs on their preferred platform.
Investment Transactions are now searchable on LeadGraph! This makes it possible to:
⁃ Stay on top of recent funding rounds
⁃ Find investors that have invested in companies with particular industries, keywords, company size, etc.
⁃ See funding insights for investors, industries, funding rounds, and more....
We now offer additional troubleshooting tools in the Extract UI: a DOM view of the rendered page for all data types and an HTML button for article extractions.
You can now use the DQL search API to get company reports for Organizations in your database, at scale, including 10-Ks, 10-Qs, 8-K, etc. To date, we have exported ~3M SEC EDGAR reports . And, we have started to download reports from Forbes Global 2000 company websites as well with ~400K reports downloaded so far to date . This data is still a work in progress so review outputs carefully, e.g. we are working to improve report titles extracted from PDFs.
When exporting data from collections via DQL, you have always had the option of specifying ONLY the fields you want to be returned in the JSON output by using the '&filter=' param (i.e. &filter=id%20name%20homepageUri added to a query like this).
But this approach can be unwieldy if you have a long list of attributes to include or if you only want to exclude a few attributes per entity in the output.
Now, instead of specifying the fields you want to include, you can exclude fields you do not want returned when exporting data by using the &filterExclude= param (i.e. &filterExclude=subsidiaries%20technologies%20customers)