Diffbot API now supports link extraction from an xml sitemap using the Analyze API and the 'fields=allLinks' param. Sample API Call: https://api.diffbot.com/v3/analyze?url=https://www.alwihdainfo.com/sitemap-posts.xml&fields=allLinks&token=YOUR-TOKEN
We have added support for the optional query parameter noredirects to the Diffbot API. This parameter prevents the Diffbot API from automatically following HTTP redirects for the submitted URL, giving you more control over the extraction process.
Usage
To use this parameter, simply appendnoredirects to your API call URL.
Example Request:
http://api.diffbot.com<YOUR_TOKEN>&url=<ARTICLE_URL>&noredirects
Error Handling
When the noredirects parameter is used, the API will not follow a redirect. Instead, if a redirect is required to access the page content, the API will return an HTTP 500 Internal Server Error with a specific JSON response body. The final, redirected URL is not included in the response.
Example Error Response (HTTP 500):
json
{ "errorCode": 500, "error": "This page requires a redirect. Please retry with redirects enabled if this url needs to be extracted." }
Primary Use Cases & Benefits
This parameter is most beneficial when using specific extraction APIs, such as the Article API or Product API, rather than the Diffbot Analyze API. Use it to:
Prevent extractions: for example, use this param to prevent an outdated article or product offer URL from silently redirecting to a general index or homepage when the original content is no longer available. This avoids the undesired extraction of the first item from a list on the index page.
Control the extraction source: to ensure that the extraction is performed only on the exact URL submitted, giving developers certainty regarding the data source.
Use this optional param to return the extracted data as llm-ready markdown, i.e. '&mode=llm', including interactive elements. Try it here: Website example or Github example.
KG DATA CHANGE NOTIFICATION - Organization.naceClassification
We will be updating Organization.naceClassification to NACE Rev. 2.1 in build v437 of the Diffbot Knowledge Graph, targeted to go live in about two weeks. Please read on for more details.
Ordinarily, we take extraordinary measures to avoid breaking changes in the Diffbot Knowledge Graph ontologies. However, in some cases, there is no benefit in retaining a prior version of the data, so we replace an existing attribute with a new data format. The Organization.naceClassification field is one such case. The current version of the NACE codes in the KG lacks level, isPrimary, and ancestor codes. And, some of the codes are no longer valid in the latest NACE Rev. 2.1 version.
In Rev 2.1 of the NACE codes:
NACE codes are no longer strictly 4-digit numbers.
NACE codes are structured into: Sections (letters A–V, level 1) →
 Divisions (2 digits, level 2) → Groups (3 digits with dot, level 3) →
 Classes (4 digits with dot, level 4).
Codes are unique. For example, both 28 and 29 share the same parent C, but it is not repeated after 28 because it already appears earlier in the primary chain.
There is at most one primary code per level.
Primary codes are listed before non-primaries.
Specific codes (e.g., 29.10) are listed before broader ones (e.g., 29.1).
For a comparison of the existing code format versus the new Rev 2.1 format, see below.
CURRENT DATA FORMAT: NACE codes - Organization.naceClassification
Volkswagen's current NACE classification in the KG appears as the following
[
{
"code": "2910",
"isPrimary": false,
"name": "Manufacture of motor vehicles"
},
{
"code": "7022",
"isPrimary": false,
"name": "Business and other management consultancy activities"
},
{
"code": "7021",
"isPrimary": false,
"name": "Public relations and communication activities"
}
]
Issues with this data:
Missing level information
All codes are marked as non-primary
Parent codes are missing
Codes 7022 and 7021 are no longer valid in the Rev. 2.1 version of the codes
Volkswagen should not be classified under those industries in 7022 and 7021.
NEW DATA FORMAT: NACE Rev 2.1 Codes
When the updates deploy, Volkswagen's Organization.naceClassification NACE codes will look like this:
[
{
"code": "29.10",
"level": 4,
"isPrimary": true,
"name": "Manufacture of motor vehicles",
"version": "Rev 2.1"
},
{
"code": "29.1",
"level": 3,
"isPrimary": true,
"name": "Manufacture of motor vehicles",
"version": "Rev 2.1"
},
{
"code": "29",
"level": 2,
"isPrimary": true,
"name": "Manufacture of motor vehicles, trailers and semi-trailers",
"version": "Rev 2.1"
},
{
"code": "C",
"level": 1,
"isPrimary": true,
"name": "MANUFACTURING",
"version": "Rev 2.1"
},
{
"code": "28.11",
"level": 4,
"isPrimary": false,
"name": "Manufacture of engines and turbines, except aircraft, vehicle and cycle engines",
"version": "Rev 2.1"
},
{
"code": "28.1",
"level": 3,
"isPrimary": false,
"name": "Manufacture of general-purpose machinery",
"version": "Rev 2.1"
},
{
"code": "28",
"level": 2,
"isPrimary": false,
"name": "Manufacture of machinery and equipment n.e.c.",
"version": "Rev 2.1"
}
]
diffbot-small-xl is finally live on https://lmarena.ai/! Check out: https://lmarena.ai/leaderboard/search. LMArena has open-sourced the largest repository of organic human preferences on generative models in the world. These datasets are free and open to access.
You can now label and disable tokens from the Dashboard. And, be sure to use a child token in production environments. They're much easier to replace if compromised.
You can now find us on Postman! We're starting with Extract API, and moving quickly to get the rest of our APIs on Postman as well.
Postman is an API testing platform that eliminates the need to manually write cURL. The API testing UI is quite similar to what we have in the docs, with even more features to setup your environment, testing scripts, and more.
Note that our primary documentation platform will continue to live on docs.diffbot.com. Postman is an extension of our docs presence to make it easier for Postman users to test Diffbot APIs on their preferred platform.
Investment Transactions are now searchable on LeadGraph! This makes it possible to:
⁃ Stay on top of recent funding rounds
⁃ Find investors that have invested in companies with particular industries, keywords, company size, etc.
⁃ See funding insights for investors, industries, funding rounds, and more....