Custom Headers

Diffbot supports setting/sending the following custom headers for Extract API, Bulk Extract API, and Crawl API. These headers will be used when requesting content from third-party sites:

User-Agent
Referer
Cookie
Accept-Language
X-Evaluate

User-Agent, Referer, Accept-Language

Create a new RequestBin. We'll use this to test that our custom headers are coming through.

There are several ways to attach custom headers to API requests.

Direct

Use X-Forward as a prefix with any header you want forwarded. For example, to send the User-Agent foobar, we would use the header X-Forward-User-Agent: foobar.

Here's an example as a cURL request:

curl --location --request GET 'api.diffbot.com/v3/article?token=MY_TOKEN&url=https%3A%2F%2Fen17uofqrlcgv.x.pipedream.net%2F' \
--header 'X-Forward-User-Agent: foobar' \
--header 'X-Forward-Referrer: Diffbot.com' \
--header 'X-Forward-Accept-Language: hr'

These headers are discarded after this call, meaning they need to be added again on a subsequent call.

Rule-based

To permanently attach headers to a rule for an API, you can download raw rule data, modify it, and upload it back to your token, replacing the old rule setup.

A rule with permanently attached headers might look like this:

{
xForwardHeaders: {
    User-Agent: [
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Safari/604.1.38"
    ],
    X-Evaluate: "function() { start(); setTimeout(function() { setTimeout(function() { end(); }, 5000); }, 25000);}",
    Cookie: "foo=bar"
},
rules: [ ],
api: "all",
urlPattern: "(http(s)?://)?(.*\.)?mysite.com.*",
testUrl: "",
},

Re-uploading this JSON will add these headers to all calls issued towards mysite.com. Notice that User-Agent is a JavaScript array. If you supply User-Agent as an array (only possible through this method), Diffbot will randomly pick one from the list when accessing a site. This is great for throwing off bot-detection algorithms.

Note: X-Evaluate is explained below

Dashboard

The Dashboard also allows you to permanently add some headers to a rule. When creating a new custom rule, use the Custom Headers section to enter any headers you wish to add. This will save the headers in the same way as the JSON approach above.

Cookie

The Cookie header allows you to:

simulate a login session
remove annoying GDPR and newsletter popups
ignore ads and content you don't want to extract

For a comprehensive guide on using the Cookie header to simulate a login session, please see How do I extract content behind logins?

X-Evaluate

See Custom Javascript.