How to Use Querystrings in Crawl and Bulk Extract

Querystrings are an optional setting that allow you to add additional parameters to your crawls and bulk extractions.

These additional parameters generally include the query parameters supported with the selected Extract API. For example, Analyze API supports a query parameter called mode to extract only pages classified to the type specified. This query parameter can be passed to Crawl as a querystring.

Supported Querystring Options

Supported query parameters change depending on the chosen Extract API. Review the respective API reference pages for a full list of available Extract API parameters.

Crawl and Bulk Extract also support a few utility querystring options. See below for a list.

ParameterValueDescription
&links<leave blank>Extracts Javascript generated links in addition to links in the raw HTML source.

Using Querystrings in the Dashboard

When creating a Crawl or Bulk Extract, you have the option to include a querystring with your Extract API. In the Diffbot Dashboard, this option is labeled "Querystring".

Simply add the desired parameters to the form field following standard querystring convention.

Screenshot of a New Crawl in the Diffbot Dashboard. The Querystring form field has its value set to "mode=article&links".

Screenshot of a New Crawl in the Diffbot Dashboard. The Querystring form field has its value set to "mode=article&links".

In the example above, with the Querystring field set to ?mode=article&links and API set to analyze, your crawl will look for Javascript generated links on the page and only extract pages classified as articles.

📘

Pro Tip

Setting the querystring field value to ?links&mode=article is equivalent to ?mode=article&links.

Using Querystrings with Crawl or Bulk Extract API

When creating a Crawl or Bulk Extract, you have the option to include a querystring with your apiUrl. Query parameters can be appended to the apiUrl the same way you would append them in a typical GET request. Example crawl below.

curl --location 'https://api.diffbot.com/v3/crawl' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'token=<YOUR DIFFBOT TOKEN>' \
--data-urlencode 'name=diffbot-querystring-demo' \
--data-urlencode 'seeds=https://example.com/' \
--data-urlencode 'apiUrl=https://api.diffbot.com/v3/analyze?mode=article&links'