How to Use Querystrings in Crawl and Bulk Extract
Querystrings are an optional setting that allow you to add additional parameters to your crawls and bulk extractions.
These additional parameters generally include the query parameters supported with the selected Extract API. For example, Analyze API supports a query parameter called mode
to extract only pages classified to the type specified. This query parameter can be passed to Crawl as a querystring.
Supported Querystring Options
Supported query parameters change depending on the chosen Extract API. Review the respective API reference pages for a full list of available Extract API parameters.
Crawl and Bulk Extract also support a few utility querystring options. See below for a list.
Parameter | Value | Description |
---|---|---|
&links | <leave blank> | Extracts Javascript generated links in addition to links in the raw HTML source. |
Using Querystrings in the Dashboard
When creating a Crawl or Bulk Extract, you have the option to include a querystring with your Extract API. In the Diffbot Dashboard, this option is labeled "Querystring".
Simply add the desired parameters to the form field following standard querystring convention.
In the example above, with the Querystring field set to ?mode=article&links
and API set to analyze
, your crawl will look for Javascript generated links on the page and only extract pages classified as articles.
Pro Tip
Setting the querystring field value to
?links&mode=article
is equivalent to?mode=article&links
.
Using Querystrings with Crawl or Bulk Extract API
When creating a Crawl or Bulk Extract, you have the option to include a querystring with your apiUrl
. Query parameters can be appended to the apiUrl
the same way you would append them in a typical GET request. Example crawl below.
curl --location 'https://api.diffbot.com/v3/crawl' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'token=<YOUR DIFFBOT TOKEN>' \
--data-urlencode 'name=diffbot-querystring-demo' \
--data-urlencode 'seeds=https://example.com/' \
--data-urlencode 'apiUrl=https://api.diffbot.com/v3/analyze?mode=article&links'
Updated 9 months ago