Create a Crawl

Create and start a job to spider and extract pages through a site.

Form Data
string
required
Defaults to test-crawl

Job name. This should be a unique identifier and can be used to modify your crawl or retrieve its output.

string
required
Defaults to https://example.com

Seed URL(s). Must be URL encoded. Separate multiple URLs with whitespace to spider multiple sites within the same crawl. If the seed contains a non-www subdomain ("https://blog.diffbot.com" or "https://docs.diffbot.com") Crawl will restrict spidering to the specified subdomain.

string
required
Defaults to https://api.diffbot.com/v3/analyze

Full Extract API URL through which to process pages. E.g., &apiUrl=https://api.diffbot.com/v3/analyze to process matching links via the Analyze API, which will automatically determine the page type to extract as. The Extract API URL can include querystring parameters to tailor the output.

For example, &apiUrl=https://api.diffbot.com/v3/product?fields=querystring,meta will process matching links using the Product API, and also return the querystring and meta fields.

Responses

Language
Credentials
Query
LoadingLoading…
Response
Click Try It! to start a request and see the response here! Or choose an example:
application/json