Create a Crawl

post

https://api.diffbot.com/v3/crawl

Create and start a job to spider and extract pages through a site.

Form Data

name

string

required

Defaults to test-crawl

Job name. This should be a unique identifier and can be used to modify your crawl or retrieve its output.

seeds

string

required

Defaults to https://example.com

Seed URL(s). Must be URL encoded. Separate multiple URLs with whitespace to spider multiple sites within the same crawl. If the seed contains a non-www subdomain ("https://blog.diffbot.com" or "https://docs.diffbot.com") Crawl will restrict spidering to the specified subdomain.

apiUrl

string

required

Defaults to https://api.diffbot.com/v3/analyze

Full Extract API URL through which to process pages. E.g., &apiUrl=https://api.diffbot.com/v3/analyze to process matching links via the Analyze API, which will automatically determine the page type to extract as. The Extract API URL can include querystring parameters to tailor the output.

For example, &apiUrl=https://api.diffbot.com/v3/product?fields=querystring,meta will process matching links using the Product API, and also return the querystring and meta fields.

Responses

Language

Credentials

Query

Loading…

Response

Click Try It! to start a request and see the response here! Or choose an example:

application/json

200200

400400

505505