Can I spider multiple sites in the same crawl?

Yes, you can crawl multiple sites in a single Crawl job.

Simply add each seed URL, separated by a space, when creating or editing your job. Crawl will then apply any crawling and processing patterns (and/or regular expressions) to all of the domains/subdomains contained in your seeds.

Extracted content will be indexed in the same Crawl collection; you can use a DQL API query to filter by site (site:diffbot.com) if you’d like to narrow results after crawling.

There is no hard limit to the number of seed URLs.

Updated over 3 years ago