Can I spider multiple sites in the same crawl?
Yes, you can crawl multiple sites in a single Crawl job.
Simply add each seed URL, separated by a space, when creating or editing your job. Crawl will then apply any crawling and processing patterns (and/or regular expressions) to all of the domains/subdomains contained in your seeds.
Extracted content will be indexed in the same Crawl collection; you can use a DQL API query to filter by site (site:diffbot.com
) if you’d like to narrow results after crawling.
There is no hard limit to the number of seed URLs.
Updated over 2 years ago