Docs Suite

Docs Suite

  • Debugging
Edit

Can I spider multiple sites in the same crawl? Is there a limit to the number of seed URLs?

Yes, you can crawl multiple sites in the same Crawlbot job: simply add each seed URL, separated by a space, when creating or editing your job. Crawlbot will then apply any crawling and processing patterns (and/or regular expressions) to all of the domains/subdomains contained in your seeds.

Extracted content will be indexed in the same Crawlbot collection; you can use a Search API query to filter by site (query=site:diffbot.com) if you’d like to narrow results after crawling.

There is no hard limit to the number of seed URLs.

Last updated by Rick Deininger
Docs Suite
Docs
ExtractionCrawlingKnowledge GraphDiffbot and GDPR
Community
Stack OverflowTwitter
More
BlogHelpGitHub
Diffbot.com
Copyright © 2021 Diffbot.com