Yes, by default Crawlbot adheres to a site’s robots.txt instructions, including the
In specific cases — typically because of a partnership or agreement you have with the site to be crawled — the robots.txt instruction can be ignored/overridden. This is often faster than waiting for the third-party site to update its robots.txt file.
To whitelist Crawlbot for a site, specify the “Diffbot” user-agent in the site’s robots.txt:
User-agent: Diffbot Disallow
Note that Crawlbot does not adhere to the