&linksas a Diffbot Querystring Argument
Adding the argument
&linksuses Diffbot core API link-extracting functionality to return all links found on a page. Crawlbot will use these additional links, found within the rendered page, to augment those found in the raw source.
If you are using the Crawlbot API, simply append
Include your seed page (and any other JS-requiring pages) in your processing pattern(s) or regular expression.
Make sure you broaden your processing patterns or processing regular expression, or remove them entirely.
A note on deduplication
Additional note for recurring crawls: Do not “Only Process New Pages”
If “Only Process New Pages” is set to “on,” only brand new URLs will be processed in subsequent crawl rounds. But in order to find Ajax-generated links per the above solution, pages will have to be re-processed each crawl round in order to discover new links.
If you are crawling an Ajax-heavy site regularly using the above method (e.g., for new products or new articles), please make sure you process all pages each round in order to find new URLs.