Generally Diffbot processing and visual analysis takes just a couple of seconds. Why, then, can an API response take longer than that?
The overwhelming majority of time in a Diffbot request is taken up by fetching content from the requested (third-party) site. The site’s physical location, responsiveness, and overall Internet traffic will dictate the latency in the request.
That said, there are a some things you can try to help make your Diffbot processing even faster:
The Article and Product APIs automatically identify and extract user comments and reviews from article and product pages. If you do not need comment or review data, you can disable this functionality using the argument discussion=false. While comment extraction is typically very fast, this can have a noticeable performance improvement on pages with many comments or reviews.
Diffbot’s Article API will automatically concatenate (string-together) multiple pages of an article. For articles with many pages, this will result in a number of individual requests made to the third-party server, which will add to the response time. If you do not wish to have pages concatenated / full article contents returned, pass along the argument paging=false to ensure that only the first page of content is returned.
Sometimes a robots.txt file will have delays specified. Diffbot will obey these by default. You should try turning off the Obey Robots.txt feature in the Crawlbot settings UI if you notice very slow crawls.
If speed of your request is much more important than the overall quality of extraction, you can disable full rendering engine using the advanced argument norender. Pass norender along in your Article API requests and you will receive faster response times, albeit with reduced quality of extraction. (Common issues include under-identified image captions and the over-inclusion of sharing and other extraneous elements.)
If you have proximity to or special access to your target content, you may experience faster processing by downloading the content directly, then POSTing the markup to our APIs (Automatic or Custom). Diffbot will return the structured content as usual, but without having to fetch the content first.
See more on uploading content for extraction.
Our Bulk API is specifically built for processing large batches of URLs asynchronously. The Bulk service distributes your calls across a broad array of extraction servers for speedy processing, and provides the extracted output in a single JSON output — or, you can use our Search API to fine-tune the output from your structured data.
Proxies add additional layers for the data to go through which introduces further delay. Disabling them should increase the response time.
Updated over 1 year ago