Introduction to Bulk Extract API

Bulk API lets you send a large quantity of URLs through any Diffbot Extract API for fast, asynchronous processing.

The Bulk API sends all submitted page URLs to an Extract API (either automatic or custom). All structured page results are then compiled into a single "collection," which can be downloaded in full or searched.

Note: The Bulk API is not a crawler: it does not spider a site for additional links. You must supply each URL you wish to process. For crawling/spidering, see the Crawl API.

🚧

Access to Bulk API is Limited to Plus Plans and Up

Upgrade to a Plus plan anytime at diffbot.com/pricing, or contact [email protected] for more information.

Data Retention

Inactive bulk jobs will be deleted within ten days of completion. This includes the extracted data as well as the job meta information (name, settings, etc.).

“Active” jobs are those that are not in a permanently “paused” state. Currently active jobs will not be deleted or removed from your account. After a job finishes, it will be subject to regular deletion policies.