Manage a Bulk Extract Job

Pause, delete, restart, or view the status of a bulk job.

A single endpoint allows both control and status requests for one or more active bulk jobs (and crawls) with any given token.

View the Status of Bulk Jobs

Your token's active bulk jobs (along with any active crawls) will be returned in a jobs object when a GET request supplying just a token parameter is made to this endpoint.

Note that this endpoint without any query parameters returns exactly the same output as its Crawl Job equivalent.

curl --request GET \
     --url https://api.diffbot.com/v3/bulk?token=YOURDIFFBOTTOKEN \
     --header 'Accept: application/json'

To retrieve a single job's details, provide the job's name in addition to your token in your request.

Pause a Bulk Job

To pause a bulk job, send a GET request to this endpoint supplying your token, the name of the bulk job to pause, and the pause parameter set to 1.

curl --request GET \
     --url https://api.diffbot.com/v3/bulk?token=YOURDIFFBOTTOKEN&name=bulkTest&pause=1 \
     --header 'Accept: application/json'

To resume a paused bulk job, pass pause=0 in the same GET request.

Delete a Bulk Job

To delete a bulk job, send a GET request to this endpoint supplying your token, the name of the bulk job to delete, and the delete parameter set to 1. Job deletions are irreversible.

curl --request GET \
     --url https://api.diffbot.com/v3/bulk?token=YOURDIFFBOTTOKEN&name=bulkTest&delete=1 \
     --header 'Accept: application/json'

Restart a Bulk Job

To restart a bulk job, send a GET request to this endpoint supplying your token, the name of the bulk job to restart, and the restart parameter set to 1. This will erase all previously processed data and re-process all of the submitted URLs.

curl --request GET \
     --url https://api.diffbot.com/v3/bulk?token=YOURDIFFBOTTOKEN&name=bulkTest&restart=1 \
     --header 'Accept: application/json'

Response

All requests will return a JSON response. The following is a sample response.

{
    "jobs": [
        {
            "jobStatus": {
                "message": "Job has completed and no repeat is scheduled.",
                "status": 9
            },
            "maxHops": -1,
            "downloadJson": "...json",
            "urlProcessPattern": "",
            "jobCompletionTimeUTC": 0,
            "maxRounds": -1,
            "type": "bulk",
            "pageCrawlSuccessesThisRound": 0,
            "urlCrawlRegEx": "",
            "pageProcessPattern": "",
            "apiUrl": "https://api.diffbot.com/v3/analyze",
            "useCanonical": 1,
            "jobCreationTimeUTC": 1649950325,
            "repeat": 0,
            "downloadUrls": "...csv",
            "obeyRobots": 1,
            "roundsCompleted": 0,
            "pageCrawlAttempts": 0,
            "notifyWebhook": "",
            "pageProcessSuccessesThisRound": 0,
            "customHeaders": {},
            "objectsFound": 0,
            "roundStartTime": 0,
            "urlCrawlPattern": "",
            "seedRecrawlFrequency": -1,
            "urlProcessRegEx": "",
            "pageProcessSuccesses": 0,
            "urlsHarvested": 0,
            "crawlDelay": -1,
            "currentTime": 1649950325,
            "useProxies": 0,
            "sentJobDoneNotification": 0,
            "currentTimeUTC": 1649950325,
            "name": "bulkTest",
            "notifyEmail": "",
            "pageCrawlSuccesses": 0,
            "pageProcessAttempts": 0
        }
    ]
}

Status Codes

The jobStatus object will return the following status codes and associated messages:

StatusMessage
0Job is initializing
6Job paused
7Job in progress
8All crawling temporarily paused by root administrator for maintenance.
9Job has completed and no repeat is scheduled
Language
Click Try It! to start a request and see the response here!