Pause, delete, restart, or view the status of a bulk job.
A single endpoint allows both control and status requests for one or more active bulk jobs (and crawls) with any given token.
View the Status of Bulk Jobs
Your token's active bulk jobs (along with any active crawls) will be returned in a jobs object when a GET request supplying just a token
parameter is made to this endpoint.
Note that this endpoint without any query parameters returns exactly the same output as its Crawl Job equivalent.
curl --request GET \
--url https://api.diffbot.com/v3/bulk?token=YOURDIFFBOTTOKEN \
--header 'Accept: application/json'
To retrieve a single job's details, provide the job's name
in addition to your token in your request.
Pause a Bulk Job
To pause a bulk job, send a GET request to this endpoint supplying your token
, the name
of the bulk job to pause, and the pause
parameter set to 1
.
curl --request GET \
--url https://api.diffbot.com/v3/bulk?token=YOURDIFFBOTTOKEN&name=bulkTest&pause=1 \
--header 'Accept: application/json'
To resume a paused bulk job, pass pause=0
in the same GET request.
Delete a Bulk Job
To delete a bulk job, send a GET request to this endpoint supplying your token
, the name
of the bulk job to delete, and the delete
parameter set to 1
. Job deletions are irreversible.
curl --request GET \
--url https://api.diffbot.com/v3/bulk?token=YOURDIFFBOTTOKEN&name=bulkTest&delete=1 \
--header 'Accept: application/json'
Restart a Bulk Job
To restart a bulk job, send a GET request to this endpoint supplying your token
, the name
of the bulk job to restart, and the restart
parameter set to 1
. This will erase all previously processed data and re-process all of the submitted URLs.
curl --request GET \
--url https://api.diffbot.com/v3/bulk?token=YOURDIFFBOTTOKEN&name=bulkTest&restart=1 \
--header 'Accept: application/json'
Response
All requests will return a JSON response. The following is a sample response.
{
"jobs": [
{
"jobStatus": {
"message": "Job has completed and no repeat is scheduled.",
"status": 9
},
"maxHops": -1,
"downloadJson": "...json",
"urlProcessPattern": "",
"jobCompletionTimeUTC": 0,
"maxRounds": -1,
"type": "bulk",
"pageCrawlSuccessesThisRound": 0,
"urlCrawlRegEx": "",
"pageProcessPattern": "",
"apiUrl": "https://api.diffbot.com/v3/analyze",
"useCanonical": 1,
"jobCreationTimeUTC": 1649950325,
"repeat": 0,
"downloadUrls": "...csv",
"obeyRobots": 1,
"roundsCompleted": 0,
"pageCrawlAttempts": 0,
"notifyWebhook": "",
"pageProcessSuccessesThisRound": 0,
"customHeaders": {},
"objectsFound": 0,
"roundStartTime": 0,
"urlCrawlPattern": "",
"seedRecrawlFrequency": -1,
"urlProcessRegEx": "",
"pageProcessSuccesses": 0,
"urlsHarvested": 0,
"crawlDelay": -1,
"currentTime": 1649950325,
"useProxies": 0,
"sentJobDoneNotification": 0,
"currentTimeUTC": 1649950325,
"name": "bulkTest",
"notifyEmail": "",
"pageCrawlSuccesses": 0,
"pageProcessAttempts": 0
}
]
}
Status Codes
The jobStatus
object will return the following status codes and associated messages:
Status | Message |
---|---|
0 | Job is initializing |
6 | Job paused |
7 | Job in progress |
8 | All crawling temporarily paused by root administrator for maintenance. |
9 | Job has completed and no repeat is scheduled |