Search a Crawl/Bulk job using DQL

DQL allows you to search the extracted content of your Diffbot collections. A collection is a discrete Crawl or Bulk job output, and includes all of the web pages processed within that job.

In order to search a collection, you must first create that collection using either Crawl or the Bulk API. A collection can be searched before a crawl or bulk job is finished.

To search a crawled collections, you have to specify type=crawl and specify one or more collections in the col parameter. The parameter col=all searches all your custom crawl collections. You can then query the collection using DQL.

An example API request looks like this:<DIFFBOT-TOKEN>&type=crawl&col=winemore,bevmo&query=title:'Riesling'

The above API request has the following parameter

tokenDIFFBOT-TOKENThe Diffbot token that you used to create the custom crawl
typecrawlSpecify type=crawlwhen searching a crawl collection
colwinemore,bevmoA comma-delimited list of collections to search. The parameter col=all searches all your custom crawl collections.
querytitle:'Riesling'DQL query. See Search(DQL) to learn how to write DQL queries.

Free-text search

You can use standard DQL syntax to query the collections. Aside from that, you can also use free-text queries as follows:

computer visionAll objects containing "computer" and "vision" anywhere in all Diffbot-extracted fields.
"web page analysis"All objects containing the phrase "web page analysis" anywhere in all Diffbot-extracted fields.

Did this page help you?