Search a Crawl/Bulk job using DQL
DQL allows you to search the extracted content of your Diffbot collections. A collection is a discrete Crawl or Bulk job output, and includes all of the web pages processed within that job.
In order to search a collection, you must first create that collection using either Crawl or the Bulk API. A collection can be searched before a crawl or bulk job is finished.
To search a crawled collections, you have to specify type=crawl
and specify one or more collections in the col
parameter. The parameter col=all
searches all your custom crawl collections. You can then query the collection using DQL.
An example API request looks like this:
https://kg.diffbot.com/kg/v3/dql?token=<DIFFBOT-TOKEN>&type=crawl&col=winemore,bevmo&query=title:'Riesling'
The above API request has the following parameter
Parameter | Value | Description |
---|---|---|
token | DIFFBOT-TOKEN | The Diffbot token that you used to create the custom crawl |
type | crawl | Specify type=crawl when searching a crawl collection |
col | winemore,bevmo | A comma-delimited list of collections to search. The parameter col=all searches all your custom crawl collections. |
query | title:'Riesling' | DQL query. See Search(DQL) to learn how to write DQL queries. |
Free-text search
You can use standard DQL syntax to query the collections. Aside from that, you can also use free-text queries as follows:
query= | Returns... |
---|---|
computer vision | All objects containing "computer" and "vision" anywhere in all Diffbot-extracted fields. |
"web page analysis" | All objects containing the phrase "web page analysis" anywhere in all Diffbot-extracted fields. |
Updated 29 days ago