Extract with Custom API

Extracts a page using a modified Extract API or a custom ruleset.

If you need just one more field from an Extract API, or if one or more field values are incorrect, you may use a Custom API to override or add those fields using rules.

Correcting a field’s output takes immediate effect for your account, and also serves to train our system, improving Diffbot extraction over the long run.

Extracting a page with a Custom API works just like all other Extract APIs. Simply pass a URL to your Custom API's unique endpoint.

You may wish to start with creating a Custom API.

Response

The Custom API returns data in JSON format.

Each response includes a request object (which returns request-specific metadata), and an objects array, which will include the extracted information for all objects on a submitted page.

For Custom APIs the objects array will always contain a single object, and all custom fields and collections will be returned therein.

Optional Fields

Custom API may also return some optional fields if specified. (comma delimited) in the &fields= argument.

Already have the source HTML? POST it to Custom API.

Custom API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.

Path Params
string
required

Name of your Custom API

Query Params
string
required

Target URL to extract (url encoded)

string

Specify optional fields to be returned from any fully-extracted pages (e.g. fields=querystring,links)

int32

Sets a value in milliseconds to wait for the retrieval/fetch of content from the requested URL. The default timeout for the third-party response is 30 seconds (30000).

string

Use for jsonp requests. Needed for cross-domain ajax.

string

Leave value empty to use default proxies, or specify an IP address of a custom proxy that will be used to fetch the target page, instead of Diffbot's default IPs/proxies. (Ex: &proxy or &proxy=0.0.0.0)

string

Used to specify the authentication parameters that will be used with a custom proxy specified in the &proxy parameter. (Ex: proxyAuth=username:password)

string

none will instruct Extract to not use proxies, even if proxies have been enabled for this particular URL globally.

Responses

Language
Credentials
Click Try It! to start a request and see the response here! Or choose an example:
application/json