Extracts a page using a modified Extract API or a custom ruleset.
If you need just one more field from an Extract API, or if one or more field values are incorrect, you may use a Custom API to override or add those fields using rules.
Correcting a field’s output takes immediate effect for your account, and also serves to train our system, improving Diffbot extraction over the long run.
Extracting a page with a Custom API works just like all other Extract APIs. Simply pass a URL to your Custom API's unique endpoint.
You may wish to start with creating a Custom API.
Response
The Custom API returns data in JSON format.
Each response includes a request
object (which returns request-specific metadata), and an objects
array, which will include the extracted information for all objects on a submitted page.
For Custom APIs the objects
array will always contain a single object, and all custom fields and collections will be returned therein.
Optional Fields
Custom API may also return some optional fields if specified. (comma delimited) in the &fields=
argument.
Already have the source HTML? POST it to Custom API.
Custom API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.