Have you run into a problem where the Diffbot extraction from a particular site is incorrect or needs adjusting? Our API Toolkit not only allows you to create new APIs entirely, but also to override or correct the output returned by our Automatic APIs.
Correcting a field’s output takes immediate effect for your account, and also serves to train our system, improving Diffbot extraction over the long run.
Here’s how to make a correction if you have a problem with a particular site:
Find a problematic URL
Start with a web page that is exhibiting the problem, then visit the API Toolkit in your Developer Dashboard.
Create a rule in the API Toolkit
Select the API you want to correct from the drop-down list, and then “Test” your sample URL’s output.
Optional: adjust the domain-matching for your rule
By default, your rule will apply to any pages whose URLs match the subdomain of the sample URL. In our case, the rule will affect all pages at support.diffbot.com.
To adjust this, click the Change this link. This will provide you a regular expression that can be edited to narrow or broaden your matches. For example, to apply to all pages at diffbot.com:
To apply only to pages within the “/apitoolkit/” section:
Or to apply to all pages at any domain:
Edit the field you wish to correct
The API Toolkit will show a preview of current API output. To correct, click “edit” next to the field you wish to adjust. In our example, we’ll edit the
author field, which is hidden for Diffbot support posts.
In the resulting preview window, you can either manually enter a CSS selector, or point-and-click to choose the correct element. A preview of the output will be displayed at the top of the screen.
In our example case, the CSS selector we want is
Click “Save” to save your rule
Once saved, your rule will take immediate effect for API calls (a) using the specified API and (b) matching the domain regular expression.
Any page that doesn’t contain a specified CSS selector will return the default Diffbot API output.
For more advanced techniques, see the following resources: