Here’s how to make a correction with the API Customization toolkit if you have a problem with a particular site.
Find a problematic URL
Start with a web page that is exhibiting the problem, then visit the dahsboard, and enter the Custom API section and click Create New (or go there directly). Pick the type of API you want to process this page with, and enter the problematic URL.
The example above processes this wiki page which has a clearly defined author at the bottom, but this wasn't picked up by Diffbot as evident by the results in the image below.
Edit the field you wish to correct.
Click “edit” next to the field you wish to adjust. In our example, we’ll edit the
In the resulting preview window, you can either manually enter a CSS Selector, or point-and-click to choose the correct element. A preview of the output will be displayed at the top of the screen.
In our example case, the CSS selector we want is
Note: By default the Custom API Toolkit will retrieve all content matching your selector. You can select multiple items with different selectors if you wish, by comma-separating your selectors.
For example, if you are trying to extract two different types of images, one with the ID
#featureImageand perhaps additional images with the class
.inlineImage, simply use the selector:
Your API results will include all matching images from either selector.
We're getting some extra information! Diffbot's API customization allows us to apply regular expression filters to some output in order to tweak it to our very specific needs. Let's apply one and clear up the unneeded information.
Click on Filters, select Replace, under value put
^(.*)by\s(.*)$ and under Replace with put
The above means "split the result into two groups separated by
by, and then replace the whole output with the content of the second group".
Note: if you're not familiar with regular expressions, fear not - they aren't needed often. But if you're curious and want to learn more about this powerful string manipulation language, Regex101 is an excellent playground.
Click Save to save and apply your rule.
Once saved, your rule will take immediate effect for API calls (a) using the specified API and (b) matching the domain regular expression.