Getting Started with Custom API
Extend Extract API or create your own custom extraction API with rules
In most scraping use cases, Extract API does an excellent job extracting a complete ontology of attributes from the page. If you have additional attributes that you wanted to extract, or if a particular field is not extracting accurately, you would create a Custom API to close that gap.
A Custom API can be either an extended Extract API or a completely custom API. In both ways, a set of manually defined rules set by CSS selectors are used to extract data from the page, much like traditional web scraping.
Extended Extract API
A Custom API that modifies or appends more data to the output of Extract API with the use of manually defined rules.
Completely Custom API
A blank slate Custom API that extracts only the data manually defined by rules.
A Simple Example
Suppose you wish to extract data from this blog post . You pass this URL to Extract API and initiate the request, but the author
field was not successfully extracted.

We can remedy this with a simple Custom API rule!
This example might not be reproducible
Extract API's AI model is constantly improving and may already be extracting
author
correctly by the time you're reading this guide. With that said, the fundamental steps still apply for creating a simple Custom API.
There are two ways to go about this. The easier method is via the Dashboard, but we would be remiss if we didn't offer an API option as well. Both options will be covered below.
Creating a Simple Custom API on the Dashboard
Head to the Custom API view on the Dashboard, and click "New Custom API" on the top right corner of the screen.
For this Custom API, we will be extending an Extract API (Article API) by adding a manually defined rule for author
, rather than relying on Diffbot AI to extract it (which it was unsuccessful at doing). Select "Article" under the "API" dropdown and paste the URL of blog post into the "Test URL" field. Then click "Create".

On the page that follows, notice that the author
field is empty. This is the field that Article API failed to extract. We'll fix that by defining a manual rule for this field. Click Edit next to the author
field.

A modal will open with a DOM preview of the page. You can either manually enter a CSS selector or point-and-click on the element you want extracted. A preview of the output will be displayed at the top of the screen.

Notice in the screenshot above that the "Selector/Value" field is filled now with .mb-1.text-dark strong
. This is a CSS selector that identifies the element to extract. The preview on the right shows that we've correctly selected the author name. Click "Save" to continue.

Once saved, your new Custom API rule will take immediate effect for all Article API calls targeting pages that match the URL regex pattern saved to the Custom API (in this case, the www.diffbot.com domain).
Custom APIs are uniquely defined by an API name and a URL pattern
If you create another Custom API targeting the same URL pattern using the List API, that Custom List API will only take effect when you make a List API extract call to a URL matching that URL pattern.
Vice versa, the Custom Article API we created prior will only take effect when you make an Article API extract call for the same URL.
Click the API Call button on the top right of the page to see the final JSON result.

Now let's try creating the same Custom API via API.
Creating a Simple Custom API via API
A Custom API is a ruleset defined in JSON. For this example, our final Custom API definition (also known as a Custom API ruleset) will look like this
{
"rules": [
{
"name": "author",
"selector": ".mb-1.text-dark strong"
}
],
"api": "/api/article",
"urlPattern": "(http(s)?://)?(.*\\.)?www.diffbot.com.*",
"testUrl": "https://www.diffbot.com/insights/build-a-sanctions-tracker/"
}
Let's break this ruleset down so we can better understand what is going on here.
A Custom API starts with an api
name and a urlPattern
. These attributes uniquely define and ID a Custom API.
The api
name is not just a label but also a pointer. We are extending the Article API by adding a manual rule for the author
field in this example. To extend the Article API, we define the api
name with the value /api/article
. This is a reserved value that invokes the base article extraction model behind Article API. If we wish to define a completely Custom API from scratch, we can put any value here that is not reserved (e.g. my-custom-api
).
The urlPattern
is a regular expression that tells Diffbot what URLs this Custom API should apply to. This is usually done to scope a Custom API to a single domain, but it can also be used to apply the rules of a Custom API to any set of websites you desire. For this example, we will define a regular expression that simply scopes the Custom API to www.diffbot.com domain — (http(s)?://)?(.*\\.)?www.diffbot.com.*
The testUrl
is not functionally necessary but is helpful to define for future you to remember the specific website this rule was designed to fix.
Now that we have an api
, a urlPattern
, and a testUrl
defined, the next step is to define the rules that extract the data we want.
rules
is a JSON array. Each JSON record in the array defines a single field and the extraction logic for each. We will define one rule in this example.
{
"name": "author",
"selector": ".mb-1.text-dark strong"
}
name
can be any string value you wish, but if it is the same value as an attribute that Article API normally extracts (i.e. exists in the Article ontology), that field will be overridden by your rule.
selector
is a CSS selector that matches the target element you wish to extract from.
By default, a rule defined as such will extract the text content of the defined selector. This is appropriate for our example, but if you need to extract something more complicated (like an attribute value), see Using Rule Filters.
Our last step to actually create this Custom API is to POST this ruleset to the Create a Custom API endpoint.
curl --request POST \
--url https://api.diffbot.com/v3/custom?token=YOURDIFFBOTTOKEN \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '
{
"rules": [
{
"name": "author",
"selector": ".mb-1.text-dark strong strong"
}
],
"api": "/api/article",
"urlPattern": "(http(s)?://)?(.*\\.)?www.diffbot.com.*",
"testUrl": "https://www.diffbot.com/insights/build-a-sanctions-tracker/",
}
'
Pro Tip for Updating Custom APIs
The combination of
api
andurlPattern
uniquely IDs a Custom API for your token. Should you wish to update any rules in your ruleset, simply providing the same combination ofapi
andurlPattern
will override the ruleset for that Custom API. Changing the values of eitherapi
orurlPattern
will create a brand new Custom API.
That's it! You've created your first Custom API. While we didn't cover them here, more advanced Custom APIs can execute Javascript, forward cookies, and even modify the effect of some Extract APIs . Keep reading to learn more!
Updated 4 days ago