Docs Suite

Docs Suite

  • Debugging

›Basics

Custom API

    Basics

    • Introduction
    • Basic Usage
    • Custom API Video Tutorials

    Recipes

    • Index
    • Back up / restore custom APIs
    • Sending Custom Headers with API Calls
    • Applying the same API to multiple domains

    API

    • Custom APIs
    • Custom API Selectors and Filters
    • Managing Custom Rules Programmatically
Edit

Custom API Basic Usage

Here’s how to make a correction with the API Customization toolkit if you have a problem with a particular site.

Find a problematic URL

Start with a web page that is exhibiting the problem, then visit the dashboard, and enter the Custom API section and click Create New (or go there directly). Pick the type of API you want to process this page with, and enter the problematic URL.

Creating a new API and entering a link into the Test URL field

The example above processes this wiki page which has a clearly defined author at the bottom, but this wasn't picked up by Diffbot as evident by the results in the image below.

Author is missing from extraction

Edit the field you wish to correct.

Click “edit” next to the field you wish to adjust. In our example, we’ll edit the author field.

Edit field popup with preview window

In the resulting preview window, you can either manually enter a CSS Selector, or point-and-click to choose the correct element. A preview of the output will be displayed at the top of the screen.

In our example case, the CSS selector we want is .docLastUpdate:

Selector is producing a result

Note: By default the Custom API Toolkit will retrieve all content matching your selector. You can select multiple items with different selectors if you wish, by comma-separating your selectors.

For example, if you are trying to extract two different types of images, one with the ID #featureImage and perhaps additional images with the class .inlineImage, simply use the selector: #featureImage, .inlineImage

Your API results will include all matching images from either selector.

We're getting some extra information! Diffbot's API customization allows us to apply regular expression filters to some output in order to tweak it to our very specific needs. Let's apply one and clear up the unneeded information.

Click on Filters, select Replace, under value put ^(.*)by\s(.*)$ and under Replace with put $2.

Regular expression filter applied

The above means "split the result into two groups separated by by, and then replace the whole output with the content of the second group".

Note: if you're not familiar with regular expressions, fear not - they aren't needed often. But if you're curious and want to learn more about this powerful string manipulation language, Regex101 is an excellent playground.

Click Save to save and apply your rule.

Once saved, your rule will take immediate effect for API calls (a) using the specified API and (b) matching the domain regular expression.

Accurate result given by modified API

Last updated by dioro
← IntroductionCustom API Video Tutorials →
  • Find a problematic URL
Docs Suite
Docs
ExtractionCrawlingKnowledge GraphDiffbot and GDPR
Community
Stack OverflowTwitter
More
BlogHelpGitHub
Diffbot.com
Copyright © 2021 Diffbot.com