Diffbot offers several APIs for automatic extraction of content from third party websites. The content is returned as structured JSON or, in some cases, CSV, and is ready to be imported into your local database for further processing.
Through machine learning and its ability to improve over time, Diffbot excels in recognizing various types of content in several dozen human languages.
Please consult the following list of available APIs and their brief descriptions to recognize the use case you're interested in. Almost all of the APIs below can be used in tandem with out batch services and applied to a wide range of links at once.
- Product API allows you to extract information about products, including specifications, colors, availability, price, discount offers, shipping, description, reviews, and more.
- Article API allows you to extract information about news articles, blog posts, and other written content. Diffbot can recognize authors and their profile images and links, dates and locations of publication, sentiment, tags based on content, images in the article, comments, language the content is written in, and more.
- Image API allows you to extract detailed information about images, from dimensions and download URLs to what's on the image through image recognition.
- Video API same as above, for videos.
- Discussion API is used for extracting threads of content. This can be a review section of a product (indeed, Product API uses the Discussion API internally when extracting comments to include them in the output), a forum or Reddit thread, or a comment section in a blog.
- Analyze API can be used when you're not sure of the type of page you're trying to parse, or when you're issuing batch calls and can't specify what Diffbot will encounter across the domain being spidered. Analyze API will automatically recognize the type of content and apply one of the above APIs for processing.
- Custom API comes in when all else fails. If an Automatic API fails to recognize some content or extracts the wrong thing, it can be modified with the Custom API toolkit. Futhermore, if the page being processed is in a completely different category from any of the above APIs, a Custom API can be created from scratch specifically for that resource. Custom APIs are very powerful and worth learning about.
- Account API is an account-management API which lets you programmatically find out information about your Diffbot account. This can be account type, requests issued to Diffbot, plan and billing, list of custom rules, and more.