Docs Suite

Docs Suite

  • Debugging

›Basics

Article API

    Basics

    • Introduction
    • Basic Usage
    • Article API: HTML Field Specification
    • Semantria-Powered Text Analysis Features

    Recipes

    • Index
    • Accessing Data Behind a Login Wall
    • Auto-merge multi-page articles with custom rules
    • Correcting the Images field

    API

    • Article Extraction API
Edit

Article API Introduction

The Article API is used to extract clean article text and other data from news articles, blog posts and other text-heavy pages. Retrieve the full-text, cleaned and normalized HTML, related images and videos, author, normalized date, tags—automatically, from any article on any site.

The full API reference of Article API can be found here.

Learn more about the Article API:

  • Do Diffbot APIs Follow Redirects?
  • Do Diffbot APIs Execute JavaScript?
  • How long can a single request take / what is the Diffbot API timeout?
  • Does Diffbot handle non-English pages?
  • Can Diffbot APIs Extract Content from PDFs or Other Documents?
  • Can I send HTML or text directly to Diffbot APIs?
  • Using Diffbot Proxy Servers / Proxy IPs
  • What counts as an API call?

Once you've gone through the basics above, proceed with learning about:

  • Usage Examples
  • API reference
Last updated by Bruno Skvorc
Basic Usage →
Docs Suite
Docs
ExtractionCrawlingKnowledge GraphDiffbot and GDPR
Community
Stack OverflowTwitter
More
BlogHelpGitHub
Diffbot.com
Copyright © 2021 Diffbot.com