Extract Content Not Available Online

POST markup or plain text directly to any Extract API endpoint

Note that the quality of analysis is dependent on many factors, among them the accessibility of page assets (images, CSS) and how reliant the page layout is on those that are unavailable.


Please note that the url argument is still required, and will be used to resolve any relative links contained in the markup.

Provide the content to analyze as your POST body, and specify the Content-Type header as text/html (for full markup) or text/plain (for text-only).

Example Request for Extracting HTML Markup

curl --request POST \
     --header 'Content-Type: text/html' \
     --url 'https://api.diffbot.com/v3/article?token=<YOURTOKEN>&url=http%3A%2F%2Fstore.diffbot.com' \
     -d '<html><head><title>Diffbot Extract makes web data fun</title></head><body><h1>Make web data fun</h1><p>Do you know what makes working with web data fun? Diffbot! </p></body></html>'

Example Request for Extracting Plain Text

Only available with Article API.

curl --request POST \
     --header 'Content-Type: text/plain' \
     --url 'https://api.diffbot.com/v3/article?token=<YOURTOKEN>&url=http%3A%2F%2Fstore.diffbot.com' \
     -d 'Now is the time for all good robots to come to the aid of their-- oh never mind, run!'