Extract Content Not Available Online

If the website you're extracting from is not available to Diffbot's servers but is available to you, use the POST method with any Extract API to extract structured data from markup or plain text.

https://api.diffbot.com/v3/analyze?token=...&url=...

Please note that the url argument is still required, and will be used to resolve any relative links contained in the markup.

Provide the content to analyze as your POST body, and specify the Content-Type header as text/html (for full markup) or text/plain (for text-only).

Example Request for Extracting HTML Markup

curl --request POST \
     --header 'Content-Type: text/html' \
     --url 'https://api.diffbot.com/v3/article?token=<YOURTOKEN>&url=http%3A%2F%2Fstore.diffbot.com' \
     -d '<html><head><title>Diffbot Extract makes web data fun</title></head><body><h1>Make web data fun</h1><p>Do you know what makes working with web data fun? Diffbot! </p></body></html>'

Example Request for Extracting Plain Text

Only available with Article API.

curl --request POST \
     --header 'Content-Type: text/plain' \
     --url 'https://api.diffbot.com/v3/article?token=<YOURTOKEN>&url=http%3A%2F%2Fstore.diffbot.com' \
     -d 'Now is the time for all good robots to come to the aid of their-- oh never mind, run!'

Note that the quality of analysis is dependent on many factors, among them the accessibility of page assets (images, CSS) and how reliant the page layout is on those that are unavailable.