Can Extract APIs Extract Content from PDFs or Other Documents?
Yes, but only in direct Extract API calls.
As of September 2016 Diffbot’s Automatic Extract APIs are able to structure content from PDF files.
This is a beta functionality and only available in direct API calls - it is not currently possible to process PDFs while using Crawl. (PDF URLs will be successfully processed in Bulk Extract jobs.)
Quality of PDF extraction varies and depends significantly on the underlying structure of the document itself.
Updated over 2 years ago