# Diffbot Docs Documentation ## Guides - [How do I delete my Diffbot account details?](https://docs.diffbot.com/docs/how-do-i-delete-my-diffbot-account-details.md): Data Removal Requests - [Does Diffbot respect robots.txt?](https://docs.diffbot.com/docs/does-crawl-respect-robotstxt.md): Yes. - [How can I crawl (news) sites and monitor/extract only recent content?](https://docs.diffbot.com/docs/how-can-i-crawl-news-sites-and-monitorextract-only-recent-content.md) - [How to find and access JavaScript-generated links while crawling](https://docs.diffbot.com/docs/how-to-find-and-access-javascript-generated-links-while-crawling.md) - [How to Use Querystrings in Crawl and Bulk Extract](https://docs.diffbot.com/docs/how-to-use-querystrings-in-crawl-and-bulk-extract.md) - [Is there a limit to the number of crawls/bulk jobs?](https://docs.diffbot.com/docs/is-there-a-limit-to-the-number-of-crawlsbulk-jobs.md): Yes. ~1000 crawl or bulk jobs. - [The Difference Between Crawling and Extraction](https://docs.diffbot.com/docs/the-difference-between-crawling-and-processing.md): Crawl looks for links. Extraction extracts those links for data. - [How to Read the URL Report](https://docs.diffbot.com/docs/url-report.md) - [Crawl and Processing Patterns and Regexes](https://docs.diffbot.com/docs/crawl-and-processing-patterns-and-regexes.md) - [Getting Started with Crawl](https://docs.diffbot.com/docs/getting-started-with-crawl.md): Spider a site for links and process them with Extract API - [Tutorial: How to Search a Crawl/Bulk job using DQL](https://docs.diffbot.com/docs/search-a-crawlbulk-job.md): Query your crawl or bulk job collections for data with DQL. (⏲️ 10 Minutes) - [How do I apply a Custom API to multiple domains?](https://docs.diffbot.com/docs/how-to-apply-a-custom-api-to-multiple-domains.md) - [Does Extract API cache responses?](https://docs.diffbot.com/docs/does-extract-api-cache-responses.md): Sometimes. - [Getting Started with Custom API](https://docs.diffbot.com/docs/introduction-to-custom-api.md): Extend Extract API or create your own custom extraction API with rules - [Tutorial: How to backup and restore Custom API rulesets](https://docs.diffbot.com/docs/how-to-backup-and-restore-custom-api-rulesets.md): Sometimes you might want to back up your custom rules – maybe you’re freezing your token for a while, or maybe you want to copy them to another token. In this short guide we’ll go through how to access all of your custom API rules and back them up for safe keeping. (⏲️ 10 Minutes) - [Tutorial: A Tiny, Zero Dependency Price Tracker](https://docs.diffbot.com/docs/tutorial-a-tiny-zero-dependency-price-tracker.md): Build a tiny personal price tracker with Diffbot Extract and Python (⏲️ 10 Minutes) - [Tutorial: How to Pull Data From a Website to Google Sheets](https://docs.diffbot.com/docs/tutorial-how-to-pull-data-from-a-website-to-google-sheets.md): Pull content from a website to a spreadsheet without scraping (⏲️ 15 Minutes) - [New to Diffbot?](https://docs.diffbot.com/docs/getting-started-with-diffbot.md): Let's help you get acquainted. - [CData Integration](https://docs.diffbot.com/docs/cdata-diffbot.md) - [Tables](https://docs.diffbot.com/docs/tables.md): Account is a metadata table. The other four: Organization, Person, Article, Place are queryable entity tables. To filter any of them, pass a DQL string into the dql property in the WHERE clause. DQL syntax and operators are documented at https://docs.diffbot.com/docs/kg-search. - [KnowledgeGraph Sources - Places](https://docs.diffbot.com/docs/knowledgegraph-sources.md) - [Enhance](https://docs.diffbot.com/docs/kg-enhance.md): Enrich a person or organization with data from the public web. - [Contains](https://docs.diffbot.com/docs/contains.md) - [Greater - Less Than - or - Equal To](https://docs.diffbot.com/docs/greater-less-than-or-equal-to.md) - [Comparison Operators](https://docs.diffbot.com/docs/comparison-operators-1.md) - [Not Equals](https://docs.diffbot.com/docs/not-equals.md) - [Strict](https://docs.diffbot.com/docs/strict.md) - [Search (DQL)](https://docs.diffbot.com/docs/kg-search.md): Find and filter organizations, people, articles, and more from Diffbot Knowledge Graph - [Query Types](https://docs.diffbot.com/docs/query-types-1.md) - [Range Operator](https://docs.diffbot.com/docs/range-operator.md) - [Simple & Nested Paths](https://docs.diffbot.com/docs/simple-nested-paths.md): a.k.a. AND queries - [Ontology](https://docs.diffbot.com/docs/ontology.md): A complete reference of entity types and their attributes in the [Diffbot Knowledge Graph](doc:getting-started-with-diffbot-knowledge-graph) - [LegalEntity](https://docs.diffbot.com/docs/legal-entity.md): Attributes available to legal entities in the Knowledge Graph. - [All Entities](https://docs.diffbot.com/docs/ont-all-entities.md): Attributes common to all entities in the Knowledge Graph. - [Article](https://docs.diffbot.com/docs/ont-article.md): Attributes available to article entities in the Knowledge Graph. - [CreativeWork](https://docs.diffbot.com/docs/ont-creativework.md): Attributes available to creative work entities in the Knowledge Graph. - [Discussion](https://docs.diffbot.com/docs/ont-dicussion.md) - [Event](https://docs.diffbot.com/docs/ont-event.md): Attributes available to event entities in the Knowledge Graph. - [Image](https://docs.diffbot.com/docs/ont-image.md): Attributes available to image entities in the Knowledge Graph. - [JobPost](https://docs.diffbot.com/docs/ont-jobpost.md): Attributes available to JobPost entities in the Knowledge Graph - [Organization](https://docs.diffbot.com/docs/ont-organization.md): Attributes available to organization entities in the Knowledge Graph. - [Person](https://docs.diffbot.com/docs/ont-person.md): Attributes available to person entities in the Knowledge Graph. - [Place](https://docs.diffbot.com/docs/ont-place.md): Attributes available to place/location entities in the Knowledge Graph. - [Product](https://docs.diffbot.com/docs/ont-product.md): Attributes available to retail product entities in the Knowledge Graph. - [Video](https://docs.diffbot.com/docs/ont-video.md): Attributes available to video entities in the Knowledge Graph. - [Research](https://docs.diffbot.com/docs/research.md): Attributes available to Research entities in the Knowledge Graph. - [Article Categories](https://docs.diffbot.com/docs/article-categories.md): Structured context to the topics discussed in the text of articles. - [Organization Categories](https://docs.diffbot.com/docs/organization-industries.md) - [Organization Industries (Legacy)](https://docs.diffbot.com/docs/organization-industries-legacy.md) - [Product Categories](https://docs.diffbot.com/docs/product-categories.md) - [Technology Categories](https://docs.diffbot.com/docs/technology-categories.md) ## API Reference - [Introduction to Crawl API](https://docs.diffbot.com/reference/crawl-introduction.md): Spider a site for links and processes them with Extract API - [Create a Crawl](https://docs.diffbot.com/reference/create-a-crawl.md): Create and start a job to spider and extract pages through a site. - [Manage a Crawl Job](https://docs.diffbot.com/reference/manage-a-crawl-job.md): Pause, delete, restart, or view the status of a crawl job. - [Retrieve Crawl Job Data](https://docs.diffbot.com/reference/retrieve-crawl-job-data.md): Download the extracted results of a crawl job - [Search Crawl Job Data](https://docs.diffbot.com/reference/search-crawl-job-data.md): Query your crawl job collections for data with DQL - [Download Results of Bulkjob](https://docs.diffbot.com/reference/bulkjobresultget.md): Download the result of a completed Enhance Bulkjob - [Download Results of Bulkjob](https://docs.diffbot.com/reference/bulkjobresultpost.md): Download the result of a completed Enhance Bulkjob - [Poll bulkjob status](https://docs.diffbot.com/reference/bulkjobstatus.md): Poll the status of an Enhance Bulkjob - [List Bulkjobs for Token](https://docs.diffbot.com/reference/bulkjobstatusfortoken.md): Poll the status of all Enhance Bulkjobs for a token - [Download Bulkjob Coverage Report](https://docs.diffbot.com/reference/coveragereportget-1.md): Download the coverage report of a completed Bulk Enhance job - [Delete Bulkjob](https://docs.diffbot.com/reference/deletebulkjob.md): Delete bulkjob - [Bulk Enhance](https://docs.diffbot.com/reference/bulk-enhance.md) - [Download Single Result of Bulkjob](https://docs.diffbot.com/reference/singleresult.md): Use this API to download the result of a single job within a bulkjob by specifying the index of the job. - [Stop Bulkjob](https://docs.diffbot.com/reference/stopbulkjob.md): Stop an active Enhance Bulkjob - [Create a Bulkjob](https://docs.diffbot.com/reference/submitbulkjob.md): Enhance multiple records in bulk asynchronously - [Combine](https://docs.diffbot.com/reference/combine-1.md): Enrich a person record and return both person and current employer data - [Enhance](https://docs.diffbot.com/reference/enhanceget.md): Enrich a person or organization record with partial data input - [Enhance](https://docs.diffbot.com/reference/enhancepost.md): Enrich a person or organization record with partial data input (POST option) - [Analyze](https://docs.diffbot.com/reference/extract-analyze.md): Automatically classify a page and extract data according to its type. - [Article](https://docs.diffbot.com/reference/article.md): Automatically extract clean article text and other data from news articles, blog posts and other text-heavy pages. - [Create or Update a Custom API](https://docs.diffbot.com/reference/create-a-custom-api.md): Create or update the parameters and ruleset of an existing Custom API - [Custom API Rulesets](https://docs.diffbot.com/reference/custom-api-rulesets.md): A set of rules and parameters defining what a Custom API actually extracts. - [Extract with Custom API](https://docs.diffbot.com/reference/custom.md): Extracts a page using a modified Extract API or a custom ruleset. - [Delete a Custom API](https://docs.diffbot.com/reference/delete-a-custom-api.md): Delete definitions of existing Custom APIs for a given URL pattern and API on your token - [Retrieve Custom APIs](https://docs.diffbot.com/reference/retrieve-a-custom-api.md): Get all the Custom APIs and their rules currently defined on your token - [Discussion](https://docs.diffbot.com/reference/discussion.md): Automatically structure and extract entire threads of reviews/comments from articles, product pages, and forum threads. - [Event](https://docs.diffbot.com/reference/event.md): Automatically extracts dates, location and address information, images and event descriptions from event pages. - [Extract Content Not Available Online](https://docs.diffbot.com/reference/extract-content-not-available-online.md): **POST** markup or plain text directly to any Extract API endpoint - [Custom Headers](https://docs.diffbot.com/reference/extract-custom-headers.md): Pass referrers, user-agents, and cookies to Extract APIs. - [Custom Javascript](https://docs.diffbot.com/reference/extract-custom-javascript.md): Inject custom Javascript code before an Extract API processes a page - [Introduction to Extract API](https://docs.diffbot.com/reference/extract-introduction.md): Extract uses computer vision and natural language processing to automatically categorize and extract their contents into clean, structured JSON. - [Optional Fields](https://docs.diffbot.com/reference/extract-optional-fields.md): Available with Extract APIs using the `&fields=` parameter - [Image](https://docs.diffbot.com/reference/image.md): Automatically identifies the primary image(s) on any web page and returns comprehensive information and metadata for each image. - [Job](https://docs.diffbot.com/reference/job.md): Automatically extracts structured information from job postings. - [List](https://docs.diffbot.com/reference/list.md): Automatically structures a list of items from news index pages, product listings pages, search engine results pages, and other "list-like" pages. - [Product](https://docs.diffbot.com/reference/product.md): Automatically extract pricing, product specs, images, and more from an e-commerce product page. - [Error 401: Not authorized API token](https://docs.diffbot.com/reference/error-401-not-authorized-api-token.md): The token you are supplying is not authorized to make this request - [Error 429: Site has received too many requests. Please try again later.](https://docs.diffbot.com/reference/error-429-too-many-requests.md): The website is slow to load, completely down, or is blocking Diffbot's servers. - [Error 457: Invalid API](https://docs.diffbot.com/reference/error-457-invalid-api.md): These are not the APIs you're looking for - [Error 500: Site has received too many requests.](https://docs.diffbot.com/reference/error-500-too-many-requests.md): We are at risk of getting blocked from this website, so we've temporarily limited access. - [Error 500: Unable to Apply Rules](https://docs.diffbot.com/reference/error-500-unable-to-apply-rules.md): Your custom rules do not match any element on the page. - [Error 404: Could Not Download Page](https://docs.diffbot.com/reference/error-could-not-download-page.md): The website is slow to load, completely down, or is blocking Diffbot's servers. - [Normalized HTML Fields for Article API](https://docs.diffbot.com/reference/normalized-html-fields.md): Diffbot's `html` field returns normalized HTML maintaining the structure and layout of the source article, while standardizing its element and attributes for reliable parsing and processing. - [Normalized Specifications for Product API](https://docs.diffbot.com/reference/normalized-product-specifications.md): The `normalizedSpecs` field returns a product's standardized/sanitized specifications. Numeric values for many specifications are normalized into a standard units. - [Troubleshooting "Failed to fetch" errors](https://docs.diffbot.com/reference/troubleshooting-failed-to-fetch-errors.md): Also known as "Load failed" on iOS - [Using Proxies](https://docs.diffbot.com/reference/using-proxies.md): Avoid rate limiting or throttling responses when extracting from certain websites. - [Video](https://docs.diffbot.com/reference/video.md): The Video API automatically extracts detailed video information—including most metadata, thumbnail images, direct video URL and embed code from nearly any video page or video platform on the web. - [Authentication](https://docs.diffbot.com/reference/authentication.md): All Diffbot APIs are authenticated via token. - [Introduction to Diffbot APIs](https://docs.diffbot.com/reference/introduction-to-diffbot-apis.md): AI that reads websites and structures them into facts - [Rate Limits](https://docs.diffbot.com/reference/rate-limits.md) - [Introduction to Natural Language API](https://docs.diffbot.com/reference/introduction-to-natural-language-api.md): Extract entities (e.g., people, organizations, products) and data about them (e.g., sentiment, relationships) from raw text - [Process Text](https://docs.diffbot.com/reference/nl-post.md) ## Changelog - [Extract Links from a Sitemap](https://docs.diffbot.com/changelog/extract-links-from-a-sitemap.md) - [Optional Param noredirects](https://docs.diffbot.com/changelog/optional-param-noredirects.md) - [50 billion new facts Added to the KG in the last month](https://docs.diffbot.com/changelog/50-billion-new-facts-added-to-the-kg-in-the-last-month.md) - [DiffbotAPI now returns Interactive Elements for 'mode=llm'](https://docs.diffbot.com/changelog/diffbotapi-now-returns-interactive-elements-for-modellm.md) - [NACE Code Rev 2.1 Updates](https://docs.diffbot.com/changelog/nace-code-rev-21-updates.md)