Products Overview

A cheatsheet of all Diffbot products and how they are used

Extract

Automatically categorize and extract the contents of web pages into clean, structured JSON. Learn more about Extract.

Use Extract if...

  • You have an exact URL for an exact page you want data from
  • Your service requires synchronous extraction
  • Examples:
    • Getting product data when a user enters a product URL on a wedding registry site
    • Getting clean article text for Natural Language Processing projects
    • Extracting an HTML table into a CSV

Bulk

An asynchronous, bulk processor for Extract. Learn more about Bulk

Use Bulk if...

  • You're extracting very large lists (100+) of exact URLs
  • Your service is compatible with asynchronous extraction
  • Examples:
    • Refreshing the metadata of an existing product database

Crawl

Spider a site for links and process them with Extract. Learn more about Crawl.

Use Crawl if...

  • You don't have exact URLs, but the data you want is somewhere in a known domain
  • Your service is compatible with asynchronous extraction
  • Examples:
    • Getting all the articles from a blog
    • Getting all the products from a category on an e-commerce website
    • Finding a specific piece of content somewhere on a domain (i.e. privacy policy, pricing, documentation)

Knowledge Graph

A graph database of over 10 billion entities (news organizations, people, and more) crawled and structured from all over the public web. Learn more about searching it, or enhancing your data with it.

Use Search if...

  • You know what data you need, but you're not sure where to find it
  • The data and fields you need falls into one of these primary KG data types — News & Articles, Organizations, or People.
  • Examples:
    • Building a news feed on a niche topic
    • Sourcing a list of companies for sales/M&A
    • Market research on an industry

Use Enhance if..

  • You have some data on Organizations or People and want to enrich your current dataset with more
  • You're trying to fuzzy match a name to an organization or person
  • Examples:
    • Cleaning up and normalizing dirty org/people data in a database
    • Filling in a "revenue" column in your spreadsheet of orgs/people without manual googling
    • Auto-filling lead gen forms on websites

Natural Language Processing

Extract entities, classify, and understand the context of raw text programmatically. Learn more about NLP.

Use Natural Language Processing if...

  • You have raw text and
    • You want to programmatically understand it
    • You want to find all the mentioned entities in it
    • You want to get the sentiment of a piece of raw text
    • You want to identify and extract facts mentioned in the text
    • You want to classify the text into a category
  • Examples:
    • Programmatically tagging and filtering content from a content publisher
    • Extracting facts from a Wikipedia entry, "About Me" page, or bio paragraph