Getting Started

Opening the Add-in

To start using the Add-in, we recommend that you first open a blank worksheet, confirm the Add-in is available from the Home menu of the sheet in the right-hand corner. If not, follow the steps listed above to Insert the Add-In. Once available tap the Diffbot icon to begin.

You will see an option to Sign-in or Register for a Diffbot Trial Account. Tap the Sign-in/Register link to continue.

If you have an account, enter your Diffbot token and tap the blue "OK" button.

If you are now logged in, skip the next section of this document called Register for a Trial Account. Continue with the section that follows titled Search the Knowledge Graph. If you do not have a Diffbot account, tap the word here on the Sign-in/register screen of the Add-In to register for an account.

Register for a Trial Account

To register for a trial account, fill out the form presented, complete the captcha, and click submit.

For example:

A confirmation screen will appear if your information was submitted successfully.

Once you see the confirmation screen, check the email account you provided in the registration form for a Welcome Email from Diffbot. It will look like this:

You'll need the KG Trial Developer token to Sign-in to the Add-In. The token will appear in the Welcome email like this:

"To get started, you'll just need your kgtrial developer token --> 908a2961e751610b34a0f8b564040575"

Once you receive your developer token, close the confirmation screen, tap the Sign-in/Register link, enter your token, and click OK.

You will now see the Home screen of the Diffbot Add-in. From here you can perform either an Enhance or Search (Knowledge Graph query) operation. There is also a link to some tutorial videos that can help you get started with this Add-in.

Search the Knowledge Graph for Organizations

To search for organizations in the Diffbot Knowledge Graph from the Add-In, begin by tapping the Search button in the home screen after you've logged in. Tap Create Query to enter your filter criteria.

You have the option to specify one or more constraints to filter the results and return organizations that best meet your business criteria. As you define each attribute, the number of results dynamically updates to display how many entity matches will be returned given the inputs you've provided so far. Please note: no more than 1000 results will be returned per query submitted regardless of how many entities match your criteria.

Here's an example of a query defined to return software companies in the United States, founded after Jan 2019 that currently report no more than 500 employees:

The number of entities matching this query was 2,571 when the query was run. Match counts for the same query will vary with each build of the Diffbot Knowledge Graph (DKG). Please note: new builds of the DKG are released every 3-5 days on average.

You can click the "Advanced" tab in the query window to view the DQL query you have generated using the visual editor. On this screen you can also edit your query manually, and specify the number of results you would like returned.

Click Done when you've finished defining your search query inputs. Then click the Next button to constrain the attributes returned for each matching organization entity.

By default, all Organization attributes will be populated with values for each matching Organization entity returned. You can specify which you do not want returned by unchecking the box next to that attribute in the Output details screen. Please note: there will be one row per Organization returned with additional values added as new columns unless you specify that the data be returned as multiple rows.

You can bound the number of values returned and will receive no more than 3 values per attribute by default. For example, see subsidiaries in the Add-in output configuration screen below:

You can also decide the name of the sheet the output will be written to. By default it will be written to a sheet called Organizations Output. After finalizing the search results output format and scope, scroll to the bottom of the Output Details screen and click the Execute button. The results of your search will begin to flow into the sheet called Organizations Output or whatever sheet name you specified in the Output details screen.

When the data finishes loading the screen will still display "Please Wait" but will indicate Done. Time Elapsed... in the white section of the status box as displayed below:

Click the 'x' next to Please Wait to return to the Add-In to take other actions. Or, close the Add-In by tapping the Diffy Icon in the upper right corner of the screen of the Excel worksheet (see above) to view the resulting output in the sheet.

Sample Organization Profile Data

Diffbot has supplied sample data to make it easy to familiarize users with the Diffbot Add-in data enrichment features and functions. The sample data is a list of the Fortune 1000 companies from some point in time in the past. We recommend including all attributes in your output in order to familiarize yourself with the abbreviated Diffbot Organization profile. For a more complete data profile, use the Diffbot URL provided to view the Diffbot Knowledge Graph Organization entity in the Diffbot Developer Dashboard. From there you can download a complete JSON profile record that contains a full set of known attributes for that Organization.

Enhance an Organization Profile with Sample Data

To enrich organization profiles click the Enhance button from the Home screen of the Add-In or tap Menu and select Enhance.

We recommend that you experiment with organization profile enrichment inputs and outputs before enriching your own data. To access Sample Data, scroll to the bottom of the Add-in Input screen and click the purple button labeled Create Sample Input Data.

You will see sample inputs flow into a sheet named Input sheet - example in three columns: ID, Organization Name, Organization URL (site root domain and top-level domain, only, e.g. mysite.com). Please note: the example input sheet is locked and is only intended for demonstration purposes. You will not be able to edit the inputs.

You'll note that the input columns in the example sheet are mapped to the Add-in Input screen, columns A-C. And, we've indicated the data has a header row. (The Location attribute is not populated in the example sheet nor is it mapped to the Add-In Input screen. Diffbot Entity ID is treated as Location in this example). The sheet name used to submit data for enrichment is also identified in the Add-In Input screen. This information has all been automatically populated as part of the example data load. Click the Next button to customize the output returned. Please note: any data enrichment tasks you run on your own data require you to map your worksheet name and columns to the Add-In Enhance Input screen manually. They will not populate automatically.

By default, all Organization attributes listed in the Output Details screen will be returned for each matched entity (those that are populated with data in the Diffbot Knowledge Graph). You can specify which attributes you do not want returned by unchecking the box next to that attribute. You can also drag the attributes around, to set the order that you would like the output columns to be returned in. Please note: there will be one row per Organization returned with additional values added as new columns unless you specify that the data be returned as multiple rows. You can bound the values you want returned and will receive no more than 3 values per attribute by default. For example, see subsidiaries in the Add-in output configuration screens below

After configuring your desired output, scroll to the bottom of the Output details screen in the Add-In and click the Execute Sample Data button to view the Sample output. Please note: the request will not count against your account budget.

Note here that while your Enhance job is running, you can click the "Cancel" button at any time to halt execution.

The output of the Sample Data/Sample Data Execution flows into a new sheet called Organization Outputs - example. Please note: If you run this process more than once and try to use an existing sheet, you will be asked to confirm that you wish to overwrite it with a new output. You will have to confirm the overwrite or cancel and name a new sheet to launch the enrichment task.

The dialog box in the Add-in will advise you of the progress of the task. Each row will come back as it completes. You can re-sort your data output using the ID column once the job finishes.

You will need to click the 'x' in the Please Wait dialog box when the window indicates the task is Done. You can then tap the 'x' in the upper right corner of the Add-In screen to close the Add-In or tap the Diffy icon to close it and view the data in the output sheet.

Once your Enhance operation has completed, you can try clicking a Diffbot URL in the worksheet. A button will show up in the Add-in pane, which you can click to open that Knowledge Graph entity in the browser.

Enriching Your Own Organization Data

The process for enriching your own organization profile data is similar to enriching the Sample dataset provided. Below is an example of a custom flow.

Please note: any data enrichment tasks you run on your own data require you to map your Sheet Name and columns to the Add-In Enhance Input screen manually. They will not populate automatically.

By default the output is written to a sheet named Organization Output. Please note: If you run this process more than once and do not modify the output sheet name, you will be asked to confirm that you wish to overwrite an existing sheet with a new output. You will have to confirm the overwrite or cancel and name a new output sheet to launch the enrichment task.

Add-in inputs for Organizations

To use the Diffbot Excel Add-In for data enrichment, you must include

  • a unique identifier per row in your input worksheet (Required)

plus AT LEAST one of the following:

  • Organization Name, or
  • Website homepageUri (expressed as a site domain and top-level domain, only. For example, mysite.com or whisper.ai, NOT www.whisper.ai or blog.mysite.com UNLESS the business is hosted on a single subdomain ), or
  • Diffbot Entity ID/URI (e.g. Cgqq17ZsBPZupmJesRVCKIg or http://diffbot.com/entity/Cgqq17ZsBPZupmJesRVCKIg)

You may also choose to constrain your match further by including

  • Location (CITY - e.g. San Francisco, STATE or Province e.g. CA of headquarters or primary address)

In general, it is best to start with a unique ID and one attribute (Organization name, Organization website or Org Diffbot Entity ID) to maximize the match rate.

Add-in outputs for Organizations

The Add-In will return the following organization attribute details by default (when available in the Graph):

  • Organization Name
  • Estimated Number of Employees (based on company filings)
  • Description of Organization
  • Founding Date
  • Social Media Profile URLs (LinkedIn, Crunchbase, Facebook)
  • Websites (main site, blog, Wikipedia page)
  • Location (organization headquarters location)
  • Financial Data (yearly revenues, quarterly revenues, stock symbol, stock exchange, total investments, IPO date)
  • List of Industries associated with that Organization
  • Key People currently linked to/accountable for the organization (CEOs, Founders, Board Members)
  • Links to in-depth Knowledge Graph Entity Pages (for organizations and key people)

Knowledge Graph Organization Search Query Constraints

When you seek companies that match your business criteria, a search of the graph can produce a list of matching profiles. You have the option to constrain your search using:

  • Estimated Number of Employees (based on company filings)
  • Founding Date ranges
  • Location (organization headquarters location) and/or
  • Industry labels

Support for financial data constraints (annual revenues, total investments) is planned for a future version of the Add-In.

Search the Knowledge Graph for Articles

To search for articles in the Diffbot Knowledge Graph from the Add-In, begin by tapping the Search button in the home screen after you've logged in. Then tap 'Create Query' and select 'Article' from the 'Data Type' Menu.

You have the option to specify one or more constraints to filter the results and return articles that best meet your business criteria. As you define each attribute, the number of results dynamically updates to display how many entity matches will be returned given the inputs you've provided so far. Please note: no more than 1000 results will be returned per query submitted regardless of how many entities match your criteria.

Here's an example of a query defined to return articles tagged 'COVID19', published on or after April 15, 2020 that currently reference UCSF:

The number of entities matching this query was 54 when the query was run. Match counts for the same query will vary overtime as new articles that match the criteria are added to the Graph. Please note: complete builds of the DKG are released every 3-5 days on average. Article data is continuously added to the Graph 24x7, 365. Click Done when you've finished defining your search query inputs. Then click the Next button to constrain the attributes returned for each matching article entity.

By default, all Article attributes will be populated with values for each matching articles returned. You can specify which you do not want returned by unchecking the box next to that attribute in the Output details screen. Please note: there will be one row per Article returned with additional values added as new columns unless you specify that the data be returned as multiple rows.

You can bound the number of values returned and will receive no more than 3 values per attribute by default. For example, see Quotes in the Add-in output configuration screen below:

You can also decide the name of the sheet the output will be written to. By default it will be written to a sheet called Articles Output. After finalizing the search results output format and scope, scroll to the bottom of the Output Details screen and click the Execute button. The results of your search will begin to flow into the sheet called Articles Output or whatever sheet name you specified in the Output details screen.

When the data finishes loading the screen will still display "Please Wait" but will indicate Done. Time Elapsed... in the white section of the status box as displayed below:

Click the 'x' next to Please Wait to return to the Add-In to take other actions. Or, close the Add-In by tapping the Diffy Icon in the upper right corner of the screen of the Excel worksheet (see above) to view the resulting output in the sheet.

Add-in outputs for Articles

The Add-In will return the following article attribute details by default (when available in the Graph):

  • Website (root publisher or host name and pageUrl)
  • Author (Name and Url)
  • Title
  • Dates (Publication date and Crawl date)
  • Publisher (region and country)
  • Language
  • Tags (Generated from analysis of the extracted text)
  • Quotes (Found in the article text and the person credited with each quote)
  • Links to in-depth Knowledge Graph Entity Pages (for articles, authors and publishers)

Knowledge Graph Article Search Query Constraints

When you seek articles that match your criteria, you have the option to constrain your search using:

  • Tag Labels (Generated from analysis of the extracted text and linked to Organizations, Persons, Locations, or Topics in the graph)
  • Keywords Found in the Article Title
  • Keywords Found in the Text of the Article
  • Language
  • Date or Date Range
  • Website (root publisher or host name)

Support for entityID tag constraints (DiffbotIDs for Organizations or other entities mentioned in an article) is planned for a future version of the Add-In.

Getting Help

If you have any questions or comments about this tool, or just need help, you can click the Intercom logo in the Diffbot Add-in pane to send us a message. You can also reach out to Diffbot Support at [email protected]

Add-in pane with Intercom iconAdd-in pane with Intercom icon


Did this page help you?