This guide is meant to provide detailed information about how to download and use the Diffbot Excel Add-in to facilitate the use of the Diffbot Knowledge Graph data enrichment and search APIs from your desktop. You can also use the Add-In from an Office365 spreadsheet.
The Diffbot Excel Add-in is offered as a “production” beta service while we collect feedback on the features and functionality. This version of the Add-In provides support for organization profile search and data enrichment.
Please note: person profile search and data enrichment is not yet supported from within the Add-In, but it is planned for a future release.
Version 184.108.40.206 of the plug-in supports integration with the Diffbot Enhance API for data enrichment of organization firmographic profiles and integration with the Diffbot Knowledge Graph Search API in support of discovery of companies matching size, industry, founding date, and/or financial criteria.
How much data are we talking about?
Enrich or search organizational data with access to over 180M profiles from Diffbot’s Knowledge Graph. Start with a list of entities or a set of criteria. End with a deep dive into organizational data. Pull data on one or a hundred organizations with the click of a button.
How does it work?
Diffbot’s Knowledge Graph is compiled by machine learning-enabled web scrapers that turn unstructured data from around the web into a structured, queryable database.
The Knowledge Graph contains over 20 billion entities (organizations, locations, articles, key people, brands, and more), and over 2 trillion facts (revenue, price, skills, and more). All entities are contextually linked and sourced from public-facing documents around the web. A new Knowledge Graph is compiled every 3-5 days, ensuring data freshness and accuracy.
You can use Diffbot’s Excel Add-In to search or enrich organizational entries without needing to leave your Excel workbook.
Define a search query using the Add-in query builder UI within an Excel spreadsheet on your desktop or in Office365. Or, provide one or more columns containing organization names or website domains and an additional column containing a unique identifier per row. Point Diffbot’s Excel Add-In to the location of your names and identifiers and toggle what fields of data you want to be returned. Upon execution, Diffbot’s Excel Add-In populates a new sheet that pairs the organizations and unique identifiers you provided with all available data attributes much more quickly and comprehensively than could be manually compiled.
You will find the Diffbot Excel Add-in in the Microsoft store here: https://appsource.microsoft.com/en-us/product/office/WA200001213.
Step By Step, Set-up Guide
You will start on the Diffbot Add-In page. To continue, click the GET IT NOW button in the left-hand column.
Accept the Microsoft store terms to continue to the download page.
The Diffbot add-in works in Excel 2016 for Mac, Excel 2016 or later, Excel Online. Click the green button to Open in Excel on your desktop, or if you do not meet the criteria listed above, try using Office Online.
Microsoft will ask you to confirm you want to open the Excel app on your device and download the Diffbot Add-in. To confirm, click the button that reads Open the Microsoft Excel App. You may be asked to login to your Microsoft account to confirm the download
Open any spreadsheet from the Excel app home screen and click ‘Insert’ in the spreadsheet menu. You will see options that include ‘Get Add-ins’ and ‘My Add-ins’.
First click ‘My Add-ins’. If you do not see the Diffbot Add-in listed, then select recent add-ins from the ‘My Add-ins’ drop-down menu. You should see the Diffot Add-in in the window that is displayed. Select it and click the Add button at the bottom of the window.
If successful, you will be taken to the spreadsheet Home where you can tap the Diffbot icon on the far right to begin.
If you do not see the Diffbot Add-in as an option under ‘My Add-ins’, then select, ‘Get Add-ins’. You will be taken to the Microsoft Add-in Marketplace where you will search for Diffbot in the search box in the upper left corner.
Click the Add button next to the Diffbot Add-in.
And confirm you agree to Terms and Conditions.
You will be taken to your desktop Excel sheet and will see a confirmation that the Add-in was successful. The Diffbot robot icon, Diffy, will appear in the right corner of the spreadsheet Home menu.
If your desktop does not support Excel 2016 for Mac, Excel 2016 or later, Excel Online, try using Office Online. You will follow a similar workflow but from within Office365 instead of from your desktop Excel app.
Getting Started Guide
Opening the Add-in
To start using the Add-in, we recommend that you first open a blank worksheet, confirm the Add-in is available from the Home menu of the sheet in the right-hand corner. If not, follow the steps listed above to Insert the Add-In. Once available tap the Diffbot icon to begin.
You will see an option to Sign-in or Register for a Diffbot Trial Account. Tap the Sign-in/Register link to continue.
If you have an account, enter your Diffbot token and tap the blue ‘OK’ button.
If you are now logged in, skip the next section of this document called Register for a Trial Account. Continue with the section that follows titled Search the Knowledge Graph. If you do not have a Diffbot account, tap the word here on the Sign-in/register screen of the Add-In to register for an account.
Register for a Trial Account
To register for a trial account, fill out the form presented, complete the captcha, and click submit.
A confirmation screen will appear if your information was submitted successfully.
Once you see the confirmation screen, check the email account you provided in the registration form for a Welcome Email from Diffbot. It will look like this:
You’ll need the KG Trial Developer token to Sign-in to the Add-In. The token will appear in the Welcome email like this:
"To get started, you'll just need your kgtrial developer token --> 908a2961e751610b34a0f8b564040575"
Once you receive your developer token, close the confirmation screen, tap the Sign-in/Register link, enter your token, and click OK.
You will now see the Home screen of the Diffbot Add-in.
Search the Knowledge Graph
To search for organizations in the Diffbot Knowledge Graph from the Add-In, begin by tapping the Search button in the home screen after you’ve logged in.
You have the option to specify one or more constraints to filter the results and return organizations that best meet your business criteria. As you define each attribute, the number of results dynamically updates to display how many entity matches will be returned given the inputs you’ve provided so far. Please note: no more than 1000 results will be returned per query submitted regardless of how many entities match your criteria.
Here’s an example of a query defined to return software companies in the United States, founded after Jan 2019 that currently report no more than 500 employees:
The number of entities matching this query was 441 when the query was run. Match counts for the same query will vary with each build of the Diffbot Knowledge Graph (DKG). Please note: new builds of the DKG are released every 3-5 days on average. Click Done when you’ve finished defining your search query inputs. Then click the Next button to constrain the attributes returned for each matching organization entity.
By default, all Organization attributes will be populated with values for each matching Organization entity returned. You can specify which you do not want returned by unchecking the box next to that attribute in the Output details screen. Please note: there will be one row per Organization returned with additional values added as new columns unless you specify that the data be returned as multiple rows.
You can bound the number of values returned and will receive no more than 3 values per attribute by default. For example, see subsidiaries in the Add-in output configuration screen below:
You can also decide the name of the sheet the output will be written to. By default it will be written to a sheet called Organizations Output. After finalizing the search results output format and scope, scroll to the bottom of the Output Details screen and click the Execute button. The results of your search will begin to flow into the sheet called Organizations Output or whatever sheet name you specified in the Output details screen.
When the data finishes loading the screen will still display “Please Wait’ but will indicate Done. Time Elapsed... in the white section of the status box as displayed below:
Click the ‘x’ next to Please Wait to return to the Add-In to take other actions. Or, close the Add-In by tapping the Diffy Icon in the upper right corner of the screen of the Excel worksheet (see above) to view the resulting output in the sheet.
Diffbot has supplied sample data to make it easy to familiarize users with the Diffbot Add-in data enrichment features and functions. The sample data is a list of the Fortune 1000 companies from some point in time in the past. We recommend including all attributes in your output in order to familiarize yourself with the abbreviated Diffbot Organization profile. For a more complete data profile, use the Diffbot URL provided to view the Diffbot Knowledge Graph Organization entity in the Diffbot Developer Dashboard. From there you can download a complete JSON profile record that contains a full set of known attributes for that Organization.
Enhance a Profile with Sample Data
To enrich organization profiles click the Enhance button from the Home screen of the Add-In or tap Menu and select Enhance.
We recommend that you experiment with organization profile enrichment inputs and outputs before enriching your own data. To access Sample Data, scroll to the bottom of the Add-in Input screen and click the purple button labeled Create Sample Input Data.
You will see sample inputs flow into a sheet named Input sheet - example in three columns: ID, Organization Name, Organization URL (site root domain and top-level domain, only, e.g. mysite.com). Please note: the example input sheet is locked and is only intended for demonstration purposes. You will not be able to edit the inputs.
You’ll note that the input columns in the example sheet are mapped to the Add-in Input screen, columns A-C. And, we’ve indicated the data has a header row. (The Location attribute is not populated in the example sheet nor is it mapped to the Add-In Input screen. Diffbot Entity ID is treated as Location in this example). The sheet name used to submit data for enrichment is also identified in the Add-In Input screen. This information has all been automatically populated as part of the example data load. Click the Next button to customize the output returned. Please note: any data enrichment tasks you run on your own data require you to map your worksheet name and columns to the Add-In Enhance Input screen manually. They will not populate automatically.
By default, all Organization attributes listed in the Output Details screen will be returned for each matched entity (those that are populated with data in the Diffbot Knowledge Graph). You can specify which attributes you do not want returned by unchecking the box next to that attribute. Please note: there will be one row per Organization returned with additional values added as new columns unless you specify that the data be returned as multiple rows. You can bound the values you want returned and will receive no more than 3 values per attribute by default. For example, see subsidiaries in the Add-in output configuration screens below
After configuring your desired output, scroll to the bottom of the Output details screen in the Add-In and click the Execute Sample Data button to view the Sample output. Please note: the request will not count against your account budget.
The output of the Sample Data/Sample Data Execution flows into a new sheet called Organization Outputs - example. Please note: If you run this process more than once and try to use an existing sheet, you will be asked to confirm that you wish to overwrite it with a new output. You will have to confirm the overwrite or cancel and name a new sheet to launch the enrichment task.
The dialog box in the Add-in will advise you of the progress of the task. Each row will come back as it completes. You can re-sort your data output using the ID column once the job finishes.
You will need to click the ‘X” in the Please Wait dialog box when the window indicates the task is Done. You can then tap the ‘x’ in the upper right corner of the Add-In screen to close the Add-In or tap the Diffy icon to close it and view the data in the output sheet.
Enriching Your Own Data
The process for enriching your own organization profile data is similar to enriching the Sample dataset provided. Below is an example of a custom flow.
Please note: any data enrichment tasks you run on your own data require you to map your Sheet Name and columns to the Add-In Enhance Input screen manually. They will not populate automatically.
By default the output is written to a sheet named Organization Output. Please note: If you run this process more than once and do not modify the output sheet name, you will be asked to confirm that you wish to overwrite an existing sheet with a new output. You will have to confirm the overwrite or cancel and name a new output sheet to launch the enrichment task.
To use the Diffbot Excel Add-In for data enrichment, you must include
- a unique identifier per row in your input worksheet (Required)
plus AT LEAST one of the following:
- Organization Name, or
- Website homepageUri (expressed as a site domain and top-level domain, only. For example, mysite.com or whisper.ai, NOT www.whisper.ai or blog.mysite.com UNLESS the business is hosted on a single subdomain ), or
- Diffbot Entity ID/URI (e.g. Cgqq17ZsBPZupmJesRVCKIg or http://diffbot.com/entity/Cgqq17ZsBPZupmJesRVCKIg)
You may also choose to constrain your match further by including
- Location (CITY - e.g. San Francisco, STATE or Province e.g. CA of headquarters or primary address)
In general, it is best to start with a unique ID and one attribute (Organization name, Organization website or Org Diffbot Entity ID) to maximize the match rate.
The Add-In will return the following organization attribute details by default (when available in the Graph):
- Organization Name
- Estimated Number of Employees (based on company filings)
- Description of Organization
- Founding Date
- Social Media Profile URLs (LinkedIn, Crunchbase, Facebook)
- Websites (main site, blog, Wikipedia page)
- Location (organization headquarters location)
- Financial Data (yearly revenues, quarterly revenues, stock symbol, stock exchange, total investments, IPO date)
- List of Industries associated with that Organization
- Key People currently linked to/accountable for the organization (CEOs, Founders, Board Members)
- Links to in-depth Knowledge Graph Entity Pages (for organizations and key people)
Knowledge Graph Search Query Constraints
When you seek companies that match your business criteria, a search of the graph can produce a list of matching profiles. You have the option to constrain your search using:
- Estimated Number of Employees (based on company filings)
- Founding Date ranges
- Location (organization headquarters location) and/or
- Industry labels
Support for financial data constraints (annual revenues, total investments) is planned for a future version of the Add-In.