January 2020

by Jerome Choo
  • Improved handling of tables and lists in Article data to better support Entity tagging and linking.
  • Optimized entity tagging in the Article Title. It now occurs when the same entity is mentioned in the title and text of the Article.
  • Improved location data extraction in the AnalyzeAPI for events.
  • Added a Diffbot Excel Plug-in to enable our clients to TestDrive Diffbot's data enrichment API. The beta service currently supports organization firmographic profile data enrichment.

December 2019

by Jerome Choo
  • Launched new Renderer architecture in support of Crawlbot and DiffbotAPI services.
  • Added descriptions to 90+ million Organization entities in the KG.
  • Added 300+ local US news sources to Article data.

November 2019

by Jerome Choo
  • Improved date/time handling
  • Improved linking of Board Members to Organizations
  • Added revisit/update frequency signals based on whether or not a profile was accessed in the last 30 days.

October 2019

by Jerome Choo
  • Added Longitude and/or Latitude data to 53M Organizations
  • Added sicClassification attribute to Organizations
  • Created a more robust employment category taxonomy and ML model in support of employment data
  • Improved coverage of parentCompany attribute for subsidiary organization entities
  • Normalized stock exchange labels to improve filtering and discoverability.
  • Deployed bug fixes to developer Dashboards

September 2019

by Jerome Choo
  • Added support for the inclusion of RocketReach email contact data (in addition to LeadIQ).
  • Added support for extraction of the headquarter address from headquarter building entity.
  • Began improvements to Record Linking for Organizations with emphasis on improving subsidiary data accuracy.
  • Improved coverage of Org and Person data records with a focus on: 'educated at', 'member of', 'owner of', and 'position held' data fields.
  • Improved Role Classifications: separating CEO and Director.
  • Enhanced and extended the Visual Query Builder Tool in the Developer Dashboard.

August 2019

by Jerome Choo
  • Size is now supported in facet queries for articles
  • Enabled access to crawls and bulk jobs created on child tokens from the app.diffbot.com UI when logged in under the parent token.
  • Enabled the cloning of a crawl from a crawl job page from the app.diffbot.com UI.
  • Made significant improvements to the performance of the app.diffbot.com UI.
  • Added location inference to the Natural Language API.
  • Improved how importance score is generated for spam profiles.
  • Improved deduplication on Organization Founders.
  • Now avoid linking to the same DiffbotURI for some fields, such as the parent and subsidiary entities cannot link to the same unique identifier - Google and Alphabet must have unique IDs.
  • Removed bad descriptions from the allDescriptions field.
  • Improved age calculation/inference logic.

July 2019

by Jerome Choo
  • Added support for multiple Headquarters locations for Organizations.
  • Added support for multiple stock exchange symbol/pairs.
  • Improved extraction of city from neighborhoods.
  • Added support for display of English tags for Non-English taggers.
  • Trained a Dutch Entitylinker.
  • Improved RawDataSentinels supporting Organization data ingest including subsidiary data
  • Improved sub-record linking between Organizations and Founders.
  • Now force extraction of Headquarter address from HQ building entity.
  • Now ensure countries are always classified as administrative areas.
  • Populated missing address in location for 81Mil organizations.
  • Improved the error message returned for mismatched quotes in DQL queries.
  • Ensured users have the ability to stop or pause a crawl between crawl rounds from the Dashboard.
  • Forced the persistence of the assignment of a customAPI to a crawl job.
  • Set the article title in the field.
  • Now rank person images for Person profiles.
  • In DKG: facet-ing on parent key for enums now expand to .normalizedValue
  • Now cache Person and Organization images, including logos.

June 2019

by Jerome Choo
  • Committed to delivering 100% accuracy of 'Fortune 1000' Company entity profile core facts (name, headquarters location, website, CEO, founders, logo, isPublic, parent organization, year founded, stock ticker symbol and exchanges, twitter handle, size attributes - employee count & annual top-line revenues) in the Diffbot KnowledgeGraph (DKG).
  • Enhanced isPublic field population in the DKG.
  • Enhanced stock ticker symbol extraction in the DKG.
  • Fixed rules for assigning min and max employees to an Organization in the DKG.
  • Enriched 3Mil organizations with no revenue data in the DKG.
  • Improved selection of location for Organization.location in the DKG.
  • Improved evaluation of postal codes when an address has no street address in the DKG.
  • Enhanced age calculation/inference in the DKG.
  • Improved Candidate selection for email address and phone number in the DKG.
  • Added support for > and < for date/time fields in DQL.
  • Querying on a DiffbotURI is now strict by default in DQL.
  • Added support for type:Post (discussions) to DQL.
  • Added contextually embedded links to docs from the Crawlbot UI.

May 2019

by Jerome Choo
  • We addressed missing revenues for over 80Mil company entities in the Diffbot KnowledgeGraph (DKG).
  • Improved DKG entity postal code assignments.
  • Improved DKG entity Stock Exchange assignments
  • We removed cookie disclaimer text from DKG entity descriptions.
  • We improved Organization entity classification in the DKG.
  • We added the ability to facet on Organization name tokens in DQL.
  • We expanded currency support in the Diffbot extraction APIs to include ALL currencies in Europe in addition to the European Union (Euro currency standard).
  • We 
improved DQL error messages.
  • We lifted the limit on facet pagination.
  • Organization size attributes are now supported in facets.
  • We normalized Organization entity importance in the DKG to score between 1 and 100.

April 2019

by Jerome Choo
  • Improved Organization Data Quality (i.e. sub-record linking of CEOs and Founders) in the Diffbot KnowledgeGraph (DKG).
  • Added dedicated process to parse subsidiary entities in the DKG.
  • Added support for multiple Person/Organization descriptions in the DKG.
  • Fixed date/timestamp conversion bugs in DQL.
  • Optimized revenue.value and revenue.currency extractions for Organization profile data in the DKG.
  • Added support for pagination of facets in DQL.
  • Added support for querying by tags for type:Image in DQL.
  • Added facet count to the Diffbot KnowledgeGraph Search API response.