July 1st, 2019
Added support for multiple Headquarters locations for Organizations.
Added support for multiple stock exchange symbol/pairs.
Improved extraction of city from neighborhoods.
Added support for display of English tags for Non-English taggers.
Trained a Dutch Entitylinker.
Improved RawDataSentinels supporting Organization data ingest including subsidiary data
Improved sub-record linking between Organizations and Founders.
Now force extraction of Headquarter address from HQ building entity.
Now ensure countries are always classified as administrative areas.
Populated missing address in location for 81Mil organizations.
Improved the error message returned for mismatched quotes in DQL queries.
Ensured users have the ability to stop or pause a crawl between crawl rounds from the Dashboard.
Forced the persistence of the assignment of a customAPI to a crawl job.
Set the article title in the field.
Now rank person images for Person profiles.
In DKG: facet-ing on parent key for enums now expand to .normalizedValue
Now cache Person and Organization images, including logos.
June 1st, 2019
Committed to delivering 100% accuracy of 'Fortune 1000' Company entity profile core facts (name, headquarters location, website, CEO, founders, logo, isPublic, parent organization, year founded, stock ticker symbol and exchanges, twitter handle, size attributes - employee count & annual top-line revenues) in the Diffbot KnowledgeGraph (DKG).
Enhanced isPublic field population in the DKG.
Enhanced stock ticker symbol extraction in the DKG.
Fixed rules for assigning min and max employees to an Organization in the DKG.
Enriched 3Mil organizations with no revenue data in the DKG.
Improved selection of location for Organization.location in the DKG.
Improved evaluation of postal codes when an address has no street address in the DKG.
Enhanced age calculation/inference in the DKG.
Improved Candidate selection for email address and phone number in the DKG.
Added support for > and < for date/time fields in DQL.
Querying on a DiffbotURI is now strict by default in DQL.
Added support for type:Post (discussions) to DQL.
Added contextually embedded links to docs from the Crawlbot UI.
May 1st, 2019
We addressed missing revenues for over 80Mil company entities in the Diffbot KnowledgeGraph (DKG).
Improved DKG entity postal code assignments.
Improved DKG entity Stock Exchange assignments
We removed cookie disclaimer text from DKG entity descriptions.
We improved Organization entity classification in the DKG.
We added the ability to facet on Organization name tokens in DQL.
We expanded currency support in the Diffbot extraction APIs to include ALL currencies in Europe in addition to the European Union (Euro currency standard).
We
improved DQL error messages.
We lifted the limit on facet pagination.
Organization size attributes are now supported in facets.
We normalized Organization entity importance in the DKG to score between 1 and 100.
April 1st, 2019
Improved Organization Data Quality (i.e. sub-record linking of CEOs and Founders) in the Diffbot KnowledgeGraph (DKG).
Added dedicated process to parse subsidiary entities in the DKG.
Added support for multiple Person/Organization descriptions in the DKG.
Fixed date/timestamp conversion bugs in DQL.
Optimized revenue.value and revenue.currency extractions for Organization profile data in the DKG.
Added support for pagination of facets in DQL.
Added support for querying by tags for type:Image in DQL.
Added facet count to the Diffbot KnowledgeGraph Search API response.
February 1st, 2019
Extended coverage of Entities located or residing in Asia to the Diffbot KnowledgeGraph.
Added support for the strict operator to DQL.
December 1st, 2018
Improved date/time extraction, timezone support in Diffbot extraction APIs.
Added support for 'has:'operator to DQL for Articles and Products.
October 1st, 2018
Added DQL support for type:Product has:breadcrumb.name
Added support for computation of total investment when individual investments have different currencies (Organization Profile).
Added support for svg image file type for Entity images.
Added indexing of Entity description fields.
Improved tokenization for Chinese/Japanese tagging.
Added hit count for facets.
August 1st, 2018
Launched the Diffbot Knowledge Graph including a new developer Dashboard, embedded ontology documentation, and an OpenAPI spec.
February 27th, 2018
URL Report downloads are now sorted in newest-first order
Crawlbot now indexes the seed URL of each extracted object in the fromSeedUrl
field.
January 5th, 2018
Crawlbot API: Added the useCanonical argument to allow disabling of canonical URL deduplication on specific crawls.