2016-01-14

by Jerome Choo
  • Improved specification extraction in the Product API.
  • Fixed an issue where the estimatedDate field (Article API) would sometimes not be correctly computed.

2016-01-07

by Jerome Choo
  • Fixed an issue where the <base> element could be incorrectly use to calculate relative paths.
  • Added initial functionality to categorize articles in the Article API based on article text content. If you would like to test this beta feature, contact us.
  • Improved handling of media sources without a specified protocol (e.g. src="//www.youtube.com...). Media element URLs will now match the protocol of the analyzed page.

2015-12-21

by Jerome Choo
  • Crawlbot and Bulk jobs pending delete (per your Diffbot plan) are now identified in the Crawlbot and Bulk interfaces.
  • The API Toolkit now uses Diffbot's custom rendering engine for live web page previews. This should reduce inaccuracies when creating custom rules.

2015-12-18

by Jerome Choo
  • Fixed an issue where plain-text POSTed to the Article API would not perform text analysis (tags, sentiment, language-detection).
  • Improved Crawlbot behavior on Ajax-heavy sites so that pages with the exact same HTML source are no longer deduplicated.
  • Fixed an issue within the Crawlbot and Bulk interfaces where the "Last 500" URL Report was incorrectly returning the first 500.
  • Improved author detection within the Article API.

2015-12-07

by Jerome Choo

The Analyze API now supports POSTed content.

2015-11-27

by Jerome Choo

The Account API now returns a list of child or sub-tokens.

2015-11-19

by Jerome Choo
  • Fixed an issue in the Analyze API where products with an API-Toolkit-overridden price field would not reflect changes in the "details" field (offerPriceDetails, regularPriceDetails, etc.).
  • Fixed an Article API issue for certain top-level domains where articles dated in the near future (e.g., tomorrow) would incorrectly be returned with a date from the prior year.

2015-11-11

by Jerome Choo
  • Crawlbot will now successfully spider URLs that contain (invalid) UTF-8 characters.
  • Global Index API: search-by-tag can optionally be performed using a tag-match shorthand.

2015-10-16

by Jerome Choo
  • Fixed an issue where Crawlbot and Bulk API data downloads did not include a filename.
  • The breadcrumb element is now a default field in the Article API.

2015-10-22

by Jerome Choo
  • We now offer an Account API for tracking token API usage and billing history.
  • Global Index API: negative search queries (diffbot AND -"machine learning") are now functioning as documented;