2017-08-31

by Jerome Choo
  • Fixed an issue in the Video API where the url value would retain HTML escaping if present within the original page source.
  • Fixed a rare crawling issue that occasionally resulted in "Bad IP" status messages for individual pages.
  • Fixed an issue where empty <video> elements could be returned in the Article API.

2017-08-15

by Jerome Choo

Fixed an issue in the Global Index in which complicated Boolean (OR) queries would return no results.

2017-08-08

by Jerome Choo
  • Improved date normalization to include Hijri and Jalali dates
  • Fixed support for unicode characters in API Toolkit rules

2017-05-22

by Jerome Choo
  • Many improvements to brand detection in the Product API.
  • Resolved an issue where humanLanguage could be mis-identified on some Spanish-language pages.

2017-05-15

by Jerome Choo
  • Crawlbot: resolved an issue where IP-address-only webhooks would not receive notifications.
  • Crawlbot: improved link spidering/harvesting resilience to markup errors and other invalid HTML source.
  • Fixed an issue where custom APIs would not display in Crawlbot and Bulk Processing dashboard.

2017-05-11

by Jerome Choo
  • Improved link-detection when returning page links via our &fields=links argument.
  • Improved support for and handling of the srcset (and sizes) image attributes in all APIs.
  • Added detection of Afrikaans (af) in the humanLanguage field.
  • Improved duplicate detection in the Diffbot Global Index.

2017-04-21

by Jerome Choo
  • The beta category field has been added to the Product API. See documentation.
  • All extraction APIs now support the sending of completely custom headers using X-Forward- terminology. Previously only four defined headers were supported.

2017-04-10

by Jerome Choo
  • In the Article and Discussion APIs' tags element, DBPedia uri values are now properly URL-encoded.
  • Fixed an issue when sorting by date in the Search API.
  • Various improvements and fixes to the Global Index

2017-01-12

by Jerome Choo
  • The Account API now tracks Global Index search calls/requests.
  • Improved SKU detection and extraction in the Product API.
  • Article API: Added support for the start attribute (ol elements) and data- attributes in normalized HTML.
  • In the Article API, identified image captions will no longer be returned in the text field content.
  • Various improvements to replacement rule regular expressions in Custom APIs.
  • PDF processing improvements.

2016-12-09

by Jerome Choo
  • Product API: overriding the sku, mpn or related fields using custom rules will now affect the productId field as well.
  • Crawls using the Analyze API will now correctly index video pages.
  • Improved the reliability of the fields=links argument in all Automatic APIs.