August 15th, 2017

2017-08-15

by Jerome Choo

Fixed an issue in the Global Index in which complicated Boolean (OR) queries would return no results.

August 8th, 2017

2017-08-08

by Jerome Choo

May 22nd, 2017

by Jerome Choo

Many improvements to brand detection in the Product API.
Resolved an issue where humanLanguage could be mis-identified on some Spanish-language pages.

May 15th, 2017

by Jerome Choo

Crawlbot: resolved an issue where IP-address-only webhooks would not receive notifications.
Crawlbot: improved link spidering/harvesting resilience to markup errors and other invalid HTML source.
Fixed an issue where custom APIs would not display in Crawlbot and Bulk Processing dashboard.

May 11th, 2017

by Jerome Choo

Improved link-detection when returning page links via our &fields=links argument.
Improved support for and handling of the srcset (and sizes) image attributes in all APIs.
Added detection of Afrikaans (af) in the humanLanguage field.
Improved duplicate detection in the Diffbot Global Index.

April 21st, 2017

by Jerome Choo

The beta category field has been added to the Product API. See documentation.
All extraction APIs now support the sending of completely custom headers using X-Forward- terminology. Previously only four defined headers were supported.

April 10th, 2017

by Jerome Choo

In the Article and Discussion APIs' tags element, DBPedia uri values are now properly URL-encoded.
Fixed an issue when sorting by date in the Search API.
Various improvements and fixes to the Global Index

January 12th, 2017

by Jerome Choo

The Account API now tracks Global Index search calls/requests.
Improved SKU detection and extraction in the Product API.
Article API: Added support for the start attribute (ol elements) and data- attributes in normalized HTML.
In the Article API, identified image captions will no longer be returned in the text field content.
Various improvements to replacement rule regular expressions in Custom APIs.
PDF processing improvements.

December 9th, 2016

by Jerome Choo

Product API: overriding the sku, mpn or related fields using custom rules will now affect the productId field as well.
Crawls using the Analyze API will now correctly index video pages.
Improved the reliability of the fields=links argument in all Automatic APIs.

December 1st, 2016

by Jerome Choo

Updates to our rendering engine to properly support more Unicode scripts