Automatically extract pricing, product specs, images, and more from an e-commerce product page.
The Product API automatically extracts complete data from any shopping or e-commerce product page. Retrieve full pricing information, product IDs (SKU, UPC, MPN), images, product specifications, brand and more.
Test drive Product API without a trial token at diffbot.com/testdrive.
Response
The Product API returns data in JSON format.
Each response includes a request
object (which returns request-specific metadata), and an objects
array, which will include the extracted information for all objects on a submitted page.
Objects in the Product API's objects
array will include the following fields:
Field | Description |
---|---|
type | Type of object (always product ). |
pageUrl | URL of submitted page / page from which the product is extracted. |
resolvedPageUrl | Returned if the pageUrl redirects to another URL. |
title | Title of the product. |
text | Text description, if available, of the product. |
brand | Item's brand name. |
offerPrice | Offer or actual/final price of the product. |
regularPrice | Regular or original price of the product, if available. |
shippingAmount | Shipping price. |
saveAmount | Discount or amount saved off the regular price. |
offerPriceDetails | offerPrice separated into its constituent parts: amount , symbol , and full text . |
regularPriceDetails | regularPrice separated into its constituent parts: amount , symbol , and full text . |
saveAmountDetails | saveAmount separated into its constituent parts: amount , symbol , full text , and whether or not it is a percentage value. |
productId | Diffbot-determined unique product ID. If upc , isbn , mpn or sku are identified on the page, productId will select from these values in the above order. |
upc | Universal Product Code (UPC/EAN), if available. |
sku | Stock Keeping Unit -- store/vendor inventory number or identifier. |
mpn | Manufacturer's Product Number. |
isbn | International Standard Book Number (ISBN), if available. |
specs | If a specifications table or similar data is available on the product page, individual specifications will be returned in the specs object as name/value pairs. Names will be normalized to lowercase with spaces replaced by underscores, e.g. display_resolution . |
images | Array of images, if present within the product. |
↳url | Fully resolved link to image. If the image SRC is encoded as base64 data, the complete data URI will be returned. |
↳title | Description or caption of the image. |
↳height | Height of image as (re-)sized via browser/CSS. |
↳width | Width of image as (re-)sized via browser/CSS. |
↳naturalHeight | Raw image height, in pixels. |
↳naturalWidth | Raw image width, in pixels. |
↳primary | Returns true if image is identified as primary based on visual analysis. |
↳xpath | XPath expression identifying the image node. |
↳diffbotUri | Internal ID used for indexing. |
discussion | Product reviews, as extracted by the Diffbot Discussion API. |
prefixCode | Country of origin as identified by UPC/ISBN. |
productOrigin | If available, two-character ISO country code where the product was produced. |
humanLanguage | Returns the (spoken/human) language of the submitted page, using two-letter ISO 639-1 nomenclature. |
diffbotUri | Unique object ID. The diffbotUri is generated from the values of various Product fields and uniquely identifies the object. This can be used for deduplication. |
The following fields are in an early beta stage: | |
availability | Item's availability, either true or false . |
category | Returns an inferred category from Diffbot's product categorization taxonomy. |
colors | Returns array of product color options. |
normalizedSpecs | Returns normalized specifications if a specifications table (or similar element) is found on the product page. More on normalization. |
multipleProducts | Returns true if multiple products are distinctly available on the product page. |
priceRange | If the product is available in a range of prices, the minimum and maximum values will be returned. The lowest price will also be returned as the offerPrice . |
↳minPrice | The minimum price for the offered item. |
↳maxPrice | The maximum price for the offered item. |
quantityPrices | If the product is available with quantity-based discounts, all identifiable price points will be returned. The lowest price will also be returned as the offerPrice . |
↳minQuantity | The minimum quantity required to purchase for the associated price. |
↳price | Price of the specific quantity level. |
size | Size(s) available, if identified on the page. |
The following is an example response from a successful extraction of a single product detail page on microcenter.com
{
"request": {
"pageUrl": "https://www.microcenter.com/product/628738/evga-nvidia-geforce-rtx-3090-ftw3-ultra-triple-fan-24gb-gddr6x-pcie-40-graphics-card",
"api": "product",
"version": 3
},
"humanLanguage": "en",
"objects": [
{
"images": [
{
"xpath": "/html[1]/body[1]/main[1]/article[1]/div[3]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/a[1]/img[1]",
"naturalHeight": 200,
"width": 260,
"diffbotUri": "image|3|735477840",
"title": "Product Image View 2",
"url": "https://90a1c75758623581b3f8-5c119c3de181c9857fcb2784776b17ef.ssl.cf2.rackcdn.com/628738_177923_01_front_thumbnail.jpg",
"naturalWidth": 200,
"primary": true,
"height": 260
}
],
"offerPrice": "$1,919.99",
"productId": "843368067137",
"diffbotUri": "product|3|-1003574237",
"upc": "843368067137",
"productOrigin": "us",
"mpn": "24G-P5-3987-KR",
"prefixCode": "U.S. and Canada",
"multipleProducts": true,
"availability": true,
"type": "product",
"title": "EVGA NVIDIA GeForce RTX 3090 FTW3 Ultra Triple-Fan 24GB GDDR6X PCIe 4.0 Graphics Card",
"offerPriceDetails": {
"symbol": "$",
"amount": 1919.99,
"text": "$1,919.99"
},
"specs": {
"mfr_part_": "24G-P5-3987-KR",
"upc": "843368067137",
"sku": "177923"
},
"normalizedSpecs": {
"sku": [
{
"cleanLiteral": "843368067137"
}
]
},
"humanLanguage": "en",
"pageUrl": "https://www.microcenter.com/product/628738/evga-nvidia-geforce-rtx-3090-ftw3-ultra-triple-fan-24gb-gddr6x-pcie-40-graphics-card",
"text": "The EVGA GeForce RTX 3090 is colossally powerful in every way imaginable, giving you a whole new tier of performance at 8K resolution. It's powered by the NVIDIA Ampere architecture, which doubles down on ray tracing and AI performance with enhanced RT Cores, Tensor Cores, and new streaming multiprocessors. Combined with the next generation of design, cooling, and overclocking with EVGA Precision X1, the EVGA GeForce RTX 3090 Series redefines the definition of ultimate performance.",
"category": "Computers",
"sku": "177923",
"brand": "EVGA"
}
],
"type": "product",
"title": "EVGA NVIDIA GeForce RTX 3090 FTW3 Ultra Triple-Fan 24GB GDDR6X PCIe 4.0 Graphics Card - Micro Center"
}
Optional Fields
Product API may also return some optional fields if specified. (comma delimited) in the &fields=
argument.
Already have the source HTML? POST it to Product API.
Product API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.
Extracting Product Reviews
Product API will attempt to extract reviews from product pages by default. Using integrated functionality from the Discussion API, product review data will be returned in the discussion object (nested within the primary product object). The full syntax for discussion data is available in the Discussion API documentation.
Product review extraction can be disabled using the argument discussion=false
. Note that if a page has recently been processed by Diffbot, cached reviews may be returned even if discussion=false
is passed.