Product

Automatically extract pricing, product specs, images, and more from an e-commerce product page.

The Product API automatically extracts complete data from any shopping or e-commerce product page. Retrieve full pricing information, product IDs (SKU, UPC, MPN), images, product specifications, brand and more.

Test drive Product API without a trial token at diffbot.com/testdrive.

Response

The Product API returns data in JSON format.

Each response includes a request object (which returns request-specific metadata), and an objects array, which will include the extracted information for all objects on a submitted page.

Objects in the Product API's objects array will include the following fields:

FieldDescription
typeType of object (always product).
pageUrlURL of submitted page / page from which the product is extracted.
resolvedPageUrlReturned if the pageUrl redirects to another URL.
titleTitle of the product.
textText description, if available, of the product.
brandItem's brand name.
offerPriceOffer or actual/final price of the product.
regularPriceRegular or original price of the product, if available.
shippingAmountShipping price.
saveAmountDiscount or amount saved off the regular price.
offerPriceDetailsofferPrice separated into its constituent parts: amount, symbol, and full text.
regularPriceDetailsregularPrice separated into its constituent parts: amount, symbol, and full text.
saveAmountDetailssaveAmount separated into its constituent parts: amount, symbol, full text, and whether or not it is a percentage value.
productIdDiffbot-determined unique product ID. If upc, isbn, mpn or sku are identified on the page, productId will select from these values in the above order.
upcUniversal Product Code (UPC/EAN), if available.
skuStock Keeping Unit -- store/vendor inventory number or identifier.
mpnManufacturer's Product Number.
isbnInternational Standard Book Number (ISBN), if available.
specsIf a specifications table or similar data is available on the product page, individual specifications will be returned in the specs object as name/value pairs. Names will be normalized to lowercase with spaces replaced by underscores, e.g. display_resolution.
imagesArray of images, if present within the product.
urlFully resolved link to image. If the image SRC is encoded as base64 data, the complete data URI will be returned.
titleDescription or caption of the image.
heightHeight of image as (re-)sized via browser/CSS.
widthWidth of image as (re-)sized via browser/CSS.
naturalHeightRaw image height, in pixels.
naturalWidthRaw image width, in pixels.
primaryReturns true if image is identified as primary based on visual analysis.
xpathXPath expression identifying the image node.
diffbotUriInternal ID used for indexing.
discussionProduct reviews, as extracted by the Diffbot Discussion API.
prefixCodeCountry of origin as identified by UPC/ISBN.
productOriginIf available, two-character ISO country code where the product was produced.
humanLanguageReturns the (spoken/human) language of the submitted page, using two-letter ISO 639-1 nomenclature.
diffbotUriUnique object ID. The diffbotUri is generated from the values of various Product fields and uniquely identifies the object. This can be used for deduplication.
The following fields are in an early beta stage:
availabilityItem's availability, either true or false.
categoryReturns an inferred category from Diffbot's product categorization taxonomy.
colorsReturns array of product color options.
normalizedSpecsReturns normalized specifications if a specifications table (or similar element) is found on the product page. More on normalization.
multipleProductsReturns true if multiple products are distinctly available on the product page.
priceRangeIf the product is available in a range of prices, the minimum and maximum values will be returned. The lowest price will also be returned as the offerPrice.
minPriceThe minimum price for the offered item.
maxPriceThe maximum price for the offered item.
quantityPricesIf the product is available with quantity-based discounts, all identifiable price points will be returned. The lowest price will also be returned as the offerPrice.
minQuantityThe minimum quantity required to purchase for the associated price.
pricePrice of the specific quantity level.
sizeSize(s) available, if identified on the page.

The following is an example response from a successful extraction of a single product detail page on microcenter.com

{
  "request": {
    "pageUrl": "https://www.microcenter.com/product/628738/evga-nvidia-geforce-rtx-3090-ftw3-ultra-triple-fan-24gb-gddr6x-pcie-40-graphics-card",
    "api": "product",
    "version": 3
  },
  "humanLanguage": "en",
  "objects": [
    {
      "images": [
        {
          "xpath": "/html[1]/body[1]/main[1]/article[1]/div[3]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/a[1]/img[1]",
          "naturalHeight": 200,
          "width": 260,
          "diffbotUri": "image|3|735477840",
          "title": "Product Image View 2",
          "url": "https://90a1c75758623581b3f8-5c119c3de181c9857fcb2784776b17ef.ssl.cf2.rackcdn.com/628738_177923_01_front_thumbnail.jpg",
          "naturalWidth": 200,
          "primary": true,
          "height": 260
        }
      ],
      "offerPrice": "$1,919.99",
      "productId": "843368067137",
      "diffbotUri": "product|3|-1003574237",
      "upc": "843368067137",
      "productOrigin": "us",
      "mpn": "24G-P5-3987-KR",
      "prefixCode": "U.S. and Canada",
      "multipleProducts": true,
      "availability": true,
      "type": "product",
      "title": "EVGA NVIDIA GeForce RTX 3090 FTW3 Ultra Triple-Fan 24GB GDDR6X PCIe 4.0 Graphics Card",
      "offerPriceDetails": {
        "symbol": "$",
        "amount": 1919.99,
        "text": "$1,919.99"
      },
      "specs": {
        "mfr_part_": "24G-P5-3987-KR",
        "upc": "843368067137",
        "sku": "177923"
      },
      "normalizedSpecs": {
        "sku": [
          {
            "cleanLiteral": "843368067137"
          }
        ]
      },
      "humanLanguage": "en",
      "pageUrl": "https://www.microcenter.com/product/628738/evga-nvidia-geforce-rtx-3090-ftw3-ultra-triple-fan-24gb-gddr6x-pcie-40-graphics-card",
      "text": "The EVGA GeForce RTX 3090 is colossally powerful in every way imaginable, giving you a whole new tier of performance at 8K resolution. It's powered by the NVIDIA Ampere architecture, which doubles down on ray tracing and AI performance with enhanced RT Cores, Tensor Cores, and new streaming multiprocessors. Combined with the next generation of design, cooling, and overclocking with EVGA Precision X1, the EVGA GeForce RTX 3090 Series redefines the definition of ultimate performance.",
      "category": "Computers",
      "sku": "177923",
      "brand": "EVGA"
    }
  ],
  "type": "product",
  "title": "EVGA NVIDIA GeForce RTX 3090 FTW3 Ultra Triple-Fan 24GB GDDR6X PCIe 4.0 Graphics Card - Micro Center"
}

Optional Fields

Product API may also return some optional fields if specified. (comma delimited) in the &fields= argument.

Already have the source HTML? POST it to Product API.

Product API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.

Extracting Product Reviews

Product API will attempt to extract reviews from product pages by default. Using integrated functionality from the Discussion API, product review data will be returned in the discussion object (nested within the primary product object). The full syntax for discussion data is available in the Discussion API documentation.

Product review extraction can be disabled using the argument discussion=false. Note that if a page has recently been processed by Diffbot, cached reviews may be returned even if discussion=false is passed.


Language
Credentials
Query
Click Try It! to start a request and see the response here!