Image

Automatically identifies the primary image(s) on any web page and returns comprehensive information and metadata for each image.

The Image API identifies the primary image(s) of a submitted web page and returns comprehensive information and metadata for each image.

Test drive Image API without a trial token at diffbot.com/testdrive.

Response

The Image API returns data in JSON format.

Each response includes a request object (which returns request-specific metadata), and an objects array, which will include the extracted information for all objects on a submitted page.

Objects in the Image API's objects array will include the following fields:

FieldDescription
typeType of object (always image).
urlDirect link to image file.
titleTitle or caption of the image, if available.
naturalHeightRaw image height, in pixels.
naturalWidthRaw image width, in pixels.
humanLanguageReturns the (spoken/human) language of the submitted page, using two-letter ISO 639-1 nomenclature..
anchorUrlIf the image is hyperlinked, returns the destination URL.
pageUrlURL of submitted page / page from which the image is extracted.
resolvedPageUrlReturned if the pageUrl redirects to another URL.
xpathXPath expression identifying the image node.
diffbotUriUnique object ID. The diffbotUri is generated from the values of various Image fields and uniquely identifies the object. This can be used for deduplication.
Optional fields, available using fields= argument
displayHeightHeight of image as presented in the browser (and as sized via browser/CSS, if resized).
displayWidthWidth of image as presented in the browser (and as sized via browser/CSS, if resized).
linksReturns a top-level object (links) containing all hyperlinks found on the page.
metaComma-separated list of image-embedded metadata (e.g., EXIF, XMP, ICC Profile), if available within the image file.
querystringReturns any key/value pairs present in the URL querystring. Items without a discrete value will be returned as true.
breadcrumbReturns a top-level array (breadcrumb) of URLs and link text from page breadcrumbs.

The following is an example response from a successful extraction of a product page on diffbot.com.

{
  "request": {
    "pageUrl": "https://www.diffbot.com/products/extract/",
    "api": "image",
    "version": 3
  },
  "objects": [
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-readslikehumans']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  col-md  d-flex justify-content-center align-items-center  order-md-1']/IMG[@class='img-fluid mb-4 align-self-start']",
      "humanLanguage": "en",
      "naturalHeight": 1023,
      "diffbotUri": "image|3|666824882",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/extract_screenshot.png",
      "naturalWidth": 897,
      "tags": [
        {
          "typeHierarchy": [
            "http://www.w3.org/2002/07/owl#Thing",
            "http://dbpedia.org/ontology/Work",
            "http://dbpedia.org/ontology/Website"
          ],
          "id": 33898,
          "label": "Website",
          "type": "http://dbpedia.org/ontology/Website",
          "uri": "https://www.diffbot.com/entity/Xd90vp_U4MJOoRHIja3quxg"
        },
        {
          "id": 1404579,
          "label": "Ring binder",
          "uri": "http://diffbot.com/entity/XYwPij6UgPdaDEY6fXSrrRA"
        }
      ]
    },
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-lookslikeahuman']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  col-md  d-flex justify-content-center align-items-center  order-md-1']/IMG[@class='img-fluid mb-4']",
      "humanLanguage": "en",
      "naturalHeight": 270,
      "diffbotUri": "image|3|1865428953",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/any_language.png",
      "naturalWidth": 554,
      "tags": [
        {
          "id": 5462349,
          "label": "Sachet",
          "uri": "http://diffbot.com/entity/X1_7MIKBoPpuRnAPVQXqQxA"
        },
        {
          "id": 479373,
          "label": "Eraser",
          "uri": "http://diffbot.com/entity/XGPnWNBUZPyuNyfK9rJr6vQ"
        },
        {
          "id": 57260,
          "label": "Envelope",
          "uri": "http://diffbot.com/entity/XK5gEV93iP6SqKMDjDr0YBQ"
        }
      ]
    },
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-one-click-crawling']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  offset-md-1 col-md  d-flex justify-content-center align-items-center  order-md-2']/IMG[@class='img-fluid mb-4']",
      "humanLanguage": "en",
      "naturalHeight": 372,
      "diffbotUri": "image|3|1699007329",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/analyze_two.png",
      "naturalWidth": 451,
      "tags": [
        {
          "id": 268267,
          "label": "Tray",
          "uri": "http://diffbot.com/entity/XwXgQ3A7sNmypX2VX3oZCgg"
        },
        {
          "id": 467731,
          "label": "Spatula",
          "uri": "http://diffbot.com/entity/XZohZtDoDPvq1wusJQCSZZA"
        },
        {
          "id": 2649730,
          "label": "Measuring cup",
          "uri": "http://diffbot.com/entity/XhwdQ4HnKOsCVrf9YU3hyWw"
        }
      ]
    }
  ]
}

Optional Fields

Specify each field desired (comma delimited) in the &fields= argument. In addition to the fields listed below, there are also more fields available with all Extract APIs .

FieldDescription
displayHeightHeight of image as presented in the browser (and as sized via browser/CSS, if resized).
displayWidthWidth of image as presented in the browser (and as sized via browser/CSS, if resized).

Already have the source HTML? POST it to Image API.

Image API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.

Language
Authorization
Query
Click Try It! to start a request and see the response here!