Image

Automatically identifies the primary image(s) on any web page and returns comprehensive information and metadata for each image.

The Image API identifies the primary image(s) of a submitted web page and returns comprehensive information and metadata for each image.

Test drive Image API without a token at diffbot.com/testdrive.

Response

The Image API returns data in JSON format.

Each response includes a request object (which returns request-specific metadata), and an objects array, which will include the extracted information for all objects on a submitted page.

Objects in the Image API's objects array will include the following fields:

Field

Description

type

Type of object (always image).

url

Direct link to image file.

title

Title or caption of the image, if available.

naturalHeight

Raw image height, in pixels.

naturalWidth

Raw image width, in pixels.

humanLanguage

Returns the (spoken/human) language of the submitted page, using two-letter ISO 639-1 nomenclature..

anchorUrl

If the image is hyperlinked, returns the destination URL.

pageUrl

URL of submitted page / page from which the image is extracted.

resolvedPageUrl

Returned if the pageUrl redirects to another URL.

xpath

XPath expression identifying the image node.

diffbotUri

Unique object ID. The diffbotUri is generated from the values of various Image fields and uniquely identifies the object. This can be used for deduplication.

Optional fields, available usingfields= argument

displayHeight

Height of image as presented in the browser (and as sized via browser/CSS, if resized).

displayWidth

Width of image as presented in the browser (and as sized via browser/CSS, if resized).

links

Returns a top-level object (links) containing all hyperlinks found on the page.

meta

Comma-separated list of image-embedded metadata (e.g., EXIF, XMP, ICC Profile), if available within the image file.

querystring

Returns any key/value pairs present in the URL querystring. Items without a discrete value will be returned as true.

breadcrumb

Returns a top-level array (breadcrumb) of URLs and link text from page breadcrumbs.

The following is an example response from a successful extraction of a product page on diffbot.com.

{
  "request": {
    "pageUrl": "https://www.diffbot.com/products/extract/",
    "api": "image",
    "version": 3
  },
  "objects": [
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-readslikehumans']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  col-md  d-flex justify-content-center align-items-center  order-md-1']/IMG[@class='img-fluid mb-4 align-self-start']",
      "humanLanguage": "en",
      "naturalHeight": 1023,
      "diffbotUri": "image|3|666824882",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/extract_screenshot.png",
      "naturalWidth": 897,
      "tags": [
        {
          "typeHierarchy": [
            "http://www.w3.org/2002/07/owl#Thing",
            "http://dbpedia.org/ontology/Work",
            "http://dbpedia.org/ontology/Website"
          ],
          "id": 33898,
          "label": "Website",
          "type": "http://dbpedia.org/ontology/Website",
          "uri": "https://www.diffbot.com/entity/Xd90vp_U4MJOoRHIja3quxg"
        },
        {
          "id": 1404579,
          "label": "Ring binder",
          "uri": "http://diffbot.com/entity/XYwPij6UgPdaDEY6fXSrrRA"
        }
      ]
    },
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-lookslikeahuman']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  col-md  d-flex justify-content-center align-items-center  order-md-1']/IMG[@class='img-fluid mb-4']",
      "humanLanguage": "en",
      "naturalHeight": 270,
      "diffbotUri": "image|3|1865428953",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/any_language.png",
      "naturalWidth": 554,
      "tags": [
        {
          "id": 5462349,
          "label": "Sachet",
          "uri": "http://diffbot.com/entity/X1_7MIKBoPpuRnAPVQXqQxA"
        },
        {
          "id": 479373,
          "label": "Eraser",
          "uri": "http://diffbot.com/entity/XGPnWNBUZPyuNyfK9rJr6vQ"
        },
        {
          "id": 57260,
          "label": "Envelope",
          "uri": "http://diffbot.com/entity/XK5gEV93iP6SqKMDjDr0YBQ"
        }
      ]
    },
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-one-click-crawling']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  offset-md-1 col-md  d-flex justify-content-center align-items-center  order-md-2']/IMG[@class='img-fluid mb-4']",
      "humanLanguage": "en",
      "naturalHeight": 372,
      "diffbotUri": "image|3|1699007329",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/analyze_two.png",
      "naturalWidth": 451,
      "tags": [
        {
          "id": 268267,
          "label": "Tray",
          "uri": "http://diffbot.com/entity/XwXgQ3A7sNmypX2VX3oZCgg"
        },
        {
          "id": 467731,
          "label": "Spatula",
          "uri": "http://diffbot.com/entity/XZohZtDoDPvq1wusJQCSZZA"
        },
        {
          "id": 2649730,
          "label": "Measuring cup",
          "uri": "http://diffbot.com/entity/XhwdQ4HnKOsCVrf9YU3hyWw"
        }
      ]
    }
  ]
}

Optional Fields

Specify each field desired (comma delimited) in the &fields= argument. In addition to the fields listed below, there are also more fields available with all Extract APIs .

FieldDescription
displayHeightHeight of image as presented in the browser (and as sized via browser/CSS, if resized).
displayWidthWidth of image as presented in the browser (and as sized via browser/CSS, if resized).

Already have the source HTML? POST it to Image API.

Image API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.

Language
Credentials
Query
Click Try It! to start a request and see the response here!