Image

Automatically identifies the primary image(s) on any web page and returns comprehensive information and metadata for each image.

The Image API identifies the primary image(s) of a submitted web page and returns comprehensive information and metadata for each image.

Test drive Image API without a token at diffbot.com/testdrive.

Response

The Image API returns data in JSON format.

Each response includes a request object (which returns request-specific metadata), and an objects array, which will include the extracted information for all objects on a submitted page.

Objects in the Image API's objects array will include the following fields:

FieldDescription
typeType of object (always image).
urlDirect link to image file.
titleTitle or caption of the image, if available.
naturalHeightRaw image height, in pixels.
naturalWidthRaw image width, in pixels.
humanLanguageReturns the (spoken/human) language of the submitted page, using two-letter ISO 639-1 nomenclature..
anchorUrlIf the image is hyperlinked, returns the destination URL.
pageUrlURL of submitted page / page from which the image is extracted.
resolvedPageUrlReturned if the pageUrl redirects to another URL.
xpathXPath expression identifying the image node.
diffbotUriUnique object ID. The diffbotUri is generated from the values of various Image fields and uniquely identifies the object. This can be used for deduplication. Optional fields, available usingfields= argument
displayHeightHeight of image as presented in the browser (and as sized via browser/CSS, if resized).
displayWidthWidth of image as presented in the browser (and as sized via browser/CSS, if resized).
linksReturns a top-level object (links) containing all hyperlinks found on the page.
metaComma-separated list of image-embedded metadata (e.g., EXIF, XMP, ICC Profile), if available within the image file.
querystringReturns any key/value pairs present in the URL querystring. Items without a discrete value will be returned as true.
breadcrumbReturns a top-level array (breadcrumb) of URLs and link text from page breadcrumbs.

The following is an example response from a successful extraction of a product page on diffbot.com.

{
  "request": {
    "pageUrl": "https://www.diffbot.com/products/extract/",
    "api": "image",
    "version": 3
  },
  "objects": [
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-readslikehumans']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  col-md  d-flex justify-content-center align-items-center  order-md-1']/IMG[@class='img-fluid mb-4 align-self-start']",
      "humanLanguage": "en",
      "naturalHeight": 1023,
      "diffbotUri": "image|3|666824882",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/extract_screenshot.png",
      "naturalWidth": 897,
      "tags": [
        {
          "typeHierarchy": [
            "http://www.w3.org/2002/07/owl#Thing",
            "http://dbpedia.org/ontology/Work",
            "http://dbpedia.org/ontology/Website"
          ],
          "id": 33898,
          "label": "Website",
          "type": "http://dbpedia.org/ontology/Website",
          "uri": "https://www.diffbot.com/entity/Xd90vp_U4MJOoRHIja3quxg"
        },
        {
          "id": 1404579,
          "label": "Ring binder",
          "uri": "http://diffbot.com/entity/XYwPij6UgPdaDEY6fXSrrRA"
        }
      ]
    },
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-lookslikeahuman']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  col-md  d-flex justify-content-center align-items-center  order-md-1']/IMG[@class='img-fluid mb-4']",
      "humanLanguage": "en",
      "naturalHeight": 270,
      "diffbotUri": "image|3|1865428953",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/any_language.png",
      "naturalWidth": 554,
      "tags": [
        {
          "id": 5462349,
          "label": "Sachet",
          "uri": "http://diffbot.com/entity/X1_7MIKBoPpuRnAPVQXqQxA"
        },
        {
          "id": 479373,
          "label": "Eraser",
          "uri": "http://diffbot.com/entity/XGPnWNBUZPyuNyfK9rJr6vQ"
        },
        {
          "id": 57260,
          "label": "Envelope",
          "uri": "http://diffbot.com/entity/XK5gEV93iP6SqKMDjDr0YBQ"
        }
      ]
    },
    {
      "xpath": "/HTML/BODY/MAIN/DIV[@id='slice-one-click-crawling']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12  offset-md-1 col-md  d-flex justify-content-center align-items-center  order-md-2']/IMG[@class='img-fluid mb-4']",
      "humanLanguage": "en",
      "naturalHeight": 372,
      "diffbotUri": "image|3|1699007329",
      "pageUrl": "https://www.diffbot.com/products/extract/",
      "type": "image",
      "url": "https://www.diffbot.com/assets/img/products/analyze_two.png",
      "naturalWidth": 451,
      "tags": [
        {
          "id": 268267,
          "label": "Tray",
          "uri": "http://diffbot.com/entity/XwXgQ3A7sNmypX2VX3oZCgg"
        },
        {
          "id": 467731,
          "label": "Spatula",
          "uri": "http://diffbot.com/entity/XZohZtDoDPvq1wusJQCSZZA"
        },
        {
          "id": 2649730,
          "label": "Measuring cup",
          "uri": "http://diffbot.com/entity/XhwdQ4HnKOsCVrf9YU3hyWw"
        }
      ]
    }
  ]
}

Optional Fields

Specify each field desired (comma delimited) in the &fields= argument. In addition to the fields listed below, there are also more fields available with all Extract APIs .

FieldDescription
displayHeightHeight of image as presented in the browser (and as sized via browser/CSS, if resized).
displayWidthWidth of image as presented in the browser (and as sized via browser/CSS, if resized).

Already have the source HTML? POST it to Image API.

Image API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.

Query Params
string
required
Defaults to https://www.diffbot.com/products/extract/

Target URL to extract

string
enum

Specify optional fields to be returned from any fully-extracted pages (e.g. fields=querystring,links)

Allowed:
int32

Sets a value in milliseconds to wait for the retrieval/fetch of content from the requested URL. The default timeout for the third-party response is 30 seconds (30000).

string

Use for jsonp requests. Needed for cross-domain ajax.

string

Specify an IP address of a custom proxy that will be used to fetch the target page. (Ex: &proxy or &proxy=0.0.0.0)

string

Used to specify the authentication parameters that will be used with a custom proxy specified in the ≺oxy parameter. (Ex: proxyAuth=username:password)

string

Set to default to use Diffbot's datacenter proxy for this request. none will instruct Extract to not use proxies, even if proxies have been enabled for this particular URL globally.

integer
≤ 180000

Add additional time for rendering before the page is closed and the DOM is extracted. This can cause page timeouts, so a timeout parameter may be needed to extend the timeout. Note that the renderer closes automatically at 180 seconds.

string
enum

Direct the browser to scroll down the page, to trigger lazy-loaded content.

Allowed:
Responses

Language
Credentials
Query
LoadingLoading…
Response
Click Try It! to start a request and see the response here! Or choose an example:
application/json