Automatically identifies the primary image(s) on any web page and returns comprehensive information and metadata for each image.
The Image API identifies the primary image(s) of a submitted web page and returns comprehensive information and metadata for each image.
Test drive Image API without a trial token at diffbot.com/testdrive.
Response
The Image API returns data in JSON format.
Each response includes a request
object (which returns request-specific metadata), and an objects
array, which will include the extracted information for all objects on a submitted page.
Objects in the Image API's objects
array will include the following fields:
Field | Description |
---|---|
type | Type of object (always image ). |
url | Direct link to image file. |
title | Title or caption of the image, if available. |
naturalHeight | Raw image height, in pixels. |
naturalWidth | Raw image width, in pixels. |
humanLanguage | Returns the (spoken/human) language of the submitted page, using two-letter ISO 639-1 nomenclature.. |
anchorUrl | If the image is hyperlinked, returns the destination URL. |
pageUrl | URL of submitted page / page from which the image is extracted. |
resolvedPageUrl | Returned if the pageUrl redirects to another URL. |
xpath | XPath expression identifying the image node. |
diffbotUri | Unique object ID. The diffbotUri is generated from the values of various Image fields and uniquely identifies the object. This can be used for deduplication. |
Optional fields, available using fields= argument | |
displayHeight | Height of image as presented in the browser (and as sized via browser/CSS, if resized). |
displayWidth | Width of image as presented in the browser (and as sized via browser/CSS, if resized). |
links | Returns a top-level object (links ) containing all hyperlinks found on the page. |
meta | Comma-separated list of image-embedded metadata (e.g., EXIF, XMP, ICC Profile), if available within the image file. |
querystring | Returns any key/value pairs present in the URL querystring. Items without a discrete value will be returned as true . |
breadcrumb | Returns a top-level array (breadcrumb ) of URLs and link text from page breadcrumbs. |
The following is an example response from a successful extraction of a product page on diffbot.com.
{
"request": {
"pageUrl": "https://www.diffbot.com/products/extract/",
"api": "image",
"version": 3
},
"objects": [
{
"xpath": "/HTML/BODY/MAIN/DIV[@id='slice-readslikehumans']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12 col-md d-flex justify-content-center align-items-center order-md-1']/IMG[@class='img-fluid mb-4 align-self-start']",
"humanLanguage": "en",
"naturalHeight": 1023,
"diffbotUri": "image|3|666824882",
"pageUrl": "https://www.diffbot.com/products/extract/",
"type": "image",
"url": "https://www.diffbot.com/assets/img/products/extract_screenshot.png",
"naturalWidth": 897,
"tags": [
{
"typeHierarchy": [
"http://www.w3.org/2002/07/owl#Thing",
"http://dbpedia.org/ontology/Work",
"http://dbpedia.org/ontology/Website"
],
"id": 33898,
"label": "Website",
"type": "http://dbpedia.org/ontology/Website",
"uri": "https://www.diffbot.com/entity/Xd90vp_U4MJOoRHIja3quxg"
},
{
"id": 1404579,
"label": "Ring binder",
"uri": "http://diffbot.com/entity/XYwPij6UgPdaDEY6fXSrrRA"
}
]
},
{
"xpath": "/HTML/BODY/MAIN/DIV[@id='slice-lookslikeahuman']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12 col-md d-flex justify-content-center align-items-center order-md-1']/IMG[@class='img-fluid mb-4']",
"humanLanguage": "en",
"naturalHeight": 270,
"diffbotUri": "image|3|1865428953",
"pageUrl": "https://www.diffbot.com/products/extract/",
"type": "image",
"url": "https://www.diffbot.com/assets/img/products/any_language.png",
"naturalWidth": 554,
"tags": [
{
"id": 5462349,
"label": "Sachet",
"uri": "http://diffbot.com/entity/X1_7MIKBoPpuRnAPVQXqQxA"
},
{
"id": 479373,
"label": "Eraser",
"uri": "http://diffbot.com/entity/XGPnWNBUZPyuNyfK9rJr6vQ"
},
{
"id": 57260,
"label": "Envelope",
"uri": "http://diffbot.com/entity/XK5gEV93iP6SqKMDjDr0YBQ"
}
]
},
{
"xpath": "/HTML/BODY/MAIN/DIV[@id='slice-one-click-crawling']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12 offset-md-1 col-md d-flex justify-content-center align-items-center order-md-2']/IMG[@class='img-fluid mb-4']",
"humanLanguage": "en",
"naturalHeight": 372,
"diffbotUri": "image|3|1699007329",
"pageUrl": "https://www.diffbot.com/products/extract/",
"type": "image",
"url": "https://www.diffbot.com/assets/img/products/analyze_two.png",
"naturalWidth": 451,
"tags": [
{
"id": 268267,
"label": "Tray",
"uri": "http://diffbot.com/entity/XwXgQ3A7sNmypX2VX3oZCgg"
},
{
"id": 467731,
"label": "Spatula",
"uri": "http://diffbot.com/entity/XZohZtDoDPvq1wusJQCSZZA"
},
{
"id": 2649730,
"label": "Measuring cup",
"uri": "http://diffbot.com/entity/XhwdQ4HnKOsCVrf9YU3hyWw"
}
]
}
]
}
Optional Fields
Specify each field desired (comma delimited) in the &fields=
argument. In addition to the fields listed below, there are also more fields available with all Extract APIs .
Field | Description |
---|---|
displayHeight | Height of image as presented in the browser (and as sized via browser/CSS, if resized). |
displayWidth | Width of image as presented in the browser (and as sized via browser/CSS, if resized). |
Already have the source HTML? POST it to Image API.
Image API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.