Automatically identifies the primary image(s) on any web page and returns comprehensive information and metadata for each image.
The Image API identifies the primary image(s) of a submitted web page and returns comprehensive information and metadata for each image.
Test drive Image API without a token at diffbot.com/testdrive.
Response
The Image API returns data in JSON format.
Each response includes a request object (which returns request-specific metadata), and an objects array, which will include the extracted information for all objects on a submitted page.
Objects in the Image API's objects array will include the following fields:
Field | Description | ||
|---|---|---|---|
| Type of object (always | ||
| Direct link to image file. | ||
| Title or caption of the image, if available. | ||
| Raw image height, in pixels. | ||
| Raw image width, in pixels. | ||
| Returns the (spoken/human) language of the submitted page, using two-letter ISO 639-1 nomenclature.. | ||
| If the image is hyperlinked, returns the destination URL. | ||
| URL of submitted page / page from which the image is extracted. | ||
| Returned if the | ||
| XPath expression identifying the image node. | ||
| Unique object ID. The | Optional fields, available usingfields= argument | |
| Height of image as presented in the browser (and as sized via browser/CSS, if resized). | ||
| Width of image as presented in the browser (and as sized via browser/CSS, if resized). | ||
| Returns a top-level object ( | ||
| Comma-separated list of image-embedded metadata (e.g., EXIF, XMP, ICC Profile), if available within the image file. | ||
| Returns any key/value pairs present in the URL querystring. Items without a discrete value will be returned as | ||
| Returns a top-level array ( | ||
The following is an example response from a successful extraction of a product page on diffbot.com.
{
"request": {
"pageUrl": "https://www.diffbot.com/products/extract/",
"api": "image",
"version": 3
},
"objects": [
{
"xpath": "/HTML/BODY/MAIN/DIV[@id='slice-readslikehumans']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12 col-md d-flex justify-content-center align-items-center order-md-1']/IMG[@class='img-fluid mb-4 align-self-start']",
"humanLanguage": "en",
"naturalHeight": 1023,
"diffbotUri": "image|3|666824882",
"pageUrl": "https://www.diffbot.com/products/extract/",
"type": "image",
"url": "https://www.diffbot.com/assets/img/products/extract_screenshot.png",
"naturalWidth": 897,
"tags": [
{
"typeHierarchy": [
"http://www.w3.org/2002/07/owl#Thing",
"http://dbpedia.org/ontology/Work",
"http://dbpedia.org/ontology/Website"
],
"id": 33898,
"label": "Website",
"type": "http://dbpedia.org/ontology/Website",
"uri": "https://www.diffbot.com/entity/Xd90vp_U4MJOoRHIja3quxg"
},
{
"id": 1404579,
"label": "Ring binder",
"uri": "http://diffbot.com/entity/XYwPij6UgPdaDEY6fXSrrRA"
}
]
},
{
"xpath": "/HTML/BODY/MAIN/DIV[@id='slice-lookslikeahuman']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12 col-md d-flex justify-content-center align-items-center order-md-1']/IMG[@class='img-fluid mb-4']",
"humanLanguage": "en",
"naturalHeight": 270,
"diffbotUri": "image|3|1865428953",
"pageUrl": "https://www.diffbot.com/products/extract/",
"type": "image",
"url": "https://www.diffbot.com/assets/img/products/any_language.png",
"naturalWidth": 554,
"tags": [
{
"id": 5462349,
"label": "Sachet",
"uri": "http://diffbot.com/entity/X1_7MIKBoPpuRnAPVQXqQxA"
},
{
"id": 479373,
"label": "Eraser",
"uri": "http://diffbot.com/entity/XGPnWNBUZPyuNyfK9rJr6vQ"
},
{
"id": 57260,
"label": "Envelope",
"uri": "http://diffbot.com/entity/XK5gEV93iP6SqKMDjDr0YBQ"
}
]
},
{
"xpath": "/HTML/BODY/MAIN/DIV[@id='slice-one-click-crawling']/DIV[@class='container px-3 py-5 py-md-6 mx-auto']/DIV[@class='row justify-content-center']/DIV[@class='col-12 offset-md-1 col-md d-flex justify-content-center align-items-center order-md-2']/IMG[@class='img-fluid mb-4']",
"humanLanguage": "en",
"naturalHeight": 372,
"diffbotUri": "image|3|1699007329",
"pageUrl": "https://www.diffbot.com/products/extract/",
"type": "image",
"url": "https://www.diffbot.com/assets/img/products/analyze_two.png",
"naturalWidth": 451,
"tags": [
{
"id": 268267,
"label": "Tray",
"uri": "http://diffbot.com/entity/XwXgQ3A7sNmypX2VX3oZCgg"
},
{
"id": 467731,
"label": "Spatula",
"uri": "http://diffbot.com/entity/XZohZtDoDPvq1wusJQCSZZA"
},
{
"id": 2649730,
"label": "Measuring cup",
"uri": "http://diffbot.com/entity/XhwdQ4HnKOsCVrf9YU3hyWw"
}
]
}
]
}Optional Fields
Specify each field desired (comma delimited) in the &fields= argument. In addition to the fields listed below, there are also more fields available with all Extract APIs .
| Field | Description |
|---|---|
displayHeight | Height of image as presented in the browser (and as sized via browser/CSS, if resized). |
displayWidth | Width of image as presented in the browser (and as sized via browser/CSS, if resized). |
Already have the source HTML? POST it to Image API.
Image API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.