Discussion

Automatically structure and extract entire threads of reviews/comments from articles, product pages, and forum threads.

The Discussion API automatically structures and extracts entire threads or lists of reviews/comments from most discussion pages, forums, and similarly structured web pages.

Test drive Discussion API without a trial token at diffbot.com/testdrive.

Response

The Discussion API returns data in JSON format.

Each response includes a request object (which returns request-specific metadata), and an objects array, which will include the extracted information for all objects on a submitted page.

The Discussion API also comes bundled with Article and Product APIs (to extract comments or review data when available). Discussion data in those APIs will be returned within a nested discussion object instead of an objects array.

Objects in the Discussion API's objects array / discussion object will include the following fields:

FieldDescription
typeType of object (always discussion).
pageUrlURL of submitted page / page from which the discussion is extracted.
resolvedPageUrlReturned if the pageUrl redirects to another URL.
titleTitle of the discussion.
numPostsNumber of individual posts in the thread.
postsArray of individual posts.
typeType of element (always post).
idID of the individual post. The first post of a thread will have an ID of 0.
parentIdID of the parent, if the post is a reply or response.
textFull text of the extracted post.
htmlDiffbot-normalized HTML of the extracted post. Please see Normalized HTML Fields for a breakdown of elements and attributes returned.
tagsIf the post is long enough, an array of tags generated from its specific content.
humanLanguageSpoken/human language of the post, using two-letter ISO 639-1 nomenclature.
imagesIf any images are detected within post content, they will be returned in a separate array. Individual array fields are the same as the Article API's images array.
dateDate of post, normalized in most cases to RFC 1123 (HTTP/1.1).
authorName/username of the post author.
authorUrlURL of the author profile page, if available.
pageUrlURL of the page on which the post was found.
diffbotUriInternal ID used for indexing.
tagsArray of tags/entities as generated from analysis of all extracted posts and cross-referenced with DBpedia and other data sources.
participantsNumber of unique participants in the discussion thread or comments.
numPagesNumber of pages in the thread concatenated to form the posts response. Use maxPages to define how many pages to concatenate. More on automatic concatenation.
nextPageIf discussion spans multiple pages, nextPage will return the subsequent page URL.
nextPagesArray of all page URLs concatenated in a multipage discussion. More on automatic concatenation.
providerDiscussion service provider (e.g., Disqus, Facebook), if known.
humanLanguageSpoken/human language of the discussion / comment thread, using two-letter ISO 639-1 nomenclature.
rssUrlURL of the discussion's RSS feed, if available.
diffbotUriUnique object ID. The diffbotUri is generated from the values of various Discussion fields and uniquely identifies the object. This can be used for deduplication.
Optional fields, available using fields= argument
sentimentReturns a sentiment score of each individual post, a value ranging from -1.0 (very negative) to 1.0 (very positive).
linksReturns a top-level object (links) containing all hyperlinks found on the page.
metaReturns a top-level object (meta) containing the full contents of page meta tags, including sub-arrays for OpenGraph tags, Twitter Card metadata, schema.org microdata, and -- if available -- oEmbed metadata.
querystringReturns any key/value pairs present in the URL querystring. Items without a discrete value will be returned as true.
breadcrumbReturns a top-level array (breadcrumb) of URLs and link text from page breadcrumbs.

The following is an example response from a successful extraction of comments on a Reddit post.

{
  "request": {
    "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
    "api": "discussion",
    "version": 3
  },
  "objects": [
    {
      "numPages": 1,
      "humanLanguage": "en",
      "confidence": 0.05500000089407453,
      "diffbotUri": "discussion|3|-870809033",
      "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
      "numPosts": 13,
      "type": "discussion",
      "title": "[OC] 66% of Top 50 Russian Exposed Companies Have Announced Sanctions",
      "posts": [
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "images": [
            {
              "naturalHeight": 767,
              "width": 457,
              "diffbotUri": "image|3|-804821395",
              "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
              "url": "https://preview.redd.it/l76k59t8jsm81.png?width=457&auto=webp&s=632efa1f24e607358bbec99c161a6aa579aebfe1",
              "naturalWidth": 457,
              "height": 767
            }
          ],
          "humanLanguage": "en",
          "author": "hicheoo",
          "authorUrl": "https://old.reddit.com/user/hicheoo",
          "diffbotUri": "post|3|29462830",
          "html": "<figure><a href=\"https://i.redd.it/l76k59t8jsm81.png\"><img src=\"https://preview.redd.it/l76k59t8jsm81.png?width=457&auto=webp&s=632efa1f24e607358bbec99c161a6aa579aebfe1\"></img></a></figure>\n<h2>Want to add to the discussion?</h2>\n<p>Post a comment!</p>\n<p>Create an account</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 0,
          "text": "Want to add to the discussion?\nPost a comment!\n\n \nCreate an account",
          "type": "post",
          "title": "[OC] 66% of Top 50 Russian Exposed Companies Have Announced Sanctions"
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "not_mig",
          "authorUrl": "https://old.reddit.com/user/not_mig",
          "diffbotUri": "post|3|-720375378",
          "html": "<p>What's the difference between blue, yellow, and green?</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 1,
          "text": "What's the difference between blue, yellow, and green?",
          "type": "post",
          "parentId": 0
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "hicheoo",
          "authorUrl": "https://old.reddit.com/user/hicheoo",
          "diffbotUri": "post|3|-148816221",
          "html": "<p>They're exemptions. I should've clarified up top, but they're basically all in the description.</p>\n<p>Green: Typical Sanctions<br>\n Yellow: Sanctions, but might be a PR move.<br>\n Blue: Healthcare</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 2,
          "text": "They're exemptions. I should've clarified up top, but they're basically all in the description.\nGreen: Typical Sanctions\nYellow: Sanctions, but might be a PR move.\nBlue: Healthcare",
          "type": "post",
          "parentId": 1
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "Zealousideal-Lie7255",
          "authorUrl": "https://old.reddit.com/user/Zealousideal-Lie7255",
          "diffbotUri": "post|3|-683402068",
          "html": "<p>A lot of oil service companies have no reported sanctions. Like Schlumberger, Baker Hughes. Some Chinese companies too.</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 3,
          "text": "A lot of oil service companies have no reported sanctions. Like Schlumberger, Baker Hughes. Some Chinese companies too.",
          "type": "post",
          "parentId": 0
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "varnima",
          "authorUrl": "https://old.reddit.com/user/varnima",
          "diffbotUri": "post|3|-603833918",
          "html": "<p>JetBrains changed and imposed sanctions <a href=\"https://blog.jetbrains.com/blog/2022/03/11/jetbrains-statement-on-ukraine/\">https://blog.jetbrains.com/blog/2022/03/11/jetbrains-statement-on-ukraine/</a></p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 4,
          "text": "JetBrains changed and imposed sanctions https://blog.jetbrains.com/blog/2022/03/11/jetbrains-statement-on-ukraine/",
          "type": "post",
          "parentId": 0
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "hicheoo",
          "authorUrl": "https://old.reddit.com/user/hicheoo",
          "diffbotUri": "post|3|-296888207",
          "html": "<p>Yeah, they're green in the chart.</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 5,
          "text": "Yeah, they're green in the chart.",
          "type": "post",
          "parentId": 4
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "hicheoo",
          "authorUrl": "https://old.reddit.com/user/hicheoo",
          "diffbotUri": "post|3|624793084",
          "html": "<p><strong>Sources:</strong> - Diffbot Sanctions Tracker (<a href=\"https://www.diffbot.com/insights/every-company-affected-by-sanctions/\">https://www.diffbot.com/insights/every-company-affected-by-sanctions/</a>) - Diffbot Knowledge Graph (more detail on query below)</p>\n<p><strong>Data Viz Tool:</strong> Infogram</p>\n<p><strong>Disclaimer:</strong> I work for Diffbot</p>\n<p>I started by querying the Knowledge Graph for people who live in Russia but work for a non-Russian company. Faceting this query by their employer provides me with a list of non-Russian companies ranked by # of Russian employees.</p>\n<p><code>\ntype:Person location.country.name:&quot;Russia&quot; employments.{employer.{location.country.name!=&quot;Russia&quot; nbLocations&gt;0} isCurrent:true} facet:employments.{employer.name isCurrent:true}\n</code></p>\n<p>This data underrepresents actual employment figures, as there are many employees who do not maintain an internet presence linking them to their employer. Underrepresentation should be fairly equal across all companies, and relative position in the rankings should be accurate.</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 6,
          "text": "Sources: - Diffbot Sanctions Tracker (https://www.diffbot.com/insights/every-company-affected-by-sanctions/) - Diffbot Knowledge Graph (more detail on query below)\nData Viz Tool: Infogram\nDisclaimer: I work for Diffbot\nI started by querying the Knowledge Graph for people who live in Russia but work for a non-Russian company. Faceting this query by their employer provides me with a list of non-Russian companies ranked by # of Russian employees.\ntype:Person location.country.name:\"Russia\" employments.{employer.{location.country.name!=\"Russia\" nbLocations>0} isCurrent:true} facet:employments.{employer.name isCurrent:true}\nThis data underrepresents actual employment figures, as there are many employees who do not maintain an internet presence linking them to their employer. Underrepresentation should be fairly equal across all companies, and relative position in the rankings should be accurate.",
          "type": "post",
          "parentId": 0
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "zzzmick",
          "authorUrl": "https://old.reddit.com/user/zzzmick",
          "diffbotUri": "post|3|-130810969",
          "html": "<p>epam had over 10k employees in Russia</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 7,
          "text": "epam had over 10k employees in Russia",
          "type": "post",
          "parentId": 0
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "hicheoo",
          "authorUrl": "https://old.reddit.com/user/hicheoo",
          "diffbotUri": "post|3|-1458692070",
          "html": "<p>Yup. The data underrepresents actual employment figures, as there are many employees who do not maintain an internet presence linking them to their employer. Underrepresentation should be fairly equal across all companies, and relative position in the rankings should be accurate.</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 8,
          "text": "Yup. The data underrepresents actual employment figures, as there are many employees who do not maintain an internet presence linking them to their employer. Underrepresentation should be fairly equal across all companies, and relative position in the rankings should be accurate.",
          "type": "post",
          "parentId": 7
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "JanitorKarl",
          "authorUrl": "https://old.reddit.com/user/JanitorKarl",
          "diffbotUri": "post|3|-149138223",
          "html": "<p>Schlumberger and Baker Hughes are both in the oilfield services industry.</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 9,
          "text": "Schlumberger and Baker Hughes are both in the oilfield services industry.",
          "type": "post",
          "parentId": 0
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "flumenia",
          "authorUrl": "https://old.reddit.com/user/flumenia",
          "diffbotUri": "post|3|889762151",
          "html": "<p>What if Microsoft stops to extend licenses of Microsoft Office to Russia? That would make the biggest impact, I guess</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 10,
          "text": "What if Microsoft stops to extend licenses of Microsoft Office to Russia? That would make the biggest impact, I guess",
          "type": "post",
          "parentId": 0
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "Imperial_Empirical",
          "authorUrl": "https://old.reddit.com/user/Imperial_Empirical",
          "diffbotUri": "post|3|-179317804",
          "html": "<p>Putin ordered the development of Russian alternatives after the Crimean annexation due to dependancy/spying fears. I believe from 2016 onwards Microsoft was largely fased out internally.</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 11,
          "text": "Putin ordered the development of Russian alternatives after the Crimean annexation due to dependancy/spying fears. I believe from 2016 onwards Microsoft was largely fased out internally.",
          "type": "post",
          "parentId": 10
        },
        {
          "date": "Fri, 11 Mar 2022 00:00:00 GMT",
          "humanLanguage": "en",
          "author": "Nightblood83",
          "authorUrl": "https://old.reddit.com/user/Nightblood83",
          "diffbotUri": "post|3|-901046006",
          "html": "<p>A lot of accountants for commies...</p>",
          "pageUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/",
          "id": 12,
          "text": "A lot of accountants for commies...",
          "type": "post",
          "parentId": 0
        }
      ],
      "tags": [
        {
          "score": 0.8428076505661011,
          "count": 5,
          "label": "economic sanctions",
          "uri": "https://diffbot.com/entity/EWnXSPtH6Osi0pmx8-WPKAg",
          "rdfTypes": [
            "http://dbpedia.org/ontology/Miscellaneous"
          ]
        }
      ],
      "participants": 9,
      "rssUrl": "https://old.reddit.com/r/dataisbeautiful/comments/tbvdhu/oc_66_of_top_50_russian_exposed_companies_have/.rss"
    }
  ]
}

Optional Fields

Specify each field desired (comma delimited) in the &fields= argument. In addition to the fields listed below, there are also more fields available with all Extract APIs .

FieldDescription
sentimentReturns a sentiment score of each individual post, a value ranging from -1.0 (very negative) to 1.0 (very positive).

Already have the source HTML? POST it to Discussion API.

Discussion API supports a POST option that allows you to upload HTML or plain text for extraction. See Extract Content Not Available Online.

Language
Authorization
Query
Click Try It! to start a request and see the response here!