What Organization Classifications are supported in the graph?

Industry Classifications Supported by the Diffbot Knowledge Graph
Each organization in the Diffbot Knowledge Graph can be associated with multiple industry classifications from both an internal taxonomy and standard industry registries.

Diffbot Classification
The Diffbot Classification taxonomy currently contains over 300 industry labels and is organized into a four-level hierarchy. During the KG creation, one to multiple industry labels are inferred for each organization. These labels are currently presented in the KG under three fields:

  • Organization.industries (type: String) - the industry labels are presented as textual labels.
  • Organization.categories (type: LinkedEntity) - the industry labels are presented as linked entities, but the linked pages are not currently populated with more information about the industry label.
  • Organization.diffbotClassification (type: ClassificationCode) - the industry labels are presented as classification codes, which provide additional information about each industry. Each entry in this list includes a boolean "isPrimary" attribute and a level number. The list is also sorted so that the more specific primary industry labels appear first.

Please Note: the content of these fields is the same, the difference is in the value type which allows us to include more or less information.

Descriptors
In addition to the industry classification fields, there is another field that lists free-form textual attributes used to describe the organizations in the Diffbot Knowledge Graph. These attributes are extracted from the web, they usually refer to any breadcrumb or categories used to organize the content of specific sources. They provide additional context to help understand the organization's activities and products or services offered and can be used as fine-grained categories to fill the gaps in our Diffbot Classification taxonomy.

Organization.descriptors (type: String) - free-form textual attributes used to describe the organization across various websites

Standard Industry Registries
In addition to the industry classification labels mentioned above, the Diffbot Knowledge Graph also includes classification codes for various standard industry registries. These codes provide a standardized way of referring to industries and can be useful for cross-referencing and comparing organizations with external sources.

We suppose these fields in large part by extracting values from the web, with the exception of SIC and NAICS codes which are also inferred as previously described, and NACE and ISIC that we also enrich with possible crosswalks from NAICS. The original codes and names of each classification code are retained when translations of the names into English are not available.

type:ClassificationCode

  • The Standard Industrial Classification (SIC) - sicClassification - is a system for classifying industries by a four-digit code. We use the version adopted by the U.S. Securities and Exchange Commission.
  • The International Standard of Industrial Classification (ISIC) - iSicClassification - of All Economic Activities code was developed by the UN as a standard way of classifying economic activities into 4-digit group codes. We use ISIC Rev. 4 (Edition 2016).
  • The North American Industry Classification System (NAICS)- naicsClassification - is the standard used by United States Federal statistical agencies in classifying business establishments for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy.
  • A Merchant Category Code (MCC) - mccClassification - is a four-digit number listed in ISO 18245 for retail financial services. An MCC is used to classify a business by the types of goods or services it provides.
  • The Statistical classification of economic activities in the European Community (NACE) - naceClassification - is is a 4-digit classification providing the framework for collecting and presenting a large range of statistical data. We use NACE Rev. 2 (2008). Inferred from iSicCalssification.
  • The Australian and New Zealand Standard Industrial Classification (ANZSIC) - anzSicClassification - is the standard classification used in Australia and New Zealand for the collection, compilation and publication of statistics by industry.
  • Ukrainian Economic Activities Classification System code (KVED) - kvedClassification.
  • The U.K. Standard Industrial Classification (SIC) system - ukSicClassification - used by Companies House. Companies House uses a condensed version of the full list of codes available from the Office of National Statistics (ONS).
  • Russian Economic Activities Classification System code (OKVED) - okvedClassification.
  • NAF is the French national statistical classification of business activities - nafClassification.
  • Norway statistical classification (SSB) - ssbClassification - is a statistical standard used in Norway that splits economy into sectors on the basis of groups of homogeneous institutional units.
  • Thailand Standard Industrial Classification (TSIC) - tsicClassification.

See:
Diffbot Documentation: Organization industries, KG Ontology Organization Entity,