Filtering Fields

Filter the results of DQL or Bulk Enhance APIs requests to just the fields you need.

You can specify the filter parameter with DQL and Bulk Enhance APIs to return a subset of fields in in the JSON response.

Note that this is not the same as filtering results from the Knowledge Graph, which relies on DQL. Instead, the filter parameter is used to constrain the actual fields returned from each entity record returned from the API response of DQL or Bulk Enhance.

Easy Mode

For simple filtering operations, simply provide a space delimited (and encoded) list of fields you wish for DQL or Bulk Enhance to return under the filter parameter. For example, &filter=name%20description.

Advanced Mode

For more advanced cases, such as specifying how many of a list of industries to return, or the ideal employment record, you may use JsonPath. We've implemented a variant of the original JsonPath specification for our use case, though most of the language from the original spec will be followed in the guide below.

Basic Structure of Path

JsonPath expressions always refer to a JSON structure in the same way as XPath expression are used in combination with an XML document. The "root member object" in JsonPath is always referred to as $ regardless if it is an object or array.

JSONPath expressions can use:

  1. Dot–notation when path segment matches [a-zA-Z0-9_]* pattern
    • Example: $.location.country.name gets only the country name from the primary location.
  2. or bracket–notation
    • Example: $['locations']['country']['name'] gets only the country name from the all locations.
  3. Wildcard operator to match a single node
    • Example: $.locations.*.name
    • Example: $['locations'][*]['name']
  4. Recursive-descent operator to match any number of interleaving nodes (from E4X)
    • Example: $.locations..name
    • Example: $['locations']..['name']

🚧

You will be charged credits for executing the query below.

You can control the number of credits by changing the size parameter. Start with size=5 to get an idea of the output. Substitute YOURDIFFBOTTOKEN with your Diffbot token in the script below.

TOKEN=YOURDIFFBOTTOKEN
curl --request GET \
     --url "https://kg.diffbot.com/kg/dql_endpoint?token=${TOKEN}&size=5&query=type%3AOrganization&filter=%24.location.country.name" \
     --header 'Accept: application/json'

Operators

OperatorDescription
$The root element to query. This starts all path expressions.
@The current node being processed by a filter predicate.
*Wildcard. Available anywhere a name or numeric are required.
..Deep scan. Available anywhere a name is required.
.<name>Dot-notated child
['<name>' (, '<name>')]Bracket-notated child or children
[<number> (, <number>)]Array index or indexes
[start:end]Array slice operator (from ECMA 2022 Language Specification)
[?(<expression>)]Filter expression. Expression must evaluate to a boolean value.

Filter Operators

Filters are logical expressions used to filter arrays.

  • A typical filter would be [?(@.year > 2018)] where @ represents the current item being processed.
  • More complex filters can be created with logical operators && and ||. String literals must be enclosed by single or double quotes ([?(@. name == 'Spain')] or [?(@.name == "France")]).
  • You can use ! to negate a predicate [?(!(@.year < 2018 && @.year > 2020))].
OperatorDescription
==left is equal to right (note that 1 is not equal to '1')
!=left is not equal to right
<left is less than right
<=left is less or equal to right
>left is greater than right
> =left is greater than or equal to right
=~left matches regular expression [?(@.name =~ /foo.*?/i)]
inleft exists in right [?(@.name in ['S', 'M'])]
ninleft does not exists in right
subsetofleft is a subset of right [?(@.sizes subsetof ['S', 'M', 'L'])]
anyofleft has an intersection with right [?(@.sizes anyof ['M', 'L'])]
noneofleft has no intersection with right [?(@.sizes noneof ['M', 'L'])]
sizesize of left (array or string) should match right
emptyleft (array or string) should be empty

Path Examples

The examples will refer to this partial Diffbot Organization entity sample:

{
  "type": "Corporation",
  "name": "IBM",
  "homepageUri": "ibm.com",
  "nbEmployees": 345000,
  "yearlyRevenues": [
    {
      "revenue": {
        "value": 7.362E+10
      },
      "isCurrent": false,
      "year": 2020
    },
    {
      "revenue": {
        "value": 7.9591E+10
      },
      "isCurrent": false,
      "year": 2018
    }
  ],
  "capitalization": {
    "currency": "USD",
    "value": 1.12935797E+11
  },
  "categories": [
    {
      "name": "Computer Hardware Companies"
    },
    {
      "name": "Cloud Computing Companies"
    },
    {
      "name": "Software Consulting Firms"
    }
  ],
  "locations": [
    {
      "country": {
        "summary": "Sovereign state in North America",
        "name": "United States of America"
      },
      "isCurrent": true,
      "address": "1 New Orchard Road, Armonk, 10504-1722, New York, United States"
    },
    {
      "country": {
        "summary": "Sovereign state in Southern Africa",
        "name": "South Africa"
      },
      "isCurrent": false,
      "address": "90 Grayston Dr, Sandton, Gauteng Province, South Africa"
    }
  ]
}
JsonPathResult
$.nameThe name of the entity
$.locations[?(@.country.name=='United States of America')]All locations in US
$.locations[?(@.country.name=='United States of America')].['address', 'isCurrent']address and isCurrent for all locations in US
$.locations[*].addressThe address of all locations
$.locations[0]The first location
$.locations[-2]The second to last location
$.locations[0,1]The first two locations
$.locations[:2]All locations from index 0 (inclusive) until index 2 (exclusive)
$.locations[1:2]All locations from index 1 (inclusive) until index 2 (exclusive)
$.locations[-2:]Last two locations
$.locations[?(@.isCurrent)]All locations which are current
$.yearlyRevenues[?(@.year > 2018)]yearlyRevenues for years > 2018

Specifying multiple paths

Multiple paths will be ; separated.

Example: $.name;$.homepageUri;$.yearlyRevenues[?(@.year > 2018)] specifies 3 elements to be returned (separated by ;):

  • $.name: entity name
  • $.homepageUri: entity homepageUri
  • $.yearlyRevenues[?(@.year > 2018)]: yearlyRevenues for years > 2018

DQL specific options

mostRelevant() operator for

DQL supports mostRelevant() function to select the element which is most relevant to the DQL query.

For example, employments[mostRelevant()].{employer.name title} selects the employer.name and title fields of the employment that best matches the DQL query such as type:Person employments.title:"CEO".

If no clause is specified for the employment, for example, all employments are returned.

For example, employments[mostRelevant()].{employer.name title} returns employer.name and title fields of all employments for the DQL query type:Person skills.name:"cyber security".

Enhance specific options

Getting query parameters and score

JSONPath specification can be used in the exportspec parameter when exporting results in CSV/XLS/XLSX format. You can specify special paths in the filter specification to export the Enhance query fields and score.

Query info can be accessed using:

  • #query.id
  • #query.type
  • #query.name
  • #query.url
  • #query.location
  • #query.phone
  • #query.email
  • #query.employer
  • #query.title
  • #query.school

Score can be accessed by #score.

Here is an example of exportspec using these special paths:
#query.type,Query Type;#query.name,Query Name;#score,Score;name,Name;homepageUri,Homepage;ceo.name,CEO;industries,Industries;summary,Summary;nbEmployeesMax,Employees;location.address,Address;location.city.name,City

Why a variant of JsonPath?

  • JsonPath does not support multiple paths being applied to a single Json. This is reflected in their recommended output structure. With this variant, multiple elements can be selected from a single json by specifying multiple paths.
  • The output format as specified by JsonPath is simplistic and loses the original structure of the document.

📘

Compatibility Notes

  1. JsonPath supports [start:end:step] from ECMASCRIPT4, but we won't support step to be compatible with ECMA 2022. It doesn't seem to be supported by the Jayway Java implementation either.
  2. JsonPath uses .length to refer to the length of the array. This is inconsistent when there is actually an element named length, so we don't support this.
  3. We don't support any aggregation functions provided by the Jayway Java implementation as the goal of this implementation is to filter json.
  4. We won't support referencing an absolute path through an expression like $..book[?(@.price <= $['expensive'])]
  5. The original spec supports unions but not multiple paths. It's easier to use multiple paths when the filtered nodes are disjoint.

References for JsonPath