Custom Scoring & Relevance

Custom Scoring

You can use should and must clauses for boosting and ranking results. They can be used to implement a custom scoring function.

should clause

The should operator allows you to specify optional clauses which result in matching entities appearing higher in the search.

For example: to search for all employees currently employed at Google but boost the ones employed in leadership roles, you can use this query:

type:Person employments.{isCurrent:true employer.name:"Google" should:categories.name:"leadership"}

The should:categories.name:"leadership" clause does an optional match on employments.categories.name but boosts those results.

You can specify multiple should clauses. For example: to search for all employees currently employed at Google but boost the ones employed in leadership roles and located in United States, you can use this query:

type:Person employments.{isCurrent:true employer.name:"Google" should:categories.name:"leadership" should:location.country.name:"United States of America"}

When using multiple should clauses, you can specify different weights for clauses using the should[<weight>] syntax where weight is a number between 1 and 100.

For example, to specify double the weight to the categories field, you can use this query:

type:Person employments.{isCurrent:true employer.name:"Google" should[100]:categories.name:"leadership" should[50]:location.country.name:"United States of America"}

The clause should[100]:categories.name has a weight of 100 and the clause should[50]:location.country.name has a weight of 50.

When using a should clause, you can also name the clause to figure out which clause matched the result. The syntax for naming a should clause is should[weight,clause_name] For example, the following query names the following clauses:

  • should[100,"isLeader"]:categories.name:"leadership" clause is named as isLeader
  • should[50,"isUS"]:location.country.name:"United States of America" clause is named as isUS

type:Person employments.{isCurrent:true employer.name:"Google" should[100,"isLeader"]:categories.name:"leadership" should[50,"isUS"]:location.country.name:"United States of America"}

The json response indicates the matched clauses in the data[].entity_ctx.matched_clauses element indicating that both isLeader and isUS named clauses matched:

"data": [
  {
    "score": 1,
    "entity": {...},
    "entity_ctx": {
      "matched_clauses": [
        "isLeader",
				"isUS"
      ]
    }
]

must clause

You can also specify weight for non-optional clauses using the must keyword.

For example, to search for AI companies with optional Series A investment, but giving more weight to AI companies:

type:Organization investments.{should[50]:series:"Series A" isCurrent:true} must[100]:industries:"Artificial Intelligence Companies"

Relevance

inner_hits

When matching multi-valued fields (nested fields) such as employments, locations etc. the response message contains the data[].entity_ctx.inner_hits element which indicates the zero-based index of the employments, locations etc. which matched the query. If multiple records match the query, they are sorted in order of relevance (generally, more matching clauses indicates higher relevance). For example, this response indicates that the second (index=1) and third (index=2) employments match the query:

"data": [
  {
    "score": 1,
    "entity": {...},
    "entity_ctx": {
      "inner_hits": {
        "employments": [
          1,
          2
        ]
      }
    }
]