Highlighting¶
The Atlas Search highlight
option adds fields to the result set
that display search terms in their original context.
You can use it in conjunction with $search
operators to display search terms as they appear in the returned
documents, along with the adjacent text content (if any). highlight
results are returned as part of the $meta
field.
Syntax¶
highlight
has the following syntax:
{ $search: { "index": "<index name>", // optional, defaults to "default" "<operator>": { // may be ``search``, ``term``, ``compound``, or ``span`` "query": "<search-string>", "path": "<field-to-search>" }, "highlight": { "path": "<field-to-search>", "maxCharsToExamine": "<number-of-chars-to-examine>", // optional, defaults to 500,000 "maxNumPassages": "<number-of-passages>" // optional, defaults to 5 } } }, { $project: { "highlights": { "$meta": "searchHighlights" } } } }
Options¶
Field | Type | Description | Required? |
---|---|---|---|
path | string | Document field to search. The
See Path Construction for more information. | yes |
maxCharsToExamine | int | Maximum number of characters to examine on a document when
performing highlighting for a field. If omitted, defaults to
500,000 , which means that Atlas Search only examines the first
500,000 characters in the search field in each document
for highlighting. | no |
maxNumPassages | int | Number of high-scoring passages to return per document in the
highlights results for each field. A passage is roughly the
length of a sentence. If omitted, defaults to 5, which means
that for each document, Atlas Search returns the top 5 highest-scoring
passages that match the search text. | no |
The "$meta": "searchHighlights"
field contains the highlighted
results. That field isn’t part of the original document, so it is
necessary to use a $project pipeline stage to add it to
the query output.
Examples¶
The following examples use a collection called fruit
that contains
the following documents:
{ "_id" : 1, "type" : "apple", "description" : "Apples come in several varieties, including Fuji, Granny Smith, and Honeycrisp. The most popular varieties are McIntosh, Gala, and Granny Smith." }, { "_id" : 2, "type" : "banana", "description" : "Bananas are usually sold in bunches of five or six." } { "_id" : 3, "type" : "pear", "description" : "Bosc and Bartlett are the most common varieties of pears." }
One useful aspect of highlighting is that it reveals the original text returned by the search query, which may not be exactly the same as the search term. For example, if you use a language-specific analyzer, your text searches return all the stemmed variations of your search terms.
The fruit
collection has an index definition that uses the english analyzer and dynamic field mappings.
{ "analyzer": "lucene.english", "searchAnalyzer": "lucene.english", "mappings": { "dynamic": true } }
The following query searches for variety
and bunch
in the
description
field of the fruit
collection, with the
highlight
option enabled.
The $project
pipeline stage restricts the output to the description
field
and adds a new field called highlights
, which contains
highlighting information.
db.fruit.aggregate([ { $search: { "text": { "path": "description", "query": ["variety", "bunch"] }, "highlight": { "path": "description" } } }, { $project: { "description": 1, "_id": 0, "highlights": { "$meta": "searchHighlights" } } } ])
The query returns the following results:
{ "description" : "Bananas are usually sold in bunches of five or six. ", "highlights" : [ { "path" : "description", "texts" : [ { "value" : "Bananas are usually sold in ", "type" : "text" }, { "value" : "bunches", "type" : "hit" }, { "value" : " of five or six. ", "type" : "text" } ], "score" : 1.2816506624221802 } ] } { "description" : "Bosc and Bartlett are the most common varieties of pears.", "highlights" : [ { "path" : "description", "texts" : [ { "value" : "Bosc and Bartlett are the most common ", "type" : "text" }, { "value" : "varieties", "type" : "hit" }, { "value" : " of pears.", "type" : "text" } ], "score" : 1.2691514492034912 } ] } { "description" : "Apples come in several varieties, including Fuji, Granny Smith, and Honeycrisp. The most popular varieties are McIntosh, Gala, and Granny Smith. ", "highlights" : [ { "path" : "description", "texts" : [ { "value" : "Apples come in several ", "type" : "text" }, { "value" : "varieties", "type" : "hit" }, { "value" : ", including Fuji, Granny Smith, and Honeycrisp. ", "type" : "text" } ], "score" : 1.0356334447860718 }, { "path" : "description", "texts" : [ { "value" : "The most popular ", "type" : "text" }, { "value" : "varieties", "type" : "hit" }, { "value" : " are McIntosh, Gala, and Granny Smith. ", "type" : "text" } ], "score" : 1.0910683870315552 } ] }
The search term bunch
returns a match on the document with
_id: 2
, because the description
field contains the word
bunches
. The search term variety
returns a match on the
documents with _id: 3
and _id: 1
, because the
description
field contains the word varieties
.
Output¶
The highlights
field is an array containing the following
output fields:
Field | Type | Description |
---|---|---|
path | string | Document field which returned a match. |
texts | array of objects | Each search match returns one or more objects, containing the matching text and the surrounding text (if any). |
texts.value | string | Text from the field which returned a match. |
texts.type | string | Returned value is either hit or text . Results of type
hit contain the string which returned a match. Results of
type text contain the text content adjacent to the matching
string. |
score | float | The score assigned to the highlights
object. The highlights score is a measure of the relevance
of the highlights object to the query. If multiple
highlights objects are returned, the most relevant
highlights object has the highest score. |