Navigation

Index Definitions

When you configure an Atlas Search index, you can specify that certain fields should be indexed with a particular analyzer or with multiple analyzers. You can also specify that certain fields should be indexed while others are left unindexed, or you can dynamically index all the fields in a collection.

Important

If you use the $out aggregation stage to modify a collection with an Atlas Search index, you must delete and re-create the search index. If possible, consider using $merge instead of $out.

Static and Dynamic Mappings

Individual field mappings that you configure when you create the index are called static mappings.

Mappings that are automatically assigned when new data is inserted into a field are called dynamic mappings. Dynamic mappings are useful if you have a dynamic schema and you don’t know ahead of time all the fields that a collection may contain. You can configure an entire index to use dynamic mappings, or specify individual fields to be dynamically mapped.

Note

Dynamically mapped indexes occupy more disk space than static mappings and may be less performant.

See the index configuration example.

Defining an Index

When you create a new Atlas Search index, you have the opportunity to specify a custom definition for the index. See Create an Atlas Search Index for complete instructions on creating a new Atlas Search index.

Screenshot of Create an Atlas Search Index modal window

The index name defaults to default. You can leave the default name in place or choose one of your own.

Note

If you name your index default, you don’t need to specify an index parameter when using the $search pipeline stage. Otherwise, you must specify the index name using the index parameter.

Index names must be unique within their namespace.

The default index definition will work with any collection. If you wish to create a custom index definition, you can specify which fields should be indexed with which analyzer and as which data type.

Static Mapping Example

The following example index definition file uses static mappings.

  • The default index analyzer is lucene.standard.

  • The default search analyzer is lucene.standard.

  • The index specifies static field mappings (dynamic: false).

  • The field address is of type document. It has two embedded sub-fields, city and state.

  • The city sub-field uses the lucene.simple analyzer by default for queries. It uses the ignoreAbove option to ignore any string of more than 255 bytes in length.

  • The state sub-field uses the lucene.english analyzer by default for queries.

  • The company field is of type string. It uses the lucene.whitespace analyzer by default for queries. It has a multi analyzer named mySecondaryAnalyzer which uses the lucene.french analyzer by default for queries.

    For more information on multi analyzers, see Path construction.

  • The employees field is an array of strings. It uses the lucene.standard analyzer by default for queries. For indexing arrays, Atlas Search only requires the data type of the array elements. You don’t have to specify that the data is contained in an array in the index definition.

  • The dynamic value is set to false, so any fields which are not explicitly mentioned are not indexed.

{
  "analyzer": "lucene.standard",
  "searchAnalyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "address": {
        "type": "document",
        "fields": {
          "city": {
            "type": "string",
            "analyzer": "lucene.simple",
            "ignoreAbove": 255
          },
          "state": {
            "type": "string",
            "analyzer": "lucene.english"
          }
        }
      },
      "company": {
        "type": "string",
        "analyzer": "lucene.whitespace",
        "multi": {
          "mySecondaryAnalyzer": {
            "type": "string",
            "analyzer": "lucene.french"
          }
        }
      },
      "employees": {
        "type": "string",
        "analyzer": "lucene.standard"
      }
    }
  }
}

Custom Analyzers

You can also define a custom analyzer within an index definition. Custom analyzers allow you to create an Atlas Search mechanism tailored to your specific needs.

BSON Data Types

The table below enumerates all the BSON data types and whether they are included in an Atlas Search index with dynamic mappings.

BSON Type Dynamic Index? Atlas Search Data Type
Double yes number
32-bit integer yes number
64-bit integer yes number
String yes string
Date yes date
Object yes document
ObjectId no objectId
Boolean no boolean
Timestamp no  
Array yes  
Binary Data no  
Null no  
Regular Expression no  
JavaScript no  
Decimal128 no  
Min key no  
Max key no  

Atlas Search Data Types

autocomplete

The autocomplete type is for indexing text values for autocompletion. The indexed fields can only be queried with the autocomplete operator.

Note

The autocomplete type can’t be used to index fields whose value is an array of strings.

The autocomplete type takes the following options:

Option Type Purpose Necessity Default
type string The type of field. Value must be autocomplete. required  
maxGrams int The maximum number of characters per indexed sequence. The value limits the character length of indexed tokens. When you search for terms longer than the maxGrams value, tokens are truncated to the maxGrams length. optional 15
minGrams int The minimum number of characters per indexed sequence. The recommend minimum value is 4. A value that is less than 4 could impact performance because the size of the index can become very large. The default value of 2 is only recommended for edgeGram. optional 2
tokenization enum

The tokenization strategy to use when indexing the field for autocompletion. Value can be one of the following:

  • edgeGram - to create indexable tokens, referred to as grams, from variable-length character sequences starting at the beginning of the words and delimited by whitespace.
  • nGram - to create indexable tokens, referred to as grams, by sliding a variable-length character window over a word. Atlas Search creates more tokens for nGram than edgeGram and because of this, nGram takes more space and time to index the field. This is better suited for querying languages with long, compound words or languages that don’t use spaces.

For example, consider the following sentence:

The quick brown fox jumps over the lazy dog.

When tokenized with minGrams value of 2 and maxGrams value of 5, the following sequence of characters are indexed:

Th
The
The{SPACE}
The q
qu
qui
quic
quick
...
Th
The
The{SPACE}
The q
he
he{SPACE}
he q
he qu
e{SPACE}
e q
e qu
e qui
{SPACE}q
{SPACE}qu
{SPACE}qui
{SPACE}quic
qu
qui
quic
quick
...

Note

Indexing a field for autocomplete with edgeGram or nGram tokenization strategy is more computationally expensive and takes more space than indexing a regular string field.

optional edgeGram
foldDiacritics boolean

The setting to indicate whether diacritics should be included or removed from the indexed text. Value can be one of the following:

  • true - to remove diacritic marks
  • false - to include diacritic marks
optional true

Example

{
  "mappings": {
    "dynamic": true|false,
    "fields": {
      "<field-name>": [
        {
          "type": "autocomplete",
          "tokenization": "edgeGram|nGram",
          "minGrams": <2>,
          "maxGrams": <15>,
          "foldDiacritics": true|false
        }
      ]
    }
  }
}

boolean

The boolean data type is for indexing true and false values. It works in conjunction with the equals operator.

Note

Fields of type boolean cannot be dynamically indexed. They must be specifically indexed as part of a static mapping.

Example

The following example index definition maps a field named verified_user with the boolean data type and field named teammates with the objectId data type.

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "verified_user": {
        "type": "boolean"
      },
      "teammates": {
        "type": "objectId"
      }
    }
  }
}

objectId

The objectId data type is for indexing ObjectId fields. It works in conjunction with the equals operator.

Note

Fields of type objectId cannot be dynamically indexed. They must be specifically indexed as part of a static mapping. See the example in the boolean section on this page.

string

The string data type takes the following parameters:

Option Type Purpose Default
analyzer string Name of a built-in or overridden analyzer to use for indexing the field. lucene.standard
searchAnalyzer string Analyzer to use when querying the field. lucene.standard
indexOptions string

Specifies the amount of information to store for the indexed field. Value can be one of the following:

  • docs - Only indexes documents. The frequency and position of the indexed term are ignored. Only a single occurence of the term is reflected in the score.
  • freqs - Only indexes documents and term frequency. The position of the indexed term is ignored.
  • positions - Indexes documents, term frequency, and term positions.
  • offsets - (Default) Indexes documents, term frequency, term positions, and term offsets. This is required for Highlighting.
offsets
store boolean Specifies whether or not to store the exact document text as well as the analyzed values in the index. Value can be true or false. The value for this must be true for Highlighting. true
ignoreAbove int Do not index if the field value is greater than the specified number of characters. None
multi string

The string field to index with the name of the alternate analyzer specified in the multi object.

Example

The following index definition for a library.books collection indexes string values in the field text with the lucene.english and lucene.french analyzers in addition to the default lucene.standard analyzer:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "text": {
        "type": "string",
        "multi": {
          "english": {
            "type": "string",
            "analyzer": "lucene.english"
          },
          "french": {
            "type": "string",
            "analyzer": "lucene.french"
          }
        }
      }
    }
  }
}
None
norms string

Specifies whether to include or omit the field length in the result when scoring. The length of the field is determined by the number of tokens produced by the analyzer for the field. Value can be one of the following:

  • include - to include the field length when scoring.
  • omit - to omit the field length when scoring.

If value is include, Atlas Search uses the length of the field to determine the higher score when scoring. For example, if two documents match an Atlas Search query, the document with the shorter field length will score higher than the document with the longer field length.

If value is omit, Atlas Search ignores the field length when scoring.

include

document

The document data type is for fields with embedded documents. It takes the following parameters:

Option Type Purpose Default
type string Must be document. none
dynamic boolean

If set to true, Atlas Search indexes all fields in the collection except:

  • geo fields.
  • Any fields explicitly excluded by the fields parameter.

If set to false, you must specify individual fields to index.

true
fields document Maps document field names to field definitions. See the example on this page. none

date

The date type is for indexing date values. A date cannot be indexed if it is part of an array. It takes the type option. The value of type must be date.

number

The number type is for fields with numeric values of int32, int64, and double data types. The number type has the following options:

Option Purpose Default
type The type of field. Value must be number.  
representation

The data type of the field to index. Valid values are:

  • int64 - for indexing large integers without loss of precision and for rounding double values to integers. This cannot be used to index large double values.
  • double - for indexing large double values without rounding.

Example

The following index definition for the sample_analytics.accounts collection in the sample dataset indexes the account_id field with 64 bit integer values. The following example also indexes all other integer and small double type values in the id field after rounding any decimal values in the double type before indexing.

{
   "mappings": {
       "dynamic": false,
       "fields": {
          "account_id": {
             "type": "number",
             "representation": "int64"
          }
       }
   }
}
double
indexIntegers

Index or omit int32 and int64 type values. Value can be true or false.

Example

The following index definition for the sample_airbnb.listingsAndReviews collection in the sample dataset omits the bathrooms field with 32 and 64 bit integer values. The following example will index the bathrooms field with double type values.

{
   "mappings": {
      "dynamic": false,
      "fields": {
         "bathrooms": {
            "type": "number",
            "indexIntegers": false
         }
      }
   }
}
true
indexDoubles

Index or omit double type values. Value can be true or false.

Example

The following index definition for the sample_analytics.accounts collection in the sample dataset:

  • Indexes the account_id field with integer values.
  • Omits the account_id field with doubles values.
{
   "mappings": {
      "dynamic": false,
      "fields": {
         "account_id": {
            "type": "number",
            "representation": "int64",
            "indexDoubles": false
         }
      }
   }
}
true

geo

The geo type is for indexing geographic point and shape coordinates. For this type, the indexed field must be a GeoJSON object.

Option Purpose Default
type The type of field. Value must be geo.  
indexShapes

Specifies whether or not to index shapes. By default:

  • Shape geometries such as lines and polygons are not indexed.
  • Points are indexed, even when nested.

Value can be:

  • true to index shapes and points
  • false to index only points
false

Example

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "type": "document",
      "<field-name>": {
        "indexShapes": true|false
        "type": "geo"
      }
    }
  }
}

Limitation

Atlas Search cannot index numeric or date values if they are part of an array.