Docs Menu

Docs HomeLaunch & Manage MongoDBMongoDB Atlas

Whitespace Analyzer

The whitespace analyzer divides text into searchable terms (tokens) wherever it finds a whitespace character. It leaves all text in its original letter case.

Important

Atlas Search won't index string fields that exceed 32766 bytes in size.

The following example index definition specifies an index on the title field in the sample_mflix.movies collection using the whitespace analyzer. If you loaded the collection on your cluster, you can create the example index using the Atlas UI Visual Editor or the JSON Editor. After you select your preferred configuration method, select the database and collection.

The following query searches for the term Lion's in the title field.

db.movies.aggregate([
{
"$search": {
"text": {
"query": "Lion's",
"path": "title"
}
}
},
{
"$project": {
"_id": 0,
"title": 1
}
}
])
[
{ title: 'Lion's Den' },
{ title: 'The Lion's Mouth Opens' }
]

Atlas Search returns these documents by doing the following for the text in the title field using the lucene.whitespace analyzer:

  • Retain the original letter case for the text.

  • Divide the text into tokens wherever it finds a whitespace character.

The following table shows the tokens (searchable terms) that Atlas Search creates using the Whitespace Analyzer and, by contrast, the Simple Analyzer and Keyword Analyzer for the documents in the results:

Title
Whitespace Analyzer Tokens
Simple Analyzer Tokens
Keyword Analyzer Tokens
Lion's Den
Lion's, Den
lion, s, den
Lion's Den
The Lion's Mouth Opens
The, Lion's, Mouth, Opens
the, lion, s, mouth, opens
The Lion's Mouth Opens

The index that uses whitespace analyzer is case-sensitive. Therefore, Atlas Search is able to match the query term Lion's to the token Lion's created by the whitespace analyzer.

←  Simple AnalyzerKeyword Analyzer →