Docs Home → Launch & Manage MongoDB → MongoDB Atlas
Simple Analyzer
The simple
analyzer divides text into searchable terms (tokens)
wherever it finds a non-letter character, such as whitespace,
punctuation, or one or more digits. It converts all text to lower case.
Important
Atlas Search won't index string fields that exceed 32766 bytes in size.
Example
The following example index definition specifies an index on
the title
field in the sample_mflix.movies
collection using the simple
analyzer. If you loaded the collection
on your cluster, you can create the example index using the
Atlas UI Visual Editor or the JSON Editor. After you select your
preferred configuration method, select the database and collection.
The following query searches for the term lion
in the title
field and limits the output to five results.
1 db.movies.aggregate([ 2 { 3 "$search": { 4 "text": { 5 "query": "lion", 6 "path": "title" 7 } 8 } 9 }, 10 { 11 "$limit": 5 12 }, 13 { 14 "$project": { 15 "_id": 0, 16 "title": 1 17 } 18 } 19 ])
[ { title: 'White Lion' }, { title: 'The Lion King' }, { title: 'The Lion King 1 1/2' }, { title: 'The Lion King 1 1/2' }, { title: 'Lion's Den' }, ]
Atlas Search returns these documents by doing the following for the text in the
title
field using the lucene.simple
analyzer:
Convert text to lowercase.
Create separate tokens by dividing text wherever there is a non-letter character.
The following table shows the tokens that Atlas Search creates using the Simple Analyzer and, by contrast, the Standard Analyzer and Whitespace Analyzer for the documents in the results:
Title | Simple Analyzer Tokens | Standard Analyzer Tokens | Whitespace Analyzer Tokens |
---|---|---|---|
White Lion | white , lion | white , lion | White , Lion |
The Lion King | the , lion , king | the , lion , king | The , Lion , King |
The Lion King 1 1/2 | the , lion , king | the , lion , king , 1 , 1 , 2 | The , Lion , King , 1 , 1/2 |
Lion's Den | lion , s , den | lion's , den | Lion's , Den |
Atlas Search returns document Lion's Den
in the results because the
simple
analyzer creates a separate token for lion
, which matches
the query term lion
. By contrast, if you index the field using the
Standard Analyzer or Whitespace Analyzer, Atlas Search
would return some of the documents in the results for the query, but not
Lion's Den
because these analyzers would create the tokens
lion's
and Lion's
respectively, but don't create a token for
lion
.