Docs Menu

Docs HomeLaunch & Manage MongoDBMongoDB Atlas

Simple Analyzer

The simple analyzer divides text into searchable terms (tokens) wherever it finds a non-letter character, such as whitespace, punctuation, or one or more digits. It converts all text to lower case.

Important

Atlas Search won't index string fields that exceed 32766 bytes in size.

The following example index definition specifies an index on the title field in the sample_mflix.movies collection using the simple analyzer. If you loaded the collection on your cluster, you can create the example index using the Atlas UI Visual Editor or the JSON Editor. After you select your preferred configuration method, select the database and collection.

The following query searches for the term lion in the title field and limits the output to five results.

1db.movies.aggregate([
2 {
3 "$search": {
4 "text": {
5 "query": "lion",
6 "path": "title"
7 }
8 }
9 },
10 {
11 "$limit": 5
12 },
13 {
14 "$project": {
15 "_id": 0,
16 "title": 1
17 }
18 }
19])
[
{ title: 'White Lion' },
{ title: 'The Lion King' },
{ title: 'The Lion King 1 1/2' },
{ title: 'The Lion King 1 1/2' },
{ title: 'Lion's Den' },
]

Atlas Search returns these documents by doing the following for the text in the title field using the lucene.simple analyzer:

  • Convert text to lowercase.

  • Create separate tokens by dividing text wherever there is a non-letter character.

The following table shows the tokens that Atlas Search creates using the Simple Analyzer and, by contrast, the Standard Analyzer and Whitespace Analyzer for the documents in the results:

Title
Simple Analyzer Tokens
Standard Analyzer Tokens
Whitespace Analyzer Tokens
White Lion
white, lion
white, lion
White, Lion
The Lion King
the, lion, king
the, lion, king
The, Lion, King
The Lion King 1 1/2
the, lion, king
the, lion, king, 1, 1, 2
The, Lion, King, 1, 1/2
Lion's Den
lion, s, den
lion's, den
Lion's, Den

Atlas Search returns document Lion's Den in the results because the simple analyzer creates a separate token for lion, which matches the query term lion. By contrast, if you index the field using the Standard Analyzer or Whitespace Analyzer, Atlas Search would return some of the documents in the results for the query, but not Lion's Den because these analyzers would create the tokens lion's and Lion's respectively, but don't create a token for lion.

←  Standard AnalyzerWhitespace Analyzer →