Navigation

Language Analyzers

Language-specific analyzers provide a convenient way to create indexes tailored to a particular language. Each language analyzer has built-in stopwords and word divisions based on that language’s usage patterns.

Atlas Search offers the following language analyzers:

lucene.arabic lucene.armenian lucene.basque lucene.bengali lucene.brazilian
lucene.bulgarian lucene.catalan lucene.cjk [1] lucene.czech lucene.danish
lucene.dutch lucene.english lucene.finnish lucene.french lucene.galician
lucene.german lucene.greek lucene.hindi lucene.hungarian lucene.indonesian
lucene.irish lucene.italian lucene.latvian lucene.lithuanian lucene.norwegian
lucene.persian lucene.portuguese lucene.romanian lucene.russian lucene.sorani
lucene.spanish lucene.swedish lucene.turkish lucene.thai  
[1]cjk means Chinese, Japanese, and Korean

Example

The following example index definition specifies an index on the sujet field using the french analyzer:

{
  "mappings": {
    "fields": {
      "sujet": {
        "type": "string",
        "analyzer": "lucene.french"
      }
    }
  }
}

Consider a collection named voitures with the following documents:

{ "_id": 1, "sujet": "Mieux équiper nos voitures pour comprendre les causes d'un accident." }
{ "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }

The following query uses the index on the sujet field:

db.voitures.aggregate([
  {
     $search: {
       "text": {
         "query": "pour",
         "path": "sujet"
        }
     }
  }
])

The above query returns no results when using the french analyzer, because pour is a built-in stop word. Using the standard analyzer, the same query would return both documents.

The following query searches for the string carburant in the sujet field:

db.voitures.aggregate([
  {
     $search: {
       "text": {
         "query": "carburant",
         "path": "sujet"
        }
     }
  }
])

The above query returns the document with "_id": 2 from the collection.

{ "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }