Navigation

Language Analyzers

Language-specific analyzers provide a convenient way to create indexes tailored to a particular language. Each language analyzer has built-in stopwords and word divisions based on that language’s usage patterns.

Full Text Search offers analyzers for the following languages:

Arabic Armenian Basque Bengali Brazilian
Bulgarian Catalan CJK Czech Danish
Dutch English Finnish French Galician
German Greek Hindi Hungarian Indonesian
Irish Italian Latvian Lithuanian Norwegian
Persian Portuguese Romanian Russian Sorani
Spanish Swedish Turkish Thai  

Example

The following example index definition specifies an index on the sujet field using the french analyzer:

{
  "mappings": {
    "fields": {
      "sujet": {
        "type": "string",
        "analyzer": "lucene.french"
      }
    }
  }
}

Consider a collection named voitures with the following documents:

{ "_id": 1, "sujet": "Mieux équiper nos voitures pour comprendre les causes d'un accident." }
{ "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }

The following query uses the index on the sujet field:

db.voitures.aggregate([
  {
     $searchBeta: {
       "term": {
         "query": "pour",
         "path": "sujet"
        }
     }
  }
])

The above query returns no results when using the french analyzer, because pour is a built-in stop word. Using the standard analyzer, the same query would return both documents.

The following query searches for the string carburant in the sujet field:

db.voitures.aggregate([
  {
     $searchBeta: {
       "term": {
         "query": "carburant",
         "path": "sujet"
        }
     }
  }
])

The above query returns the document with "_id": 2 from the collection.

{ "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }