Navigation

Language Analyzers

Language-specific analyzers provide a convenient way to create indexes tailored to a particular language. Each language analyzer has built-in stopwords and word divisions based on that language's usage patterns.

Atlas Search offers the following language analyzers:

lucene.arabic
lucene.armenian
lucene.basque
lucene.bengali
lucene.brazilian
lucene.bulgarian
lucene.catalan
lucene.chinese
lucene.cjk 1
lucene.czech
lucene.danish
lucene.dutch
lucene.english
lucene.finnish
lucene.french
lucene.galician
lucene.german
lucene.greek
lucene.hindi
lucene.hungarian
lucene.indonesian
lucene.irish
lucene.italian
lucene.japanese
lucene.korean
lucene.kuromoji 2
lucene.latvian
lucene.lithuanian
lucene.morfologik 3
lucene.nori 4
lucene.norwegian
lucene.persian
lucene.portuguese
lucene.romanian
lucene.russian
lucene.smartcn 5
lucene.sorani
lucene.spanish
lucene.swedish
lucene.thai
lucene.turkish
lucene.ukrainian

1 cjk is a generic Chinese, Japanese, and Korean analyzer

2 kuromoji is a Japanese analyzer

3 morfologik is a Polish analyzer

4 nori is a Korean analyzer

5 smartcn is a Chinese analyzer

The following example index definition specifies an index on the sujet field using the french analyzer:

{
"mappings": {
"fields": {
"sujet": {
"type": "string",
"analyzer": "lucene.french"
}
}
}
}

Consider a collection named voitures with the following documents:

{ "_id": 1, "sujet": "Mieux équiper nos voitures pour comprendre les causes d'un accident." }
{ "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }

The following query uses the index on the sujet field:

db.voitures.aggregate([
{
$search: {
"text": {
"query": "pour",
"path": "sujet"
}
}
}
])

The above query returns no results when using the french analyzer, because pour is a built-in stop word. Using the standard analyzer, the same query would return both documents.

The following query searches for the string carburant in the sujet field:

db.voitures.aggregate([
{
$search: {
"text": {
"query": "carburant",
"path": "sujet"
}
}
}
])

The above query returns the document with "_id": 2 from the collection.

{ "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }
Give Feedback