Navigation

Tutorial: Create and Query a FTS Index

Special Offer

Want to try out Full Text Search on a cluster of your own? Use credit activation code MONGODB4DOT2 for $200 of Atlas credit. For information on redeeming Atlas credit, see Atlas Billing.

This tutorial takes you through the steps of setting up and querying an Atlas Full Text Search index. You will use a collection with movie data from the Atlas sample data set.

Prerequisites

To complete this tutorial you will need:

  • An Atlas cluster of size M30 or larger.
  • MongoDB version 4.2 or higher.
  • The mongo shell on your local machine.

Procedure

1
2

Click the button for your cluster.

Locate the button next to the Collections button and click to reveal the dropdown menu.

3

Select Load Sample Dataset from the dropdown menu.

4

Click the Collections button for your cluster.

5

Select the sample_mflix database.

Locate the sample_mflix database in the list of namespaces and click it to reveal its collections.

6

Select the movies collection.

Select movies from the list of collections under the sample_mflix database.

7

Select the Full Text Search tab on the right-side panel.

8

Click the button labelled Create a FTS Index.

9

Enter an index definition in the modal window.

The movies collection is large, so in order to save space we will only index the title, genres, and plot fields.

Replace the default definition with the following:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "string",
        "analyzer": "lucene.standard",
        "multi": {
          "keywordAnalyzer": {
            "type": "string",
            "analyzer": "lucene.keyword"
          }
        }
      },
      "genres": {
        "type": "string",
        "analyzer": "lucene.standard"
      },
      "plot": {
        "type": "string",
        "analyzer": "lucene.standard"
      }
    }
  }
}

The above index definition specifies the standard analyzer as the default analyzer for all three indexed fields. It also specifies the keyword analyzer as an alternate analyzer for the title field, with the name keywordAnalyzer. The keyword analyzer indexes the entire field as a single term, so it only returns results if the search term and the specified field match exactly.

For more information about static and dynamic field mappings, see index definitions. For more information about multi analyzer designations, see Path Construction.

Leave the default index name default in place and click the Create Index button.

Note

If you use the name default for your index you do not need to specify it by name when querying against it. If you use a custom name you must add the "index": <index-name> parameter to all $searchBeta queries.

10

Wait for the index to finish building.

The index should take about one minute to build. While it is building, the Status column reads Build in Progress. When it is finished building, the Status column reads Active.

11

Connect to your cluster in the mongo shell.

Open the mongo shell in a terminal window and connect to your cluster. For detailed instructions on connecting, see Connect to a Cluster.

12

Use the sample_mflix database.

Run the following command at the mongo shell prompt:

use sample_mflix
13

Run your first Full Text Search query.

The following query searches for the word baseball in the plot field. It includes a $limit stage to limit the output to 5 results and a $project stage to exclude all fields except title and plot.

db.movies.aggregate([
  {
    $searchBeta: {
      "search": {
        "query": "baseball",
        "path": "plot"
      }
    }
  },
  {
    $limit: 5
  },
  {
    $project: {
      "_id": 0,
      "title": 1,
      "plot": 1
    }
  }
])

The above query returns the following results:

{ "plot" : "A trio of guys try and make up for missed opportunities
   in childhood by forming a three-player baseball team to compete
   against standard children baseball squads.", "title" : "The Benchwarmers" }
{ "plot" : "A young boy is bequeathed the ownership of a professional
   baseball team.", "title" : "Little Big League" }
{ "plot" : "A trained chimpanzee plays third base for a minor-league
   baseball team.", "title" : "Ed" }
{ "plot" : "The story of the life and career of the famed baseball
   player, Lou Gehrig.", "title" : "The Pride of the Yankees" }
{ "plot" : "Babe Ruth becomes a baseball legend but is unheroic to those
   who know him.", "title" : "The Babe" }

For more information about the $searchBeta pipeline stage, see its reference page. For complete aggregation pipeline documentation, see the MongoDB Server Manual.

14

Run a more complex query.

$searchBeta has several operators for constructing different types of queries. The following query uses the compound operator to combine several operators into a single query. It has the following search criteria:

  • The plot field must contain either Hawaii or Alaska.
  • The plot field must contain a four-digit number, such as a year.
  • The genres field must not contain either Comedy or Romance.
  • The title field must not contain Beach or Snow.
db.movies.aggregate([
  {
    $searchBeta: {
      "compound": {
        "must": [ {
          "search": {
             "query": ["Hawaii", "Alaska"],
             "path": "plot"
          },
        },
        {
          "term": {
             "query": "([0-9]{4})",
             "regex": true,
             "path": "plot"
          }
        } ],
        "mustNot": [ {
          "search": {
            "query": ["Comedy", "Romance"],
            "path": "genres"
          }
        },
        {
          "term": {
            "query": ["Beach", "Snow"],
            "path": "title"
          }
        } ]
      }
    }
  },
  {
    $project: {
      "title": 1,
      "plot": 1,
      "genres": 1,
      "_id": 0
    }
  }
])

The above query returns the following results:

{ "plot" : "A modern aircraft carrier is thrown back in time to 1941
  near Hawaii, just hours before the Japanese attack on Pearl Harbor.",
  "genres" : [ "Action", "Sci-Fi" ], "title" : "The Final Countdown" }
{ "plot" : "Follows John McCain's 2008 presidential campaign, from his
  selection of Alaska Governor Sarah Palin as his running mate to
  their ultimate defeat in the general election.",
  "genres" : [ "Biography", "Drama", "History" ], "title" : "Game Change" }
{ "plot" : "A devastating and heartrending take on grizzly bear
  activists Timothy Treadwell and Amie Huguenard, who were killed in
  October of 2003 while living among grizzlies in Alaska.",
  "genres" : [ "Documentary", "Biography" ], "title" : "Grizzly Man" }
{ "plot" : "Truman Korovin is a lonely, sharp-witted cab driver in
  Fairbanks, Alaska, 1980. The usual routine of picking up fares and
  spending his nights at his favorite bar, the Boatel, is
  disrupted ...", "genres" : [ "Drama" ], "title" : "Chronic Town" }
15

Run a query using a multi analyzer.

In the index definition you created in step 9, you specified that the title field should be able to use either the standard analyzer or the keyword analyzer for queries. The following query uses the alternate analyzer, named keywordAnalyzer, to search for exact matches on the string The Count of Monte Cristo.

db.movies.aggregate([
  {
    $searchBeta: {
      "search": {
        "query": "The Count of Monte Cristo",
        "path": { "value": "title", "multi": "keywordAnalyzer" }
      }
    }
  },
  {
    $project: {
      "title": 1,
      "year": 1,
      "_id": 0
    }
  }
])

The above query returns the following results:

{ "title" : "The Count of Monte Cristo", "year" : 1934 }
{ "title" : "The Count of Monte Cristo", "year" : 1954 }
{ "title" : "The Count of Monte Cristo", "year" : 1998 }

By contrast, the same query using the standard analyzer would find all the movies with the word Count or Monte or Cristo in the title.

Summary

In this tutorial you loaded a sample dataset into your Atlas cluster, created a Full Text Search index, and ran some example queries against it. More examples can be found througout the Full Text Search documentation.