Tutorial: Create and Query an Atlas Search Index¶
This tutorial takes you through the steps of setting up and querying an Atlas Search index. You will use a collection with movie data from the Atlas sample data set.
Prerequisites¶
To complete this tutorial you will need:
- An Atlas cluster.
- MongoDB version 4.2 or higher.
- The
mongo
shell on your local machine.
Procedure¶
Navigate to your Atlas cluster.¶
In the Atlas UI, navigate to the Clusters page for your project.
Click the ... button for your cluster.¶
Locate the ... button next to the Collections button and click to reveal the dropdown menu.
Select Load Sample Dataset from the dropdown menu.¶
The sample dataset takes a few minutes to load. When it finishes, proceed to the next step.
Click the Collections button for your cluster.¶
Click the Search tab in the right-side panel.¶
Click the button labelled Create Search Index.¶
Select a Configuration Method and click Next.¶
- For a guided experience, select Visual Editor.
- To edit the raw index definition, select JSON Editor.
Specify the Database, Collection, and Index Name.¶
- In the Database field, specify
sample_mflix
. - In the Collection field, specify
movies
. - In the Index Name field, specify
default
.
If you use the name default
for your index you do not need
to specify it by name when querying against it. If you use a
custom name you must add the "index": <index-name>
parameter
to all $search queries.
Specify an index definition.¶
The movies
collection is large, so in order to save space
we will only index the title
, genres
, and plot
fields.
- Click Next.
- Click Refine Your Index.
- Change Dynamic Mapping to Off.
Add the following fields:
Field Name Dynamic Mapping Data Type Configuration genres
Change Enable Dynamic Mapping to Off. Click Add Data Type, and select String. plot
Change Enable Dynamic Mapping to Off. Click Add Data Type, and select String. title
Change Enable Dynamic Mapping to Off. - Click Add Data Type, and select String.
- Click Add Data Type, and select Multi.
- Specify
keywordAnalyzer
as the name of the Multi analyzer. - Change Index Analyzer to lucene.keyword.
- Click Save Changes.
The above index definition specifies the standard analyzer as the default analyzer for all three
indexed fields. It also specifies the keyword analyzer as an alternate analyzer for the title
field, with the name keywordAnalyzer
. The keyword analyzer
indexes the entire field as a single term, so it only returns
results if the search term and the specified field match exactly.
The index definition also specifies standard analyzer as the analyzer by default for queries
on the genres
field, which is an array of strings. For indexing
arrays, Atlas Search only requires the data type of the array elements. You
don't have to specify that the data is contained in an array in
the index definition.
For more information about static and dynamic field mappings, see
index definitions. For more
information about multi
analyzer designations, see
Path Construction.
Click Create Search Index.¶
Close the You're All Set! Modal Window.¶
A modal window appears to let you know your index is building. Click the Close button.
Wait for the index to finish building.¶
The index should take about one minute to build. While it is
building, the Status column reads Build in
Progress
. When it is finished building, the
Status column reads Active
.
Connect to your cluster in the mongo
shell.¶
Open the mongo
shell in a terminal window and
connect to your cluster. For detailed instructions on connecting,
see Connect to a Cluster.
Use the sample_mflix
database.¶
Run the following command at the mongo
shell prompt:
use sample_mflix
Run your first Atlas Search query.¶
The following query searches for the word baseball
in
the plot
field. It includes a $limit stage to limit the output
to 5 results and a $project stage to exclude all
fields except title
and plot
.
db.movies.aggregate([ { $search: { "text": { "query": "baseball", "path": "plot" } } }, { $limit: 5 }, { $project: { "_id": 0, "title": 1, "plot": 1 } } ])
The above query returns the following results:
{ "plot" : "A trio of guys try and make up for missed opportunities
in childhood by forming a three-player baseball team to compete
against standard children baseball squads.", "title" : "The Benchwarmers" } { "plot" : "A young boy is bequeathed the ownership of a
professional baseball team.", "title" : "Little Big League" } { "plot" : "A trained chimpanzee plays third base for a
minor-league baseball team.", "title" : "Ed" } { "plot" : "The story of the life and career of the famed baseball
player, Lou Gehrig.", "title" : "The Pride of the Yankees" } { "plot" : "Babe Ruth becomes a baseball legend but is unheroic to those who know him.", "title" : "The Babe" }
For more information about the $search pipeline stage, see its reference page. For complete aggregation pipeline documentation, see the MongoDB Server Manual.
Run a more complex query.¶
$search
has several operators for
constructing different types of queries. The following query uses the
compound operator to combine several operators
into a single query. It has the following search criteria:
- The
plot
field must contain eitherHawaii
orAlaska
. - The
plot
field must contain a four-digit number, such as a year. - The
genres
field must not contain eitherComedy
orRomance
. - The
title
field must not containBeach
orSnow
.
db.movies.aggregate([ { $search: { "compound": { "must": [ { "text": { "query": ["Hawaii", "Alaska"], "path": "plot" }, }, { "regex": { "query": "([0-9]{4})", "path": "plot", "allowAnalyzedField": true } } ], "mustNot": [ { "text": { "query": ["Comedy", "Romance"], "path": "genres" } }, { "text": { "query": ["Beach", "Snow"], "path": "title" } } ] } } }, { $project: { "title": 1, "plot": 1, "genres": 1, "_id": 0 } } ])
The above query returns the following results:
{ "plot" : "A modern aircraft carrier is thrown back in time to
1941 near Hawaii, just hours before the Japanese attack on Pearl
Harbor.", "genres" : [ "Action", "Sci-Fi" ], "title" : "The Final Countdown" } { "plot" : "Follows John McCain's 2008 presidential campaign, from
his selection of Alaska Governor Sarah Palin as his running mate
to their ultimate defeat in the general election.", "genres" : [ "Biography", "Drama", "History" ], "title" : "Game Change" } { "plot" : "A devastating and heartrending take on grizzly bear
activists Timothy Treadwell and Amie Huguenard, who were killed
in October of 2003 while living among grizzlies in Alaska.", "genres" : [ "Documentary", "Biography" ], "title" : "Grizzly Man" } { "plot" : "Truman Korovin is a lonely, sharp-witted cab driver in
Fairbanks, Alaska, 1980. The usual routine of picking up fares
and spending his nights at his favorite bar, the Boatel, is
disrupted ...", "genres" : [ "Drama" ], "title" : "Chronic Town" }
Run a query using a multi
analyzer.¶
In the index definition you created in step 9, you specified that the
title
field should be able to use either the standard analyzer or the keyword analyzer for queries. The following query
uses the alternate analyzer, named keywordAnalyzer
, to search
for exact matches on the string The Count of Monte Cristo
.
db.movies.aggregate([ { $search: { "text": { "query": "The Count of Monte Cristo", "path": { "value": "title", "multi": "keywordAnalyzer" } } } }, { $project: { "title": 1, "year": 1, "_id": 0 } } ])
The above query returns the following results:
{ "title" : "The Count of Monte Cristo", "year" : 1934 } { "title" : "The Count of Monte Cristo", "year" : 1954 } { "title" : "The Count of Monte Cristo", "year" : 1998 }
By contrast, the same query using the standard
analyzer would
find all the movies with the word Count
or Monte
or
Cristo
in the title.
Summary¶
In this tutorial you loaded a sample dataset into your Atlas cluster, created an Atlas Search index, and ran some example queries against it. More examples can be found througout the Atlas Search documentation.