Navigation

Configure Online Archive

Beta

Online archive is available as a Beta feature. The feature and the corresponding documentation may change at any time during the Beta stage.

Overview

You can configure data in a collection to be archived by specifying an archiving rule. The archiving rule is a combination of a date, which is used to determine when to archive data, and number of days, which is used to determine how long to keep the data on the Atlas cluster. Data is archived when the current date is greater than the specified date plus the number of days.

To configure your Atlas cluster for online archive:

  1. Create an archiving rule by providing the collection namespace, an already indexed date field from the documents in the collection, and an age limit or number of days for the collection data on the Atlas cluster.
  2. (Optional) Specify commonly queried fields to partition archived data.

Configure Online Archive Through the UI

To configure an Online Archive, in your Atlas UI:

1
2
3

Click the Configure Online Archive button the first time and the Add Archive button subsequently to start configuring online archive for your collection.

4

Review the Online Archive Overview and click Next to proceed.

5

Create an Archiving Rule by providing the following information.

  1. Specify the collection namespace, which includes the database name, the dot (.) separator, and the collection name (that is, <database>.<collection>), in the Namespace field.

    You can’t modify the namespace once the online archive is created.

  2. Specify an already indexed date field from the documents in the collection in the Date field to archive on field.

    You can’t modify the date field once the online archive is created.

  3. Specify the number of days in the Age limit field.

Data is archived after the specified number of days after the specified date.

6

Click Next to specify the most commonly queried fields.

7

(Optional) Enter up to two most commonly queried fields from the collection in the Most commonly queried field and Second most commonly queried field fields respectively.

The specified fields are used to partition your archived data. Partitions are similar to folders. You can move the date field to the first position of the partition if you frequently query by the specified date field.

For example, suppose you are configuring the online archive for the movies collection in the sample_mflix database. If your archived field is the released date field, which you moved to the first position, your first queried field is title, and your second queried field is plot, your partition will look similar to the following:

/released/title/plot

Note

Data partitions based on the date field from the document are truncated to the day even if there are timestamps with seconds in the date field value.

While partitions improve query performance, queries that don’t contain these fields require a full collection scan of all archived documents, which will take longer and increase your costs. To learn more about how partitions improve your query performance in Data Lake, see Data Structure in S3.

8

Click Next to verify and confirm the online archive settings.

9

Copy and run the displayed query in your mongo shell to see the documents that match the criteria in the rule you defined in step 5.

You can run explain on the query to check whether it uses an index. Proceed to the next step to create the index if the fields are not indexed. If the fields are already indexed, skip to step 11.

10

(Optional) Copy and run the displayed query in your mongo shell to create the required index. This ensures that your data is indexed for optimal performance.

11

Verify and confirm your archiving rule.

  1. Click Begin Archiving in the Confirm an online archive tab.
  2. Click Confirm in the Begin Archiving window.

Note

Once your document is queued for archiving, you can no longer edit the document. You must use mongodump or mongoexport to move archived data back into the live Atlas cluster delete the archive to remove the data from the archive.

Configure Online Archive Through the API

To configure an online archive from the API, send a POST request to the onlineArchives endpoint. If an Active online archive already exists for the cluster for the same database and collection with the same archiving rule, the operation will fail. Alternatively, if the existing online archive is in Paused or Deleted state, the new online archive is created and its status is set to Active. To learn more about the API syntax and options, see Create an Online Archive.

Limitations

You can create up to 50 online archives per cluster and up to 20 can be active per cluster. The following limitations apply:

  • You can configure multiple online archives in the same namespace, but only one can be active at any given time.
  • You cannot create multiple online archives on the same fields in the same collection.