Navigation

Archive Cluster Data

Beta

Online archive is available as a Beta feature. The feature and the corresponding documentation may change at any time during the Beta stage.

Atlas offers a feature that moves infrequently accessed data from your Atlas cluster to a MongoDB-managed read-only Data Lake on a cloud object storage without user action. Once Atlas archives the data, you have a unified view of your Atlas and Online Archive data using a read-only Data Lake.

Atlas archives data based on the criteria you specify in an archiving rule. The criteria combines a date with a number of days. Atlas archives data when the current date exceeds the date plus the number of days specified in the archiving rule.

Cluster Requirements for Online Archive

Atlas offers this Online Archive feature only on M10 and greater clusters that run MongoDB 3.6 or later.

How Atlas Archives Data

To archive data:

  1. Atlas runs $match query every five minutes to determine the documents that match the criteria for archiving. Atlas refers to this query as a job.
  2. For documents that match the archival criteria, Atlas:
    1. Writes the first 2 GB of document data to the AWS S3 bucket as files of up to 100 MB each per job. This would be up to 20 files.
    2. Writes each subsequent quantity of document data (up to 2 GB) with each $match query run.

Atlas provides a unified endpoint through which you can query both your live cluster and archived data using the same database and collection name you use in your Atlas cluster. You can’t use the unified endpoint over a private connection such as Peering or AWS PrivateLink. You must use a standard internet connection over TLS.

If you activate Online Archive for an AWS cluster, the cloud object storage exists in the same region in AWS as your cluster. If you activate Online Archive for a GCP or Azure cluster, Online Archive creates the archive in the AWS region closest to your cluster’s primary based on a calculation of distance between the cluster and cloud object storage.

Important

Atlas encrypts your archived data using MongoDB’s AWS encryption keys. It can’t use any encryption-at-rest encryption keys you might have used on your cluster data.

Data Lake for Online Archive

When you configure your M10 or greater Atlas cluster for Online Archive, Atlas creates a read-only Data Lake, one per cluster, on a cloud object storage for your archived data.

Limitations

  • You can’t write to the Online Archive Data Lake.
  • You can’t configure or administer any Data Lake through the:
    • Atlas console,
    • Data Lake CLI, or
    • Data Lake API.

Viewing the Online Archive

To view your Data Lake for the Online Archive:

  1. Navigate to the Data Lake page in the Atlas console.
  2. Click Data Lake from the left navigation in your Project page.

Querying the Online Archive

To query your Online Archive data, use the connection string through the Data Lake Connect button to connect to the cloud object storage.

You can also query your Online Archive data with SQL. For more information, see Querying with SQL.

Deleting Online Archives

If you delete all the Online Archives, Atlas deletes the Data Lake. After deleting all the Online Archives, if you create an Online Archive with the same settings as a deleted Online Archive, Atlas creates a new Data Lake for the new Online Archive.

Online Archive Costs

Online Archive stores infrequently accessed data to lower the data storage costs on your Atlas cluster. You incur costs for storage on the cloud object storage and queries on archived data.

See also

To learn more about the storage and query costs, see Atlas pricing page.

Manage Your Online Archive

You can configure an Online Archive for a collection on your cluster through your Atlas UI and API. Once you create an Online Archive, you can view the list of archives, edit an archiving rule, pause archiving, and delete your Online Archive at any time. If you need to move archived data back to your cluster, you must use mongodump or mongoexport. The following pages describe how to: