Docs Menu

Archive Data

On this page

  • Cluster Requirements
  • Required Permissions
  • How Atlas Archives Data
  • Data Lake for Online Archive
  • Online Archive Costs
  • Manage Your Online Archive
Important
Serverless Instances are in Preview

Serverless instances are in preview and do not support this feature at this time. To learn more, see Serverless Instance Limitations.

Atlas moves infrequently accessed data from your Atlas cluster to a MongoDB-managed read-only Data Lake on a cloud object storage. Once Atlas archives the data, you have a unified view of your Atlas and Online Archive data in a read-only Data Lake.

Atlas archives data based on the criteria you specify in an archiving rule. The criteria can be one of the following:

  • A combination of a date field to archive data and number of days to keep data on the Atlas cluster. When the current date exceeds the value of the specified date field, Atlas subtracts the number of days from the current time and then archives data after the time.
  • A custom query. Atlas runs the query specified in the archiving rule to select the documents to archive.

Online Archive in Atlas is available only on M10 and greater clusters that run MongoDB 3.6 or later.

To create an Online Archive, you must have one of these roles:

To archive data:

  1. Atlas periodically runs a query to determine the documents that match the criteria for archiving. Atlas refers to this query as a job. Initially, Atlas runs the job every five minutes. If the size or number of documents to archive doesn't meet the threshold, Atlas expands the job interval by ten minutes, up to a maximum of twelve hours. If the job interval reaches the maximum or if either the size or number of documents to archive reaches the threshold, Atlas runs the job again and resets the job interval to five minutes.

    Note

    The threshold is 2GB per job.

  2. For documents that match the archival criteria, Atlas:

    1. Writes to up to a maximum of 10,000 partitions per archival job.
    2. Writes up to 2GB of document data to partitions on the cloud object storage for each unique combination of query field values except dates, which are grouped during each run to reduce the number of partitions.
    3. Writes each subsequent quantity of document data (up to 2 GB) with each query run.

Atlas provides a unified endpoint through which you can query both your live cluster and archived data using the same database and collection name you use in your Atlas cluster. You can't use the unified endpoint over a private connection such as Peering or AWS PrivateLink. You must use a standard internet connection over TLS.

If you activate Online Archive for an AWS cluster, the cloud object storage exists in the same region in AWS as your cluster. If you activate Online Archive for a GCP or Azure cluster, Online Archive creates the archive in the AWS region closest to your cluster's primary based on a calculation of distance between the cluster and cloud object storage.

Important

Atlas encrypts your archived data using Amazon's server-side encryption S3-managed keys (SSE-S3) for archived data. It can't use any encryption-at-rest encryption keys you might have used on your cluster data.

When you archive data, Atlas first copies the data to the cloud object storage and then deletes the data from your Atlas cluster. WiredTiger does not release the storage blocks of the deleted data back to the OS for performance reasons. However, Atlas eventually automatically reuses these storage blocks for new data. This helps the Atlas cluster to avoid fragmentation. To learn more about reclaiming the disk space, see How do I reclaim disk space in WiredTiger?.

When you configure your M10 or greater Atlas cluster for Online Archive, Atlas creates a read-only Data Lake, one per cluster, on a cloud object storage for your archived data.

  • You can't write to the Online Archive Data Lake.
  • You can't configure or administer the Online Archive Data Lake through the:

    • Atlas console,
    • Data Lake CLI, or
    • Data Lake API.

To view your Data Lake for the Online Archive:

  1. Navigate to the Data Lake page in the Atlas console.
  2. Click Data Lake from the left navigation in your Project page.

To query your Online Archive data, use the connection string through the Data Lake Connect button to connect to the cloud object storage.

You can also query your Online Archive data with SQL. For more information, see Querying with SQL.

If you delete all the Online Archives, Atlas deletes the Data Lake. After deleting all the Online Archives, if you create an Online Archive with the same settings as a deleted Online Archive, Atlas creates a new Data Lake for the new Online Archive.

Online Archive stores infrequently accessed data to lower the data storage costs on your Atlas cluster. You incur costs for the following:

  • storage on the cloud object storage,
  • data scanned for queries, and
  • data returned by queries.

To learn more about Data Lake costs, see the billing documentation for Atlas Data Lake.

You can configure an Online Archive for a collection on your cluster through your Atlas UI and API. Once you create an Online Archive, you can:

Give Feedback
© 2021 MongoDB, Inc.

About

  • Careers
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2021 MongoDB, Inc.