Navigation

Query a Data Lake

Beta

The Atlas Data Lake is available as a Beta feature. The product and the corresponding documentation may change at any time during the Beta stage. For support, see Atlas Support.

About Atlas Data Lake

Using Atlas Data Lake to ingest your S3 data into Atlas clusters allows you to quickly query data stored in your AWS S3 buckets using the Mongo Shell, MongoDB Compass, and any MongoDB driver.

When you create a Data Lake, you will grant Atlas read only access to S3 buckets in your AWS account and create a data configuration file that maps data from your S3 buckets to your MongoDB databases and collections. Atlas supports using any M10+ cluster, including Global Clusters, to connect to Data Lakes in the same project.

Note

If you update your custom AWS role arn, you must update the AWS trust policy associated with the role. See the Configure a New Data Lake modal for instructions.

A MongoDB user must have one of the following roles to query an Atlas Data Lake:

To view, create, or modify any existing Data Lakes in an Atlas project, click Data Lake on the left hand navigation.

Data Lake supports the following data formats:

  • Avro
  • Parquet
  • JSON
  • JSON/Gzipped
  • BSON
  • CSV (requires header row)
  • TSV (requires header row)

Prerequisites

Verify that you meet the following prerequisites before you create a Data Lake:

  • One or more AWS S3 buckets in the same AWS account.
  • An AWS CLI configured to access your AWS account. Alternatively, you must have access to the AWS Management Console with permission to create IAM roles.