Navigation

Create One Data Lake

Beta

The Atlas Data Lake is available as a Beta feature. The product and the corresponding documentation may change at any time during the Beta stage. For support, see Atlas Support.

Note

Groups and projects are synonymous terms. Your {GROUP-ID} is the same as your project ID. For existing groups, your group/project ID remains the same. The resource and corresponding endpoints use the term groups.

The Atlas API uses HTTP Digest Authentication. Provide a programmatic API public key and corresponding private key as the username and password when constructing the HTTP request.

For complete documentation on configuring API access for an Atlas project, see Configure Atlas API Access.

Base URL: https://cloud.mongodb.com/api/atlas/v1.0

Use this endpoint to create a specific Atlas Data Lake associated to an Atlas project.

Syntax

POST /groups/{GROUP-ID}/dataLakes

Request Path Parameters

Path Element Required/Optional Description
GROUP-ID Required. The unique identifier for the project.

Request Query Parameters

The following query parameters are optional:

Query Parameter Type Description Default
pretty boolean Displays response in a prettyprint format. false
envelope boolean Specifies whether or not to wrap the response in an envelope. false

Request Body Parameters

Field Required/Optional Description
name Required Name of the Atlas Data Lake.

Response

Name Type Description
cloudProviderConfig object Configuration information related to the cloud service where Atlas Data Lake source data is stored.
cloudProviderConfig.<provider> object

Name of the provider of the cloud service where Data Lake can access the S3 Bucket data stores.

Data Lake only supports aws.

cloudProviderConfig.aws. externalId string

Unique identifier generated by Atlas and associated to the created Data Lake.

Atlas requires an IAM Role with read-only access to the S3 buckets that you will associate with the data store. You must specify this value as the sts.ExternalId when defining that role’s trust policy.

Important

Atlas displays this value only once as part of the response body. You cannot retrieve this value outside of the response body.

cloudProviderConfig.aws. iamAssumedRoleARN string

Amazon Resource Name (ARN) of the IAM Role that Data Lake assumes when accessing the AWS S3 bucket associated with the data store.

The initial state of iamAssumedRoleARN is null. After creating the required IAM Role in AWS, specify it to this field when performing the update operation.

The IAM Role must support the following actions against each S3 bucket:

  • s3:GetObject
  • s3:ListBucket
  • s3:GetObjectVersion

For more information on S3 actions, see Actions, Resources, and Condition Keys for Amazon S3.

cloudProviderConfig.aws. iamUserARN string

The Amazon Resource Name (ARN) of an Atlas IAM user associated with the project. Data Lake assumes the IAM role specified using the cloudProviderConfig.aws.iamAssumedRoleARN with this user to access the data store S3 bucket.

You must specify the iamUserARN as part of the Principal.aws field when defining the iamAssumedRoleARN role’s trust policy.

Important

Atlas displays this value only once as part of the response body. You cannot retrieve this value outside of the response body.

dataProcessRegion Optional

The cloud provider region to which Atlas Data Lake routes client connections for data processing.

The default value null directs Atlas Data Lake client connections to the region nearest to the client based on DNS resolution.

Use the update endpoint to update the data store configuration with a specific dataProcessRegion.

groupId string The unique identifier for the project.
hostnames array The list of hostnames assigend to the Atlas Data Lake. Each string in the array is a hostname assigned to the Atlas Data Lake.
name string Name of the Atlas Data Lake.
state string

Current state of the Atlas Data Lake. The intial state after creation is always UNVERIFIED.

Use the update endpoint to update the data store configuration with the required settings. For cloudProviderConfig.aws, this requires setting the cloudProviderConfig.aws.iamAssumedRoleARN to an IAM role that grants access to the S3 buckets associated with any data stores.

storage object Configuration details for each data store and its mapping to MongoDB database(s) and collection(s).
storage.databases object

Mapping configuration for the data store.

The initial state of this field is an empty document {}.

storage.stores array

Each object in the array represents a storage resource associated with the data store.

The initial state of this field is an empty array [].

Example

Request

curl -u "{PUBLIC-KEY}:{PRIVATE-KEY}" --digest \
 --header "Accept: application/json" \
 --header "Content-Type: application/json" \
 --request POST "https://cloud.mongodb.com/api/atlas/v1.0/groups/{GROUP-ID}/dataLakes?pretty=true" \
 --data '{ "name" : "UserMetricData" }'

Response

{
  "cloudProviderConfig": {
    "aws": {
      "externalId" : "12a3bc45-de6f-7890-12gh-3i45jklm6n7o",
      "iamAssumedRoleARN": null
      "iamUserARN": "arn:aws:iam::1234567890123:user/queryengine"
    }
  },
  "dataProcessRegion": null,
  "groupId": "1ab23c4567def890gh12ij34",
  "hostnames": [
    "hardwaremetricdata.mongodb.example.net"
  ],
  "name": "UserMetricData",
  "state": "UNVERIFIED",
  "storage": {
    "databases": {},
    "stores": []
  }
}