Navigation

Run Queries Against Your Data Lake

Beta

The Atlas Data Lake is available as a Beta feature. The product and the corresponding documentation may change at any time during the Beta stage. For support, see Request Support.

Estimated completion time: 5 minutes

You can run operations using the MongoDB Query Language (MQL) which includes most, but not all standard server commands. In particular, the Atlas Data Lake is currently a read-only service. MQL operations can run in parallel to enhance performance even for large and complex queries. However, Data Lake is designed for analytics-type workloads and is not intended for day-to-day operational workloads. To learn which MQL operations are supported, see the MQL Support documentation.

Prerequisites

To complete this part of the tutorial, you will need to have completed:

You must be connected to your Data Lake with the Mongo shell before running the following queries.

Queries

Find instances in the weather collection where pressure is higher than 900 millibars. Sort by timestamp and limit the number of documents returned:

db.weather.find({"pressure": {$gt: 900}}).limit(5).sort({ "ts": 1})

Find AirBnB offerings in Porto with a high review score:

db.airbnb.find( { "address.market" : "Porto", "review_scores.review_scores_rating": {$gt: 79}})

Find properties in New York for less than $200 per night, and sort the returned documents by customer review rating:

db.airbnb.find({ "address.market" : "New York", "price": {$lt: NumberDecimal("200.00")} } ).sort({review_scores_rating: -1})

Find the average cost of accommodation in Porto by accommodation type:

db.airbnb.aggregate([{ $match: { "address.market" : "Porto" } },{ $group : {"_id" : "$property_type", avgPrice: {$avg: "$price"}}}])

Find the average price per night of an apartment in Sydney:

db.airbnb.aggregate([{ $match: { "address.market" : "Sydney", "property_type" : "Apartment" } }, { $group : {"_id" : "$property_type", avgPrice: {$avg: "$price"}}}])

Find the number of apartments available to rent in Barcelona.

db.airbnb.aggregate([{ $match: { "address.market" : "Barcelona" , "property_type": "Apartment"} },{ $count: "numApartments"}])

Summary

Congratulations! You just set up an Atlas Data Lake, created a database and collections from data stored in an S3 bucket, and queried the data using MQL commands.

For more information on Atlas Data Lake, see Query a Data Lake.

Screenshot of the Data Lake after running queries.

Note

When you dynamically generate collections from filenames, the number of collections is not accurately reported in the Data Lake view.