Navigation

Reduce Number of Collections

Overview

Collections are groupings of MongoDB documents, similar to an RDBMS table. A collection exists within a single database.

Even if a collection does not contain any documents, it still comes with a resource cost in the form of an undroppable default _id index. Although this index does not take up much space on its own (especially for small collections) if you have thousands of collections these indexes can add up in resources and strain your database allocation.

If your deployment contains unnecessary or an increasing number of collections, you should consider restructuring your data to reduce the number of collections and ultimately reduce the resource requirements of your application.

Example

Consider a temperatures database that stores collections of temperature readings obtained from a sensor. The sensor takes readings every half hour from 10 AM to 10 PM. Each day’s readings are stored in a separate collection, named by the reading date:

// temperatures.march-09-2020

{
  "_id": 1,
  "timestamp": "2020-03-09T010:00:00Z",
  "temperature": 29
}
{
  "_id": 2,
  "timestamp": "2020-03-09T010:30:00Z",
  "temperature": 30
}
...
{
  "_id": 25,
  "timestamp": "2020-03-09T022:00:00Z",
  "temperature": 26
}
// temperatures.march-10-2020

{
  "_id": 1,
  "timestamp": "2020-03-10T010:00:00Z",
  "temperature": 30
}
{
  "_id": 2,
  "timestamp": "2020-03-10T010:30:00Z",
  "temperature": 32
}
...
{
  "_id": 25,
  "timestamp": "2020-03-10T022:00:00Z",
  "temperature": 28
}

With each passing day, the number of collections in the database increments. Since the number of collections is unbounded, there is an ever-growing need from the database to maintain these collections and their corresponding indexes. If the database eventually reaches a point where it is managing thousands of collections and indexes, it may result in performance degradation.

Additionally, this approach does not easily facilitate queries across multiple days. To query data from multiple days to obtain temperature trends over longer periods of time, you would need to perform a $lookup operation, which is not as performant as querying data in the same collection.

Updated Schema

Instead, a better approach to structure this data is to store all temperature readings in a single collection, and have each day’s readings in a single document. Consider this updated schema, where all temperatures are in a single collection: temperatures.readings:

// temperatures.readings

{
  "_id": ISODate("2020-03-09"),
  "readings": [
    {
      "timestamp": "2020-03-09T010:00:00Z",
      "temperature": 29
    },
    {
      "timestamp": "2020-03-09T010:30:00Z",
      "temperature": 30
    },
    ...
    {
      "timestamp": "2020-03-09T022:00:00Z",
      "temperature": 26
    }
  ]
}
{
  "_id": ISODate("2020-03-10"),
  "readings": [
    {
      "timestamp": "2020-03-10T010:00:00Z",
      "temperature": 30
    },
    {
      "timestamp": "2020-03-10T010:30:00Z",
      "temperature": 32
    },
    ...
    {
      "timestamp": "2020-03-10T022:00:00Z",
      "temperature": 28
    }
  ]
}

This updated schema requires much fewer resources than the original schema. Now, rather than requiring an index for every single day that temperatures are read, the default _id index on this collection helps facilitate queries by date.

How to Check for Unnecessary Collections

Mongo Shell

To check the number of collections in your database, you can run the following command from the mongo shell:

db.getCollectionNames().length

The db.stats() method also returns the number of collections in your database, along with useful database statistics such as the total size of your data and indexes.

MongoDB Atlas

Data Explorer

The Atlas Data Explorer provides a high-level overview of collections in your databases. The Data Explorer shows the total size of a collection, including the size of a collection’s indexes. If the majority of a collection’s size is comprised of indexes, you can consider consolidating that collection’s data into another collection and dropping the original collection. Refer to the $merge documentation for an approach to merging data from one collection into another.

Additionally, if the Data Explorer reveals that you have empty collections, you can drop those collections directly from the Data Explorer.

Real-Time Performance Panel

The Atlas Real-Time Performance Panel shows which collections receive the most activity. You can use this tool to ensure that before you drop a collection, it is not being actively used by your application.

Learn More