Navigation

Reduce the Size of Large Documents

On this page

Overview

Storing large documents in your database can lead to excessive RAM and bandwidth usage. MongoDB keeps frequently accessed data, referred to as the working set, in RAM. When the working set grows beyond the RAM allotment, performance is degraded as data must be retrieved from disk instead.

If your most frequent queries are for documents that contain much more information than you need for that query, consider restructuring your schema with smaller documents using references to additional collections. By breaking up your data into more collections and using smaller documents for frequently accessed data, you reduce the overall size of the working set and improve performance.

Note

Your hardware configuration can affect the size of documents that your system can support. The BSON Document Size limit is 16 megabytes.

Example

Consider a movie catalog website that displays a list of the 50 most recently released movie titles and their poster images on the home page. From the home page, a user can click on a movie to see additional details.

The website stores information about movies in a movies collection. Each movie document contains all of the information available for that movie:

// movies collection

{
    "_id": 123,
    "title": "2001: A Space Odyssey",
    "poster": <url>,
    "director": "Stanley Kubrick",
    "release_year": 1968,
    "box_office_usd": 146000000,
    "countries_released": [
        "United States",
        ...
    ],
    "cast": [
        "Keir Dullea",
        ...
    ],
    "crew": [
         "Ray Lovejoy",
         ...
    ],
    ...

}

Note

Whenever possible, you should host images outside of your MongoDB deployment and reference them with URLs. If you store images in your database, you are much more likely to reach the document size limit.

In this example, the most frequent query the website performs is to find the 50 most recent movies’ title and poster. Instead of querying for all movie information, consider breaking up the movie collection into two separate collections, movies and movie_metadata. The collections are linked with the _id of movie documents:

// movies collection

{
    "_id": 123,
    "title": "2001: A Space Odyssey",
    "poster": <url>
}
// movie_metadata collection

{
    "_id": <object_id>,
    "movie_id": 123, // reference to a movies document
    "director": "Stanley Kubrick",
    "release_year": 1968,
    "box_office_usd": 146000000,
    "countries_released": [
        "United States",
        ...
    ],
    "cast": [
        "Keir Dullea",
        ...
    ],
    "crew": [
         "Ray Lovejoy",
         ...
    ],
    ...

}

This way, when the website queries for the 50 most recent movies and their posters, it only loads information that it needs. If a user clicks on a movie, the site performs another query to find the movie_metadata document associated with that movie. This new schema is more performant than the original because the most frequent query returns much smaller documents.

Consider your use case, especially the operations you most frequently perform, and design a schema that efficiently uses your working set.

Learn More

MongoDB.live 2020 Presentations

To learn how to incorporate the flexible data model into your schema, see the following presentations from MongoDB.live 2020: