Navigation

Test Failover

Replica set elections are necessary every time Atlas makes configuration changes as well as during failure scenarios. Configuration changes may occur as a result of patch updates or scaling events. As a result, you should write your applications to be capable of handling elections without any downtime.

You can use the following procedure to test the failure of a primary replica set member in your Atlas cluster and observe how your application responds to a replica set failover:

  1. Click Clusters.

  2. For the cluster you wish to perform failover testing, click on the button.

  3. Click Test Failover.

  4. Atlas displays a Test Failover modal with the steps Atlas will take to simulate a failover event. Click Restart Primary to begin the test. During this process:

    1. Atlas shuts down the current primary.

    2. The members of the replica set hold an election to choose which of the secondaries will become the new primary.

    3. Atlas brings the original primary back to the replica set as a secondary. When the old primary rejoins the replica set, it will sync with the new primary to catch up any writes that occurred during its downtime.

      Note

      If the original primary accepted write operations that had not been successfully replicated to the secondaries when the primary stepped down, the primary rolls back those write operations when it re-joins the replica set and begins synchronizing. For more information on rollbacks, see Rollbacks During Replica Set Failover.

      Contact MongoDB support for assistance with resolving rollbacks.

    4. Atlas notifies you in the Test Failover modal the results of your failover process.

If your application does not handle the failover gracefully, ensure the following:

  • The connection string includes all members of the replica set.
  • You are using the latest version of the driver.
  • You have implemented appropriate retry logic in your application.