Message Loss
Incident Report for mailgun
Resolved
Overview

On 3/23/16 between roughly 17:40 - 17:47 UTC, our monitoring system alerted us that Mailgun was not delivering any messages. We quickly discovered that delivery failure linked to a recent deployment of our API and we immediately rolled back those changes.

Details

The changes triggered a bug in our code where we accepted a message but did not store the message within our message store. Due to this, messages that we accepted were lost and we are unable to retrieve or deliver them. We lost roughly 8,000 inbound messages and 129,000 outbound messages. While we gathered data to the extent of the message loss these messages remained within our queue which caused some delays until 21:30 UTC.

Mitigation

To prevent this from happening again, we've implemented additional failsafes to our code to ensure that if we acknowledge having accepted a message, that message has successfully been stored without any errors.

We'll be reaching out to all customers affected by this outage with details of the messages lost and how to obtain a credit towards your invoice.

We're very sorry about this incident and for any inconvenience this may have caused, if you have any questions or concerns, please reach out to our support team.
Posted Mar 23, 2016 - 16:00 PDT