On August 22, 2024, between 17:04 and 18:55 UTC, Atlassian customers using Jira, Jira Service Management, and Confluence experienced intermittent failures during login. Other affected features that faced intermittent failures are inviting teammates to Jira/Confluence, Jira Service Management helpseeker sign-up using email domain, and creating requests in the Jira portal.
The event was triggered by a faulty database configuration change, which caused approximately 25% of new login attempts to fail. The Automated Monitoring system detected the incident within five minutes and mitigated it by reverting the database configuration, which put Atlassian systems into a known-good state. The total time to resolution was about one hour and 51 minutes.
The overall impact was on August 22 2024, between 17:04 and 18:55 UTC on Jira, Jira Service Management, and Confluence products. The Incident caused service disruption to customers across all regions where the new logins experienced intermittent failures.
The issue was caused by a faulty database auto-scaling configuration change. As a result, 25% of the new Atlassian login attempts received HTTP 5xx errors.
We know that outages impact your productivity. While we have a number of testing and preventative processes in place, this specific issue wasn’t identified because the change was related to the system's traffic load. The change reduced the database's autoscaling capacity, which was sufficient for low load conditions, but as traffic increased, the capacity was not enough. The automated testing didn’t load the system enough to detect the issue.
We are prioritizing the following improvement actions to avoid repeating this type of incident:
Furthermore, we deploy our changes progressively to avoid broad impact. This works well for changes in code, but in this case, the change was to a global resource like database configuration where it gets deployed universally. To minimize the impact of changes to our environments, we will implement additional preventative measures such as:
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Thanks,
Atlassian Customer Support