Cloudflare service outage June 12, 2025

June 15, 2025 2:17 am

On June 12, 2025, Cloudflare experienced a notable service outage that affected a broad range of its critical services, including Workers KV, WARP, Access, Gateway, Images, and Stream. This disruption lasted for two hours and twenty-eight minutes, impacting global customers reliant on these services. The root cause stemmed from a failure in the underlying storage infrastructure used by the Workers KV service, which is integral for various Cloudflare products that depend on configuration management, authentication, and asset delivery.

Workers KV forms a vital component of Cloudflare’s architecture, and its failure resulted in widespread consequences. The issue arose from a third-party cloud provider’s outage, which severely hindered the availability of Workers KV, leaving many services reliant on it unable to function properly. Despite significant efforts from Cloudflare, they acknowledged their responsibility in selecting dependencies and how they design around them. It’s essential to clarify that this incident was not the result of any cyberattack or security breach, and there was no data loss.

Cloudflare’s design philosophy emphasizes building services on their platform. Nevertheless, the reliance on Workers KV proved problematic during this outage, affecting several key services. The company provided a detailed breakdown of the impacted services, showcasing how each was affected, including key error rates and service disruptions.

Workers KV itself faced a staggering 90.22% failure rate for requests. Access, which utilizes Workers KV to store app and policy configurations, failed to authenticate 100% of identity-based logins. This failure extended to all application types, including Self-Hosted, SaaS, and Infrastructure applications, demonstrating the cascading impact of the outage.

Other services, such as Gateway, faced challenges too. While most DNS queries remained unaffected, authenticated queries reliant on identity information failed. Similarly, WARP clients were unable to connect due to authentication failures tied to Workers KV. The Dashboard also experienced significant disruptions, with logins largely unavailable as various services like Turnstile, Access, and others struggled to function.

Media services, including Cloudflare Images and Stream, were affected as well, with significant error rates reported during the incident. Video streaming services, in particular, saw an alarming 100% error rate during peak failure times, causing inconvenience for users worldwide.

Although many services were severely impaired, Cloudflare’s Magic Transit, Magic WAN, DNS, caching, and web application firewall (WAF) services were not directly impacted, highlighting the company’s efforts to build redundancies into critical infrastructure.

In response to the outage, Cloudflare swiftly transitioned into action. They initiated several protocols aimed at restoring affected services. Their team worked tirelessly to rectify the underlying issues, and by 8:28 PM UTC, all impacted services returned to their regular functioning state. However, the company recognized that this incident highlighted vulnerabilities in their design and architecture, prompting immediate reassessment and improvements.

Cloudflare’s approach to resolving this outage involves enhancing the redundancy of its storage infrastructure and mitigating the risks associated with third-party dependencies. Plans are underway to reduce reliance on external providers, particularly for critical systems like Workers KV. They are also focused on developing tools to manage service recovery more effectively during outages, ensuring continued service delivery even amidst disruptions.

Going forward, the company is prioritizing the resilience of its services. This includes granting more independence to Workers KV and reducing single points of failure, which are critical for ensuring uninterrupted service for their customers. Cloudflare’s commitment to enhancing the stability and reliability of its offerings demonstrates their responsibility towards users and partners who rely on their infrastructure to operate seamlessly.

As the dust settles from the June 12 outage, Cloudflare’s teams are engaged in ongoing evaluations and infrastructure changes, ensuring that lessons learned translate into lasting improvements. This proactive approach indicates their dedication to safeguarding against similar incidents in the future and reassures customers of their commitment to service excellence.

In summary, the Cloudflare service outage on June 12, 2025, serves as a stark reminder of the intricacies involved in managing cloud-based infrastructures. Despite the challenges faced, Cloudflare’s response and proactive measures signal their commitment to service reliability and customer trust, paving the way for a more resilient platform moving forward.

Source link