Amazon Tightens Code Checks After AI Outages

Amazon is reinforcing code review processes after AI tools linked to outages.
Several incidents, including 120k lost orders, stemmed from AI coding tool 'Q'.
New rules mandate dual approvals & audits for critical systems to improve stability.

Summarized by AI ⓘ

Mastering AI

SEE ALL

NewsBytes

YouTube's deepfake detection tool now lets politicians flag fakes

NewsBytes

How AI tools are making language learning interesting!

NewsBytes

AI can find bugs in decades-old code, but there's a catch

What is the story about?

Discover how Amazon is fortifying its systems with enhanced code reviews after AI tools contributed to critical outages, ensuring a more stable experience for millions.

Navigating AI's Coding Double Edge

Amazon is reportedly reinforcing its internal safeguards for software development following a pattern of service interruptions that have impacted its e-commerce

operations. One significant incident is understood to be connected to the use of its AI coding assistant, named Q. Dave Treadwell, Amazon's SVP of e-commerce services, indicated the company is introducing immediate safety measures that will create deliberate 'friction' in the deployment process for crucial parts of the retail experience. Simultaneously, efforts are underway to develop more permanent solutions, incorporating both predictable and adaptive safety mechanisms. This comes after a series of 'trend of incidents' observed since the third quarter of 2025, including several major disruptions in the weeks preceding an internal meeting where these issues were discussed. The heightened controls are expected to mandate more detailed documentation for code modifications and require additional layers of approval, alongside the implementation of new safeguards within the code-change review procedures. This situation highlights a broader challenge in contemporary software engineering, where developers face pressure to leverage AI tools for code generation without adequate systems for reviewing and validating the output, particularly as large tech firms look to AI for cost efficiencies.

The Ripple Effect of Disruptions

The e-commerce giant experienced significant service disruptions, particularly affecting its US operations, in the lead-up to a critical internal review. A report from the Financial Times indicated that 'novel GenAI usage for which best practices and safeguards are not yet fully established' was a contributing factor to these incidents. Furthermore, an Amazon Web Services (AWS) outage in December 2025, which lasted for 13 hours, has also been linked to its AI coding tool, identified internally as Kiro. These challenges underscore a growing tension in the software development landscape, where the drive to utilize AI-generated code is outpacing the establishment of robust review and validation processes. This is occurring against a backdrop of widespread layoffs in the tech industry, as companies seek to harness AI for operational cost savings. Amazon itself announced the elimination of over 16,000 corporate positions in January 2026. While the company denies any direct link between these workforce reductions and the increase in service disruptions, engineers have reportedly noted an escalation in 'Sev2s' – critical incidents requiring immediate attention to avert outages. In contrast, a spokesperson for Amazon characterized the internal meeting as part of a routine weekly review, focused on website and app availability for continuous improvement, and denied AWS involvement in recent outages, attributing only one incident to AI.

Specific Outage Incidents

Several distinct incidents have underscored the need for improved oversight. On March 2, 2026, Amazon's e-commerce platform and app experienced a nearly six-hour outage that led to incorrect delivery estimates for users, transaction failures, and difficulties accessing account details and product pricing. This event resulted in an estimated 120,000 lost orders and approximately 1.6 million website errors. Amazon's internal assessment identified its AI coding tool, 'Q', as a primary contributor to this disruption. Just a few days later, on March 5, 2026, the marketplace suffered another outage, causing a dramatic 99% drop in orders within the US and a loss of roughly 6.3 million orders. This incident was attributed to a production change that was deployed without the necessary authorization. In a separate matter, Amazon's cloud computing division, AWS, reportedly faced at least two outages linked to AI coding assistants. One notable disruption lasted 13 hours and affected a cost calculator tool for customers. This issue was traced back to modifications made by engineers utilizing Amazon's internal Kiro AI coding tool, which, in one instance, reportedly resulted in the 'delete and recreate the environment' action.

New Code Verification Protocols

In response to these challenges, Amazon is implementing a new policy requiring engineers to undergo a more rigorous code review process. Under this updated framework, developers must secure approval from two colleagues before implementing any coding changes. Additionally, they are mandated to utilize an internal tool for documentation and approvals, and a system that automatically verifies adherence to Amazon's core reliability engineering standards. The company has also directed the individuals responsible for 335 'Tier-1' systems – those directly impacting consumer services – along with their management chain up to the VP level, to conduct audits of all production code change activities within their respective departments. This comprehensive approach aims to embed stricter quality control and accountability throughout the software development lifecycle, thereby minimizing the risk of errors and ensuring greater system stability.