For years, OpenAI built the world’s most powerful AI models almost entirely on hardware it didn’t make. Every ChatGPT response, every Codex suggestion, every API call ran through chips designed by Nvidia — powerful, expensive, and in constant short supply. That changed today. On June 24, 2026, OpenAI and Broadcom officially unveiled Jalapeño, OpenAI’s first custom “Intelligence Processor,” purpose-built from scratch for the unique demands of large language model inference. This isn’t a minor upgrade. It’s a strategic pivot that puts OpenAI in the same infrastructure league as Google, Amazon, and Microsoft — companies that have quietly controlled the AI economy not just through software, but through silicon.
If you’ve seen the announcement trending on X and want to understand what it actually means — what the chip does, why it matters, and how it changes the competitive landscape — this is your complete guide.
We’ve designed and built our first AI chip: Jalapeño.
— OpenAI (@OpenAI) June 24, 2026
Designed from the ground up by OpenAI and brought to production with @Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products.
Chips are foundational to the AI… pic.twitter.com/mHU7DaMMTi
What Is the OpenAI Jalapeño Chip?
The OpenAI Jalapeño chip is a custom ASIC (Application-Specific Integrated Circuit) designed in collaboration with Broadcom exclusively for LLM inference — the process of running AI models in response to user queries. Developed in a record nine months, Jalapeño is the first step in a multi-generation compute platform that OpenAI plans to deploy at gigawatt scale in partnership with Microsoft and other data center operators beginning in late 2026.
What Is “LLM Inference” and Why Does It Need Its Own Chip?

Before diving into Jalapeño’s specs, it’s worth understanding the problem it was built to solve.
Training vs. inference: two completely different workloads
Most people picture Artificial Intelligence “computing” as one giant process. In practice, it splits into two very different phases:
Training is the process of teaching a model — feeding it billions of tokens, adjusting hundreds of billions of parameters, running massive gradient calculations across clusters of thousands of chips simultaneously. It’s computationally brutal, runs for weeks or months, and Nvidia’s H100 and H200 GPUs remain the dominant hardware here.
Inference is what happens every time a user sends a message to ChatGPT, runs a Codex query, or hits OpenAI’s API. The model is already trained — inference is simply running it in response to real users, in real time, at enormous scale. Every single ChatGPT conversation that happens today is an inference workload.
Why inference is where the real cost lives
OpenAI reportedly serves hundreds of millions of users. Every message, every API call, every agent loop — all inference. At that scale, even small inefficiencies in how a chip handles a specific type of workload translate into hundreds of millions of dollars in unnecessary compute costs.
General-purpose GPUs like Nvidia’s are exceptional but expensive, power-hungry, and designed to handle many types of workloads equally well. That generality is their strength — and their inefficiency for a company that runs one type of workload at a billion-user scale.
This is the problem Jalapeño was built to solve.
How OpenAI Built Its First Custom AI Chip in Just 9 Months
This is where the Jalapeño story gets genuinely remarkable, regardless of what you think about OpenAI as a company.
The fastest ASIC development cycle in semiconductor history
Designing a custom ASIC from scratch to manufacturing tape-out typically takes 18 to 24 months — and that’s for experienced teams at established chipmakers. The Jalapeño program went from initial design to tape-out in nine months. Broadcom and OpenAI describe this as potentially the fastest high-performance ASIC development cycle ever achieved in the advanced semiconductor industry.
Three things made this speed possible:
1. Deep software-hardware co-design. OpenAI didn’t hand Broadcom a specification and wait. OpenAI’s engineering teams were embedded throughout the process — sharing real kernel behavior, actual memory access patterns, and live serving data from production systems. The chip was shaped around how OpenAI’s models actually run, not how a textbook says they should run.
2. Broadcom’s silicon implementation expertise. Broadcom has built custom AI accelerators for Google (TPUs), Meta, and others. They maintain a substantial library of reusable logic that can be incorporated into new designs without being built from scratch — compressing the timeline significantly.
3. OpenAI’s own models accelerated the design process. OpenAI directly used its AI models to speed up parts of the chip design and optimization work. In other words, the same models that Jalapeño will run helped design Jalapeño — a genuinely recursive capability that will likely become a structural advantage for every future generation.
Who else was involved?
Beyond OpenAI and Broadcom, Celestica was brought in to handle board and rack system integration, high-performance networking, and scalable production systems. The chip itself was physically delivered to OpenAI CEO Sam Altman and President Greg Brockman by Broadcom CEO Hock Tan and Semiconductor Solutions President Charlie Kawwas — a ceremonial handover that reflects the strategic weight of the partnership.
OpenAI Jalapeño Chip: Architecture, Design, and What Sets It Apart

The Jalapeño chip is purpose-built for LLM inference by optimizing the specific bottlenecks that matter at frontier scale — data movement, memory-compute balance, networking latency, and serving efficiency — close to the hardware’s theoretical limits. It is not a repurposed training accelerator or a general-purpose AI processor.
This is the key architectural principle behind Jalapeño, and it’s worth unpacking what it means in practice.
ASIC vs. GPU: the core tradeoff
| ASIC (Jalapeño) | GPU (Nvidia H100/H200) | |
|---|---|---|
| Flexibility | Low — designed for one workload | High — runs many workload types |
| Cost | Lower (per inference unit) | Higher |
| Performance per watt | Higher for target workload | Lower for specialized tasks |
| Development cycle | Longer, higher upfront investment | Available off-the-shelf |
| Best use case | Higher for the target workload | Training + R&D + varied workloads |
Jalapeño accepts this tradeoff deliberately. OpenAI knows its inference workload in extraordinary detail — the kernel patterns, memory movement, networking behavior, and serving patterns are all visible internally. Designing silicon around that known workload, rather than around every possible workload, is what enables the performance-per-watt advantage OpenAI describes.
What “reticle-sized” means and why it matters
Multiple technical reports describe Jalapeño as a “reticle-sized” chip — meaning it uses the maximum die area that semiconductor lithography equipment can expose in a single shot. Larger die = more logic, more memory bandwidth, fewer bottlenecks. This is a design choice that signals OpenAI and Broadcom are going for maximum performance, not minimum cost, on this first generation.
Inference efficiency over raw throughput
OpenAI’s Richard Ho, who leads the hardware program, stated that Jalapeño was designed to “efficiently execute our most important workloads close to the hardware’s theoretical limits.” Early testing shows performance per watt substantially better than current state-of-the-art, though OpenAI notes final performance benchmarks are still being measured.
Why Did OpenAI Build Its Own Chip? The Strategic Logic
Several trends converged to make this decision not just attractive but necessary.
The “compute is never enough” problem
OpenAI President Greg Brockman stated it plainly:
“We cannot get computers fast enough.” Broadcom CEO Hock Tan echoed this, saying demand from his company’s top AI customers is “simply insatiable” — and he’s not just talking about 2026. He sees elevated demand extending into 2028 and beyond.
When your product’s biggest constraint is chip supply controlled by a single company (Nvidia), and that supply is also being competed for by Google, Meta, Amazon, Microsoft, and every AI startup simultaneously, building your own source becomes a strategic imperative — not just a cost optimization.
Reducing the “Nvidia premium”
Nobody wants to be beholden to a single chip supplier — a sentiment Ben Barringer of Quilter Cheviot summarized succinctly in a CNN interview: “Nobody wants to be beholden to Nvidia.” Google built TPUs. Amazon built Trainium and Inferentia. Microsoft invested billions in OpenAI while simultaneously developing its own silicon. Now OpenAI closes that gap itself.
Full-stack control = full-stack optimization
OpenAI’s own announcement framed the goal clearly: by designing more of the stack themselves — chip architecture, kernels, memory systems, networking, deployment, product — every layer can be optimized around a single goal. When one team understands both the chip behavior and the model behavior, optimizations that cross the hardware/software boundary become possible that no external supplier relationship could enable.
This mirrors the exact playbook Apple executed with its M-series chips — control the hardware, unlock performance and efficiency that commodity chips can’t match.
IPO economics
OpenAI is preparing for an IPO that analysts have valued in the $1 trillion range. At that valuation, the company needs to demonstrate that it can generate and keep revenue at scale — not pay it all upstream to Nvidia. Custom silicon that cuts inference costs materially is a direct line to improved margins, which is a direct line to a credible public market valuation.
Deployment Plans: Where and When Will Jalapeño Run?
Jalapeño is already in engineering trials at OpenAI’s labs. The public deployment timeline:
- Late 2026: Initial “small prototype development” deployment in production systems, per Broadcom CEO Hock Tan
- 2026–2027: Scaling from prototype to broader deployment across OpenAI’s infrastructure
- Multi-year roadmap: This is explicitly framed as the first chip in a multi-generation platform — not a one-off project
Microsoft’s role

Microsoft is expected to be Jalapeño’s largest single customer, with reports suggesting Broadcom required Microsoft to guarantee purchasing 40% of the first production run before committing to the initial phase. This makes sense: Microsoft is both OpenAI’s largest investor and its primary cloud infrastructure partner, and Jalapeño would power OpenAI’s models running on Azure.
Broadcom CEO Hock Tan described the deployment as targeting “gigawatt-scale data centers” — a reference to the raw power consumption of the facilities that will house these chips, which puts the scale in physical terms that are hard to ignore.
What workloads does it target?
Jalapeño is specifically named as the chip powering ChatGPT, Codex, the OpenAI API, and future agentic products. The explicit call-out of agentic products is significant — agents that run continuously, make tool calls, and loop through multi-step reasoning are inference-heavy by design, and they represent the fastest-growing segment of OpenAI’s product roadmap.
Training workloads will likely remain on Nvidia hardware for the foreseeable future. This is a deliberate division: Jalapeño is optimized for inference at scale, not for the exploratory, high-variance compute patterns of model training.
What Jalapeño Means for the AI Industry
OpenAI joins the full-stack club
Before today, the full-stack AI companies — organizations that control chips, infrastructure, models, and products — were Google, Amazon, and Microsoft. Each spent years building custom silicon (TPUs, Trainium, Azure Maia) before it became the norm. OpenAI, despite being arguably the world’s most influential AI company, was running entirely on rented compute. Jalapeño changes that.
Implications for Nvidia
NVIDIA remains indispensable for AI training, and that won’t change with Jalapeño. But inference is a massive and growing market, and every chip OpenAI uses for inference is an Nvidia chip it doesn’t buy. As more frontier labs follow this path, the inference segment of the GPU market — historically a major revenue source — faces increasing competition from custom silicon. Broadcom’s shares climbed on the announcement; Nvidia’s relationship to this trend is more nuanced.
The AI-designed-chip loop
Perhaps the most strategically important detail in the Jalapeño announcement is this: OpenAI used its own AI models to accelerate parts of the chip’s design and optimization. This isn’t science fiction — it’s a working demonstration that AI can compress the most expensive, time-consuming parts of semiconductor development. If AI models can design better chips faster, and those chips make AI models cheaper to run, that’s a compounding feedback loop. Every generation of Jalapeño could be designed faster and more efficiently than the last.
Conclusion
The OpenAI Jalapeño chip is more than a product launch. It’s a declaration that OpenAI intends to control its own destiny at every layer of the AI stack — not just the model that thinks, but the silicon that computes, the data centers that house it, and the infrastructure that serves it to users. The nine-month development sprint, the recursive use of AI to design AI hardware, the gigawatt-scale deployment with Microsoft, and the explicit multi-generation roadmap all point in the same direction: OpenAI is building for a decade of scale, not a quarterly news cycle.
For anyone tracking the AI industry, the Jalapeño announcement marks a clear before and after. The era of every frontier AI lab running on commodity chips is ending. What comes next — the arms race for efficient, purpose-built inference silicon — is only just beginning.
FAQs
What is the OpenAI Jalapeño chip?
Jalapeño is OpenAI’s first custom AI accelerator, developed with Broadcom, designed specifically for running large language models (inference) rather than training them. It’s the first chip in a planned multi-generation compute platform.
What does the Jalapeño chip power?
It is designed to power ChatGPT, OpenAI’s Codex coding agent, the OpenAI API, and future agentic AI products.
How long did it take to build Jalapeño?
Nine months from initial design to manufacturing tape-out — described as the fastest high-performance ASIC development cycle ever achieved in advanced semiconductors.
Is Jalapeño replacing Nvidia chips?
For inference workloads specifically, Jalapeño is intended to reduce OpenAI’s reliance on Nvidia. Training workloads will likely continue to use Nvidia hardware. It replaces Nvidia at the inference layer, not across all of OpenAI’s compute.
When will Jalapeño be deployed?
Initial deployment in production systems is targeted for late 2026, with expansion through 2027 and beyond as part of a multi-generation platform.
Who else is involved in the Jalapeño project?
Broadcom is the primary semiconductor partner handling chip implementation, while Celestica is handling board, rack, and networking system integration. Microsoft is expected to be the largest customer for Jalapeño chips.

