Once composed of a single piece of silicon, computer and server processors will increasingly become compositions of logic blocks glued or stacked together. Focus on what is not a trend, but a prerequisite for the chip industry’s continued race for performance.
The new Meteor Lake chip, which Intel has just presented in detail at its Innovation show, is unlike any other chip the American giant has ever launched. This processor, which will start appearing in notebook PCs on December 14 under the name Core Ultra , breaks with the company’s monolithic tradition. You don’t need a microscope to see that the chip is a “collage” of several chip “pieces”. If we were to make a wine analogy, the giant Intel, which until now only produced single-vineyard Burgundies, has converted to blending to become a Bordeaux producer!
We could be content to say that the arrival of new CEO Pat Gelsinger in 2021 was the beginning of many changes at Intel, which, with its IDM 2.0 strategy, is making its revolution in several of its branches. This would be to overlook the fact that AMD has been playing Lego for some time now, that Apple has already glued M1 Maxes together to make M1 Ultra. Or that many rumors point to Nvidia’s forthcoming conversion. So the question is less why Intel has changed, but rather what is forcing high-power chip producers to design chips like sets of bricks?
Foveros and Chiplets: optimizing costs to the max
The first and foremost reason, as we mentioned when the Meteor Lake chip was unveiled, is cost. The fact that AMD and Intel now design their chips à la carte is first and foremost a way of saving money. Why etch the IO die in 4 nm if you can make do with 7-8 nm? While the right engraving finesse is an important factor, there is also, and above all, an industrial logic behind it, based on yields. And to understand this logic, we need to delve into the wonderful world of semiconductor production. Semiconductors start life on a wafer, that primordial wafer of silicon 300 mm in diameter and a few microns thick.
After numerous chemical baths coupled with multiple laser exposures to the scanners that print the circuits, the electronic components are cut out and tested. Tested indeed, because during the actual design of the chips, numerous defects can appear on the surface of the wafer, often rendering the affected parts inoperable. Imagine a wafer with 10 errors scattered across its surface. If you cut large 3×4 cm chips, you end up with a very high waste rate. Take the same wafer and cut 8×6 mm blocks, and your yield rate explodes. This difference in yield largely explains why the larger the components, the more exponentially expensive they become.
If assembling chips from pieces costs money as well as technology (see below), then this mechanism can lower the final price of a compound chip. In addition to this potential gain in money based on pure yields, there’s also the ability to limit the start-up costs of new production nodes. Indeed, the production of a new component takes time to calibrate machines and processes, an expensive start-up period. Maximizing yields when introducing a new process can significantly reduce start-up costs. Once the process has been mastered and peak yields achieved with small processors (or chips), it’s much easier (and cheaper!) to apply the same recipe to larger components, which naturally have a higher waste rate.
We’ve been seeing this for years with smartphone chips: with their delirious volumes, higher margins and smaller chips, our smartphones benefit from the thinnest etch nodes. The launch in early September of the iPhone 15 Pro and its 3 nm A17 Pro chip is a reminder of this – it’s the first mass-produced chip in this etch finesse. And it’s good to see that the incredible production volumes of mobile SoC sales are in part financing the computer chips of tomorrow!
Please note that the era of chiplet assembly does not mean the total disappearance of large chips. Just look at the size of the Cerebras mega-processor, which is literally the size of a square-cut wafer (hence the name wafer scale engine )! Some professional chips such as Xeon or EPIC, some of which are already composed, still have rather huge chip pieces. This is to guarantee maximum performance. Because disaggregated design brings with it limits and constraints, such as greater latency between certain chip elements, for example. But these small losses can be partially limited.
Facilitating diversification of the chip portfolio
Once yields had been accepted as the main reason for the “big set of bricks”, the search for cost optimization linked to each designer’s portfolio of offerings was added. Intel, AMD and others develop hundreds of chip references every year. Entry-level, mid-range or high-end chips, low-power or high-power chips: between customer needs and range effects, processor vendors need variety.
In the classic approach to monolithic processors, one of the tricks to creating different chips is to qualify them differently. Depending on the errors on the surface of the wafer, and the needs of marketing segmentation, this means lowering frequencies or deactivating one or more CPU or GPU cores – controlling the number of active cores in the various logic blocks is an effective way of influencing chip performance. This explains why the cheapest Core i5 often physically resembles a Core i9. In the latter case, the chip has passed all the tests with flying colors and turns out to be the ultimate version of the design. In the former case, it’s a die that didn’t hold up so well at high frequencies, and was missing one or two cores.
Logic block assembly has one great strength here: the ability for chip designers to create highly complex chips and/or a la carte. All without having to go through the process of qualifying a new monolithic chip design. What’s more, they can respond precisely to a customer who wants more CPU cores, or a less powerful and less expensive integrated graphics unit.
Overcoming two-dimensional limits
Designing chip layouts is such a complex undertaking that AI is now needed to find the best routing of information within the circuits. Circuits which, it must be realized, are three-dimensional. This depth adds complexity, not only during the design phase, but also during the lithography (and testing!) phases. For example, there’s a limit to the number of circuit layers that can be etched on the wafer surface, a limit that forces us to spread out. This is not compatible with the need for compactness in our electronic devices, whether smartphones or PCs.
Here again, the “big block set” offers a new way of getting around these limitations: stacking modules. All you have to do is go to your favorite online retailer to buy such chips from AMD. Ryzen 9 7900X3D processors, labelled “X3D”, are classic chips on which AMD (and TSMC!) have added cache memory on the surface to facilitate the execution of certain tasks (in this case, video games). Why cache memory? Because it’s made up of cells of a memory called SRAM, the limit of which is that it doesn’t miniaturize as well as the transistors in the logic blocks that are the CPUs and GPUs. For a given surface area, engineers think carefully about how much cache memory they need, with each kilobyte consuming precious surface area. Although there are limits, particularly thermal – there’s an extra layer of heat on top of another, which explains the somewhat lower frequencies of AMD processors designed in this way – sticking memory is a clever way to benefit from more memory at low cost.
Note here that AMD/TSMC are not the only ones who know how to stack blocks: Intel was the first with its Lakefield processor (2020), a very low-power chip that stacked 6 layers (including substrate and memory support), the last of which was nothing less than RAM! At the Innovation Forum’s press Q&A session on September 18, Intel CEO Pat Gelsinger guaranteed that cache stacking was not unique to AMD, and that Intel would be proposing similar solutions using its own methods.
Assembly: Intel and TSMC lead the way
If you follow semiconductor news, you’ve probably heard of some of these acronyms: EMIB, CO-EMIB or even Foveros at Intel or CoWoS at TSMC. These various “packaging” technologies, i.e. the integration of die pieces with one another on a substrate, are the lethal weapons used by these two chip titans to push back the current limits of silicon. Whether we’re talking about the huge EPYCs or the tiny Core Ultra, none of these products would ever have seen the light of day under these conditions without cutting-edge interconnection know-how down to the micron.
Does this mean that these two companies are the only ones who know how to assemble or stack die pieces? Certainly not: in addition to the know-how of Samsung, STMicroelectronics, Sony (particularly for stacked sensors) and Global Foundries, there are also players such as Taiwanese ASE Technology Holdings, who specialize exclusively in this area of packaging (and testing). But Intel and TSMC are by far the most advanced, not only in terms of know-how and cutting-edge technologies, but also in terms of production capacity. This is a different type of know-how from etching, which requires not only additional knowledge, but also specific factories and machines. The two titans are thus battling it out for tens of billions of dollars. Intel alone has invested 7 and 3.5 billion dollars between its Malaysian and Penang plants and its American plant in New Mexico (USA). And TSMC is doing the same on its home turf, with the future $2.9 billion Tongluo site, located in Miaoli County, in the north-west of the island.
Another advantage of block stacking is the possibility of creating chips that were once inconceivable for yield reasons, as we have seen. As cooling techniques have improved – watercooling has become widespread in supercomputers – chip designers have been able to create monsters such as the “Datacenter GPU Max” graphics chip (codenamed Ponte Vecchio). But here there’s another limit: the substrate.
This support for dies and chiplets – the “plate” on which blocks are glued, stacked and linked together – becomes a new frontier. And, once again, it’s Intel that could push back the current limits of the organic substrate on which manufacturers place and connect chip ends. Currently based on laminated fiberglass, this heterogeneous material can’t be used to compose the giant chips the industry needs. And the density of the holes – which allow interconnection circuits to pass between the different chip “pieces” – is limited. That’s why Intel has been working for years on a new homogeneous glass substrate, which should see the light of day in commercial applications by the end of the decade. A substrate that will enable Intel to stack more bricks – bigger and bigger bricks at that.
While the reduction in circuit finesse will become increasingly constraining as we approach physical limits, the assembly game that manufacturers are now playing, with the promise of giant chips that it brings with it, promises to keep Moore’s famous law alive for a few more years. And satisfy mankind’s unquenchable thirst for computing power.