From Intel’s Foveros to AMD’s chiplets, why the industry is playing lego with chips

0

Once composed of a single piece of silicon, computer and server processors will increasingly become compositions of logic blocks glued or stacked together. Focus on what is not a trend, but a prerequisite for the chip industry’s continued race for performance.

Intel Foveros AMD Chiplets
The GPU Max graphics chip (codenamed Ponte Vecchio) is the extreme example of these new generations of “aggregated” compound chips. Here, dozens of chiplets make up a computational gas pedal with over 100 billion transistors © Intel

The new Meteor Lake chip, which Intel has just presented in detail at its Innovation show, is unlike any other chip the American giant has ever launched. This processor, which will start appearing in notebook PCs on December 14 under the name Core Ultra , breaks with the company’s monolithic tradition. You don’t need a microscope to see that the chip is a “collage” of several chip “pieces”. If we were to make a wine analogy, the giant Intel, which until now only produced single-vineyard Burgundies, has converted to blending to become a Bordeaux producer!

futures générations de puces intel foveros
Intel CEO Pat Gelsinger at Innovation 2023 in Sans José, showcasing the company’s future generations of compound chips. adrian BRANCO for Overclocking.com

We could be content to say that the arrival of new CEO Pat Gelsinger in 2021 was the beginning of many changes at Intel, which, with its IDM 2.0 strategy, is making its revolution in several of its branches. This would be to overlook the fact that AMD has been playing Lego for some time now, that Apple has already glued M1 Maxes together to make M1 Ultra. Or that many rumors point to Nvidia’s forthcoming conversion. So the question is less why Intel has changed, but rather what is forcing high-power chip producers to design chips like sets of bricks?

Foveros and Chiplets: optimizing costs to the max

Gordon Moore
Back in the 60s, iconic Intel co-founder Gordon Moore (after whom Moore’s Law is named) foresaw the future need to compose chips from smaller pieces. “It could be more economical to design systems from smaller functions, which would be aggregated and interconnected separately”. adrian BRANCO for Overclocking.com

The first and foremost reason, as we mentioned when the Meteor Lake chip was unveiled, is cost. The fact that AMD and Intel now design their chips à la carte is first and foremost a way of saving money. Why etch the IO die in 4 nm if you can make do with 7-8 nm? While the right engraving finesse is an important factor, there is also, and above all, an industrial logic behind it, based on yields. And to understand this logic, we need to delve into the wonderful world of semiconductor production. Semiconductors start life on a wafer, that primordial wafer of silicon 300 mm in diameter and a few microns thick.

After numerous chemical baths coupled with multiple laser exposures to the scanners that print the circuits, the electronic components are cut out and tested. Tested indeed, because during the actual design of the chips, numerous defects can appear on the surface of the wafer, often rendering the affected parts inoperable. Imagine a wafer with 10 errors scattered across its surface. If you cut large 3×4 cm chips, you end up with a very high waste rate. Take the same wafer and cut 8×6 mm blocks, and your yield rate explodes. This difference in yield largely explains why the larger the components, the more exponentially expensive they become.

If assembling chips from pieces costs money as well as technology (see below), then this mechanism can lower the final price of a compound chip. In addition to this potential gain in money based on pure yields, there’s also the ability to limit the start-up costs of new production nodes. Indeed, the production of a new component takes time to calibrate machines and processes, an expensive start-up period. Maximizing yields when introducing a new process can significantly reduce start-up costs. Once the process has been mastered and peak yields achieved with small processors (or chips), it’s much easier (and cheaper!) to apply the same recipe to larger components, which naturally have a higher waste rate.

Sapphire Rapids évolution industrie des puces
The new generation of Sapphire Rapids professional processors is indeed made up of several “tiles”. And they’re already very large! adrian BRANCO for Overclocking.com

We’ve been seeing this for years with smartphone chips: with their delirious volumes, higher margins and smaller chips, our smartphones benefit from the thinnest etch nodes. The launch in early September of the iPhone 15 Pro and its 3 nm A17 Pro chip is a reminder of this – it’s the first mass-produced chip in this etch finesse. And it’s good to see that the incredible production volumes of mobile SoC sales are in part financing the computer chips of tomorrow!

Please note that the era of chiplet assembly does not mean the total disappearance of large chips. Just look at the size of the Cerebras mega-processor, which is literally the size of a square-cut wafer (hence the name wafer scale engine )! Some professional chips such as Xeon or EPIC, some of which are already composed, still have rather huge chip pieces. This is to guarantee maximum performance. Because disaggregated design brings with it limits and constraints, such as greater latency between certain chip elements, for example. But these small losses can be partially limited.

Facilitating diversification of the chip portfolio

system on a package
Having integrated virtually all formerly separate elements (GPU, CPU, I/O, etc.) into a single chip called “system on a chip”, the industry is now moving towards “system on a package”, where the substrate becomes a kind of micro motherboard housing components measuring just a few square millimeters. intel

Once yields had been accepted as the main reason for the “big set of bricks”, the search for cost optimization linked to each designer’s portfolio of offerings was added. Intel, AMD and others develop hundreds of chip references every year. Entry-level, mid-range or high-end chips, low-power or high-power chips: between customer needs and range effects, processor vendors need variety.

In the classic approach to monolithic processors, one of the tricks to creating different chips is to qualify them differently. Depending on the errors on the surface of the wafer, and the needs of marketing segmentation, this means lowering frequencies or deactivating one or more CPU or GPU cores – controlling the number of active cores in the various logic blocks is an effective way of influencing chip performance. This explains why the cheapest Core i5 often physically resembles a Core i9. In the latter case, the chip has passed all the tests with flying colors and turns out to be the ultimate version of the design. In the former case, it’s a die that didn’t hold up so well at high frequencies, and was missing one or two cores.

Logic block assembly has one great strength here: the ability for chip designers to create highly complex chips and/or a la carte. All without having to go through the process of qualifying a new monolithic chip design. What’s more, they can respond precisely to a customer who wants more CPU cores, or a less powerful and less expensive integrated graphics unit.

Overcoming two-dimensional limits

SoC x86 intel Lakefield
Launched in 2020, the Lakefield chip was an ultra-low-power x86 SoC that embedded CPU, GPU, I/O as well as the RAM that was etched into its surface. A mini all-in-one system measuring just 12 mm x 12 mm x 1 mm! intel

Designing chip layouts is such a complex undertaking that AI is now needed to find the best routing of information within the circuits. Circuits which, it must be realized, are three-dimensional. This depth adds complexity, not only during the design phase, but also during the lithography (and testing!) phases. For example, there’s a limit to the number of circuit layers that can be etched on the wafer surface, a limit that forces us to spread out. This is not compatible with the need for compactness in our electronic devices, whether smartphones or PCs.

Intel packaging
While Intel has lost its leadership in the field of etching finesse (a leadership the company is keen to regain), the American company nevertheless has cutting-edge packaging techniques that are unique in the industry. intel

Here again, the “big block set” offers a new way of getting around these limitations: stacking modules. All you have to do is go to your favorite online retailer to buy such chips from AMD. Ryzen 9 7900X3D processors, labelled “X3D”, are classic chips on which AMD (and TSMC!) have added cache memory on the surface to facilitate the execution of certain tasks (in this case, video games). Why cache memory? Because it’s made up of cells of a memory called SRAM, the limit of which is that it doesn’t miniaturize as well as the transistors in the logic blocks that are the CPUs and GPUs. For a given surface area, engineers think carefully about how much cache memory they need, with each kilobyte consuming precious surface area. Although there are limits, particularly thermal – there’s an extra layer of heat on top of another, which explains the somewhat lower frequencies of AMD processors designed in this way – sticking memory is a clever way to benefit from more memory at low cost.

Note here that AMD/TSMC are not the only ones who know how to stack blocks: Intel was the first with its Lakefield processor (2020), a very low-power chip that stacked 6 layers (including substrate and memory support), the last of which was nothing less than RAM! At the Innovation Forum’s press Q&A session on September 18, Intel CEO Pat Gelsinger guaranteed that cache stacking was not unique to AMD, and that Intel would be proposing similar solutions using its own methods.

Assembly: Intel and TSMC lead the way

Industrie des puces : Intel PAckage Technology Foveros
After starting to graft an AMD GPU onto some of these 8th-generation Core chips in 2018 with its EMIB technology (Kaby Lake G generation), Intel has never stopped investing and advancing in the search for new generations of chiplet aggregation mechanisms. intel

If you follow semiconductor news, you’ve probably heard of some of these acronyms: EMIB, CO-EMIB or even Foveros at Intel or CoWoS at TSMC. These various “packaging” technologies, i.e. the integration of die pieces with one another on a substrate, are the lethal weapons used by these two chip titans to push back the current limits of silicon. Whether we’re talking about the huge EPYCs or the tiny Core Ultra, none of these products would ever have seen the light of day under these conditions without cutting-edge interconnection know-how down to the micron.

Does this mean that these two companies are the only ones who know how to assemble or stack die pieces? Certainly not: in addition to the know-how of Samsung, STMicroelectronics, Sony (particularly for stacked sensors) and Global Foundries, there are also players such as Taiwanese ASE Technology Holdings, who specialize exclusively in this area of packaging (and testing). But Intel and TSMC are by far the most advanced, not only in terms of know-how and cutting-edge technologies, but also in terms of production capacity. This is a different type of know-how from etching, which requires not only additional knowledge, but also specific factories and machines. The two titans are thus battling it out for tens of billions of dollars. Intel alone has invested 7 and 3.5 billion dollars between its Malaysian and Penang plants and its American plant in New Mexico (USA). And TSMC is doing the same on its home turf, with the future $2.9 billion Tongluo site, located in Miaoli County, in the north-west of the island.

Another advantage of block stacking is the possibility of creating chips that were once inconceivable for yield reasons, as we have seen. As cooling techniques have improved – watercooling has become widespread in supercomputers – chip designers have been able to create monsters such as the “Datacenter GPU Max” graphics chip (codenamed Ponte Vecchio). But here there’s another limit: the substrate.

industrie puces : Packaging evolution
In addition to the race to miniaturize transistors, industry players are also looking to reduce the size of the “bumps”, the small balls that form the links between the different tiles once the bonding process is complete. Between the first generation and Foveros direct, Intel has already succeeded in reducing their size tenfold, from 100 microns to 10 microns. intel

This support for dies and chiplets – the “plate” on which blocks are glued, stacked and linked together – becomes a new frontier. And, once again, it’s Intel that could push back the current limits of the organic substrate on which manufacturers place and connect chip ends. Currently based on laminated fiberglass, this heterogeneous material can’t be used to compose the giant chips the industry needs. And the density of the holes – which allow interconnection circuits to pass between the different chip “pieces” – is limited. That’s why Intel has been working for years on a new homogeneous glass substrate, which should see the light of day in commercial applications by the end of the decade. A substrate that will enable Intel to stack more bricks – bigger and bigger bricks at that.

Moore's law

While the reduction in circuit finesse will become increasingly constraining as we approach physical limits, the assembly game that manufacturers are now playing, with the promise of giant chips that it brings with it, promises to keep Moore’s famous law alive for a few more years. And satisfy mankind’s unquenchable thirst for computing power.