Intel’s 3rd-generation Xeon Scalable CPUs offer 16-bit FPU processing
Intel today announced its third-generation Xeon Scalable (meaning Gold and Platinum) processors, along with new generations of its Optane persistent memory (read: extremely low-latency, high-endurance SSD) and Stratix AI FPGA products.
The fact that AMD is currently beating Intel on just about every conceivable performance metric except hardware-accelerated AI isn't news at this point. It's clearly not news to Intel, either, since the company made no claims whatsoever about Xeon Scalable's performance versus competing Epyc Rome processors. More interestingly, Intel hardly mentioned general-purpose computing workloads at all.
Finding an explanation of the only non-AI generation-on-generation improvement shown needed jumping through multiple footnotes. With sufficient determination, we eventually discovered that the "1.9X average performance gain" mentioned on the overview slide refers to "estimated or simulated" SPECrate 2017 benchmarks comparing a four-socket Platinum 8380H system to a five-year-old, four-socket E7-8890 v3.
To be fair, Intel does seem to have introduced some unusually impressive innovations in the AI space. "Deep Learning Boost," which formally was just branding for the AVX-512 instruction set, now encompasses an entirely new 16-bit floating point data type as well.
With earlier generations of Xeon Scalable, Intel pioneered and pushed heavily for using 8-bit integer—
INT8 —inference processing with its OpenVINO library. For inference workloads, Intel argued that the lower accuracy of
INT8 was acceptable in most cases, while offering extreme acceleration of the inference pipeline. For training, however, most applications still needed the greater accuracy of
FP32 32-bit floating point processing.
The new generation adds 16-bit floating point processor support, which Intel is calling
bfloat16 . Cutting
FP32 models' bit-width in half accelerates processing itself, but more importantly, halves the RAM needed to keep models in memory. Taking advantage of the new data type is also simpler for programmers and codebases using
FP32 models than conversion to integer would be.
Intel also thoughtfully provided a game revolving around the BF16 data type's efficiency. We cannot recommend it either as a game or as an educational tool.
Optane storage acceleration
Intel also announced a new, 25 percent-faster generation of its Optane "persistent memory" SSDs, which can be used to greatly accelerate AI and other storage pipelines. Optane SSDs operate on 3D Xpoint technology rather than the NAND flash typical SSDs do. 3D Xpoint has tremendously higher write endurance and lower latency than NAND does. The lower latency and greater write endurance makes it particularly attractive as a fast caching technology, which can even accelerate all solid-state arrays.
The big takeaway here is that Optane's extremely low latency allows acceleration of AI pipelines—which frequently bottleneck on storage—by offering very rapid access to models too large to keep entirely in RAM. For pipelines which involve rapid, heavy writes, an Optane cache layer can also significantly increase the life expectancy of the NAND primary storage beneath it, by reducing the total number of writes which must actually be committed to it.
Meanwhile, this excellent Tom's Hardware review from 2019 demonstrates just how far in the dust Optane leaves traditional data center-grade SSDs in terms of latency.
Stratix 10 NX FPGAs
Finally, Intel announced a new version of its Stratix FPGA. Field Gate Programmable Arrays can be used as hardware acceleration for some workloads, allowing more of the general-purpose CPU cores to tackle tasks that the FPGAs can't.
Listing image by Intel
- Intel researchers create AI system that rates similarity of 2 pieces of code
- AMD 英特尔股价为何“冰火两重天”?
- Heads roll at Intel after 7nm delay
- Intel Ponte Vecchio GPU Will Be Made on TSMC 6nm Process, CPUs Could Be TSMC 3nm
- 外媒：台积电英特尔5nm及3nm CPU合作计划正在推进
- Intel lost $42B in market value after revealing it might not make its own next-gen chipsets