So far, there haven’t been any upsets in the MLPerf AI benchmarks. Nvidia not only wins everything, but they are still the only company that even competes in every category. Today’s MLPerf Training 0.7 announcement of results isn’t much different. Nvidia started shipping its A100 GPUs in time to submit results in the Released category for commercially available products, where it put in a top-of-the-charts performance across the board. However, there were some interesting results from Google in the Research category.
To help reflect the growing variety of uses for machine learning in production settings, MLPerf had added two new and one upgraded training benchmarks. The first, Deep Learning Recommendation Model (DLRM), involves training a recommendation engine, which is particularly important in eCommerce applications among other large categories. As a hint to its use, it’s trained on a massive trove of Click-Through-Rate data.
The second addition is the training time for BERT, a widely-respected natural language processing (NLP) model. While BERT itself has been built on to create bigger and more complex versions, benchmarking the training time on the original is a good proxy for NLP deployments because BERT is one of a class of Transformer models that are widely used for that purpose.
Finally, with Reinforcement Learning (RL) becoming increasingly important in areas such as robotics, the MiniGo benchmark has been upgraded to MiniGo Full (on a 19 x 19 board), which makes a great deal of sense.
For the most part, commercially available alternatives to Nvidia either didn’t participate at all in some of the categories, or couldn’t even out-perform Nvidia’s last-generation V100 on a per-processor basis. One exception is Google’s TPU v3 beating out the V100 by 20 percent on ResNet-50, and only coming in behind the A100 by another 20 percent. It was also interesting to see Huawei compete with a respectable entry for ResNet-50, using its Ascend processor. While the company is still far behind Nvidia and Google in AI, it’s continuing to make it a major focus.
As you can see from the chart below, the A100 is 1.5x to 2.5x the performance of the V100 depending on the benchmark:
If you have the budget, Nvidia’s solution also scales to well beyond anything else submitted. Running on the company’s SELENE SuperPOD that includes 2,048 A100s, models that used to take days can now be trained in minutes:
While many types of specialized hardware have been designed specifically for machine learning, most of them excel at either training or inferencing. Reinforcement Learning (RL) requires an interleaving of both. Nvidia’s GPGPU-based hardware is ideal for the task. And, because data is generated and consumed during the training process, Nvidia’s high-speed interlinks are also helpful for RL. Finally, because training robots in the real world is expensive and potentially dangerous, Nvidia’s GPU-accelerated simulation tools are useful when doing RL training in the lab.
Perhaps the most surprising piece of news from the new benchmarks is how well Google’s TPU v4 did. While v4 of the TPU is in the Research category — meaning it won’t be commercially available for at least 6 months — its near-Ampere-level performance for many training tasks is quite impressive. It was also interesting to see Intel weigh in with a decent performer in reinforcement learning with a soon-to-be-released CPU. That should help it deliver in future robotics applications that may not require a discrete GPU. Full results are available from MLPerf.