Oct. 11, 2018
NVIDIA has announced a set of open-source libraries intended to bring GPU acceleration to a wide swathe of data analytics applications. The new software suite has attracted support from IBM, HPE, Oracle and a number of other key providers.
Known as RAPIDS, the suite was developed over the past two years by NVIDIA, along with a handful of other open source contributors. It encompasses GPU support for not just conventional analytics, but also machine learning (including deep learning), graph analytics, stream processing, and eventually visualization. RAPIDS is aimed at the data science crowd, that is, researchers, engineers, and other developers looking to make the most out of their datasets – both literally and figuratively.
The aim is to draw businesses and other organizations away from their dependency on CPUs for their analytics and machine learning workloads. These encompass such mission-critical applications as credit card fraud detection, retail inventory forecasts, and customer purchasing prediction, each one of these represents billions of dollars to the economy. Credit card fraud alone cost companies over $20 billion globally in 2015.
According to the RAPIDS developers, by virtue of GPU acceleration, performance on these types of applications can be sped up by an order of magnitude or more – up to 50x on some workloads. That translates into shorter turnaround time, which means business can save money through better monitoring and more timely information collection.
Although in its current form, RAPIDS is built atop CUDA, the software suite is otherwise independent from NVIDIA and could presumably be targeted to AMD GPUs or other accelerators. Besides NVIDIA, the initial software was developed in concert with key open source providers, including Anaconda, BlazingDB, Databricks, Quansight and scikit-learn.
Another key contributor was Ursa Labs, the organization that developed Apache Arrow, upon which RAPIDS is based. Apache Arrow is a development platform for in-memory data processing that is now the industry standard for columnar in-memory data analytics. To spur broader adoption, NVIDIA is integrating RAPIDS into Apache Spark, the immensely popular open-source framework for large-scale data analytics.
The potential value to NVIDIA is considerable, which values the server market for data science and machine learning at $20 billion per year. Since the company already dominates the adjacent $16 billion market for scientific computing/analysis and deep learning (which they’ve broken out from the rest of machine learning), the GPU-maker thinks it can now effectively address a $36 billion market.
“Data analytics and machine learning are the largest segments of the high performance computing market that have not been accelerated — until now,” said Jensen Huang, founder and CEO of NVIDIA, who revealed RAPIDS in his keynote address at Europe’s GPU Technology Conference in Munich. “The world’s largest industries run algorithms written by machine learning on a sea of servers to sense complex patterns in their market and environment, and make fast, accurate predictions that directly impact their bottom line.”
IBM, HPE, Dell EMC, Oracle, Cisco, and Lenovo also stand to benefit, although perhaps not to the same extent as NVIDIA, inasmuch as they can also sell CPU-only systems for their big data customers. Besides the aforementioned server-makers, RAPIDS also garnered backing from more than a dozen other organizations with a stake in data science, including big-name vendors such as NetApp, SAP, and OmniSci (formerly MapD), as well as public organizations like NERSC, Georgia Tech, and UC Davis. If you’re interested in reading all the endorsements, feel free to peruse NVIDIA’s press release.
The RAPIDS libraries can be accessed via http://rapids.ai/, which has made the code available under the Apache open source license. Containerized versions of RAPIDS will also be available later this week on the NVIDIA GPU Cloud container registry.