How GPU Computing Can Boost Data Science Capabilities

GPU technology is an innovative option that can improve the speed of certain types of processes, namely those that can be performed using parallel processing which require computer systems to "think" and "learn" while processing data as quickly and efficiently as possible. For example, GPUs are used in everyday technology that requires real-time human language processing such as chatbots, text generation, sentiment analysis, and image recognition.

How GPU Computing Can Boost Data Science Capabilities

November 17, 2020


GPU technology is an innovative option that can improve the speed of certain types of processes, namely those that can be performed using parallel processing which require computer systems to "think" and "learn" while processing data as quickly and efficiently as possible. For example, GPUs are used in everyday technology that requires real-time human language processing such as chatbots, text generation, sentiment analysis, and image recognition.

While adoption of GPU technology can streamline processes with faster compute time especially in the discipline of data science, there are challenges to implementing and maintaining GPUs on self-hosted infrastructure due to fast-evolving technology. When executed correctly, however, GPUs can make a world of difference in improving and modernizing an organization’s processing infrastructure. Curious how adopting GPU usage could optimize your company's workflow and insights? Keep reading to learn more.

CPU v. GPU cores

What exactly is a GPU?

 

Adoption of GPUs (graphical processing units) in data science is growing as the understanding of its benefits and implementation requirements increases. Use of GPUs is often discussed in relation to Machine Learning, gaming or real-time human language tasks such as chatbots, because a GPU is a processor that is great at handling specialized computations. We can contrast this to the Central Processing Unit (CPU), which is great at handling general computations, both in personal computers and enterprise infrastructure.

GPUs can be faster at completing specific tasks than a CPU, but the efficiency of GPUs depends on the type of computation being performed. GPUs are great at tasks that can be run in parallel, a type of computing architecture in which several processors simultaneously execute many smaller calculations broken down from larger more complex problems. If there are multiple cores in a processing unit, the user can split tasks into multiple smaller tasks and run them at the same time.

Serial processing v. Parallel Processing

Fulcrum recently developed a prototype Natural Language Processing (NLP) model to classify consumer complaints about financial products based on public data from the Consumer Financial Protection Bureau. Text complaints are classified by product, and we wanted to train a model to predict the correct product from new complaint text to test the methodology.

The model was built using Transformers, which are a recent advancement in machine learning applied to NLP. A GPU optimized server was the only feasible processing option in this instance given the size of the data. We estimated the same model using a standard CPU would have taken 10-20 times as long both to train the model and score the underlying data.

 

 

The Operational Side of GPUs

 

Use of GPUs for modeling is a quickly evolving technology, with frequent releases of drivers, APIs and software packages. This, along with the expertise needed to fully utilize GPU technology, creates an obstacle for some organizations to make and maintain the investment in GPU technology.

GPU-optimized servers are available on a per-hour basis from most cloud computing providers, or can be added to an enterprise’s data science stack. Technical expertise in setup and maintenance is likely the greater burden than the infrastructure’s cost itself. The software libraries and drivers that leverage GPU computing for machine learning are rapidly advancing, and the skill set and knowledge required among data scientists and engineers is both in short supply and high demand in the marketplace. Training and retention of trained resources must go hand in hand with any investment in GPUs in order to make the most of the technology.

 

 

Experimenting with GPUs without the Full Investment

 

When it comes to data science, GPUs are worth considering as an innovative solution for parallel processing needs such as NLP or transformer models. While use of GPUs for analytical processing is on the rise, many companies still have trouble adopting this technology due to the expertise required. Fulcrum offers access to and assistance with GPU processing with its Managed Services for Big Data and Agile Analytical Lab for companies that want to quickly explore its benefits before making the investment. To learn more about trying out GPUs or to get training or guidance on utilizing your own GPUs, contact us today.

Leave a Reply