Conventional CPU Computing vs GPU Computing

A CPU is in simple terms, the brain of a computer. All the arithmetic and logical operations are performed on a CPU. CPU computing is the process of developing code that would enable programs to work according to a specific application requirement. Specific actions that are performed with these programs on a processing unit within a CPU is known as a CPU core. The arithmetic and logical operations that we speak of are performed with the help of arithmetic logic units (ALUs) present in the CPU core.

The highest number of cores on a CPU till date is 32 on AMD's Ryzen Threadripper 2990WX and 2990X.
A GPU, on the other hand is a specialized processor focused on graphical processing of visual imagery which reduced the load on the CPU by taking care of such visual elements. GPUs have great computational capabilities and can also perform intense operations many times faster than a CPU. GPU cores also have ALUS.

The highest number of cores on a GPU till date is 5120 on NVIDIA’s Titan V, also with 640 Tensor cores specialized for Machine Learning in AI.

A simple way to differentiate CPUs and GPUs

One of the simplest ways to differentiate between the two processors is the following example (presented during StampedeCon 2014) with a series of these three sequential questions you can ask yourself.

Consider some farmers ploughing an enormous field with 32 Oxen. How efficiently can the entire field be ploughed with them?

There is a famous quote by Seymour Cray, the father of supercomputing:

By Michael Hicks -, CC 2.0,

If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?

Now consider 5120 Chickens being used to plough the same field. How different would the process be compared to 32 Oxen?
Some ALUs on a CPU vs thousands on a GPU

Can we relate the above to 32 cores on the CPU compared with 5120 cores on the GPU?

Basic concepts – Why should you know them before getting started with GPU Programming

It is essential that we discuss some of the core concepts in detail.


In the world of Computing, threads are parts of a process under execution. A process starts with a single thread as the smallest sequence of instructions but can evolve into multiple threads according to the implementation requirements of the process.  A scheduler, which is a part of an OS, can manage such threads independently.


Concurrency is the process of carrying out multiple tasks at a time via a program or algorithm. These multiple tasks can be independent of each other with different goals. The tasks can be performed at overlapping time periods for their execution and completion. At least two threads are involved during concurrency.


Performing multiple tasks at the same time is said to be known as parallelism. A parallel operation involves the simultaneous execution of two or more operations. Unlike concurrency, there is no overlapping involved in true parallelism. At least two threads always execute simultaneously during parallelism.

Memory management

Memory management  are the different ways through which portions of computer memory are dynamically allocated as resources to programs in need and also to deallocate them for repurposing when not required any more. Memory management is quite significant in computer science because any program and its processes can request memory allocation at any time.


SIMD stands for Single Instruction Multiple Data which involves computation with multiple processing elements to perform simultaneous operation on multiple data units. Even though SIMD involves parallelism at data level, only one instruction is allowed to execute at a time and therefore SIMD is not concurrent.
SMT stands for Simultaneous Multi-Threading which allows execution of multiple independent threads in order to effectively utilize computational resources of modern processors.
SIMT stands for Single Instruction Multiple Threads introduced by NVIDIA which is the essence of how parallel programming on GPUs work.
SIMT combines the power of both SIMD and SMT to balance flexible and efficient parallelism.
Combined with SMT, NVIDIA’s SIMT allows the following benefits over conventional SIMD:
  • Single Instruction, Multiple Register Sets
  • Single Instruction, Multiple Addresses
  • Single Instruction, Multiple Flow Paths
Looking further ahead, while enabling concurrency with SMT, when SIMT is combined with SIMD, GPUs can make use of core SIMD features to handle single instruction parallelism at data level much more efficiently.

Understanding how GPUs are invoked within CPU code

The basic idea behind invoking GPUs within CPU code is very simple. You always have a single source code file which is your GPU program. When using CUDA, you use a .cu format. While on OpenCL, you use .cl extension.

In the single source file and within the source code, the basic concept is to hand over the computationally intense fragments of the operations you want to perform to the GPU from the CPU. After the GPU carries out those calculations, it transfers its output back to you via the CPU. The goal should always about how efficiently you balance the operations between the CPU and the GPU.

GPU Computing on Linux vs Windows

In today’s time, an Open Source Approach is being preferred by many. More and more communities and companies are preferring the use of an open source operating system(OS) such as Ubuntu Linux for both academic and industrial purposes. Linux is where most scientific and computational applications are being tested and run.

So it is recommended to use Ubuntu as the preferable OS where you plan to learn or run your GPU computing programs.

Windows 10 is a proprietary OS, meaning which you need to purchase a license in order to run or deploy your GPU programs. Linux is free to use. So you can download a distribution whenever you want and get started with writing your GPU programs!

Linux’s lightweight functionality allows a smoother and efficient performance compared to Windows. Due to a lot of heavy usage of system resources with many apps and GUIs, performance can take quite a hit on Windows.

It might be a bit easier on Windows to install CUDA and deploy your programmable interface, but on the long run, Linux will always be a better choice to ensure optimum efficiency of your deployed programs.

If a company plans to deploy some thousands of computing nodes/workstations to carry out their industrial operations, how many Windows licenses would be required for programmable or computational deployment? How would it affect the total expenditure? How can they spend that amount if they prefer Linux instead? Will there not be a better scope in hardware expenses?
It is quite understandable why supercomputers prefer Linux as their host OS. Linux provides better tools to ensure more consistency, efficiency and reliability.

If you are interested to learn more about GPUs, please check  


Popular posts from this blog

An Introduction to Quantum Computing with Open Source Cirq Framework