Performance boost for GPU

I recently surfaced couple of ways to speed up GPU based application. I was experimenting with art transformation of images on a P2 class AWS machines. The process was taking anytime between 10 to 16 seconds, with an average of around 14 seconds. I was exploring, if time can be further optimized.

A brief chat with AWS resolved it for me. The AWS support guy asked me to set up my P2 per

I had done almost everything confirming to link above, except last 4 statements. Turns out, these commands configure some cool performance configurations.

Nvidia-smi ( is a CLI for Nvidia GPU devices. As AWS P2 class server contain Nvidia Tesla K80 devices, I can use nvidia-smi to see and set options.

nvidia-smi -pm 1
The above option toggles Persistence Mode. It’s a GPU Attribute, which ensure Nvidia Driver remains loaded, even when an application is not running. In Nvidia words, “...This minimizes the driver load latency associated with running dependent apps, such as CUDA programs…”. The flag works only on Linux, and needs to be set each time instance reboots.

Following set of commands helps in using GPU Boost, a feature available in NVIDIA Tesla GPUs. In brief, GPU Boost helps configuring GPU core and memory clock rates.

nvidia-smi --auto-boost-permission=0
This command relaxes root user requirement of toggling GPU Boost. This should be sufficient for single GPU applications, but isn’t in case of multi-GPU machines.

nvidia-smi --applications-clocks-permission 0
This command relaxes root user requirement of changing application clocks.

nvidia-smi -q -d SUPPORTED_CLOCKS
nvidia-smi -ac 2505,875
The first command shows the supported clock rates. The second command sets the application clock, where first number is Memory clock rate and second number is Graphic clock rate. You can safely use the highest supported Memory and Graphics clock rate. GPU automatically moves down to lower configuration, in case it cannot run on configured setting. For AWS P2, the setting is given in second command.

Be aware that each of above setting can be done for only one GPU, by using -i option. GPU can be identified with GPU'S UUID, PCI bus ID, Board serial number, 0 based index from driver. However, UUID or PCI bus ID is recommended. For e.g.
nvidia-smi -ac 2505,875 -i 0


Popular posts from this blog

504 Gateway Timeout on Amazon AWS ELB (Elastic Load Balancer)

AWS RDS incompatible-parameters solved

Push Notifications to iOS not received from Amazon SNS