Configuring GPU Accelerated Keras in Windows 10

Date: 1-24 2017

Tags: keras, tensorflow, python, deep learning

Update 10/27/2018: Now Anaconda provides a standalone environment for both CPU and GPU versions of TensorFlow (the GPU version bundles the correct version of CUDA runtime and relevant libraries). You can read the blog post here. TLDR: once you installed Anaconda (or Miniconda), use the following commands to create and activate a new conda environment containing GPU accelerated TensorFlow:

conda create -n tf_gpu tensorflow-gpu
conda activate tf_gpu

You can change tf_gpu to another name you like. Note that jupyter or (jupyterlab) is not included in this preconfigured environment. If you need to use it, you need to install it yourself with conda.

It should also be noted that the Keras API is now included in TensorFlow under tensorflow.keras. This tutorial below is now obsolete for newer versions of TensorFlow and Keras.

Update 1/26/2018: Updated some steps for newer TensorFlow versions. I have tested that the nightly build for the Windows-GPU version of TensorFlow 1.6 works with CUDA 9.0 and cuDNN 7.

This short tutorial summarizes my experience in setting up GPU-accelerated Keras in Windows 10 (more precisely, Windows 10 Pro with Creators Update). Keras is a high-level framework that makes building neural networks much easier. Keras supports both the TensorFlow backend and the Theano backend. The two backends are not mutually exclusive and you can have both of them installed. The backend can be specified in the Keras configuration file.

Note: Microsoft also added the CNTK backend support for Keras. You can find more details here. This article will not cover the configuration CNTK backend. Nevertheless, it should be quite straightforward.

Note: MILA will stop developing Theano. It is recommended to migrate to the TensorFlow (or CNTK) backend in the future.

Note: The Keras API will be integrated into TensorFlow directly as tf.keras, serving as a high-level API for TensorFlow. tf.keras will be a independent implementation of the Keras specs using TensorFlow only. The development of the original Keras (fchollet/keras) will not stop and the backend support for Theano will continue.

Setting Up Backend

TensorFlow

Download and install NVIDIA CUDA Toolkit for Windows 10. Check the CUDA Toolkit Archive if you cannot find the version you want to install in the front page. (Do not worry, you can still play games/use you 3D software.)

The GPU version of TensorFlow requires a current version of CUDA installed. Here is what you need for difference
- TensorFlow 1.4 : CUDA 8.0
- TensorFlow 1.5 - 1.8 : CUDA 9.0 (Not 9.1)
Note: CUDA 9.1 may work for TensorFlow 1.6+. However, it is still recommended to use CUDA 9.0 at the moment. For CUDA 9.1 support, follow this issue on GitHub. You may also try the nightly builds.
Download and install cuDNN library for your CUDA version on Windows 10. This library contains optimized routines that will significantly speed up the training process. The cuDNN library does not come with a installer so you need to set it up manually. An easy way to accomplish this is to copy the files to %CUDA_PATH% so they will be discovered automatically along with the CUDA toolkit. The cuDNN library contains three files: cudnn64_x.dll (here x denotes the version number, which can be 5, 6, or 7), cudnn.h and cudnn.lib. You should copy them to the following locations:
```
%CUDA_PATH%\bin\cudnn64_x.dll
%CUDA_PATH%\include\cudnn.h
%CUDA_PATH%\lib\x64\cudnn.lib
```
By default, for CUDA 8.0, %CUDA_PATH% points to
```
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
```
and for CUDA 9.0, %CUDA_PATH% points to
```
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
```
Note: if you installed CUDA toolkit in a different location, you need to copy the files to the location of your CUDA installation.

Note: if for some reason cuDNN 6.x does not work when importing tensorflow, you can try cuDNN 5.x instead. TensorFlow 1.2.1 or earlier requires cuDNN 5.1. (cudnn64_5.dll), while TensorFlow 1.3 and 1.4 requires cuDNN 6. (cudnn64_6.dll). For newer versions of TensorFlow such as TensorFlow 1.8, you will need cuDNN 7 (cudnn64_7.dll).
Install Anaconda. You need to install the 64-bit Python 3.5 for TensorFlow 1.1 and prior. Starting from TensorFlow 1.2, Python 3.6 is also supported. Here we use Anaconda for convenience. You can also install Python from python.org as well as all the dependencies yourself.
Open Anaconda Prompt and install the GPU version of TensorFlow following the official guide. This will install tensorflow-gpu to your root Anaconda environment. In short, it should be a simple pip command.

Important: The official installation guide for Anaconda will ask you to create a virtual environment named tensorflow via conda create -n tensorflow python=3.5. By default, Anaconda packages (such as jupyter) will not be installed into this environment. Therefore if you run Jupyter notebook inside this enviroment, you may be using the jupyter command for the root Anaconda environment, which will complain that tensorflow is not found when you try to import it. To resolve this issue, simply use conda to install jupyter inside this virtual environment. Alternatively, you can use conda create -n tensorflow python=3.5 anaconda when creating the virtual environment to ensure that Anaconda is installed into this environment.
Profit!

Tip: If everything is installed correctly, import tensorflow as tf show give no errors. If you see error messages reporting missing DLLs (usually CUDA related), you may need to check your configuration according to these error messages. If you are still using TensorFlow 1.4 or earlier, you can use mrry's Self-check Script to troubleshoot your installation.

Note: if you do not want to modify the root Anaconda environment, you can always create your own environment with the help of conda. Basically, conda create -n snake35 python=3.5 anaconda will create a new Python 3.5 environment and install Anaconda into this environment. You can then enter this environment with activate snake35. The full documentation on environment management can be found here.

Theano

Setting up the Theano backend is more complicated because we need to get nvcc working.

Following steps 1-3 in the tutorial above on setting up the TensorFlow backend. You may optionally use the 64-bit Python 2.7 version instead when installing Anaconda.

Note: It is recommended to stay with CUDA 8.0 for the Theano backend. Because Theano will be no longer actively developed, It is recommended to use the TensorFlow backend instead.
Install Visual Studio 2015. You can grab the free community edition from the official website. During installation, make sure to select Visual C++ under Programming Languages and the newest Windows 10 SDK under Windows and Web Development > Universal Windows App Development Tools.

Note: because Visual Studio 2017 is out. The default download page will get you an installer for Visual Studio 2017. Microsoft provides downloads to older versions here. However, you will need to register a free Visual Studio Dev Essentials account to access them.
Open Anaconda Prompt and install Theano via pip install theano. You will also need to install mingw and libpython. Simply use conda install mingw libpython to install them.
Configure Theano to use GPU. To accomplish this, you need to create a file named .theanorc in %USERPROFILE%. If you are not sure where %USERPROFILE% points to, put it in the address bar of your file explorer and hit Enter. File explorer won't let you name a file starting with a period, and you need to use cmd to do so. Simply use echo.>.theanorc should do the trick. In .theanorc, add the following lines:
```
[global]
device=gpu
floatX=float32

[nvcc]
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin
```
Note:If you installed Visual Studio 2015 in a different location, you will need to adjust compiler_bindir accordingly. Optionally you can enable CNMeM by adding the following lines
```
[lib]
cnmem=0.8
```
The value represents the start size (either in MB or the fraction of total GPU memory). If you have sufficient GPU memory, enabling CNMeM will usually speed up the training process. More details can be found here.
Resolve the issue of missing UCRT header files. This is a tricky issue with CUDA 8.0 + Visual Studio 2015. If you try to test run Theano at this moment, you will get the following error:
```
Cannot open include file: 'corecrt.h': No such file or directory.
```
This error occurs because the CRT headers are no longer in the VC subdirectory of the Visual Studio installation (by default it should be C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC). They are now placed under C:\Program Files (x86)\Windows Kits\10 by default. This issue can be resolved by adding relevant directories to the %LIB% and %INCLUDE% environment variables. To accomplish this, in the Anaconda Prompt, follow the following steps:
1. Define some temporary variables using the following commands:
```
set "SDK_VERSION=10.0.14393.0"
set "SDK_LIB_DIR=C:\Program Files (x86)\Windows Kits\10\Lib\%SDK_VERSION%"
set "SDK_INCLUDE_DIR=C:\Program Files (x86)\Windows Kits\10\Include\%SDK_VERSION%"
```
  Important: you should change the SDK version and installation path of Windows Kits accordingly to match your installation. You also need to change the variable names above if you have already defined variables with similar names.
2. Check if %INLCUDE% and %LIB% have already been defined. Issue the following command:
```
echo %INCLUDE%
```
  If the output is %INCLUDE%, it means %INLCUDE% has not been defined yet. The same applies for %LIB%.
3. If %INCLUDE has been defined, issue the following command:
```
set "INCLUDE=%INCLUDE%;%SDK_INCLUDE_DIR%\ucrt;%SDK_INCLUDE_DIR%\um"
```
  Otherwise, issue the following command:
```
set "INCLUDE=%SDK_INCLUDE_DIR%\ucrt;%SDK_INCLUDE_DIR%\um"
```
4. If %LIB% has been defined, issue the following command:
```
set "LIB=%LIB%;%SDK_LIB_DIR%\ucrt\x64;%SDK_LIB_DIR%\um\x64"
```
  Otherwise, issue the following command:
```
set "LIB=%SDK_LIB_DIR%\ucrt\x64;%SDK_LIB_DIR%\um\x64"
```
Now Theano should be able to compile your model without complaining about the missing header files.

Important: you need to go through step 5 again every time you open a new Anaconda Prompt, which can be tedious. You should consider writing a batch script to make your life easier.

Note: instead of following step 5, you can also include the CRT directories in your %PATH% environment variable, or create two global environment variables, %INCLUDE% and %LIB%. However, I do not recommmend doing this as it may cause issues if you also use Visual Studio to develop Windows applications.
Profit!

Note: if for some reasons you insist on using CUDA 7.5, you will need to grab a copy of Visual Studio 2013 instead of Visual Studio 2015. In this case, you do not need to go through step 5 above.

Installing Keras

The installation of Keras is pretty simple. Just use pip install keras should work. Note that Keras will install Theano as a dependency, and you do not need to configure Theano if you choose to use the TensorFlow backend. Depending on the backend of your choice, create a configuration file and set the backend following the official documentation.

Note: TensorFlow and Theano have difference image dimension orderings. Make sure your Keras configuration on the image dimension ordering matches your backend.

After installing Keras, you can test your installation using the Keras examples here. If you are using the TensorFlow backend, you should see messages like:

Using TensorFlow backend.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
...
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 760M
major: 3 minor: 0 memoryClockRate (GHz) 0.719
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.66GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 760M, pci bus id: 0000:01:00.0)
...

It will report that CUDA libraries have been successfully loaded and the TensorFlow device has been created on the GPU.

Note: the above messages are taken from my old laptop with a dedicated GPU. You definitely need a desktop computer with a better GPU to train a complex neural network.

For the Theano backend, you should see messages like:

Using Theano backend.
...
Using gpu device 0: GeForce GTX 760M (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5105)

You will also see a bunch of messages starting with "creating library", which implies that Theano is compiling the model. It will take some time to compile the model and the training process will start when all the compilations are done. You may also see warnings messages like:

DEBUG: nvcc STDOUT nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.

You can safely ignore them.

Mianzhi Wang

Configuring GPU Accelerated Keras in Windows 10

Setting Up Backend

TensorFlow

Theano

Installing Keras