Configuring GPU Accelerated Keras in Windows 10
Date: 1-24 2017
Tags: keras, tensorflow, python, deep learning
Update 10/27/2018: Now Anaconda provides a standalone environment for both CPU and GPU versions of TensorFlow (the GPU version bundles the correct version of CUDA runtime and relevant libraries). You can read the blog post here. TLDR: once you installed Anaconda (or Miniconda), use the following commands to create and activate a new conda environment containing GPU accelerated TensorFlow:
conda create -n tf_gpu tensorflow-gpu
conda activate tf_gpu
You can change tf_gpu
to another name you like. Note that jupyter
or (jupyterlab
) is not included in this preconfigured environment. If you need to use it, you need to install it yourself with conda
.
It should also be noted that the Keras API is now included in TensorFlow under tensorflow.keras
. This tutorial below is now obsolete for newer versions of TensorFlow and Keras.
Update 1/26/2018: Updated some steps for newer TensorFlow versions. I have tested that the nightly build for the Windows-GPU version of TensorFlow 1.6 works with CUDA 9.0 and cuDNN 7.
This short tutorial summarizes my experience in setting up GPU-accelerated Keras in Windows 10 (more precisely, Windows 10 Pro with Creators Update). Keras is a high-level framework that makes building neural networks much easier. Keras supports both the TensorFlow backend and the Theano backend. The two backends are not mutually exclusive and you can have both of them installed. The backend can be specified in the Keras configuration file.
Note: Microsoft also added the CNTK backend support for Keras. You can find more details here. This article will not cover the configuration CNTK backend. Nevertheless, it should be quite straightforward.
Note: MILA will stop developing Theano. It is recommended to migrate to the TensorFlow (or CNTK) backend in the future.
Note: The Keras API will be integrated into TensorFlow directly as
tf.keras
, serving as a high-level API for TensorFlow.tf.keras
will be a independent implementation of the Keras specs using TensorFlow only. The development of the original Keras (fchollet/keras
) will not stop and the backend support for Theano will continue.
Setting Up Backend
TensorFlow
Download and install NVIDIA CUDA Toolkit for Windows 10. Check the CUDA Toolkit Archive if you cannot find the version you want to install in the front page. (Do not worry, you can still play games/use you 3D software.)
The GPU version of TensorFlow requires a current version of CUDA installed. Here is what you need for difference
- TensorFlow 1.4 : CUDA 8.0
- TensorFlow 1.5 - 1.8 : CUDA 9.0 (Not 9.1)
Note: CUDA 9.1 may work for TensorFlow 1.6+. However, it is still recommended to use CUDA 9.0 at the moment. For CUDA 9.1 support, follow this issue on GitHub. You may also try the nightly builds.
Download and install cuDNN library for your CUDA version on Windows 10. This library contains optimized routines that will significantly speed up the training process. The cuDNN library does not come with a installer so you need to set it up manually. An easy way to accomplish this is to copy the files to
%CUDA_PATH%
so they will be discovered automatically along with the CUDA toolkit. The cuDNN library contains three files:cudnn64_x.dll
(here x denotes the version number, which can be 5, 6, or 7),cudnn.h
andcudnn.lib
. You should copy them to the following locations:%CUDA_PATH%\bin\cudnn64_x.dll %CUDA_PATH%\include\cudnn.h %CUDA_PATH%\lib\x64\cudnn.lib
By default, for CUDA 8.0,
%CUDA_PATH%
points toC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
and for CUDA 9.0,
%CUDA_PATH%
points toC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0
Note: if you installed CUDA toolkit in a different location, you need to copy the files to the location of your CUDA installation.
Note: if for some reason cuDNN 6.x does not work when importing tensorflow, you can try cuDNN 5.x instead. TensorFlow 1.2.1 or earlier requires cuDNN 5.1. (
cudnn64_5.dll
), while TensorFlow 1.3 and 1.4 requires cuDNN 6. (cudnn64_6.dll
). For newer versions of TensorFlow such as TensorFlow 1.8, you will need cuDNN 7 (cudnn64_7.dll
).Install Anaconda. You need to install the 64-bit Python 3.5 for TensorFlow 1.1 and prior. Starting from TensorFlow 1.2, Python 3.6 is also supported. Here we use Anaconda for convenience. You can also install Python from python.org as well as all the dependencies yourself.
Open Anaconda Prompt and install the GPU version of TensorFlow following the official guide. This will install
tensorflow-gpu
to your root Anaconda environment. In short, it should be a simplepip
command.Important: The official installation guide for Anaconda will ask you to create a virtual environment named
tensorflow
viaconda create -n tensorflow python=3.5
. By default, Anaconda packages (such asjupyter
) will not be installed into this environment. Therefore if you run Jupyter notebook inside this enviroment, you may be using thejupyter
command for the root Anaconda environment, which will complain thattensorflow
is not found when you try to import it. To resolve this issue, simply useconda
to installjupyter
inside this virtual environment. Alternatively, you can useconda create -n tensorflow python=3.5 anaconda
when creating the virtual environment to ensure that Anaconda is installed into this environment.Profit!
Tip: If everything is installed correctly,
import tensorflow as tf
show give no errors. If you see error messages reporting missing DLLs (usually CUDA related), you may need to check your configuration according to these error messages. If you are still using TensorFlow 1.4 or earlier, you can use mrry's Self-check Script to troubleshoot your installation.
Note: if you do not want to modify the root Anaconda environment, you can
always create your own environment with the help of conda
. Basically,
conda create -n snake35 python=3.5 anaconda
will create a new Python 3.5
environment and install Anaconda into this environment. You can then enter this
environment with activate snake35
.
The full documentation on environment management can be found
here.
Theano
Setting up the Theano backend is more complicated because we need to get nvcc
working.
Following steps 1-3 in the tutorial above on setting up the TensorFlow backend. You may optionally use the 64-bit Python 2.7 version instead when installing Anaconda.
Note: It is recommended to stay with CUDA 8.0 for the Theano backend. Because Theano will be no longer actively developed, It is recommended to use the TensorFlow backend instead.
Install Visual Studio 2015. You can grab the free community edition from the official website. During installation, make sure to select Visual C++ under Programming Languages and the newest Windows 10 SDK under Windows and Web Development > Universal Windows App Development Tools.
Note: because Visual Studio 2017 is out. The default download page will get you an installer for Visual Studio 2017. Microsoft provides downloads to older versions here. However, you will need to register a free Visual Studio Dev Essentials account to access them.
Open Anaconda Prompt and install Theano via
pip install theano
. You will also need to installmingw
andlibpython
. Simply useconda install mingw libpython
to install them.Configure Theano to use GPU. To accomplish this, you need to create a file named
.theanorc
in%USERPROFILE%
. If you are not sure where%USERPROFILE%
points to, put it in the address bar of your file explorer and hit Enter. File explorer won't let you name a file starting with a period, and you need to usecmd
to do so. Simply useecho.>.theanorc
should do the trick. In.theanorc
, add the following lines:[global] device=gpu floatX=float32 [nvcc] compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin
Note:If you installed Visual Studio 2015 in a different location, you will need to adjust
compiler_bindir
accordingly. Optionally you can enable CNMeM by adding the following lines[lib] cnmem=0.8
The value represents the start size (either in MB or the fraction of total GPU memory). If you have sufficient GPU memory, enabling CNMeM will usually speed up the training process. More details can be found here.
Resolve the issue of missing UCRT header files. This is a tricky issue with CUDA 8.0 + Visual Studio 2015. If you try to test run Theano at this moment, you will get the following error:
Cannot open include file: 'corecrt.h': No such file or directory.
This error occurs because the CRT headers are no longer in the
VC
subdirectory of the Visual Studio installation (by default it should beC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC
). They are now placed underC:\Program Files (x86)\Windows Kits\10
by default. This issue can be resolved by adding relevant directories to the%LIB%
and%INCLUDE%
environment variables. To accomplish this, in the Anaconda Prompt, follow the following steps:Define some temporary variables using the following commands:
set "SDK_VERSION=10.0.14393.0" set "SDK_LIB_DIR=C:\Program Files (x86)\Windows Kits\10\Lib\%SDK_VERSION%" set "SDK_INCLUDE_DIR=C:\Program Files (x86)\Windows Kits\10\Include\%SDK_VERSION%"
Important: you should change the SDK version and installation path of Windows Kits accordingly to match your installation. You also need to change the variable names above if you have already defined variables with similar names.
Check if
%INLCUDE%
and%LIB%
have already been defined. Issue the following command:echo %INCLUDE%
If the output is
%INCLUDE%
, it means%INLCUDE%
has not been defined yet. The same applies for%LIB%
.If
%INCLUDE
has been defined, issue the following command:set "INCLUDE=%INCLUDE%;%SDK_INCLUDE_DIR%\ucrt;%SDK_INCLUDE_DIR%\um"
Otherwise, issue the following command:
set "INCLUDE=%SDK_INCLUDE_DIR%\ucrt;%SDK_INCLUDE_DIR%\um"
If
%LIB%
has been defined, issue the following command:set "LIB=%LIB%;%SDK_LIB_DIR%\ucrt\x64;%SDK_LIB_DIR%\um\x64"
Otherwise, issue the following command:
set "LIB=%SDK_LIB_DIR%\ucrt\x64;%SDK_LIB_DIR%\um\x64"
Now Theano should be able to compile your model without complaining about the missing header files.
Important: you need to go through step 5 again every time you open a new Anaconda Prompt, which can be tedious. You should consider writing a batch script to make your life easier.
Note: instead of following step 5, you can also include the CRT directories in your
%PATH%
environment variable, or create two global environment variables,%INCLUDE%
and%LIB%
. However, I do not recommmend doing this as it may cause issues if you also use Visual Studio to develop Windows applications.Profit!
Note: if for some reasons you insist on using CUDA 7.5, you will need to grab a copy of Visual Studio 2013 instead of Visual Studio 2015. In this case, you do not need to go through step 5 above.
Installing Keras
The installation of Keras is pretty simple. Just use pip install keras
should
work. Note that Keras will install Theano as a dependency, and you do not need
to configure Theano if you choose to use the TensorFlow backend.
Depending on the backend of your choice, create a configuration file and set the
backend following the official documentation.
Note: TensorFlow and Theano have difference image dimension orderings. Make sure your Keras configuration on the image dimension ordering matches your backend.
After installing Keras, you can test your installation using the Keras examples here. If you are using the TensorFlow backend, you should see messages like:
Using TensorFlow backend.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
...
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 760M
major: 3 minor: 0 memoryClockRate (GHz) 0.719
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.66GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 760M, pci bus id: 0000:01:00.0)
...
It will report that CUDA libraries have been successfully loaded and the TensorFlow device has been created on the GPU.
Note: the above messages are taken from my old laptop with a dedicated GPU. You definitely need a desktop computer with a better GPU to train a complex neural network.
For the Theano backend, you should see messages like:
Using Theano backend.
...
Using gpu device 0: GeForce GTX 760M (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5105)
You will also see a bunch of messages starting with "creating library", which implies that Theano is compiling the model. It will take some time to compile the model and the training process will start when all the compilations are done. You may also see warnings messages like:
DEBUG: nvcc STDOUT nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
You can safely ignore them.