System information

  • OS – High Sierra 10.13.2
  • Tensorflow – 1.5
  • Xcode command line tools – 8.3.3 (Change later, details in guide)
  • Cmake – 3.10.1
  • Bazel – 0.9.0
  • CUDA – 9.1
  • cuDNN – 7.0.4 (for CUDA 9.0)
  • Python – 3.6

Install Requirements

sudo pip3 install six numpy wheel
brew install coreutils

<br>

Step-by-step guide

  • Install CUDA 9.0 for MacOS

http://developer2.download.nvidia.com/compute/cuda/9.0/secure/Prod/local_installers/cuda_9.0.176_mac.dmg
<br>

  • Download cuDNN 7.0.4 for CUDA 9.0 on MacOS (login to Nvidia developer account)
    https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.4/prod/9.0_20171031/cudnn-9.0-osx-x64-v7
    <br>

  • Install cuDNN

cd Downloads
tar -xvf cudnn-9.0-osx-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib/libcudnn*

<br>

  • Set Environment Variables
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH
export PATH=/Developer/NVIDIA/CUDA-9.0/bin${PATH:+:${PATH}}

<br>

  • Install Homebrew
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

<br>

  • Install Prerequisites
brew install python3 coreutils swig llvm bazel
bazel version

<br>

  • Install Python Prerequisite Modules
sudo pip3 install six numpy

<br>

  • If Latest Xcode is installed:
/usr/bin/cc --version
Apple LLVM version 9.0.0 (clang-900.0.39.2)

<br>

  • You will get the following error:
nvcc fatal : The version ('90000') of the host compiler ('Apple clang') is not supported

<br>

  • To fix, install Xcode 8.3.3 and set as default
    https://download.developer.apple.com/Developer_Tools/Xcode_8.3.3/Xcode8.3.3.xip

<br>

  • Double Click to Install, Archive Utility will Expand (takes some time)
cd ~/Downloads
mv Xcode.app Xcode-8.3.3.app
sudo mv Xcode-8.3.3.app /Applications
sudo xcode-select -s /Applications/Xcode-8.3.3.app
sudo xcodebuild -license

 

  • Compile deviceQuery to test CUDA
cd /usr/local/cuda/samples
sudo make -C 1_Utilities/deviceQuery

 

  • Run deviceQuery
cd /usr/local/cuda/samples/
./bin/x86_64/darwin/release/deviceQuery

 

  • It should look similar to this:
./bin/x86_64/darwin/release/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1070"
CUDA Driver Version / Runtime Version 9.1 / 9.1
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 8192 MBytes (8589737984 bytes)
(15) Multiprocessors, (128) CUDA Cores/MP: 1920 CUDA Cores
GPU Max Clock rate: 1797 MHz (1.80 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 5 / 0
Compute Mode:
&lt; Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) &gt;

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

 

  • Clone TensorFlow Repository
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout r1.5

 

  • Install Java SE Development Kit 8u152
    http://download.oracle.com/otn-pub/java/jdk/8u152-b16/aa0333dd3019491ca4f6ddbe78cdb6d0/jdk-8u152-macosx-x64.dmg

 

  • First, create symlink to avoid Eigen bug
sudo ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp

 

  • Deal with other missing libs
sudo ln -s /usr/include/c++/4.2.1/memory /usr/local/include/memory
mkdir /usr/local/include/bits
cp -r /usr/include/c++/4.2.1/bits /usr/local/include
sudo ln -s /usr/include/c++/4.2.1/cstring /usr/local/include/cstring

 

  • Set Environment Variables
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH
export PATH=/Developer/NVIDIA/CUDA-9.1/bin${PATH:+:${PATH}}

 

  • Fix align(sizeof(T)) issue (thank you smitshilu):
sed -i.bu 's/__align__(sizeof(T)) //g' tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
sed -i.bu 's/__align__(sizeof(T)) //g' tensorflow/core/kernels/split_lib_gpu.cu.cc
sed -i.bu 's/__align__(sizeof(T)) //g' tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc

 

  • Fix “-lgomp” issue (thank you Mattias Arro):
sed -i.bu '/linkopts = [“-lgomp”]/d' third_party/gpus/cuda/BUILD.tpl

 

  • Check GPU Compute Capability
  • https://developer.nvidia.com/cuda-gpus
  • My GeForce GTX 1070 is 6.1
  • In configure (triggered by script below), CUDA and cuDNN are default (9.0 and 7)
  • For me, all options other than CUDA are “n” or default, CUDA is “y”
  • For CPU flags, run the following, my results are included:
$ sysctl -a | grep machdep.cpu.features

machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 POPCNT AES PCID

<br>

  • I therefore, include the following: --copt=-msse4.2 --copt=-mpopcnt --copt=-maes --copt=-mcx16

<br>

  • Next we compile the wheel file with Bazel
bazel build --config=cuda --config=opt --copt=-msse4.2 --copt=-mpopcnt --copt=-maes --copt=-mcx16 --verbose_failures --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

<br>

  • Now create the wheel:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

<br>

  • Now install the wheel:
sudo -H pip3 install /tmp/tensorflow_pkg/tensorflow-1.5.0rc0-cp36-cp36m-macosx_10_13_x86_64.whl

<br>

  • Test TensorFlow
# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

<br>

  • My Output:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.797
pciBusID: 0000:05:00.0
totalMemory: 8.00GiB freeMemory: 3.33GiB
2017-12-23 18:39:24.371910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:05:00.0, compute capability: 6.1)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'

<br>

  • Switch back to Current Xcode
sudo xcode-select -s /Applications/Xcode.app

<br>

  • Enjoy!