February 26, 2015

Getting the Power of GPU for Deep Learning

install EPD
install bleeding edge theano

On my macbook pro it says my graphics card is Intel HD Graphics 4000 1024 MB. I was very disappointed that Apple just left an intel graphics card on a $2000 laptop. They could definitely do better. So I gave up on CUDA. Recently I found that two graphics cards on board!
In terminal type
$ system_profiler SPDisplaysDataType
Graphics/Displays:

    Intel HD Graphics 4000:

      Chipset Model: Intel HD Graphics 4000
      Type: GPU
      Bus: Built-In
      VRAM (Dynamic, Max): 1024 MB
      Vendor: Intel (0x8086)
      Device ID: 0x0166
      Revision ID: 0x0009
      gMux Version: 3.2.19 [3.2.8]
      Displays:
        Color LCD:
          Display Type: Retina LCD
          Resolution: 2880 x 1800 Retina
          Retina: Yes
          Pixel Depth: 32-Bit Color (ARGB8888)
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Built-In: Yes

    NVIDIA GeForce GT 650M:

      Chipset Model: NVIDIA GeForce GT 650M
      Type: GPU
      Bus: PCIe
      PCIe Lane Width: x8
      VRAM (Total): 1024 MB
      Vendor: NVIDIA (0x10de)
      Device ID: 0x0fd5
      Revision ID: 0x00a2
      ROM Revision: 3688
      gMux Version: 3.2.19 [3.2.8]


Sure enough looks like I got two! and the NVIDIA GeForce GT 650M is CUDA enabled as well.

To enable GPU for theano on a mac, install NVIDIA's cuda driver. It is installed on /Developer/NVIDIA/CUDA-6.5

Follow the instructions here: In summary:
$ vim ~/.theanorc
    [global]
    device = gpu
    floatX = float32
   
    [nvcc]
    fastmath = True
   
    [cuda]
    root=/usr/local/cuda


To test how fast gpu works vs cpu:
$ vim test_gpu.py
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 300 * 768  # 10 x #cores x # threads per core
iters = 10000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()

for i in xrange(iters):
    r = f()

t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r

if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'


$ mv ~/.theanorc ~/.theanorc1  # ignore theanorc
$ python test_gpu.py
[Elemwise{exp,no_inplace}()]
Looping 10000 times took 216.094866037 seconds
Result is [ 1.23178032  1.61879341  1.52278065 ...,  1.51747798  1.41718145 1.92811885]
Used the cpu


$ mv ~/.theanorc1 ~/.theanorc 
$ python test_gpu.py
Using gpu device 0: GeForce GT 650M
[GpuElemwise{exp,no_inplace}(), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]

Looping 10000 times took 22.806940794 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  1.51747799  1.41718149 1.92811894]
Used the gpu


-----------------------------------------------------------------------------------

After setting up GPU I realized that the convolutional network demo in theano doesn't work.
       Segmentation fault: 11 theano

TL;DR ISSUE STILL REMAINS
To resolve that I had to switch from python 2.7 to EPD. Check here.

--------------------------

Something messed up my laptop:
I had to remove everything from macports and go the hard way to reconfigure everything from the ground up. After hours of poundering and messing with compiler issues. This is the final savior:

https://solarianprogrammer.com/2013/06/11/compiling-gcc-mac-os-x/


Compiling GCC on OS X 

As mac's compiler is corrupted/removed install apple's developer toolkit and assign corrupted gcc to clang

  $ cd /usr/bin/            $ sudo ln -s clang gcc      $ sudo ln -s clang++ g++ 

In the end you revert back

$ sudo mv /usr/bin/gcc /usr/bin/gccCLANG      $ sudo mv /usr/bin/g++ /usr/bin/g++CLANG            $ sudo ln -s /usr/gcc-4.9.2/bin/gcc-4.9.2 /usr/bin/gcc       $ sudo ln -s /usr/gcc-4.9.2/bin/g++-4.9.2 /usr/bin/g++

cuda looks into /usr/bin/cc but cc still refers to llvm-gcc-4.2. so            $ sudo mv /usr/bin/cc /usr/bin/ccLLVM-GCC-4.2        $ sudo ln -s /usr/gcc-4.9.2/bin/gcc-4.9.2 /usr/bin/cc          

to uninstall cuda    $ sudo /usr/local/cuda/bin/uninstall


Posted on June 11, 2013 by Sol

Update 8 November 2014
I’ve updated the tutorial for GCC 4.9.2 and OS X Yosemite.
In this tutorial, I will show you how to compile from source and install the current stable version of GCC with Graphite loop optimizations on your OS X computer. The instructions from this tutorial were tested with Xcode 6.0.1 and Yosemite (OS X 10.10).
Clang, the default compiler for OS X, supports only C, C++ and Objective-C. If you are interested in a modern Fortran compiler, for e.g., you will need gfortran that comes with GCC. Another reason to have the latest stable version of GCC on you Mac is that it provides you with an alternative C and C++ compiler. Testing your code with two different compilers is always a good idea.
In order to compile GCC from sources you will need a working C++ compiler. In the remaining of this article I will assume that you have installed the Command Line Tools for Xcode. At the time of this writing Apple’s Command Line Tools maps the gcc and g++ to clang and clang++.
Let’s start by downloading the last stable version of GCC from the GNU website, so go to: http://gcc.gnu.org/mirrors.html and download gcc-4.9.2.tar.bz2. I’ve saved the archive in my Downloads folder.
We will also need three other libraries for a successful build of gcc: mpc, mpfr and gmp. Use the above links and download the last versions for all of them: gmp-6.0.0a.tar.bz2, mpc-1.0.2.tar.gz and mpfr-3.1.2.tar.bz2, also save them in your Downloads folder.

For enabling the Graphite loop optimizations you will need two extra libraries, go to ftp://gcc.gnu.org/pub/gcc/infrastructure/ and download isl-0.12.2.tar.bz2 and cloog-0.18.1.tar.gz.
Extract the above six archives in your Downloads folder and open a Terminal window.
We will start by compiling the gmp library:

1
2
3
cd ~
cd Downloads
cd gmp*
Create a new folder named build in which the compiler will save the compiled library:

1
mkdir build && cd build
And now the fun part … write in your Terminal:

1
../configure --prefix=/usr/gcc-4.9.2 --enable-cxx
If you see no error message we can actually compile the gmp library:

1
make -j 4
In a few minutes you will have a compiled gmp library. If you see no error message … congratulations, we are ready to install the library in the /usr/gcc-4.9.2 folder (you will need the administrator password for this):

1
sudo make install
We will do the same steps for MPFR now:

1
2
3
4
cd ..
cd ..
cd mpfr*
mkdir build && cd build
Configuration phase:

1
../configure --prefix=/usr/gcc-4.9.2 --with-gmp=/usr/gcc-4.9.2
The second parameter will just inform the configure app that gmp is already installed in /usr/gcc-4.9.2.
After the configure phase is finished, we can make and install the library:

1
2
make -j 4
sudo make install
Now, we are going to build MPC:

1
2
3
4
5
6
7
cd ..
cd ..
cd mpc*
mkdir build && cd build
../configure --prefix=/usr/gcc-4.9.2 --with-gmp=/usr/gcc-4.9.2 --with-mpfr=/usr/gcc-4.9.2
make -j 4
sudo make install
At this time you should have finished to build and install the necessary prerequisites for GCC.
Next step is to build the libraries for the Graphite loop optimizations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
cd ..
cd ..
cd isl*
mkdir build && cd build
../configure --prefix=/usr/gcc-4.9.2 --with-gmp-prefix=/usr/gcc-4.9.2
make -j 4
sudo make install

cd ..
cd ..
cd cloog*
mkdir build && cd build
../configure --prefix=/usr/gcc-4.9.2 --with-gmp-prefix=/usr/gcc-4.9.2 --with-isl-prefix=/usr/gcc-4.9.2
make -j 4
sudo make install
We are ready to compile GCC now. Be prepared that this could take more than one hour on some machines … Since I’m interested only in the C, C++ and Fortran compilers, this is the configure command I’ve used on my machine:

1
2
3
4
5
cd ..
cd ..
cd gcc*
mkdir build && cd build
../configure --prefix=/usr/gcc-4.9.2 --enable-checking=release --with-gmp=/usr/gcc-4.9.2 --with-mpfr=/usr/gcc-4.9.2 --with-mpc=/usr/gcc-4.9.2 --enable-languages=c,c++,fortran --with-isl=/usr/gcc-4.9.2 --with-cloog=/usr/gcc-4.9.2 --program-suffix=-4.9.2
The above command instructs the configure app where we have installed gmp, mpfr, mpc, ppl and cloog; also it tells to add a prefix to all the resulting executable programs, so for example if you will invoke GCC 4.9.2 you will write gcc-4.9.2, the gcc command will invoke Apple’s version of clang.
If you are interested in building more compilers available in the GCC collection modify the –enable-languages configure option.
And now, the final touches:

1
make -j 4
Grab a coffee, maybe a book, and wait … this should take approximately, depending on your computer configuration, an hour … or more … and about 2GB of your disk space for the build folder.
Install the compiled gcc in /usr/gcc-4.9.2:

1
sudo make install
Now, you can keep the new compiler completely isolated from your Apple’s gcc compiler and, when you need to use it, just modify your path by writing in Terminal:

1
export PATH=/usr/gcc-4.9.2/bin:$PATH
If you want to avoid writing the above command each time you open a Terminal, save the above command in the file .bash_profile from your Home folder.
You should be able to invoke any of the newly compiled compilers C, C++, Fortran …, invoking g++ is as simple as writing in your Terminal:

1
g++-4.9.2 test.cpp -o test
Remember to erase your build directories from Downloads if you want to recover some space.
Let’s check if g++-4.9.2 can compile some C++11 specifics. In your favorite text editor, copy and save this test program (I’ll assume you will save the file in your Home directory):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
//Program to test the new C++11 lambda syntax and initializer lists
#include 
#include 

using namespace std;

int main()
{
  // Test lambda
  cout << [](int m, int n) { return m + n;} (2,4) << endl;

  // Test initializer lists and range based for loop
  vector<int> V({1,2,3});

  cout << "V =" << endl;
  for(auto e : V) {
    cout << e << endl;
  }

  return 0;
}
Compiling and running the above lambda example:

1
2
3
4
5
6
7
g++-4.9.2 -std=c++11 tst_lambda.cpp -o tst_lambda
./tst_lambda
6
V =
1
2
3
We could also compile a C++ code that uses the new thread header from C++11:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
//Create a C++11 thread from the main program

#include 
#include 

//This function will be called from a thread
void call_from_thread() {
    std::cout << "Hello, World!" << std::endl;
}

int main() {
    //Launch a thread
    std::thread t1(call_from_thread);

    //Join the thread with the main thread
    t1.join();

    return 0;
}
GCC 4.9.2, finally, implements the C++11 regex header. Next, we present a simple C++11 code that uses regular expressions to check if the input read from stdin is a floating point number:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
//Uses a regex to check if the input is a floating point number

#include 
#include 
#include 

using namespace std;

int main()
{
  string input;
  regex rr("((\\+|-)?[[:digit:]]+)(\\.(([[:digit:]]+)?))?((e|E)((\\+|-)?)[[:digit:]]+)?");
  //As long as the input is correct ask for another number
  while(true)
  {
    cout<<"Give me a real number!"<<endl;
    cin>>input;
    if(!cin) break;
    //Exit when the user inputs q
    if(input=="q")
      break;
    if(regex_match(input,rr))
      cout<<"float"<<endl;
    else
    {
      cout<<"Invalid input"<<endl;
    }
  }
}
If you are a Fortran programmer, you can use some of the Fortran 2008 features like do concurrent with gfortran-4.9.2:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
integer,parameter::mm=100000
real::a(mm), b(mm)
real::fact=0.5

! initialize the arrays
! ...

do concurrent (i = 1 : mm)
 a(i) = a(i) + b(i)
enddo

end
The above code can be compiled with (assuming you’ve named it tst_concurrent_do.f90):

1
2
gfortran-4.9.2 tst_concurrent_do.f90 -o tst_concurrent_do
./tst_concurrent_do
If you are interested in learning more about the new C++11 syntax I would recommend reading The C++ Programming Language by Bjarne Stroustrup.

or, Professional C++ by M. Gregoire, N. A. Solter, S. J. Kleper 2nd edition:

If you need to brush your Fortran knowledge a good book is Modern Fortran Explained by M. Metcalf, J. Reid and M. Cohen:






1 comment:

  1. A very good and informative article indeed. It helps me a lot to enhance my knowledge and is very helpful for me..I want some more information on geforce graphics card. I will be waiting for your next post.

    ReplyDelete