## Monday, November 8, 2010

### Restricted Boltzmann Machine on CUDA with Python

As promised, my group recently published our Restricted Boltzmann Machine implementation.
It is based upon the CUV Library that is being developed here.
The idea is to combine the ease of programming of Python with the computing power of the GPU.

We used this implementation for several papers and it grew a lot over time.
Here is a list of most of the features:
• Restricted Boltzmann Machine Training
• With n-step Contrastive Divergence
• With persistent Contrastive Divergence
• Weight decay, momentum, batch-learning
• Binary or gaussian visible nodes
• Restricted Boltzmann Machine Evaluation
• Sampling from the model
• Visualizing Filters
• Annealed Importance Sampling for approximating the partition function
• Calculating the partition function exactly
• Visualization and saving of hidden representations
• Stacking RBMs to Deep Belief Networks
• Sampling from DBNs
• Deep Boltzmann Machine Training
• With n-step Contrastive Divergence
• With persistent Contrastive Divergence
• Deep Boltzmann Machine Evaluation
• Sampling from the model
• Neural Network Traing
• Backpropagation of error
• RPROP
• Weight decay, momentum, batch-learning
• Variable number of layers
• Cross entropy training
• Finetuning
• Initalizing a Neural Network with an RBM and DBM
• All of the above functionality can be used
• Training on Image Data
• Visualization of input, filters and samples from the model
• on-the-fly modifications to trainingset via gaussian noise or translations

I tried to cover mosts use cases in the readme. Comments are very welcome.

Enjoy!

1. Many thanks! Read a lot about DBN's, now I can try it.

2. Hope it works out for you! If you have any problems installing the CUV library, we are glad to help :)

3. Does this depend on PyUblasExt? Because that library is being a huge pain to build.

-Brian

4. It depends only on PyUblas - which you have to build yourself. But that shouldn't be to hard.

5. Tweak,

A few more questions for you:

:- Supposing PyUblasExt were working, would CUV take advantage of it?
:- Does CUV take advantage of CULA (CUDA-enhanced LAPACK) if it's installed?

-Brian

6. I'm running into a build issue during the build for the target test_conv_op in (...)/src/tests/. I can't get make to spit out the command line, so I used strace to spit it out:

[pid 14201] execve("/usr/bin/c++", ["/usr/bin/c++", "-fPIC", "-O3", "-DNDEBUG", "-I/usr/lib64/python2.6/site-packages/PyUblas-0.93.1-py2.6-linux-x86_64.egg/include/", "-I/usr/lib64/python2.6/site-packages/numpy/core/include", "-fPIC", "CMakeFiles/test_conv_op.dir/conv_op.cpp.o", "-o", "test_conv_op", "-rdynamic", "../basics/libcuv_basics.a", "../tools/libcuv_tools.a", "../convert/libcuv_convert.a", "../vector_ops/libcuv_vector_ops.a", "../matrix_ops/libcuv_matrix_ops.a", "../convolution_ops/libcuv_convolution_ops.a", "../random/libcuv_random.a", "/opt/cuda/lib64/libcublas.so", "../convert/libcuv_convert.a", "../basics/libcuv_basics.a", "-lboost_serialization-mt-1_41", "../matrix_ops/libcuv_matrix_ops.a", "../vector_ops/libcuv_vector_ops.a", "/opt/cuda/lib64/libcublas.so", "../3rd_party/CudaConv/libcuda_conv.a", "/usr/lib64/blas/threaded-atlas/libblas.so", "/opt/cuda/lib64/libcudart.so", "-lcuda", "/opt/cuda/lib64/libcudart.so", "-lcuda", "-Wl,-rpath,/opt/cuda/lib64:/usr/lib64/blas/threaded-atlas"(...)

The (truncated) output from the make command is:

../matrix_ops/libcuv_matrix_ops.a(cuv_matrix_ops_generated_matrix_ops.cu.o): In function void cuv::prod(...):

There's 7 'undefined reference' errors being generated for various cblass_XXXX functions. I ran a little one liner to see what libs might have those symbols (in bash):

for i in find /usr/lib64/ -type f -a $-name '*.so' -o -name '*.o' -o -name '*.a'$; do nm -o -C $i 2>/dev/null | grep -E '[^U] *(cblas_saxpy|cblas_sscal|cblas_sgemm)' 2>/dev/null; done; ... which spits out: /usr/lib64/blas/atlas/libcblas.a:cblas_saxpy.o:0000000000000000 T cblas_saxpy /usr/lib64/blas/atlas/libcblas.a:cblas_sgemm.o:0000000000000000 T cblas_sgemm /usr/lib64/blas/atlas/libcblas.a:cblas_sscal.o:0000000000000000 T cblas_sscal /usr/lib64/blas/threaded-atlas/libcblas.a:cblas_sptaxpy.o:0000000000000000 T cblas_saxpy /usr/lib64/blas/threaded-atlas/libcblas.a:cblas_sptgemm.o:0000000000000000 T cblas_sgemm /usr/lib64/blas/threaded-atlas/libcblas.a:cblas_sptscal.o:0000000000000000 T cblas_sscal I looked for libs in a few other locations (eg, /opt/cuda, etc) but those were the only ones. None of the .so's had symbols nm could dump, so there's probably some shared objects that satisfy it, but I wasn't able to find them. So, I changed BLAS_blas_LIBRARY and BLAS_cblas_LIBRARY to use the .a's it found, as well as the shared objects by the same name, but as you can see from the first (admittedly difficult to visually parse) paste above it doesn't attempt to link to the cblas libraries at all, only the blas libraries -- and it's the cblas libraries that export it. Sooo ... I'm not sure what else to do, I'm hoping you might have a good idea or two. -Brian 7. Hey Brian. To your first question: No and no. We haven't actually looked into PyUblasExt yet. And we don't have access to CULA. But we might get it in the near future. One problem with CULA is, that it is closed source and so we would have to make a CULA and non-CULA version of CUV so that every one could use it. Maybe we are rather going to use Magma, which is open source. We will definitely have to look into that at some point since we can't do eigenvectors and quadratic programming at the moment, which limits CUV quite a lot. About your second question: I'll think about it and hopefully give a solution lateron ;) Cheers, Andy 8. So Brian... The first thing I can think of is that you have something wired in your LD_LIBRARY_PATH. Can you tell me what the BLAS_*_LIBRARY are set to using cmake? They should be at the very top of your CMakeCache.txt. They should point to something like /usr/lib/libblas.so. Maybe we can take this to email, which is probably easier. You can find my mail address at http://www.ais.uni-bonn.de/~amueller/. Cheers, Andy 9. Issue ended up being a combination of things * I'm using ATLAS to auto-tune the lapack install, which gets built with gcc. * I'm guessing your build of lapack was built with g++ or something similar, as your code expects C++ style linkage, whereas my libs had C style linkage. * It doesn't appear some of the stuff that's using things like *cblas_sgemm* is linking against cblas. For example, the stuff in *build/release/src/tests* fails to build, but if I manually add in a "-lcblas" flag it builds fine, after the first two problems are dealt with. I changed *src/matrix_ops/matrix_ops.cu* and *src/3rd_party/CudaConv/matrix.h* to wrap the inclusion of cblas.h in an 'extern "C" {}' block and that seems to have fixed the symbol mismatch issues, and as I said adding the -lcblas flag resolves the other problem. So, I have one more question now: Hannes said CUV doesn't link against cblas, and your stuff works fine. Is the disparity because of ATLAS, or am I doing something wrong still. -Brian 10. As another side to anyone else tinkering with this, be sure to set CUDA_TEST_DEVICE to a proper value, otherwise the ctest step will fail. I won't try to speak to the general case, but I have one card installed and its device # was 0. The default after the initial configure step is 3. After building, if you run ctest and it fails, look at (builddir)/Testing/Temporary/LastTest.log. The error indicating whether you have the wrong device # will be "Invalid argument" on a call to 'cudaSafeCall'. -Brian 11. Hi Brain. Actually I am not sure about the linking against cblas. I grepped the verbose make log and there was no -lcblas in there. We have a pretty standard Ubuntu here with all blas packages just from the standard repo. I'll ask Hannes about it again. And thanks for the comment on the standard test device. We meant to change that for quite some time but seem to have forgotten it. I'll set the default to 0 right now ;) Andy 12. Andy, Try this (in bash): for i in find ::list all paths to search:: -type f -a $-name '*.so*' -o -name '*.a' -o -name '*.o'$; do nm -CDo$i 2>/dev/null | grep cblas_sgemm; done;

Replace the :: :: with a space-separated list of paths where shared objects and static libraries can be found. 'find' will only find files, not links, unless you specify a parameter that changes the default behavior; that's why I put *.so*.

This will tell you what .so exports the symbol. If I had to put money on it, I'd guess your libblas.so has all of it, whereas mine has the cblas stuff split out in a separate .so

-Brian

13. I've run into one other issue, but it appears to be an issue with the card we have.

(builddir)/src/tests/test_random times out and generates a SIGSEGV that terminates the test early.

(builddir)/src/tests/test_conv_op requests too many resources and bails as well.

So, moral of the story: if you have an 8800GTS or similarly out-dated card, you'll have to get a new one.

You can test by running:

(cuda_dir)/sdk/C/bin/linux/release/deviceQuery

There's a few key things to look at from the output:

CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 0
Concurrent copy and execution: No
Run time limit on kernels: Yes

The first two lines indicate features your card is capable of. Mine is one of the least capable, based on those two lines. The 3rd is crappy because it means you can't be uploading data to the card while any kernels are running.

The 4th means my card is being used as a display device, so the card automatically places limits on anything running. If a kernel runs too long it bails (hence test_random failed).

-Brian

14. About that: the "too many resources" will be in random and conv up to 9400GT. A 9800 should be fine.
It's trivial to fix the "too many resources" in random, but maybe impossible in conv without major rewriting.
Your problem with random is new. But we never had a card that old so we can't reproduce that.
Ok so the readme needs a "supported GPUs" section... I see ;)

15. By the way: You can still run everything that does not need random numbers or convolutions. So you can use the mlp, knn and whatever else you want to write (although you would have to initialize the weights in the mlp in numpy but that shouldn't really slow things down).

16. Well, the weights could probably be initialized in R or something. Or just switch it to use non-GPU code.

17. Well I think numpy.random.uniform is easier than importing R ;).
Something like
cuv_python.push(numpy.random.uniform(0,1,shape))
creates a device matrix of the specified shape...

18. Oh, I was thinking more along the lines of using R from Python, but you're right of course :)

19. is it possible to run without GPU, if possible how do I setup if I need to run this just on CPU, i dont have any NVDIA GPU(graphics card)??

20. Hi vinay.
The RBM is based on the CUV library as explained above. It is possible to run the CUV library without CUDA and by now it should be pretty pain-free.
The hardest part is probably compiling CUV without cuda, but it should be possible to configure this using cmake now.
Just give it a try and get back at me if you run into problems.

21. I can't compile it, it says
"
CMake Error at /usr/share/cmake-2.8/Modules/FindBLAS.cmake:457 (message):
location.
"

Yet I installed libblas (on Ubuntu 10.04)

my CMakeCache.txt is:-

//Path to a library.
BLAS_Accelerate_LIBRARY:FILEPATH=BLAS_Accelerate_LIBRARY-NOTFOUND

//Path to a library.
BLAS_acml_LIBRARY:FILEPATH=BLAS_acml_LIBRARY-NOTFOUND

//Path to a library.
BLAS_atlas_LIBRARY:FILEPATH=/usr/lib/libatlas.so

//Path to a library.
BLAS_blas_LIBRARY:FILEPATH=/usr/lib/libblas.so

//Path to a library.
BLAS_cblas_LIBRARY:FILEPATH=/usr/lib/libcblas.so

//Path to a library.
BLAS_complib.sgimath_LIBRARY:FILEPATH=BLAS_complib.sgimath_LIBRARY-NOTFOUND

//Path to a library.
BLAS_cxml_LIBRARY:FILEPATH=BLAS_cxml_LIBRARY-NOTFOUND

//Path to a library.
BLAS_dxml_LIBRARY:FILEPATH=BLAS_dxml_LIBRARY-NOTFOUND

//Path to a library.
BLAS_essl_LIBRARY:FILEPATH=BLAS_essl_LIBRARY-NOTFOUND

//Path to a library.
BLAS_f77blas_LIBRARY:FILEPATH=/usr/lib/libf77blas.so

//Path to a library.
BLAS_mkl_LIBRARY:FILEPATH=BLAS_mkl_LIBRARY-NOTFOUND

//Path to a library.
BLAS_mkl_em64t_LIBRARY:FILEPATH=BLAS_mkl_em64t_LIBRARY-NOTFOUND

//Path to a library.
BLAS_mkl_ia32_LIBRARY:FILEPATH=BLAS_mkl_ia32_LIBRARY-NOTFOUND

//Path to a library.
BLAS_mkl_intel_LIBRARY:FILEPATH=BLAS_mkl_intel_LIBRARY-NOTFOUND

//Path to a library.
BLAS_mkl_intel_lp64_LIBRARY:FILEPATH=BLAS_mkl_intel_lp64_LIBRARY-NOTFOUND

//Path to a library.
BLAS_scsl_LIBRARY:FILEPATH=BLAS_scsl_LIBRARY-NOTFOUND

//Path to a library.
BLAS_sgemm_LIBRARY:FILEPATH=BLAS_sgemm_LIBRARY-NOTFOUND

//Path to a library.
BLAS_sunperf_LIBRARY:FILEPATH=BLAS_sunperf_LIBRARY-NOTFOUND

//Path to a library.
BLAS_vecLib_LIBRARY:FILEPATH=BLAS_vecLib_LIBRARY-NOTFOUND

22. well, uninstalling atlas solved a large proportion of my issues.

23. Also libcuv_tools was having undefined symbols on make. I had to update my NVIDEA driver beyond Ubuntu 10.04 default in order to get the thing working with CUDA toolkit 4.1