Restricted Boltzmann Machine on CUDA with Python

November 08, 2010

As promised, my group recently published our Restricted Boltzmann Machine implementation.
It is based upon the CUV Library that is being developed here.
The idea is to combine the ease of programming of Python with the computing power of the GPU.

We used this implementation for several papers and it grew a lot over time.
Here is a list of most of the features:

Restricted Boltzmann Machine Training

With n-step Contrastive Divergence
With persistent Contrastive Divergence
Weight decay, momentum, batch-learning
Binary or gaussian visible nodes

Restricted Boltzmann Machine Evaluation

Sampling from the model
Visualizing Filters
Annealed Importance Sampling for approximating the partition function
Calculating the partition function exactly
Visualization and saving of hidden representations

Stacking RBMs to Deep Belief Networks

Sampling from DBNs

Deep Boltzmann Machine Training

With n-step Contrastive Divergence
With persistent Contrastive Divergence

Deep Boltzmann Machine Evaluation

Sampling from the model

Neural Network Traing

Backpropagation of error
RPROP
Weight decay, momentum, batch-learning
Variable number of layers
Cross entropy training

Finetuning

Initalizing a Neural Network with an RBM and DBM
All of the above functionality can be used

Training on Image Data

Visualization of input, filters and samples from the model
on-the-fly modifications to trainingset via gaussian noise or translations

I tried to cover mosts use cases in the readme. Comments are very welcome.

Enjoy!

Comments

UnknownNovember 9, 2010 at 9:49 AM
Many thanks! Read a lot about DBN's, now I can try it.
ReplyDelete
Replies
Andreas MuellerNovember 9, 2010 at 10:50 AM
Hope it works out for you! If you have any problems installing the CUV library, we are glad to help :)
ReplyDelete
Replies
UnknownDecember 15, 2010 at 12:01 PM
Does this depend on PyUblasExt? Because that library is being a huge pain to build.

-Brian
ReplyDelete
Replies
Andreas MuellerDecember 15, 2010 at 1:53 PM
It depends only on PyUblas - which you have to build yourself. But that shouldn't be to hard.
ReplyDelete
Replies
UnknownDecember 17, 2010 at 9:05 AM
Tweak,

A few more questions for you:

:- Supposing PyUblasExt were working, would CUV take advantage of it?
:- Does CUV take advantage of CULA (CUDA-enhanced LAPACK) if it's installed?

-Brian
ReplyDelete
Replies
UnknownDecember 17, 2010 at 10:34 AM
I'm running into a build issue during the build for the target test_conv_op in (...)/src/tests/. I can't get make to spit out the command line, so I used strace to spit it out:

[pid 14201] execve("/usr/bin/c++", ["/usr/bin/c++", "-fPIC", "-O3", "-DNDEBUG", "-I/usr/lib64/python2.6/site-packages/PyUblas-0.93.1-py2.6-linux-x86_64.egg/include/", "-I/usr/lib64/python2.6/site-packages/numpy/core/include", "-fPIC", "CMakeFiles/test_conv_op.dir/conv_op.cpp.o", "-o", "test_conv_op", "-rdynamic", "../basics/libcuv_basics.a", "../tools/libcuv_tools.a", "../convert/libcuv_convert.a", "../vector_ops/libcuv_vector_ops.a", "../matrix_ops/libcuv_matrix_ops.a", "../convolution_ops/libcuv_convolution_ops.a", "../random/libcuv_random.a", "/opt/cuda/lib64/libcublas.so", "../convert/libcuv_convert.a", "../basics/libcuv_basics.a", "-lboost_serialization-mt-1_41", "../matrix_ops/libcuv_matrix_ops.a", "../vector_ops/libcuv_vector_ops.a", "/opt/cuda/lib64/libcublas.so", "../3rd_party/CudaConv/libcuda_conv.a", "/usr/lib64/blas/threaded-atlas/libblas.so", "/opt/cuda/lib64/libcudart.so", "-lcuda", "/opt/cuda/lib64/libcudart.so", "-lcuda", "-Wl,-rpath,/opt/cuda/lib64:/usr/lib64/blas/threaded-atlas"(...)

The (truncated) output from the make command is:

../matrix_ops/libcuv_matrix_ops.a(cuv_matrix_ops_generated_matrix_ops.cu.o): In function `void cuv::prod(...):
tmpxft_00000a1c_00000000-1_matrix_ops.cudafe1.cpp:(.text+0x25ad): undefined reference to `cblas_sgemm(...)

There's 7 'undefined reference' errors being generated for various cblass_XXXX functions. I ran a little one liner to see what libs might have those symbols (in bash):

for i in `find /usr/lib64/ -type f -a $ -name '*.so' -o -name '*.o' -o -name '*.a' $`; do nm -o -C $i 2>/dev/null | grep -E '[^U] *(cblas_saxpy|cblas_sscal|cblas_sgemm)' 2>/dev/null; done;

... which spits out:

/usr/lib64/blas/atlas/libcblas.a:cblas_saxpy.o:0000000000000000 T cblas_saxpy
/usr/lib64/blas/atlas/libcblas.a:cblas_sgemm.o:0000000000000000 T cblas_sgemm
/usr/lib64/blas/atlas/libcblas.a:cblas_sscal.o:0000000000000000 T cblas_sscal
/usr/lib64/blas/threaded-atlas/libcblas.a:cblas_sptaxpy.o:0000000000000000 T cblas_saxpy
/usr/lib64/blas/threaded-atlas/libcblas.a:cblas_sptgemm.o:0000000000000000 T cblas_sgemm
/usr/lib64/blas/threaded-atlas/libcblas.a:cblas_sptscal.o:0000000000000000 T cblas_sscal

I looked for libs in a few other locations (eg, /opt/cuda, etc) but those were the only ones. None of the .so's had symbols nm could dump, so there's probably some shared objects that satisfy it, but I wasn't able to find them.

So, I changed BLAS_blas_LIBRARY and BLAS_cblas_LIBRARY to use the .a's it found, as well as the shared objects by the same name, but as you can see from the first (admittedly difficult to visually parse) paste above it doesn't attempt to link to the cblas libraries at all, only the blas libraries -- and it's the cblas libraries that export it.

Sooo ... I'm not sure what else to do, I'm hoping you might have a good idea or two.

-Brian
ReplyDelete
Replies
Andreas MuellerDecember 17, 2010 at 10:49 AM
Hey Brian.
To your first question: No and no. We haven't actually looked into PyUblasExt yet.
And we don't have access to CULA. But we might get it in the near future. One problem with CULA is, that it is closed source and so we would have to make a CULA and non-CULA version of CUV so that every one could use it.
Maybe we are rather going to use Magma, which is open source. We will definitely have to look into that at some point since we can't do eigenvectors and quadratic programming at the moment, which limits CUV quite a lot.
About your second question: I'll think about it and hopefully give a solution lateron ;)
Cheers,
Andy
ReplyDelete
Replies
Andreas MuellerDecember 17, 2010 at 11:01 AM
So Brian...
The first thing I can think of is that you have something wired in your LD_LIBRARY_PATH.
Can you tell me what the BLAS_*_LIBRARY are set to using cmake? They should be at the very top of your CMakeCache.txt. They should point to something like /usr/lib/libblas.so.
Maybe we can take this to email, which is probably easier. You can find my mail address at http://www.ais.uni-bonn.de/~amueller/.
Cheers,
Andy
ReplyDelete
Replies
UnknownDecember 21, 2010 at 4:33 AM
Issue ended up being a combination of things

* I'm using ATLAS to auto-tune the lapack install, which gets built with gcc.
* I'm guessing your build of lapack was built with g++ or something similar, as your code expects C++ style linkage, whereas my libs had C style linkage.
* It doesn't appear some of the stuff that's using things like *cblas_sgemm* is linking against cblas. For example, the stuff in *build/release/src/tests* fails to build, but if I manually add in a "-lcblas" flag it builds fine, after the first two problems are dealt with.

I changed *src/matrix_ops/matrix_ops.cu* and *src/3rd_party/CudaConv/matrix.h* to wrap the inclusion of cblas.h in an 'extern "C" {}' block and that seems to have fixed the symbol mismatch issues, and as I said adding the -lcblas flag resolves the other problem.

So, I have one more question now: Hannes said CUV doesn't link against cblas, and your stuff works fine. Is the disparity because of ATLAS, or am I doing something wrong still.

-Brian
ReplyDelete
Replies
UnknownDecember 21, 2010 at 5:12 AM
As another side to anyone else tinkering with this, be sure to set CUDA_TEST_DEVICE to a proper value, otherwise the ctest step will fail. I won't try to speak to the general case, but I have one card installed and its device # was 0. The default after the initial configure step is 3.

After building, if you run ctest and it fails, look at (builddir)/Testing/Temporary/LastTest.log. The error indicating whether you have the wrong device # will be "Invalid argument" on a call to 'cudaSafeCall'.

-Brian
ReplyDelete
Replies
Andreas MuellerDecember 21, 2010 at 2:15 PM
Hi Brain. Actually I am not sure about the linking against cblas. I grepped the verbose make log and there was no -lcblas in there.
We have a pretty standard Ubuntu here with all blas packages just from the standard repo. I'll ask Hannes about it again.
And thanks for the comment on the standard test device. We meant to change that for quite some time but seem to have forgotten it. I'll set the default to 0 right now ;)
Andy
ReplyDelete
Replies
UnknownDecember 21, 2010 at 6:18 PM
Andy,

Try this (in bash):

for i in `find ::list all paths to search:: -type f -a $ -name '*.so*' -o -name '*.a' -o -name '*.o' $`; do nm -CDo $i 2>/dev/null | grep cblas_sgemm; done;

Replace the :: :: with a space-separated list of paths where shared objects and static libraries can be found. 'find' will only find files, not links, unless you specify a parameter that changes the default behavior; that's why I put *.so*.

This will tell you what .so exports the symbol. If I had to put money on it, I'd guess your libblas.so has all of it, whereas mine has the cblas stuff split out in a separate .so

-Brian
ReplyDelete
Replies
UnknownDecember 21, 2010 at 6:28 PM
I've run into one other issue, but it appears to be an issue with the card we have.

(builddir)/src/tests/test_random times out and generates a SIGSEGV that terminates the test early.

(builddir)/src/tests/test_conv_op requests too many resources and bails as well.

So, moral of the story: if you have an 8800GTS or similarly out-dated card, you'll have to get a new one.

You can test by running:

(cuda_dir)/sdk/C/bin/linux/release/deviceQuery

There's a few key things to look at from the output:

CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 0
Concurrent copy and execution: No
Run time limit on kernels: Yes

The first two lines indicate features your card is capable of. Mine is one of the least capable, based on those two lines. The 3rd is crappy because it means you can't be uploading data to the card while any kernels are running.

The 4th means my card is being used as a display device, so the card automatically places limits on anything running. If a kernel runs too long it bails (hence test_random failed).

-Brian
ReplyDelete
Replies
Andreas MuellerDecember 22, 2010 at 11:58 AM
About that: the "too many resources" will be in random and conv up to 9400GT. A 9800 should be fine.
It's trivial to fix the "too many resources" in random, but maybe impossible in conv without major rewriting.
Your problem with random is new. But we never had a card that old so we can't reproduce that.
Ok so the readme needs a "supported GPUs" section... I see ;)
ReplyDelete
Replies
Andreas MuellerDecember 22, 2010 at 12:03 PM
By the way: You can still run everything that does not need random numbers or convolutions. So you can use the mlp, knn and whatever else you want to write (although you would have to initialize the weights in the mlp in numpy but that shouldn't really slow things down).
ReplyDelete
Replies
UnknownDecember 22, 2010 at 7:46 PM
Well, the weights could probably be initialized in R or something. Or just switch it to use non-GPU code.
ReplyDelete
Replies
Andreas MuellerDecember 23, 2010 at 10:59 AM
Well I think numpy.random.uniform is easier than importing R ;).
Something like
cuv_python.push(numpy.random.uniform(0,1,shape))
creates a device matrix of the specified shape...
ReplyDelete
Replies
UnknownDecember 24, 2010 at 9:54 PM
Oh, I was thinking more along the lines of using R from Python, but you're right of course :)
ReplyDelete
Replies
VinayApril 28, 2011 at 6:20 PM
is it possible to run without GPU, if possible how do I setup if I need to run this just on CPU, i dont have any NVDIA GPU(graphics card)??

appreciate your help.
ReplyDelete
Replies
Andreas MuellerMay 20, 2011 at 9:38 PM
Hi vinay.
Sorry for the late answer, I had lots to do.
The RBM is based on the CUV library as explained above. It is possible to run the CUV library without CUDA and by now it should be pretty pain-free.
The hardest part is probably compiling CUV without cuda, but it should be possible to configure this using cmake now.
Just give it a try and get back at me if you run into problems.
ReplyDelete
Replies
Tom LarkworthyMarch 9, 2012 at 4:54 PM
I can't compile it, it says
"
-- A library with BLAS API not found. Please specify library location.
CMake Error at /usr/share/cmake-2.8/Modules/FindBLAS.cmake:457 (message):
A required library with BLAS API not found. Please specify library
location.
"

Yet I installed libblas (on Ubuntu 10.04)

my CMakeCache.txt is:-

//Path to a library.
BLAS_Accelerate_LIBRARY:FILEPATH=BLAS_Accelerate_LIBRARY-NOTFOUND

//Path to a library.
BLAS_acml_LIBRARY:FILEPATH=BLAS_acml_LIBRARY-NOTFOUND

//Path to a library.
BLAS_atlas_LIBRARY:FILEPATH=/usr/lib/libatlas.so

//Path to a library.
BLAS_blas_LIBRARY:FILEPATH=/usr/lib/libblas.so

//Path to a library.
BLAS_cblas_LIBRARY:FILEPATH=/usr/lib/libcblas.so

//Path to a library.
BLAS_complib.sgimath_LIBRARY:FILEPATH=BLAS_complib.sgimath_LIBRARY-NOTFOUND

//Path to a library.
BLAS_cxml_LIBRARY:FILEPATH=BLAS_cxml_LIBRARY-NOTFOUND

//Path to a library.
BLAS_dxml_LIBRARY:FILEPATH=BLAS_dxml_LIBRARY-NOTFOUND

//Path to a library.
BLAS_essl_LIBRARY:FILEPATH=BLAS_essl_LIBRARY-NOTFOUND

//Path to a library.
BLAS_f77blas_LIBRARY:FILEPATH=/usr/lib/libf77blas.so

//Path to a library.
BLAS_mkl_LIBRARY:FILEPATH=BLAS_mkl_LIBRARY-NOTFOUND

//Path to a library.
BLAS_mkl_em64t_LIBRARY:FILEPATH=BLAS_mkl_em64t_LIBRARY-NOTFOUND

//Path to a library.
BLAS_mkl_ia32_LIBRARY:FILEPATH=BLAS_mkl_ia32_LIBRARY-NOTFOUND

//Path to a library.
BLAS_mkl_intel_LIBRARY:FILEPATH=BLAS_mkl_intel_LIBRARY-NOTFOUND

//Path to a library.
BLAS_mkl_intel_lp64_LIBRARY:FILEPATH=BLAS_mkl_intel_lp64_LIBRARY-NOTFOUND

//Path to a library.
BLAS_scsl_LIBRARY:FILEPATH=BLAS_scsl_LIBRARY-NOTFOUND

//Path to a library.
BLAS_sgemm_LIBRARY:FILEPATH=BLAS_sgemm_LIBRARY-NOTFOUND

//Path to a library.
BLAS_sunperf_LIBRARY:FILEPATH=BLAS_sunperf_LIBRARY-NOTFOUND

//Path to a library.
BLAS_vecLib_LIBRARY:FILEPATH=BLAS_vecLib_LIBRARY-NOTFOUND
ReplyDelete
Replies
Tom LarkworthyMarch 9, 2012 at 6:15 PM
well, uninstalling atlas solved a large proportion of my issues.
ReplyDelete
Replies
Tom LarkworthyMarch 23, 2012 at 5:31 PM
Also libcuv_tools was having undefined symbols on make. I had to update my NVIDEA driver beyond Ubuntu 10.04 default in order to get the thing working with CUDA toolkit 4.1
ReplyDelete
Replies

Add comment

Search This Blog

Peekaboo

Restricted Boltzmann Machine on CUDA with Python

Comments

Post a Comment

Popular posts from this blog

Machine Learning Cheat Sheet (for scikit-learn)

A Wordcloud in Python

Kernel Approximations for Efficient SVMs (and other feature extraction methods) [update]

MNIST for ever....

Python things you never need: Empty lambda functions