Monday, November 8, 2010

Machine Learning Toolkits

Wow. So much to read today.
While following link upon link, I found so many great toolkits that I think it is worth listing them here.
One of the greatest sources was the GNU/Linux AI & Alife HOWTO.

[edit] It's been a while since I wrote this blog post but still many people seem to find it, so here a quick update. After looking into many libraries, I started using scikit-learn and then using it exclusively. Now I am a regular contributor. It is a fast growing project with great documentation resources, many algorithms and it is just so easy to use. Also, working with Python and the Python crowd is fun. I heartly recommend it. [/edit]

Here goes:
  • Vowpal Wabbit - project on very fast online gradient descent by Yahoo research (C++)
  • VFML (Very Fast Machine Learning) - library for very fast decision trees and Bayes networks (C++)
  • Stochastic Gradient Descent - library for SVMs with stochastic gradient descent (C++)
  • Maximum Entropy Modeling Toolkit for Python and C++ - the name says it all
  • Elefant - toolkit that includes kernel methods, optimization strategies and belief propagation. It has a gui
  • Milk - toolkit for python that includes SVMs, decision trees, kNN, PCA, Kmeans, NMF and feature selection
  • Peach - pure Python library that includes neural networks, fuzzy logic, genetic algorithms and swarm intelligence
  • Pebl - python library and command line application for learning the structure of a Bayesian network
  • Machine Learning: An Algorithmic Perspective - Actually a book. But with MANY MANY MANY examples online. All in Python. MOST AWESOME! - I just ordered the book
  • dbacl - a digramic Bayesian classifier - a collection of command line tools for Bayesian classification particularly for spam filtering
  • Shark - Modular library including neural networks, kernel methods, discrete and continuous optimization, fuzzy logic and control and mixtures density models (C++)
  • PyMVPA - python module including more classifiers, regression and feature selection methods than can be listed here. Do a cross-validated classifier sweep and parameter search in < 10 lines of python.
  • Monte - gradient based learning in Python - Python module that contains neural networks, Kmeans, logistic regression with a focus on parametric models
  • scikit-learn - python module with good API. Includes SVMs, generalized linear models, gaussian mixture models, mean-shift, feature selection and ranking and data management and many more.
  • mlpy - Python module that includes Wavelet transforms, Kernel methods, FDA, PDA, LASSO, LARS, feature selection and ranking and data management. Very clean interface.
  • Modular toolkit for Data Processing - Python toolkit for data processing. In my opinion the API needs a little getting used to. Includes PCA, Kmeans, RMBs, FastICA, Neural Gas, SVms, Perceptrons and many more.
  • Orange - Data mining through visual programming or Python. Large toolbox that includes great visualization features, classifiers, data management, regression and clustering. Definitely worth trying.
  • Weka - A classic tool for all data mining. Contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Can be used via interface, scripting or java.

5 comments:

  1. There's also RapidMiner, PyBrain, Apache Mahout, LibLinear, and that's just from the first couple of pages of http://www.delicious.com/tag/machinelearning

    ReplyDelete
  2. Thanks for your note about scikit-learn.
    I'd like to test this library for a bayesian network work but I have some difficulties to create a simple bayesian network. Do you know where I could find an example of a simple implementation using scikit-learn?

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. http://scikit-learn.org/stable/modules/naive_bayes.html
    @Stig

    ReplyDelete