While following link upon link, I found so many great toolkits that I think it is worth listing them here.
One of the greatest sources was the GNU/Linux AI & Alife HOWTO.
 It's been a while since I wrote this blog post but still many people seem to find it, so here a quick update. After looking into many libraries, I started using scikit-learn and then using it exclusively. Now I am a regular contributor. It is a fast growing project with great documentation resources, many algorithms and it is just so easy to use. Also, working with Python and the Python crowd is fun. I heartly recommend it. [/edit]
- Vowpal Wabbit - project on very fast online gradient descent by Yahoo research (C++)
- VFML (Very Fast Machine Learning) - library for very fast decision trees and Bayes networks (C++)
- Stochastic Gradient Descent - library for SVMs with stochastic gradient descent (C++)
- Maximum Entropy Modeling Toolkit for Python and C++ - the name says it all
- Elefant - toolkit that includes kernel methods, optimization strategies and belief propagation. It has a gui
- Milk - toolkit for python that includes SVMs, decision trees, kNN, PCA, Kmeans, NMF and feature selection
- Peach - pure Python library that includes neural networks, fuzzy logic, genetic algorithms and swarm intelligence
- Pebl - python library and command line application for learning the structure of a Bayesian network
- Machine Learning: An Algorithmic Perspective - Actually a book. But with MANY MANY MANY examples online. All in Python. MOST AWESOME! - I just ordered the book
- dbacl - a digramic Bayesian classifier - a collection of command line tools for Bayesian classification particularly for spam filtering
- Shark - Modular library including neural networks, kernel methods, discrete and continuous optimization, fuzzy logic and control and mixtures density models (C++)
- PyMVPA - python module including more classifiers, regression and feature selection methods than can be listed here. Do a cross-validated classifier sweep and parameter search in < 10 lines of python.
- Monte - gradient based learning in Python - Python module that contains neural networks, Kmeans, logistic regression with a focus on parametric models
- scikit-learn - python module with good API. Includes SVMs, generalized linear models, gaussian mixture models, mean-shift, feature selection and ranking and data management and many more.
- mlpy - Python module that includes Wavelet transforms, Kernel methods, FDA, PDA, LASSO, LARS, feature selection and ranking and data management. Very clean interface.
- Modular toolkit for Data Processing - Python toolkit for data processing. In my opinion the API needs a little getting used to. Includes PCA, Kmeans, RMBs, FastICA, Neural Gas, SVms, Perceptrons and many more.
- Orange - Data mining through visual programming or Python. Large toolbox that includes great visualization features, classifiers, data management, regression and clustering. Definitely worth trying.
- Weka - A classic tool for all data mining. Contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Can be used via interface, scripting or java.