One of the requests there was to provide some sort of flow chart on how to do machine learning.
As this is clearly impossible, I went to work straight away.
This is the result:
clarification: With ensemble classifiers and ensemble regressors I mean random forests, extremely randomized trees, gradient boosted trees, and the soon-to-be-come weight boosted trees (adaboost).
Needless to say, this sheet is completely authoritative.
Thanks to Rob Zinkov for pointing out an error in one yes/no decision.
More seriously: this is actually my work flow / train of thoughts whenever I try to solve a new problem. Basically, start simple first. If this doesn't work out, try something more complicated.
The chart above includes the intersection of all algorithms that are in scikit-learn and the ones that I find most useful in practice.
Only that I always start out with "just looking". To make any of the algorithms actually work, you need to do the right preprocessing of your data - which is much more of an art than picking the right algorithm imho.
Anyhow, enjoy ;)
You can find the SVG and dia file I used here. I doubt this qualifies as a creative work, but to make, I put this under CC0 license, which translates to "public domain" in the US.
As some people commented about structured prediction not being included in the chart: There is SVMstruct, which is a great library and has interfaces to many languages, but is only free for non-comercial use.
There is also the library I'm working on, pystruct, which I will write about on another day ;)
The chart is not really comprehensive, as I focused on scikit-learn. Otherwise I certainly would have included neural networks ;)