Subject: Re: ML for MI
Well I unintentionally stirred up a hornets nest. The two most esteemed old time members of the board mungofitch and zeelotes both seem to dismiss my ML attempt at stock prediction as a complete data mining folly.

I don't dismiss it.

Looking at the whole subject from a great distance, there are two things that using smarter tools might accomplish.
* Exploring unexamined "corners" of the parameter space and discovering previously undiscovered pockets that tend to lead to good performance.
* Adding more degrees of freedom in the models.

The first one is a very good thing, provided only that the out-of-sample validation is done enough, and well enough, prior to trusting real money to it. Same with any investment method.

But the second one is generally a Very Bad Thing. There are good reasons that we have always liked screens with very few tuning parameters: less to tune means less chance of overtuning to the training set. It's astounding what a model will memorize if you don't watch it like a hawk.

I think there's a very good chance that you might accomplish good things on the first bullet, and I don't dismiss that at all. My prior post was merely intended to caution about the second...compounded by the hidden problems of doing out-of-sample validation cleanly.

An anecdote: back in the 90's when my team was paid to do predictive modelling on very large data sets, we mostly used a whole lot of linear factors, and it worked. We souped it up by feeding that result into a neural net (not many layers back then), which polished the model to get a few extra percentage points of accuracy, which (as we understood the data set quite well) was quite a surprise to us, even though the benefit was modest. These things can work. Fortunately we had a lot of validation data; we generally held back around 20% for that.

Data mining is defined as the discovery of new, useful, non-obvious and predictive rules in old data. Go for it.

Jim