Subject: Re: ML for MI
Zee.... There are always pros and cons .... but one always has to attempt to frame the issue with an unbiased mind

(1) World (Data Science) has advanced exponentially from 2000s. Majority of the "at-play" stuff today are breakthrus in 2010s-2015 ish periods
(2) I know who Lorren Cobb is/was - IIRC he used to be a prof in a Colorado University. Background in physical sciences. The reason why I mention this is that physicists tend to have this bias - "everything is a system" , ie some closed form representation exists.

Of course , we ( ie Statisticians/ML Scientists) also tend to have a bias - we have a good tool ie Hammer - so everything starts to look like a nail. ( Infact this is almost a direct quote from one of the best Deep Learning books)

(3) Frankly, this sort of a broad brush is disrespectful to highly erudite and honorable folks. My Linear Models prof ( He's like Top 5 in LM/Exp Designs) told me once "learn to figure out how a model goes awry first. That's the key to building a good one"

And in the years and years in practice we developed a saying "All models are wrong in the long run, some happen to be useful".

I think going in with a presumption that there must exist a STABLE Cause-Effect relationship is flawed - especially with Financial Markets. Weirdly, if there's one and you feed enough relevant info to a Deep Learner today - its practically guaranteed it will LEARN it from the data and give it back to you. As Bob (RAMc) says - if you do rolling cross-validations (enough of them thru cycles) and certain relationships come out time and again - then you have proved a Causal driver relationship from the data itself.

Again purely from a practitioners standpoint - these are the KEY steps IMHO:

(a) GIGO ( Garbage In/Out) ie Data Quality : Supremely important and in financial series - most critical is to ensure there's no "look-forward" leakage. Eg for whatever reason one chooses to use Q2 Company (SEC filings/VL type) data to forecast July returns. Issue is , even if the data is marked as Q2 - companies report thru July-August so the data really is not available till fully possibly Aug mid.

The biggest hole where I see this issue is actually macro data. Most econometric data is released at various lags - but if you look at historical repositories (eg FRED) - they mark it contemporaneously

If your performance ie Outcome periods are 1 yr out etc - this wont matter that much - but can still dilute your run-time ie actual/RL(real-life) post performance

(b)Benchmark yourself against a simple linear model. If you see not much difference between methods - stick to the simplest model always. This is the Occam's Razor problem

(c) Ensure a proper validation process. Read some books/articles - sequencing matters in financial time series - your Validation samples should have NO LEAKAGE from development time periods

(d) Understand what the model is saying - this in MLs is very time consuming and computationally expensive - so sometimes ignored. DONT. Field is called Model Explanations - you should be able to explain your model. IN ESSENCE - this is the crux of what started the topic - your end result shouldnt be some magic or the Butter indicator.

It may very much end up being Jim's ROE/Cash model - and there's no harm or shame in that - you just learnt/re-learnt that if a Firm has a healthy Return on Equity and Generates adequate Cash-Flow it IS A GOOD FIRM - and long term produces healthy returns on its stock.

Best