Subject: Re: What constitutes success?
The more I think about it, the more I think I'm pretty happy with downside deviation as the key metric to watch.

Traditionally I have used a threshold hurdle ("minimum acceptable return") of 10%/year, but I think I might revise that to be "inflation + 8%/year". That's pretty much the same thing in the average year of backtests in recent years, but much more meaningful over the long run.

To recap the way I calculate downside deviation:
For every rolling year ending on a month boundary, check to see the extent (if any) that the real returns were less than the MAR (in this case inflation + 8%). All observation periods over 8% therefore count as zero risk.
Take that set of numbers, including the zeroes, and calculate the RMS. (square them all, sum the squares, take the square root of the sum).
That is the risk metric: rolling year downside deviation with MAR= real 8%.
Any portfolio which returns inflation+8% or more in every rolling year will by construction have a risk metric of zero. Each observed shortfall below 8% adds a bit of penalty, with a squared function on the size of each shortfall. Missing the MAR by 2% in a single one-year period (having return = inflation + 6%) is deemed to be four times as bad as missing by 1% (return inflation + 7%).

Arguably using rolling two years would be even better, but that needs a whole lot more history before you learn anything, and I'm impatient.

The quant portfolio would still have to have a decent minimum amount of history. I'd say absolute bare minimum 3 years before even looking at the results, preferably 5 or more years. And, critically, the real world history would has to include a bear market stretch.

For a comparison benchmark, my preference is whatever equally weighted index best matches your quant portfolio's search universe. I'm no more likely to buy a position in a trillion dollar company than I am in a billion dollar company, so I don't think the S&P 500 is a suitable benchmark for trying to assess value added--it's really just measuring the performance of a dozen huge stocks these days. For my own current portfolio I am however tilted towards large caps, so RSP (S&P 500 equal weight) is fine. If I wanted to be fussy I could create a synthetic benchmark of maybe 60-80% RSP and 20-40% equal weight the VL universe. But the difference would be negligible, and a lot more work.

With all those caveats out of the way, my yardstick of MI success would simply be this: after a few years of real world results including a bear, which had a better downside deviation risk metric, my portfolio or RSP? The key idea is this: if you've taken good control of the bad times, the good times will take care of themselves. If it got down to a risk metric of zero I'd know only that it had a return of at least inflation + 8%/year in every rolling period---my risk metric wouldn't know by how much it beat that, but it would surely be by a wholly satisfactory amount. I usually express this as the fraction of the risk metric from RSP in the same stretch, calculated the same way.

This risk metric has some hidden charms. As mentioned, you don't get minimum risk without making a decent return all the time, so sitting in cash won't give you low risk. Making no money is a real kind of risk. It doesn't care about portfolio volatility in the short term. It works fine for systems using options or market timing or shorting, since flat spots or jagged spots don't affect the risk metric for better or worse unless they are long lasting. Thus it is much more tolerant of different investment styles than anything involving short term volatility as a a proxy for risk.

Once that's done, start-to-end CAGR is the logical companion metric, provided only that you have a long enough sample and the end date you calculate it isn't really atypical. For example, CAGR figures ending March 2009 are not going to give you a really good idea about what portfolio styles were really the best long run choices, since things were just too weird. With that caveat, end to end CAGR is the ultimate metric of success.

For example, I'd be very happy if my MI portfolio, after several years including a bear, ends up with (say) 85% of the risk metric of RSP and start-to-end CAGR of 2.0%/year better than RSP. The backtests are of course way better than that, but we know better than to believe backtests : )

As others have noted, you're never 100% sure if it was skill or chance. It's just a matter of becoming slowly more certain over time. There are those who still think that Mr Buffett has just been a lucky outlier playing a game of chance these last 60+ years. But with a good enough metric of how things have gone, I think it's possible to get the certainty of the answer down in a reasonable period of time. Wild guess: after maybe 8-10 years of passably good real world results, I think you might achieve maybe 80% certainty that the result was not entirely due to chance.

Jim