No. of Recommendations: 10
But the second one is generally a Very Bad Thing. There are good reasons that we have always liked screens with very few tuning parameters: less to tune means less chance of overtuning to the training set. It's astounding what a model will memorize if you don't watch it like a hawk.
A bit OT -
The whole subject makes me recall work I did many years ago in designing aspheric optical lenses for display systems (believe it or not, cathode ray tubes lol).
We needed to define the exposure lens surface in a very precise way in order to produce a physical lens that exposed the phosphor stripes exactly where they needed to be.
Essentially, the optics needed to match the electromagnetic deflection coil as well as the geometry of the CRT, with all of its asymmetries.
It also needed to include deflection of the electron beam due to magnetic fields, some stray, some not.
There were also thermal and mechanical effects, process effects, vacuum effects, and all kinds of things that made "lens design" a huge industry problem.
Very fancy software (for the day) and procedures were developed and honed by huge corporations - Philips/Sylvania (where I worked), GE, RCA, Sony, Zenith, and others.
We desired to fit these surfaces with simple polynomials, but that rarely worked well enough.
You had to add higher orders. If you included up to sixth or eighth order terms, you could make a very nicely working lens.
We liked to stick to even orders, but sometimes you had to throw in odd order terms to make things work - without understanding why they were needed.
And the problem was, to manufacture the lens, you needed to hold it in various fixtures. And those fixtures held the lens outside the used region.
That was called the "extrapolated region", where you had to extend the polynomial (or other) fit in a way to produce a continuous, smooth, real surface.
That region was what was gripped and held during lens manufacturing, and during the exposure process as well.
Junior lens designers quickly learned that a sixth order polynomial made a wonderful lens in the used (in-sample) region, able to match most any dataset.
But in the extrapolated (out of sample) region, it was exceedingly difficult to extend those to a surface that could be held by lens grinding equipment.
The fit could "blow up" within millimeters of leaving the used region.
We would spend a lot of time trying to smooth and extend those complicated mathematical fits.
And senior lens designers would always say, if you need a sixth or eighth order term, you are trying to fit erroneous data, or noise.
I've always had this mental model of MI screen/strategy development being like my experience with lens design.
The better your fit in the used or in sample region, the less trustworthy is in the out of sample region.
But hey, that was 35 years ago. Our FORTRAN programs got replaced by C and then more sophisticated software, and then CAD came along.
You fit surfaces with meshes, and like magic these fit and extrapolation problems disappeared. (It was still challenging to make a well working exposure lens).
Then, CRT's went away, and the whole problem went away, to be replaced by new problems in plasma displays, LCD's, OLEDs, quantum dots, etc.
I think ML has a place at the table.
Mark