No. of Recommendations: 6
Tedthedog: Am curious about the features you used, do they include "fundamental ratios" e.g. P/B, P/S, etc. often used in screens?
Yes, the vast majority of the features being used are fundamental financial data which change at the quarterly reports, plus analysts estimates and then the usual price, volume, momentum and sentiment factors. Have fed the ML models as many as 400 features but removing most correlated features and using a much smaller subset works better.
Although I started out using my Stock Investor Pro data base coupled with scikit-learn I’ve put that on the back burner. Portfolio123 is developing a ML service for professional and individual investors. I was invited to join their beta users group trying out the product and evaluating different models before it will be released as a service. Beta user group includes a few well-versed professional finance individuals with ML experience and some essentially beginners. They are developing many more user-friendly tools for use by individual investors.
System includes most major ML models (9 categories with several subset within each category). In addition, you have the capability of defining your own hyperparameters for each model. Because financial data has a very large amount of noise an ensemble of different learning models seems to achieve better results. Users do not have access to proprietary data but can select their own features, run models, run simulations with buy/sell rules including friction using the data. Significantly better than I could have accomplished as an individual.
It works! Looks to be an excellent system but not likely to be affordable for small net worth investors.
Very first try months ago, selected 77 mostly typical fundamental data features typically used in screens. Trained 6 standard ML models (random forest, extra trees, xgboost, deeptables +) on 3.5 years of S&P mid cap data from 2004 that did not include the stock id just features and 3 month gain. Starting 6 months after the end of the training data s to avoid look ahead data leakage tested the models on 2 years of data. All the models performed better than the universe.
Second fold again starting in 2004 trained on 5.5 years of data, skipped 6 months and used models to predict future gain for 2 years, used the same models each restarting from scratch, same features. Again all retrained models outperformed the universe.
Repeated this 6 more times giving 16 years of future predictions which all achieved better than universe results.
Did not have to change the models used, the features selected to achieve significant outperformance.
Outperformed using 20, 40, 100, 200 stocks.
Some of the models tell you which factors they found most valuable.
Some of the newer ML models significantly outperform some of the classics used in my initial try but they all work without changing anything.