Portfolio Inputs Selection from Imprecise Training Data

Sarunas Raudys,

Aistis Raudys,

Gene Biziuleviciene,

Zidrina Pabarskaite

This paper explores very acute problem of portfolio secondary overfitting. We examined the financial portfolio inputs random selection optimization model and derived the equation to calculate the mean Sharpe ratio in dependence of the number of portfolio inputs, the sample size L used to estimate Sharpe ratios of each particular subset of inputs and the number of times the portfolio inputs were generated randomly. It was demonstrated that with the increase in portfolio complexity, and complexity of optimization procedure we can observe the over-fitting phenomena. Theoretically based conclusions were confirmed by experiments with artificial and real world 60,000-dimensional 12 years financial data.
Słowa kluczowe: Complexity, financial portfolio, overfitting, sample size, variable selection

[1] Markowitz H.M., Foundations of portfolio theory. The journal of finance, 1991, 46 (2), pp. 469–477.

[2] Reilly F.K., Brown K.C., Investment analysis and portfolio management. Cengage Learning, 2011. 188

[3] Raudys S., Portfolio of automated trading systems: Complexity and learning set size issues. IEEE transactions on neural networks and learning systems, 2013, 24 (3), pp. 448–459.

[4] DeMiguel V., Garlappi L., Uppal R., Optimal versus naive diversification: How inefficient is the 1/n portfolio strategy? Review of Financial Studies, 2009, 22 (5), pp. 1915–1953.

[5] Haley M.R., Shortfall minimization and the naive (1/n) portfolio: an out-ofsample comparison. Applied Economics Letters, 2015, pp. 1–4.

[6] Guyon I., Elisseeff A., An introduction to variable and feature selection. Journal of machine learning research, 2003, 3 (Mar), pp. 1157–1182.

[7] John G.H., Kohavi R., Pfleger K. et. al, Irrelevant features and the subset selection problem. The journal of finance, 1994, pp. 121–129.

[8] Raudys A., Pabarˇskait˙e ˇZ., Discrete portfolio optimisation for large scale systematic trading applications. In: Biomedical Engineering and Informatics (BMEI), 2012 5th International Conference on, IEEE, 2012, pp. 1566–1570.

[9] Bailey D.H., Borwein J.M., de Prado M.L., Zhu Q.J., Pseudomathematics and financial charlatanism: The effects of backtest over fitting on out-of-sample performance. Notices of the AMS, 2014, 61 (5), pp. 458–471.

[10] Bradley P.S., Fayyad U.M., Mangasarian O.L., Mathematical programming for data mining: Formulations and challenges. INFORMS Journal on Computing, 1999, 11 (3), pp. 217–238.

[11] Jackowski K., Wozniak M., Algorithm of designing compound recognition system on the basis of combining classifiers with simultaneous splitting feature space into competence areas. Pattern Analysis and Applications, 2009, 12 (4), pp. 415–425.

[12] Tetko I.V., Livingstone D.J., Luik A.I., Neural network studies. 1. comparison of overfitting and overtraining. Journal of Chemical Information and Computer Sciences, 1995, 35 (5), pp. 826–833.

[13] Raudys S., Experts’ boasting in trainable fusion rules. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25 (9), pp. 1178–1182.