Portfolio Inputs Selection from Imprecise Training Data

Sarunas RAUDYS,

Aistis RAUDYS,



This paper explores very acute problem of portfolio secondary overfitting.
We examined the financial portfolio inputs random selection optimization
model and derived the equation to calculate the mean Sharpe ratio in dependence
of the number of portfolio inputs, the sample size L used to estimate
Sharpe ratios of each particular subset of inputs and the number of times the
portfolio inputs were generated randomly. It was demonstrated that with the
increase in portfolio complexity, and complexity of optimization procedure we
can observe the over-fitting phenomena. Theoretically based conclusions were
confirmed by experiments with artificial and real world 60,000-dimensional 12
years financial data.
Słowa kluczowe: Complexity, financial portfolio, overfitting, sample size, variable selection

[1] Markowitz H.M., Foundations of portfolio theory. The journal of finance, 1991,
46 (2), pp. 469–477.
[2] Reilly F.K., Brown K.C., Investment analysis and portfolio management. Cengage
Learning, 2011.
[3] Raudys S., Portfolio of automated trading systems: Complexity and learning set
size issues. IEEE transactions on neural networks and learning systems, 2013,
24 (3), pp. 448–459.
[4] DeMiguel V., Garlappi L., Uppal R., Optimal versus naive diversification: How
inefficient is the 1/n portfolio strategy? Review of Financial Studies, 2009, 22
(5), pp. 1915–1953.
[5] Haley M.R., Shortfall minimization and the naive (1/n) portfolio: an out-ofsample
comparison. Applied Economics Letters, 2015, pp. 1–4.
[6] Guyon I., Elisseeff A., An introduction to variable and feature selection. Journal
of machine learning research, 2003, 3 (Mar), pp. 1157–1182.
[7] John G.H., Kohavi R., Pfleger K. et. al, Irrelevant features and the subset selection
problem. The journal of finance, 1994, pp. 121–129.
[8] Raudys A., Pabarˇskait˙e ˇZ., Discrete portfolio optimisation for large scale systematic
trading applications. In: Biomedical Engineering and Informatics (BMEI),
2012 5th International Conference on, IEEE, 2012, pp. 1566–1570.
[9] Bailey D.H., Borwein J.M., de Prado M.L., Zhu Q.J., Pseudomathematics and
financial charlatanism: The effects of backtest over fitting on out-of-sample performance.
Notices of the AMS, 2014, 61 (5), pp. 458–471.
[10] Bradley P.S., Fayyad U.M., Mangasarian O.L., Mathematical programming for
data mining: Formulations and challenges. INFORMS Journal on Computing,
1999, 11 (3), pp. 217–238.
[11] Jackowski K., Wozniak M., Algorithm of designing compound recognition system
on the basis of combining classifiers with simultaneous splitting feature space into
competence areas. Pattern Analysis and Applications, 2009, 12 (4), pp. 415–425.
[12] Tetko I.V., Livingstone D.J., Luik A.I., Neural network studies. 1. comparison
of overfitting and overtraining. Journal of Chemical Information and Computer
Sciences, 1995, 35 (5), pp. 826–833.
[13] Raudys S., Experts’ boasting in trainable fusion rules. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 2003, 25 (9), pp. 1178–1182.

Czasopismo ukazuje się w sposób ciągły on-line.
Pierwotną formą czasopisma jest wersja elektroniczna.

Wersja papierowa czasopisma dostępna na www.wuj.pl