Subspace Memory Clustering

Łukasz Struski,

Jacek Tabor,

Przemysław Spurek

Abstrakt

We present a new subspace clustering method called SuMC (Subspace Memory Clustering), which allows to efficiently divide a dataset D  RN into k 2 N pairwise disjoint clusters of possibly different dimensions. Since our approach is based on the memory compression, we do not need to explicitly specify dimensions of groups: in fact we only need to specify the mean number of scalars which is used to describe a data-point. In the case of one cluster our method reduces to a classical Karhunen-Loeve (PCA) transform. We test our method on some typical data from UCI repository and on data coming from real-life experiments.

Słowa kluczowe: subspace clustering, projection clustering, PCA
References
[1] Vidal R., Subspace clustering. Signal Processing Magazine, IEEE, 2011, 28(2), pp. 52–68.
[2] Agrawal R., Gehrke J., Gunopulos D., Raghavan P., Automatic subspace clustering of high dimensional data for data mining applications. vol. 27. ACM, 1998.
[3] Cheng C.H., Fu A.W., Zhang Y., Entropy-based subspace clustering for mining numerical data. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 1999, pp. 84–93.
[4] Goil S., Nagesh H., Choudhary A., Mafia: Efficient and scalable subspace clustering for very large data sets. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 443–452.
[5] Liu B., Xia Y., Yu P.S., Clustering through decision tree construction. In: Proceedings of the ninth international conference on Information and knowledge management. ACM, 2000, pp. 20–29.
[6] Procopiuc C.M., Jones M., Agarwal P.K., Murali T., A monte carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, ACM, 2002, pp. 418–427.
[7] Aggarwal C.C., Wolf J.L., Yu P.S., Procopiuc C., Park J.S., Fast algorithms for projected clustering. In: ACM SIGMOD Record. vol. 28, ACM, 1999, pp. 61–72.
[8] Ng R.T., Han J., Clarans: A method for clustering objects for spatial data mining. Knowledge and Data Engineering, IEEE Transactions on, 2002, 14(5), pp. 1003– 1016.
[9] Woo K.G., Lee J.H., Kim M.H., Lee Y.J., Findit: a fast and intelligent subspace clustering algorithm using dimension voting. Information and Software Technology, 2004, 46(4), pp. 255–271.
[10] Aggarwal C.C., Yu P.S., Finding generalized projected clusters in high dimensional spaces. vol. 29. ACM, 2000.
[11] B¨ohm C., Kailing K., Kr¨oger P., Zimek A., Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 2004, pp. 455–466.
[12] Ester M., Kriegel H.P., Sander J., Xu X., A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd., 1996, 96, pp. 226–231.
[13] Achtert E., B¨ohm C., Kriegel H.P., Kr¨oger P., Zimek A., et al., Robust, complete, and efficient correlation clustering. In: SDM, SIAM, 2007, pp. 413–418. [14] Spurek P., ´Smieja M., Misztal K., Subspaces clustering approach to lossy image compression. In: Computer Information Systems and Industrial Management. Springer 2014, pp. 571–579.
[15] Spurek P., Tabor J., Misztal K., Weighted approach to projective clustering. In: Computer Information Systems and Industrial Management. Springer 2013, pp. 367–378.
[16] Bingham E., Mannila H., Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2001, pp. 245–250.
[17] Jolliffe I., Principal component analysis. Wiley Online Library, 2005.
[18] Barszcz T., Bielecki A., W´ojcik M., Art-type artificial neural networks applications for classification of operational states in wind turbines. In: Artifical Intelligence and Soft Computing. Springer 2010, pp. 11–18.
[19] Barszcz T., Bielecki A., W´ojcik M., Vibration signals processing by cellular automata for wind turbines intelligent monitoring. Diagnostyka, 2013, 14.
[20] Barszcz T., Bielecki A., Bielecka M., W´ojcik M., Wuka M., Vertical axis wind turbine states classification by an art-2 neural network with a stereographic projection as a signal normalization. In: Applied Condition Monitoring.
[21] Barszcz T., Bielecka M., Bielecki A., W´ojcik M., Wind turbines states classification by a fuzzy-art neural network with a stereographic projection as a signal normalization. In: Adaptive and Natural Computing Algorithms. Springer 2011, pp. 225–234.
[22] Barszcz T., Bielecki A., W´ojcik M., Bielecka M., Art-2 artificial neural networks applications for classification of vibration signals and operational states of wind turbines for intelligent monitoring. In: Advances in Condition Monitoring of Machinery in Non-Stationary Operations. Springer 2014, pp. 679–688.
[23] Bielecka M., Barszcz T., Bielecki A., W´ojcik M., Fractal modelling of various wind characteristics for application in a cybernetic model of a wind turbine. In: Artificial Intelligence and Soft Computing. Springer, 2012, pp. 531–538.
[24] Bielecki A., Barszcz T., W´ojcik M., Bielecka M., Hybrid system of art and rbf neural networks for classification of vibration signals and operational states of wind turbines. In: Artificial Intelligence and Soft Computing. Springer, 2014, pp. 3–11.

Czasopismo ukazuje się w sposób ciągły on-line.
Pierwotną formą czasopisma jest wersja elektroniczna.

Wersja papierowa czasopisma dostępna na www.wuj.pl