Objectives
Clustering real-world data sets is often hampered by the so-called curse of dimensionality: many real-world data sets consist of a very high dimensional feature space. In general, most of the common algorithms fail to generate meaningful results because of the inherent sparsity of the data space. Usually, clusters cannot be found in the original feature space because several features may be irrelevant for clustering due to correlation and/or redundancy. However, clusters are usually embedded in lower dimensional subspaces. In addition, different sets of features may be relevant for different sets of objects. Thus, objects can often be clustered differently in varying subspaces of the original feature space.
Techniques
Arbitrarily-oriented Correlation Clustering
Correlation clustering aims at partitioning the data objects into distinct sets of points that exhibit a dense, arbitrarily linear correlation, so called correlation sets. Points that do not belong to any correlation set are classified as noise.
4C (Computing Correlation Connected Clusters) 4C computes so-called correlation connected clusters, i.e.sets of neighboring points that exhibit an arbitrary linear correlation in a subset of the attributes. For that purpose, 4C examines the correlation in the neighborhood of each point and groups points with similar local correlation into clusters. | |
COPAC (COrrelation PArtition Clustering) Description | |
HICO (HIrachical Correlation Clustering) Description | |
ERiC (Exploring Relationships among Correlation clusters) Description | |
Increasing the Robustness of PCA-based Algorithms Description | |
CASH (Clustering in Arbitrary Subspaces based on the Hough transform) Description |
Axis-parallel Projected Clustering
Projected clustering aims at partitioning the data into a set of clusters and a noise set while allowing the clusters to exist in different subspaces. Thus, for each cluster, a different set of features is relevant, whith the remaining features being irrelevant. The points of each cluster are densely packed in the subspace spanned by the corresponding relevant features but are sparsely distributed in the subspace spanned by the other dimensions.
PreDeCon (subspace PREference weighted DEnsity CONnnected clustering) PreDeCon extends the 'full-dimensional' clustering algorithm DBSCAN with the concept of subspace preference weights. For each point, the local neighborhood is examined to determine a subspace tendency, i.e. an axis parallel projection onto which the point is located in a dense area. If the point is noise, this subspace is the entire feature space. Neighboring points having a similar subspace preference weight are grouped into clusters using the density-connected clustering notion. | |
HiSC (HIerarchical Subspace Clustering) Description | |
DiSH (DerIving Subspace Hierarchies) Description |
Axis-parallel Subspace Clustering
Subspace clustering aims at computing all clusters in all subspaces of the feature space. The information of objects clustered differently in varying subspaces is conserved. Objects may be assigned to several clusters (in different subspaces). Subspace ranking aims at identifying all subspaces of a (high dimensional) feature space that contain interesting clustering structures. The subspaces should be ranked according to this interestingness.
SUBCLU (density-based SUBspace CLUstering) SUBCLU is a subspace clustering algorithm that relies on the density-based clustering notion of DBSCAN. It efficiently computes all clusters DBSCAN would have found if applied to the set of all possible subspaces of the feature space. SUBCLU uses an Apriori-like subspace generation procedure which is based on the monotonicity property of density-connected sets and which ensures that only those subspaces are examined that contain clusters. | |
RIS (Ranking Interesting Subspaces) RIS is a subspace ranking algorithm that uses a quality criterion to rate the interestingness of subspaces. This criterion is based on the monotonicity of core points which are the central concept of the density-based clustering notion of DBSCAN. An Apriori-like subspace generation method (similar to SUBCLU) is used to compute all relevant subspaces and rank them by interestingness. The clusters can be computed in the generated subspaces using any clustering method of choice. | |
SURFING (SUbspaces Relevant For clusterING) SURFING is a subspace ranking algorithm that does not rely on a global density threshold. It computes the interstingness of a subspace based on the distribution of the k-nearest neighbors of all data points in the corresponding projection. An efficient, bottom-up subspace expansion heuristics ensures that less interesting subspaces are not generated for examination. | |
FIRES (FIlter/REfinement Subspace clustering) Description |
Publications
2015 | |
36 | A. Zimek, J. Vreeken The Blind Men and the Elephant: On Meeting the Problem of Multiple Truths in Data from Clustering and Pattern Mining Perspectives Machine Learning, 98(1–2): 121–155, 2015. |
2014 | |
35 | A. Zimek, I. Assent, J. Vreeken Frequent Pattern Mining Algorithms for Data Clustering In C. C. Aggarwal, J. Han (ed.): Frequent Pattern Mining, Springer: 403–423, 2014. |
2013 | |
34 | K. Sim, V. Gopalkrishnan, A. Zimek, G. Cong A survey on enhanced subspace clustering Data Mining and Knowledge Discovery, 26(2): 332–397, 2013. |
33 | A. Zimek Clustering High-Dimensional Data In C. C. Aggarwal, C. K. Reddy (ed.): Data Clustering: Algorithms and Applications, CRC Press: 201–230, 2013. |
32 | E. Achtert, H.-P. Kriegel, E. Schubert, A. Zimek Interactive Data Mining with 3D-Parallel-Coordinate-Trees In Proceedings of the ACM International Conference on Management of Data (SIGMOD), New York City, NY: 1009–1012, 2013. |
2012 | |
31 | H.-P. Kriegel, P. Kröger, A. Zimek Subspace Clustering Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(4): 351–364, 2012. |
30 | E. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, H.-P. Kriegel Density-based Projected Clustering over High Dimensional Data Streams In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA: 987–998, 2012. |
2011 | |
29 | H.-P. Kriegel, P. Kröger, E. Ntoutsi, A. Zimek Density Based Subspace Clustering over Dynamic Data In Proceedings of the 23rd International Conference on Scientific and Statistical Database Management (SSDBM), Portland, OR: 387–404, 2011. |
28 | H.-P. Kriegel, E. Ntoutsi, M. Spiliopoulou, G. Tsoumakas, A. Zimek Mining Complex Dynamic Data Tutorial at the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Athens, Greece, 2011. |
2010 | |
27 | I. Färber, S. Günnemann, H.-P. Kriegel, P. Kröger, E. Müller, E. Schubert, T. Seidl, A. Zimek On Using Class-Labels in Evaluation of Clusterings In MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC, 2010. |
26 | H.-P. Kriegel, A. Zimek Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other? In MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC, 2010. |
2009 | |
25 | H.-P. Kriegel, P. Kröger, A. Zimek Clustering High Dimensional Data: A Survey on Subspace Clustering, Pattern-based Clustering, and Correlation Clustering ACM Transactions on Knowledge Discovery from Data (TKDD), 3(1): 1–58, 2009. |
24 | A. Zimek Correlation Clustering ACM SIGKDD Explorations, 11(1): 53–54, 2009. |
23 | G. Moise, A. Zimek, P. Kröger, H.-P. Kriegel, J. Sander Subspace and Projected Clustering: Experimental Evaluation and Analysis Knowledge and Information Systems (KAIS), 21(3): 299–326, 2009. |
22 | P. Kröger, A. Zimek Subspace Clustering Techniques In L. Liu, M. T. Özsu (ed.): Encyclopedia of Database Systems, Springer: 2873–2875, 2009. |
2008 | |
21 | H.-P. Kriegel, P. Kröger, A. Zimek Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering Proceedings of the VLDB Endowment, 1(2): 1528–1529, 2008. |
20 | E. Achtert, C. Böhm, J. David, P. Kröger, A. Zimek Global Correlation Clustering Based on the Hough Transform Statistical Analysis and Data Mining, 1(3): 111–127, 2008. |
19 | H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek A General Framework for Increasing the Robustness of PCA-based Correlation Clustering Algorithms In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM), Hong Kong, China: 418–435, 2008. |
18 | E. Achtert, H.-P. Kriegel, A. Zimek ELKI: A Software System for Evaluation of Subspace Clustering Algorithms In Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM), Hong Kong, China: 580–585, 2008. |
17 | E. Achtert, C. Böhm, J. David, P. Kröger, A. Zimek Robust Clustering in Arbitrarily Oriented Subspaces In Proceedings of the 8th SIAM International Conference on Data Mining (SDM), Atlanta, GA: 763–774, 2008. |
16 | H.-P. Kriegel, P. Kröger, A. Zimek Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering Tutorial at the 14th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Las Vegas, NV, 2008. |
15 | H.-P. Kriegel, P. Kröger, A. Zimek Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering Tutorial at the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Osaka, Japan, 2008. |
14 | H.-P. Kriegel, P. Kröger, A. Zimek Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering Tutorial at the 34nd International Conference on Very Large Data Bases (VLDB), Auckland, New Zealand, 2008. |
13 | A. Zimek Correlation Clustering PhD Thesis, Ludwig-Maximilians-Universität München, Munich, Germany, 2008. |
2007 | |
12 | E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek Detection and Visualization of Subspace Cluster Hierarchies In Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA), Bangkok, Thailand: 152–163, 2007. |
11 | E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. Zimek On Exploring Complex Relationships of Correlation Clusters In Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM), Banff, Canada: 7–16, 2007. |
10 | E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. Zimek Robust, Complete, and Efficient Correlation Clustering In Proceedings of the 7th SIAM International Conference on Data Mining (SDM), Minneapolis, MN: 413–418, 2007. |
9 | H.-P. Kriegel, P. Kröger, A. Zimek Detecting Clusters in Moderate-to-High Dimensional Data: Subspace Clustering, Pattern-based Clustering, and Correlation Clustering Tutorial at the 7th International Conference on Data Mining (ICDM), Omaha, NE, 2007. |
2006 | |
8 | E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, I. Müller-Gorman, A. Zimek Finding Hierarchies of Subspace Clusters In Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Berlin, Germany: 446–453, 2006. |
7 | E. Achtert, C. Böhm, P. Kröger, A. Zimek Mining Hierarchies of Correlation Clusters In Proceedings of the 18th International Conference on Scientific and Statistical Database Management (SSDBM), Vienna, Austria: 119–128, 2006. |
2005 | |
6 | H.-P. Kriegel, P. Kröger, M. Renz, S. Wurst A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), Houston, TX: 250–257, 2005. |
2004 | |
5 | C. Böhm, K. Kailing, P. Kröger, H.-P. Kriegel Immer größere und komplexere Datenmengen: Herausforderungen für Clustering-Algorithmen Datenbank-Spektrum, 4(9): 11–17, 2004. |
4 | C. Böhm, K. Kailing, H.-P. Kriegel, P. Kröger Density Connected Clustering with Local Subspace Preferences In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Brighton, UK: 27–34, 2004. |
3 | C. Baumgartner, K. Kailing, H.-P. Kriegel, P. Kröger, C. Plant Subspace Selection for Clustering High-Dimensional Data In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Brighton, UK: 11–18, 2004. |
2 | K. Kailing, H.-P. Kriegel, P. Kröger Density-Connected Subspace Clustering for High-Dimensional Data In Proceedings of the 4th SIAM International Conference on Data Mining (SDM), Lake Buena Vista, FL: 246–257, 2004. |
1 | C. Böhm, K. Kailing, P. Kröger, A. Zimek Computing Clusters of Correlation Connected Objects In Proceedings of the ACM International Conference on Management of Data (SIGMOD), Paris, France: 455–466, 2004. |
Team
Scientific Head: | Prof. Dr. Hans-Peter Kriegel |
Project Leaders: | |
Current Members: |