Workshop Abstract

Despite the advancement in data clustering techniques, the literature still lacks a unified framework capable of handling the ill-posed nature of clustering, the high-dimensionality issue, the often multi-faceted nature of the data, and the opportunity of exploiting application-driven and user-provided knowledge. Such framework would enable clustering solutions for a variety of tasks and applications, including data integration, topic detection and tracking, evolving data management, collaborative filtering, document classification and retrieval, Web data and social network analysis, etc.

We solicit original papers (including work in progress) that contribute to narrow the aforementioned research gap in data clustering. In particular, we solicit approaches for solving emerging problems such as clustering ensembles, semi-supervised clustering, subspace/projective clustering, co-clustering, and multi-view clustering. Of particular interest will be papers that draw new and insightful connections between these techniques; and papers that contribute to the achievement of a unified framework that combines two or more of these techniques.

Important Dates

Submission deadline: January 13, 2012
Acceptance notification: February 10, 2012
Camera-ready deadline: February 24, 2012

Workshop Theme and Topics

Clustering is the key step for many tasks in data/knowledge management and mining, whose aim is to discover unknown relationships and/or patterns from large sets of data. A considerable amount of work has been done for data clustering during the last four decades, and a myriad of methods has been proposed focusing on different data types, proximity functions, cluster representation models, and cluster presentation. Clustering is a challenging problem due to its ill-posed nature. It is well known that off-the-shelf clustering methods may discover different patterns in a given set of data. This is because each clustering algorithm has its own bias resulting from the optimization of different criteria.

The workshop aims to solicit and discuss the latest advances in data clustering research for solving emerging and challenging issues concerning three major themes: (i) multi-view data, (ii) high-dimensionality, and (iii) external knowledge. These themes and their interplay are visualized in Figure 1.

Figure 1 – Concept map of major research themes in advanced data clustering

Topics of interest:

Clustering Ensembles
Co-clustering Ensembles
Subspace/Projective Clustering
Semi-supervised Clustering
Multiview/Alternative Clustering
Combining Clustering Ensembles/Multiview Clustering and Subspace Clustering/Co-clustering
Combining Clustering Ensembles/Multiview Clustering and Semi-supervised Clustering
Combining Subspace Clustering/Co-clustering and Semi-supervised Clustering
Bayesian Learning for Clustering
Model Selection Issues: How Many Clusters?
Multiview and Clustering Ensembles: How Many Clusterings?
Co-clustering with External Knowledge for Relational Learning
Probabilistic Clustering with Constraints
Kernels for Semi-supervised Clustering
Active Learning of Constraints in Clustering Ensembles
Clustering Ensembles for Uncertain Data Management and Mining
Constraint-based Clustering for Uncertain Data Management and Mining
Integration of Frequent Pattern Mining in (Semi-supervised) Multi-view Clustering
Evaluation Criteria for Multi-view Data Clustering
Incorporating User Feedback in Semi-supervised Clustering

Workshop Goals and Expected Outcome

Most of the existing approaches to data clustering provide single clustering solutions and/or use the same space (typically very large) of attributes to represent all clusters. However, in several real-life domains, data can be explained according to different views. The high-dimensionality of the data poses an additional difficult challenge to the clustering process. Almost all problems of practical interest are highly dimensional. Data with thousands of dimensions abound in fields and applications. A common scenario with high-dimensional data is that several clusters may exist in different subspaces comprised of different combinations of features. In many real-world problems, points in a given region of the input space may cluster along a given set of dimensions, while points located in another region may form a tight group with respect to different dimensions. Each dimension could be relevant to at least one of the clusters. Multiple clustering solutions can also be hidden in projections of the data.

The development of advanced techniques for clustering data has become important in both academia and industry. As a result, this workshop will survey the emerging fields in data clustering with issues related to the high-dimensionality of the data, the multi-view nature of the data, and the use of application-driven and user-specified knowledge that can support the clustering process.

Therefore, the proposed workshop intends to provide a venue for active researchers to share their expertise in advanced data clustering fields, address open questions, identify emerging trends and challenges in those fields, and explore unified approaches to clustering problems.

Through the careful selection and review of submitted workshop papers, we aim to provide a suitable stage for discussion that will both generate follow-up interest and push forward the state of the art in data clustering.

The proposed workshop is mainly geared towards researchers in Data Mining, Machine Learning, Pattern Recognition, and Artificial Intelligence areas that are particularly (but not exclusively) concerned with issues in data clustering.

Workshop Program Format (tentative)

The proposed workshop would be organized in three sessions, centered around the main research themes of the proposal, namely “Clustering Ensembles and Multi-View Clustering”, “Subspace/Projective and Co-Clustering”, “Semisupervised Clustering”. One or more sessions would include those papers that are at the intersection of two or more research themes. We anticipate one or more Invited Talk to introduce the audience to the main focus of each session.

We also plan to organize a discussion panel, which would prompt the workshop participants to highlight current challenges and explore future directions of research.

Workshop Chairs and PC

Carlotta Domeniconi, George Mason University, USA
Francesco Gullo, University of Calabria, Italy
Andrea Tagarelli, University of Calabria, Italy

Program Committee

To be announced soon

Submission Instructions and Policy

Papers submitted to this workshop should have a maximum length of 12 pages and formatted according to the Springer-Verlag Lecture Notes in Artificial Intelligence guidelines. Authors instructions and style files can be downloaded at http://www.springer.de/comp/lncs/authors.html. All papers (in PDF format) should be submitted via the Microsoft CMT system (the submission site url will be announced soon).

As required by the PAKDD 2012 Workshop Co-Chairs, by submitting a paper to the workshop, the authors promise that, if the paper is accepted, at least one author will attend the workshop to present the paper. For no-show authors, their affiliations will receive a notification. Each workshop will have a right to include its outstanding papers in a LNCS/LNAI post Proceedings of PAKDD Workshops published by Springer. Under the program, the workshop chairs will organize a review committee to select the outstanding papers from the papers presented in the workshop. Based on the reviews, each selected paper should be further improved for the camera ready version. A detailed schedule of due dates for the paper selection and the collection of the camera ready versions will be announced immediately after the workshop.

Every submitted paper will be subject to peer-review by at least 3 reviewers selected from the PC. While we do not plan a double-blind reviewing process, we will use blind bidding so to prevent reviewers’ conflicts of interest when bidding.

3Clust Workshop at PAKDD 2012

In conjunction with the 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2012). May 29 – June 1, 2012. Kuala Lumpur, Malaysia

Abstract Important Dates Theme and Topics

Goals Format Chairs and PC Submission