Clustering high dimensional categorical data via topographical features

Chen, Chao and Quadrianto, Novi (2016) Clustering high dimensional categorical data via topographical features. Published in: Balcan, Maria Florina and Weinberger, Kilian Q, (eds.) Proceedings of the 33rd International Conference on Machine Learning; New York; 19 - 24 June 2016. 48 2732-2740. JMLR ISSN 1938-7288

[img] PDF - Accepted Version
Available under License Creative Commons Attribution.

Download (921kB)

Abstract

Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.

Item Type: Conference Proceedings
Schools and Departments: School of Engineering and Informatics > Informatics
Subjects: Q Science > QA Mathematics > QA0276 Mathematical statistics
Related URLs:
Depositing User: Novi Quadrianto
Date Deposited: 03 Jun 2016 12:15
Last Modified: 16 Jun 2017 08:18
URI: http://sro.sussex.ac.uk/id/eprint/61285

View download statistics for this item

📧 Request an update