Clustering high dimensional categorical data via topographical features

Chen, Chao and Quadrianto, Novi (2016) Clustering high dimensional categorical data via topographical features. International Conference on Machine Learning, New York, New York, USA, 20-22 June 2016. Published in: Balcan, Maria Florina and Weinberger, Kilian Q, (eds.) Proceedings of the 33rd International Conference on Machine Learning; New York; 19 - 24 June 2016. 48 2732-2740. JMLR ISSN 1938-7288

[img] PDF - Accepted Version
Available under License Creative Commons Attribution.

Download (921kB)

Abstract

Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.

Item Type: Conference Proceedings
Schools and Departments: School of Engineering and Informatics > Informatics
Subjects: Q Science > QA Mathematics > QA0276 Mathematical statistics
Related URLs:
Depositing User: Novi Quadrianto
Date Deposited: 03 Jun 2016 12:15
Last Modified: 26 Nov 2021 15:28
URI: http://sro.sussex.ac.uk/id/eprint/61285

View download statistics for this item

📧 Request an update