On a fuzzy c-means algorithm for mixed incomplete data using partial distance and imputation

Furukawa T., Ohnishi S., YAMANOİ T.

International MultiConference of Engineers and Computer Scientists, IMECS 2014, Kowloon, Hong Kong, 12 - 14 March 2014, vol.2209, pp.319-323, (Full Text)

Nəşrin Növü: Conference Paper / Full Text
Cild: 2209
Çap olunduğu şəhər: Kowloon
Ölkə: Hong Kong
Səhifə sayı: pp.319-323
Açar sözlər: Clustering, Fuzzy c-means, Incomplete data, Mixed data, Partial distance
Açıq Arxiv Kolleksiyası: Konfrans Materialı
Adres: Bəli

Qısa məlumat

The focus of fuzzy c-means clustering method is normally used on numerical data. However, most data existing in databases are both categorical and numerical. To date, clustering methods have been developed to analyze only complete data. Although we sometimes encounter data sets that contain one or more missing feature values (incomplete data), traditional clustering methods cannot be used for such data. Thus, we study this theme and discuss clustering methods that can handle mixed numerical and categorical incomplete data. In this paper, we propose an algorithm that uses the missing categorical data imputation method and distances between numerical data that contain missing values. Finally, we show through numerical experiments that our proposed method is applicable to real data.