On a fuzzy c-means algorithm for mixed incomplete data using partial distance and imputation


Furukawa T., Ohnishi S., YAMANOİ T.

International MultiConference of Engineers and Computer Scientists, IMECS 2014, Kowloon, Hong Kong, 12 - 14 March 2014, vol.2209, pp.319-323, (Full Text) identifier

  • Nəşrin Növü: Conference Paper / Full Text
  • Cild: 2209
  • Çap olunduğu şəhər: Kowloon
  • Ölkə: Hong Kong
  • Səhifə sayı: pp.319-323
  • Açar sözlər: Clustering, Fuzzy c-means, Incomplete data, Mixed data, Partial distance
  • Açıq Arxiv Kolleksiyası: Konfrans Materialı
  • Adres: Bəli

Qısa məlumat

The focus of fuzzy c-means clustering method is normally used on numerical data. However, most data existing in databases are both categorical and numerical. To date, clustering methods have been developed to analyze only complete data. Although we sometimes encounter data sets that contain one or more missing feature values (incomplete data), traditional clustering methods cannot be used for such data. Thus, we study this theme and discuss clustering methods that can handle mixed numerical and categorical incomplete data. In this paper, we propose an algorithm that uses the missing categorical data imputation method and distances between numerical data that contain missing values. Finally, we show through numerical experiments that our proposed method is applicable to real data.