Model-based co-clustering for mixed type data - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue Computational Statistics and Data Analysis Année : 2020

Model-based co-clustering for mixed type data

Résumé

The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features. By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the ``Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets.
Fichier principal
Vignette du fichier
manuscript.pdf (616.99 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01893457 , version 1 (11-10-2018)
hal-01893457 , version 2 (11-10-2019)

Identifiants

Citer

Margot Selosse, Julien Jacques, Christophe Biernacki. Model-based co-clustering for mixed type data. Computational Statistics and Data Analysis, 2020, 144, pp.106866. ⟨10.1016/j.csda.2019.106866⟩. ⟨hal-01893457v2⟩
652 Consultations
633 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More