Model-based co-clustering for mixed type data - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2018

Model-based co-clustering for mixed type data

Résumé

Over decades, a lot of studies have shown the importance of clustering to emphasize groups of observations. More recently, due to the emergence of high-dimensional datasets with a huge number of features, co-clustering techniques have emerged and proposed several methods for simultaneously producing groups of observations and features. By synthesizing the dataset in blocks (the crossing of a row-cluster and a column-cluster), this technique can sometimes summarize better the data and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing a co-clustering. However, recently, contexts with features of different types (here called mixed type datasets) are becoming more common. Unfortunately, the LBM is not directly applicable on this kind of dataset. The present work extends the usual LBM to the so-called Multiple Latent Block Model (MLBM) which is able to handle mixed type datasets. The inference is done through a Stochastic EM-algorithm embedding a Gibbs sampler and model selection criterion is defined to choose the number of row and column clusters. This method was successfully used on simulated and real datasets.
Fichier principal
Vignette du fichier
model-based-clustering.pdf (642.66 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01893457 , version 1 (11-10-2018)
hal-01893457 , version 2 (11-10-2019)

Identifiants

  • HAL Id : hal-01893457 , version 1

Citer

Margot Selosse, Julien Jacques, Christophe Biernacki. Model-based co-clustering for mixed type data. 2018. ⟨hal-01893457v1⟩
654 Consultations
633 Téléchargements

Partager

Gmail Facebook X LinkedIn More