Acp avec xlstat
Therefore, It can be conclude that the cluster 1 is characterized by a low rate of Assault compared to all clusters. For example, the mean value of the Assault variable in cluster 1 is 78.53 which is less than it’s overall mean (170.76) across all clusters. The variables UrbanPop, Murder, Rape and Assault are most significantly associated with the cluster 1. Here, we show only some columns of interest: “Mean in category”, “Overall Mean”, “p.value” # $`1`įrom the output above, it can be seen that: To display quantitative variables that describe the most each cluster, type this: res.hcpc$desc.var$quanti In the table above, the last column contains the cluster assignments. To display the original data with cluster assignments, type this: head(res.hcpc$data.clust, 10) # Murder Assault UrbanPop Rape clust desc.axes: The axes describing clusters.desc.ind: The more typical individuals of each cluster.desc.var: The variables describing clusters.data.clust: The original data with a supplementary column called class containing the partition.The function HCPC() returns a list containing: You can also draw a three dimensional plot combining the hierarchical clustering and the factorial map using the R base function plot(): # Principal components + tree Palette = "jco", # Color palette see ?ggpubr::ggpar The function fviz_cluster() can be used to visualize individuals clusters. It’s possible to visualize individuals on the principal component map and to color individuals according to the cluster they belong to. The dendrogram suggests 4 clusters solution. The final partitioning solution, obtained after consolidation with k-means, can be (slightly) different from the one obtained with the hierarchical clustering. Perform K-means clustering to improve the initial partition obtained from hierarchical clustering. Ward criterion is used in the hierarchical clustering because it is based on the multidimensional variance like principal component analysis.Ĭhoose the number of clusters based on the hierarchical tree: An initial partitioning is performed by cutting the hierarchical tree.
The default value is 5.Ĭompute hierarchical clustering: Hierarchical clustering is performed using the Ward’s criterion on the selected principal components. At this step, you can choose the number of dimensions to be retained in the output by specifying the argument ncp. The algorithm of the HCPC method, as implemented in the FactoMineR package, can be summarized as follow:Ĭompute principal component methods: PCA, (M)CA or MFA depending on the types of variables in the data set and the structure of the data set.