inter-group distance measures of dynamic PCA

Postby physio » Tue Oct 29, 2013 4:25 pm

How can I give inter-group distance measures for the samples analysed by dynamic PCA?
Postby Mikael Kubista » Tue Oct 29, 2013 10:23 pm

It would indeed be nice to have an indicator of the distance between the groups, such that the optimum number of genes could be identified by dynamic PCA. It is, however, not trivial. The graph is in reduced space, since axes are principal components; distances in the original space may be different. Still it is possible to calculate distance between clusters in the reduced space as well as in the original space. There are in fact several alternative ways (in fact, distance between two clusters is calculated in hierarchical clustering and in the advance menu you find options for algorithms). Some indicators of distance will be available in GenEx 6. It gets more complicated if there are more than two groups (filtration based on SD).

Another problem is that the sequential elimination of genes based on univariate (one gene at a time) comparison by t-test may not produce the optimum selection of genes, because PCA is a multivariate technique that exploits genes’ correlated expressions not accounted for by t-test. Although dynamic PCA is excellent tool with integrated visualization there are other methods for variable selection. There is, however, not one best method and they require extensive calculations when the number of genes is large. Therefore, rather advanced programming is needed to find the optimum selection of genes and also to validate it (which is done using leave-one-out validation or boot strapping). These methods are currently being implemented into GenEx and advanced user will be able to contact MultiD to get pre-releases.
Mikael Kubista
