Missing data and Grubbs test

Here we discuss the problem of missing data

Moderator: MultiD Support

Missing data and Grubbs test

Postby enduser » Fri Jan 22, 2010 9:20 pm

Hi,

I have a few questions regarding missing data. I believe that when I have missing data within a technical replicate ( for example.. using the cut off value of Cq=35) I can using the missing data button to use the avg of replicates to fill the cell. Also, when I use this cut-off and eliminate all replicates of a biological sample ( thus the gene expression is very low for a particular animal), I can include "0" as that value to complete the pre-processing and then in the data manager in Genex remove that sample from that dose group for the downstream statistics. Is this correct?

When using the Grubbs test to identify outliers ( I am working with biological samples in replicates) when should I apply this? After doing all the pre-processing of replicates (qpcr, ref genes, rts) thus left with only the animal identifier and dose being able to identify the biological outlier ( the actual animal)? If I apply the Grubbs test before the pre-processing it seems to identify only one of the technical replicates ( thus with missing data I would fill in this cell)..thus not really identifying a biological outlier? I dont' believe I would redo the Grubbs test to to increased error?

I followed the tutorials and forum but found little info on the timing of the grubbs test and how to deal with biological variation.....
enduser
 
Posts: 3
Joined: Wed Jan 13, 2010 10:09 pm

Re: Missing data and Grubbs test

Postby Mikael Kubista » Sun Jan 24, 2010 1:08 am

You are asking two very important questions that by no means trivial re trivial. Missing data can be obtained due to two very different reasons, which must be handled differently.

1) Read is missing due to technical failure. In this case, the sample did contain target molecules but we failed to measure them. The correct approach is to replace this reading with the mean value of technical replicates. In GenEx this can be done using the “Missing Data” function, but you can also leave the cell blank. GenEx will then automatically handle is as the mean of its technical replicates.

2) Read is missing due to too low a number of target molecules. In this case, the sample contained fewer molecules than we can detect. It is not necessarily blank, but the number of targets is below the limit of detection of our assay. Although, it is hardly a catastrophe to replace it by the mean reading of positive technical replicates, such handling introduces a bias. Reason is that if a sample contains very few molecules a fraction of the technical replicates are expected to be negative. In fact, this is the approach to determine the level of detection of qPCR assays (see GenEx manual for LOD). For example, let say a sample contains in average 1 molecule per aliquot. If three aliquots are measured as technical replicates, they may contain one molecule each. But, due to random sampling (so called Poisson distribution) there may be two targets in one aliquot, one target in a second aliquot and the third aliquot may be blank. The last one will not give a qPCR reading. If this “missing data” is replaced by the average of the two positive samples, artificially the estimated target concentration in the sample is increased! To avoid introducing the bias GenEx offers alternative means to handle missing data that are appropriate for the case when data are missing due to too low target concentration. The approach is to replace the missing data by the Cq measured at LOD (= level of detection) +1. If you have not determined the LOD of your assay you can usually use the highest Cq measured for a truly positive sample +1. In GenEx “Missing Data” this option is called “Fill with column’s maximum +1”.

If you have large data sets and want to automate the process, go for option 1 (average of technical replicates). It introduces only a small (usually negligible) error if there are readings below LOD. Using option to for failed reactions could introduce large error.

Any triplicates or higher replicates can be tested for outliers by the Grubb’s test. It’s straight forward. If you have a nested design, you perform an outlier test before each averaging of technical replicates (test for qPCR outliers; average qPCR replicates; test for RT outliers; average RT replicates etc). There is a risk of doing too extensively outlier detection with standard Grubb’s test due to multiple testing complications. If performing the test with 95% confidence the probability is 5% that a normal sample with accidently have a deviant reading and be counted as outlier. If an outlier test is performed once this is usually an acceptable error rate. However, when performing outlier test on every sample for each gene and furthermore on several levels, the number of tests may be very large and several normal readings may be eliminated. A normal test is therefore not recommended to perform too extensively. GenEx offers a modified Grubb’s test, which requires that in addition to fulfilling the Grubb’s criteria (which is being off the mean by a standard deviation (SD) that depends on the number of replicates) the SD should be larger than a predefined value. Default in GenEx is 0.25 cycles. This additional criterion removes most false outliers allowing for multiple testing.

Whether outlier test shall be performed on biological replicates or not depends on the situation and context. Biological systems often show wide variation and removing an extreme reading may lose the most exciting sample. Also, Grubb’s test assumes normal distribution. Technical replicates usually show normal distribution (in Cq scale) but biological replicates may not. In GenEx non-parametric tests are available to compare data that do not show normal distribution. In most case it is therefore advisable to keep all the biological replicates. A situations when a biological outlier can be considered for removal is when rather large groups are compared and data show normal distribution. The outlier is then most likely real, and caused by a rare situation in factor that is not addressed.

Good luck!
Mikael Kubista
 
Posts: 152
Joined: Tue Jul 01, 2008 12:28 pm


Return to Missing data

Who is online

Users browsing this forum: No registered users and 1 guest

cron



MultiD Analyses

Home of the GenEx analysis software




Partners



































www.Gene-Quantification.info