by Mikael Kubista » Sun Jan 24, 2010 1:08 am
You are asking two very important questions that by no means trivial re trivial. Missing data can be obtained due to two very different reasons, which must be handled differently.
1) Read is missing due to technical failure. In this case, the sample did contain target molecules but we failed to measure them. The correct approach is to replace this reading with the mean value of technical replicates. In GenEx this can be done using the “Missing Data” function, but you can also leave the cell blank. GenEx will then automatically handle is as the mean of its technical replicates.
2) Read is missing due to too low a number of target molecules. In this case, the sample contained fewer molecules than we can detect. It is not necessarily blank, but the number of targets is below the limit of detection of our assay. Although, it is hardly a catastrophe to replace it by the mean reading of positive technical replicates, such handling introduces a bias. Reason is that if a sample contains very few molecules a fraction of the technical replicates are expected to be negative. In fact, this is the approach to determine the level of detection of qPCR assays (see GenEx manual for LOD). For example, let say a sample contains in average 1 molecule per aliquot. If three aliquots are measured as technical replicates, they may contain one molecule each. But, due to random sampling (so called Poisson distribution) there may be two targets in one aliquot, one target in a second aliquot and the third aliquot may be blank. The last one will not give a qPCR reading. If this “missing data” is replaced by the average of the two positive samples, artificially the estimated target concentration in the sample is increased! To avoid introducing the bias GenEx offers alternative means to handle missing data that are appropriate for the case when data are missing due to too low target concentration. The approach is to replace the missing data by the Cq measured at LOD (= level of detection) +1. If you have not determined the LOD of your assay you can usually use the highest Cq measured for a truly positive sample +1. In GenEx “Missing Data” this option is called “Fill with column’s maximum +1”.
If you have large data sets and want to automate the process, go for option 1 (average of technical replicates). It introduces only a small (usually negligible) error if there are readings below LOD. Using option to for failed reactions could introduce large error.
Any triplicates or higher replicates can be tested for outliers by the Grubb’s test. It’s straight forward. If you have a nested design, you perform an outlier test before each averaging of technical replicates (test for qPCR outliers; average qPCR replicates; test for RT outliers; average RT replicates etc). There is a risk of doing too extensively outlier detection with standard Grubb’s test due to multiple testing complications. If performing the test with 95% confidence the probability is 5% that a normal sample with accidently have a deviant reading and be counted as outlier. If an outlier test is performed once this is usually an acceptable error rate. However, when performing outlier test on every sample for each gene and furthermore on several levels, the number of tests may be very large and several normal readings may be eliminated. A normal test is therefore not recommended to perform too extensively. GenEx offers a modified Grubb’s test, which requires that in addition to fulfilling the Grubb’s criteria (which is being off the mean by a standard deviation (SD) that depends on the number of replicates) the SD should be larger than a predefined value. Default in GenEx is 0.25 cycles. This additional criterion removes most false outliers allowing for multiple testing.
Whether outlier test shall be performed on biological replicates or not depends on the situation and context. Biological systems often show wide variation and removing an extreme reading may lose the most exciting sample. Also, Grubb’s test assumes normal distribution. Technical replicates usually show normal distribution (in Cq scale) but biological replicates may not. In GenEx non-parametric tests are available to compare data that do not show normal distribution. In most case it is therefore advisable to keep all the biological replicates. A situations when a biological outlier can be considered for removal is when rather large groups are compared and data show normal distribution. The outlier is then most likely real, and caused by a rare situation in factor that is not addressed.
Good luck!