Contents

1 Introduction

In this vignette we demonstrate the functionality of the miRcomp package. This package uses data from a dilution / mixture experiment to assess methods that estimate microRNA expression from qPCR amplification curves. Specifically, this package provides assessments of accuracy, precision, data quality, titration response, limit of detection, and complete features. Each of these is described in the following sections. To avoid any confusion due to naming conventions (expression estimates from amplification curves have been called Ct values, Crt values, and Cq values to name a few), we refer to the reported values as expression estimates or simply expression.

1.1 Background

Life Technologies has designed a qPCR miRNA array panel to work within their TaqMan® OpenArray® system. The current version of the array has coverage for 754 human miRNAs and additional non-human miRNAs for normalization controls/spike-ins across two primer pools. For raw data analysis, LifeTechnologies uses a closed ExpressionSuite software package to analyze qPCR data. There are a number of previously developed open-source packages to analyze raw qPCR fluorescence data; however, unlike microarray and RNAseq data, raw qPCR data are typically analyzed using the manufacturer’s software. In fact, several open-source packages take expression values (e.g. Ct, Crt, or Cq values) as input, ignoring how these values were estimated. To assess the potential benefits of alternative expression measures, we developed miRcomp, a benchmark data set and R/Bioconductor package to facilitate the development of new algorithms to preprocess LifeTechnologies® OpenArray® miRNA data and to provide tools to integrate the data into other software packages from the Bioconductor tool set.

1.2 Experimental Design

Two separate RNA pools were prepared by blending two tissues each: (1) kidney and placenta and (2) skeletal muscle and brain (frontal cortex). These sources of RNA were chosen based on prior analyses suggesting that the majority of microRNAs are expressed in at least one of these tissues and that several microRNAs are unique to each pool (e.g. miR-133a for skeletal muscle and the chromosome 19 miRNA cluster for placenta). We extracted RNA from formaldehyde-fixed, paraffin-embedded (FFPE) sections using the AllPrep DNA/RNA FFPE protocol (Qiagen). Mixtures of RNA were made by combining equal masses of kidney and placenta or skeletal muscle and frontal cortex RNA, respectively, and diluting to equal concentration of 3.3 ng/ul. 10 ng of RNA was used as the input for reverse transcription using the ‘A’ and ‘B’ primer pools and following the Life Technologies Open Array protocol modification for low-concentration, FFPE RNA. Separate reverse transcription and pre-amplification reactions were performed for the Life Technologies MegaPlex Pools ‘A’ and ‘B’ primer pools. Following pre-amplification, 30 ul from the ‘A’ and ‘B’ reactions for both pools were mixed with 570 ul of 0.1x TE. Further dilutions and combinations of the pools were then prepared according to the following design:

Experimental Design

Each of the 10 unique mixture / dilution sample types was performed four times.

2 Example Assessment

2.1 Data Sets

In the following, we will use two data sets included in the miRcomp package. The first was generated using the LifeTech ExpressionSuite software package. The second was generated using the default algorithm from the qpcR R package.

We first load the package and data.

## Load libraries
library('miRcomp')
data(lifetech)
data(qpcRdefault)

Each of the example data sets includes both an expression estimate (ct) and an assessment of the quality of each estimate (qc) for each of the 754 microRNAs and 40 samples. For the lifetech data set, the measure of quality is the AmpScore (a proprietary method of assessing the quality of an amplification curve). For the qpcRdefault data set, the measure of quality is the R2 value from the sigmoidal model fit to the amplification curve data.

str(lifetech)
## List of 2
##  $ ct: num [1:754, 1:40] 28.2 19.8 11.9 15.7 23.7 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:754] "hsa-miR-409-5p_002331:A" "hsa-miR-424_000604:A" "hsa-miR-30b_000602:A" "hsa-miR-29a_002112:A" ...
##   .. ..$ : chr [1:40] "KW1:1" "KW1:2" "KW1:3" "KW1:4" ...
##  $ qc: num [1:754, 1:40] 1.29 1.33 1.42 1.45 1.17 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:754] "hsa-miR-409-5p_002331:A" "hsa-miR-424_000604:A" "hsa-miR-30b_000602:A" "hsa-miR-29a_002112:A" ...
##   .. ..$ : chr [1:40] "KW1:1" "KW1:2" "KW1:3" "KW1:4" ...
str(qpcRdefault)
## List of 2
##  $ ct: num [1:754, 1:40] 29.8 21.1 12.6 16.6 25.2 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:754] "hsa-miR-409-5p_002331:A" "hsa-miR-424_000604:A" "hsa-miR-30b_000602:A" "hsa-miR-29a_002112:A" ...
##   .. ..$ : chr [1:40] "KW1:1" "KW1:2" "KW1:3" "KW1:4" ...
##  $ qc: num [1:754, 1:40] 0.998 0.999 0.999 0.999 0.998 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:754] "hsa-miR-409-5p_002331:A" "hsa-miR-424_000604:A" "hsa-miR-30b_000602:A" "hsa-miR-29a_002112:A" ...
##   .. ..$ : chr [1:40] "KW1:1" "KW1:2" "KW1:3" "KW1:4" ...

The colnames of the example data matrices correspond to the sample types shown in the design figure above. Each name starts with KW, followed by the sample type (1-10), then a colon and the replicate number (1-4).

colnames(lifetech$ct)
##  [1] "KW1:1"  "KW1:2"  "KW1:3"  "KW1:4"  "KW2:1"  "KW2:2"  "KW2:3" 
##  [8] "KW2:4"  "KW3:1"  "KW3:2"  "KW3:3"  "KW3:4"  "KW4:1"  "KW4:2" 
## [15] "KW4:3"  "KW4:4"  "KW5:1"  "KW5:2"  "KW5:3"  "KW5:4"  "KW6:1" 
## [22] "KW6:2"  "KW6:3"  "KW6:4"  "KW7:1"  "KW7:2"  "KW7:3"  "KW7:4" 
## [29] "KW8:1"  "KW8:2"  "KW8:3"  "KW8:4"  "KW9:1"  "KW9:2"  "KW9:3" 
## [36] "KW9:4"  "KW10:1" "KW10:2" "KW10:3" "KW10:4"

2.2 Quality Assessment

We begin by examing the quality scores provided by each of the example methods. The qualityAssessment function allows one to examine the relationship between quality scores and expression estimates (plotType="scatter") or the distribution of quality scores across samples (plotType="boxplot").

qualityAssessment(lifetech, plotType="scatter", label1="LifeTech AmpScore")

qualityAssessment(lifetech, plotType="boxplot", label1="LifeTech AmpScore")

In addition to assessing a single method, one has the option to compare two methods by passing an optional second data object to the function. For the scatter plot, this results in plotting the quality metrics against each other. For the boxplots, the results are simply presented in a single figure. One also has the option to apply the complementary log-log transformation to the quality metrics prior to plotting (e.g. cloglog2=TRUE). This is often useful for R2 quality metrics.

qualityAssessment(lifetech, object2=qpcRdefault, cloglog2=TRUE, plotType="scatter", label1="LifeTech AmpScore", label2="qpcR R-squared")

Finally, one can filter quality scores corresponding to NA expression estimates from the boxplots. This can be useful to focus on cases in which the expression estimates appear valid but may be of poor quality.

qualityAssessment(lifetech, plotType="boxplot", na.rm=TRUE, label1="LifeTech AmpScore")

2.3 Complete Features

Given the difficulty in measuring many microRNAs, we examine the number of complete features (here microRNAs). Complete features are ones that are detected (non-NA expression estimate) and good quality across all 40 samples. The completeFeatures function allows one to assess a single method or compare two methods.

completeFeatures(lifetech, qcThreshold1=1.25, label1="LifeTech")
##                                             LifeTech
## Complete miRNAs (all good quality & non-NA)      165
## Partial miRNAs (some good quality & non-NA)      375
## Absent miRNAs (no good quality & non-NA)         214
completeFeatures(lifetech, qcThreshold1=1.25, object2=qpcRdefault, qcThreshold2=0.99, label1="LifeTech", label2="qpcR")
##                   qpcR:Complete qpcR:Partial qpcR:Absent
## LifeTech:Complete           162            3           0
## LifeTech:Partial             87          288           0
## LifeTech:Absent               2          109         103

One can also use this function to compare two quality thresholds for the same expression estimates.

completeFeatures(lifetech, qcThreshold1=1.25, object2=lifetech, qcThreshold2=1.4, label1="LT 1.25", label2="LT 1.4")
##                  LT 1.4:Complete LT 1.4:Partial LT 1.4:Absent
## LT 1.25:Complete              48            117             0
## LT 1.25:Partial                0            255           120
## LT 1.25:Absent                 0              0           214

2.4 Limit of Detection

We also directly examine the limit of detection for a given method. While this is related to the previous assessment, here the focus is on determining the minimum signal that can be reliably detected. This is accomplished in three ways: examining the distribution of average observed expression stratfied by the proportion of values within a set of replicates that are good quality (plotType="boxplot"), plotting the average observed expression in the two low input sample types (9 & 10) vs the expected expression (plotType="scatterplot"), or plotting the difference between the average observed expression in the two low input sample types and the expected expression vs the expected expression (plotType="MAplot"). The expected expression for both low input sample types (9 & 10) can be calculated based on the pure sample types (1 & 5) or, in the case of the 0.01/0.01 dilution (sample type 10), it can be calculated based on the expression in the 0.1/0.1 dilution (sample type 9).

par(mar=c(6,6,2,2))
boxes <- limitOfDetection(lifetech, qcThreshold=1.25, plotType="boxplot")

str(boxes)
## List of 5
##  $ 0.00: num [1:2979] 19.5 11.9 15.9 16.4 26.4 ...
##  $ 0.25: num [1:498] 25.9 15.1 16.4 26.2 16.2 ...
##  $ 0.5 : num [1:315] 24.7 30.5 19.7 26.3 31.5 ...
##  $ 0.75: num [1:461] 28.2 24 31 28.3 34.4 ...
##  $ 1.00: logi [1:3287] NA NA NA NA NA NA ...
par(mfrow=c(1,3))
lods <- limitOfDetection(lifetech, qcThreshold=1.25, plotType="scatter")

par(mfrow=c(1,3))
lods <- limitOfDetection(lifetech, qcThreshold=1.25, plotType="MAplot")

print(round(lods,digits=2))
##      0.1/0.1 vs pure 0.01/0.01 vs pure 0.01/0.01 vs 0.1/0.1
## 0.50            27.6              26.8                 26.3
## 0.75            28.9              28.4                 28.4
## 1.00            29.1              29.0                 29.2

In all three cases, the function also returns additional information. For plotType="boxplot" the function returns the values in each box. For the other two plotTypes, the function returns several potential limits of detection. Specifically, it returns a matrix with three columns corresponding to the three figures and three rows corresponding to the median difference between the observed and expected values. The values in the matrix are the expected expression values such that the median absolute difference for all larger expected expression values is approximately equal to the threshold for that row. For example, in the output shown above, if we focus on the 0.1/0.1 vs 0.01/0.01 comparison (column 3) and set a median average difference threshold of less than 1.00 (row 3), our estimate of the limit of detection is approximately 29.2.

2.5 Titration Response

We now turn to assessments of the expression estimates themselves. Perhaps the most straight-forward assessment of a dilution experiment is the titration response, which features a consistent increase in expression with increasing amounts of input RNA. The titrationResponse function allows one to assess a single method or compare two methods. The output is a table displaying the number of features that show a titration response (monotone increasing expression as the input RNA increases). Here we use samples 2-4 and 6-8 as two separate titration series. The function also produces a figure showing the titration response stratified by the difference in expression between the sample being titrated and the sample being held constant. For example, in the sample type 2-4 titration series, mixture component A is held constant and mixture component B is titrated. To assess the difference in expression between mixture components A and B, we use the expression estimates in the pure sample types: sample type 1 (pure A) and sample type 5 (pure B).

titrationResponse(lifetech, qcThreshold1=1.25)

##            A   B
## Mono      98 164
## Non-Mono 109  43
titrationResponse(lifetech, qcThreshold1=1.25, object2=qpcRdefault, qcThreshold2=0.99, label1="LifeTech", label2="qpcR")

##          LifeTech:A qpcR:A LifeTech:B qpcR:B
## Mono             96     84        163    161
## Non-Mono        108    120         41     43

Note that the number features included in the assessment differs between the two tables shown above. This is due to the fact that when comparing two methods, a given feature must be of acceptable quality according to both methods (by default). However, one can remove this constraint by setting commonFeatures=FALSE.

titrationResponse(lifetech, qcThreshold1=1.25, object2=qpcRdefault, qcThreshold2=0.99, commonFeatures=FALSE, label1="LifeTech", label2="qpcR")