Back to Title Page

 

 

Statistical Analysis of Microarray Data Using the Software R

 

Introduction

 

Rweb is free statistical software online that can be used in data analysis. A data set can be entered into the program and then commands are typed in using R code to analyze and manipulate the data. Each command must be typed on a separate line. Once the commands are submitted a separate screen will appear with data output. To continue with the next commands, simply hit the back button and all of your previous commands will be saved in the code box. Rweb can be accessed at http://bayes.math.montana.edu/Rweb/Rweb.general.html

 

Procedure

 

R Code: 

 

WTI = X[ ,1]

WTB = X[ ,2]

MUTI = X[ ,3]

MUTB = X[ ,4]            (save each column as an individual vector- set of data)

 

 

WT.intensity = WTI - WTB

mut.intensity = MUTI – MUTB                         (subtract all the background intensities from all the total intensities for both red and green)

               
 

We created a single column of corrected intensities of both the green and red dyes (wild type and mutant).

 

 

 

R code:

 

WT.positive = WT.intensity[WT.intensity>0]

mut.positive = mut.intensity[mut.intensity>0]

WT.transformed = log2(WT.positive)

mut.transformed = log2(mut.positive)

           

            By listing the name of the vectors on a separate line you should get a

display of all data points in that vector (not shown here).           

 

 

R code:

 

m<- mut.transformed - WT.transformed

a<- (mut.transformed + WT.transformed)/2

plot(a,m, xlab = "A", ylab = "M", col=gray(.30), main = "MA plot")

 

In the MA plot, if the data are symmetric around a horizontal line through zero then no dye bias exits. However, if the MA plot shows that the data are not symmetric around a horizontal line through zero and clearly shows a nonlinear trend then a Lowess Normalization is needed.

 

A    B  

 

C     D

Figure 1. MA plots of the data for Groups 1 (A), 2 (B), 4 (C) and 9 (D). If data is not symmetric about a horizontal line at zero, represented in these graphs, then a

Lowess Normalization is needed to correct for dye bias.

 

R code:

 

lmodel <- loess(m~a)

norm.ratio <- m - fitted(lmodel)

scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")

 

N.mut <- mut.transformed -0.5*fitted(lmodel)

N.WT <- WT.transformed +0.5*fitted(lmodel)

 

A      B

 

C       D

Figure 2. Lowess plots of the data from Groups 1 (A), 2 (B), 4 (C) and 9 (D) after the Lowess Normalization. The lines on the graph represent the true axis

about which the data are symmetrically distributed. An improvement from the MA plots can be seen.

 

 

R code:

 

qmz<-(1/4)*(sort(N.mut.1)+sort(N.mut.2)+sort(N.mut.4)+sort(N.mut.9))

qm1<- qmz[rank(N.mut.1)]; qm2<- qmz[rank(N.mut.2)]; qm3<- qmz[rank(N.mut.4)]; qm4<- qmz[rank(N.mut.9)]

 

qwz<-(1/4)*(sort(N.WT.1)+sort(N.WT.2)+sort(N.WT.4)+sort(N.WT.9))

qw1<- qwz[rank(N.WT.1)]; qw2<- qwz[rank(N.WT.2)]; qw3<- qwz[rank(N.WT.4)]; qw4<- qwz[rank(N.WT.9)]

 

zz<- cbind(qw1,qw2,qw3,qw4,qm1,qm2,qm3,qm4)

boxplot(qw1,qw2,qw3,qw4,qm1,qm2,qm3,qm4, main="Stage 1 Quantile Normalized Box Plots",color=”Blue”)

 

           

            Figure 3. Boxplots of the green and red dye intensities for Groups 1, 2, 4 and 9 respectively after quantile normalization. The

            datasets have approximately the same distribution which is ideal for comparisons across the groups.

 

R code:

        
        ratios.1 = N.mut.1/N.WT.1
        ratios.2 = N.mut.2/N.WT.2
        ratios.4 = N.mut.4/N.WT.4
        ratios.9 = N.mut.9/N.WT.9 
 
 
	One sample, two-sided t-test
         
         X is a vector containing the ratios from all trials for a single gene
                         i.e. X = c(x1, x2, x3)
 
         In this case, we are comparing the ratios to zero so µ = 0
         
         The default confidence level is α = 0.05
 
         R code:
 
         t.test(X, alternative = “two.sided”, mu = 0)
 
         Data output will include the t test statistic and a p-value. The p-value is compared to α and if 
         p< α then the null hypothesis is rejected. In this case, the intensity ratio would be significantly different from zero. 
 
         Two-sample, two-sided t-test
 
         This is used if you wanted to compare two different datasets (i.e. Dzms1 and Dzms1/Dzms2) to see if they are differentially expressed. 
 
         The procedure is similar to that of the one sample t-test except a second variable, y, is added in.
 
          X is a vector containing the ratios from all trials for a single gene within the Dzms1 replicates
          Y is a vector containing the ratios from all the trials for a single gene within the Dzms1/Dzms2 replicates
 
          R code:
 
          t.test(X, Y, alternative = "two.sided", mu = 0)
 
    Data output will again include the t test statistic and a p-value. The p-value is compared to α as above and if p<α then the null hypothesis is rejected
          and we can conclude that the gene is differentially expressed between Dzms1 and Dzms1/Dzms2. 
         
Table 1. Results of one-sample t-tests on ten possible genes of interest in Dzms1 replicates. Exact p-values are shown and compared to a standard α = 0.05. Rejecting Ho indicates
that the average intensity ratio of that gene is significantly different from zero and therefore, failing to reject Ho indicates that the average intensity ratio of the gene is not significantly 
different from zero.       
Gene p-value p<α Conclusion
YKL197C 0.0738 No Fail to reject Ho
YAL045C 0.0157 Yes Reject Ho
YHL008C 0.0335 Yes Reject Ho
YHR187W 0.0015 Yes Reject Ho
YPL213W 0.2258 No Fail to reject Ho
YDR416W 0.0023 Yes Reject Ho
YLL006W 0.0173 Yes Reject Ho
YCR037C 0.0517 No Fail to reject Ho
YOR259C 0.0021 Yes Reject Ho
YDL207W 0.0026 Yes Reject Ho

 

 

 

 

Complete R code

 

WTI1 = X[ ,1]

WTB1 = X[ ,2]

MUTI1 = X[ ,3]

MUTB1 = X[ ,4]

 

WT.intensity.1 = WTI1 - WTB1

mut.intensity.1 = MUTI1 - MUTB1

 

WT.positive.1 = WT.intensity.1[WT.intensity.1>0]

mut.positive.1 = mut.intensity.1[mut.intensity.1>0]

WT.transformed.1 = log2(WT.positive.1)

mut.transformed.1 = log2(mut.positive.1)

 

m<- mut.transformed.1 - WT.transformed.1

a<- (mut.transformed.1 + WT.transformed.1)/2

plot(a,m, xlab="A", ylab="M", col=gray(.30), main ="MA plot1")

 

lmodel <- loess(m~a)

norm.ratio <- m - fitted(lmodel)

scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")

 

N.mut.1 <- mut.transformed.1 -0.5*fitted(lmodel)

N.mut.1

N.WT.1 <- WT.transformed.1 +0.5*fitted(lmodel)

N.WT.1

 

WTI2 = X[ ,5]

WTB2 = X[ ,6]

MUTI2 = X[ ,7]

MUTB2 = X[ ,8]

 

WT.intensity.2 = WTI2 - WTB2

mut.intensity.2 = MUTI2 - MUTB2

 

WT.positive.2 = WT.intensity.2[WT.intensity.2>0]

mut.positive.2 = mut.intensity.2[mut.intensity.2>0]

WT.transformed.2 = log2(WT.positive.2)

mut.transformed.2 = log2(mut.positive.2)

 

m<- mut.transformed.2 - WT.transformed.2

a<- (mut.transformed.2 + WT.transformed.2)/2

plot(a,m, xlab="A", ylab="M", col=gray(.30), main ="MA plot2")

 

lmodel <- loess(m~a)

norm.ratio <- m - fitted(lmodel)

scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")

 

N.mut.2 <- mut.transformed.2 -0.5*fitted(lmodel)

N.mut.2

N.WT.2 <- WT.transformed.2 +0.5*fitted(lmodel)

N.WT.2

 

WTI4 = X[ ,9]

WTB4 = X[ ,10]

MUTI4 = X[ ,11]

MUTB4 = X[ ,12]

 

WT.intensity.4 = WTI4 - WTB4

mut.intensity.4 = MUTI4 - MUTB4

 

WT.positive.4 = WT.intensity.4[WT.intensity.4>0]

mut.positive.4 = mut.intensity.4[mut.intensity.4>0]

WT.transformed.4 = log2(WT.positive.4)

mut.transformed.4 = log2(mut.positive.4)

 

m<- mut.transformed.4 - WT.transformed.4

a<- (mut.transformed.4 + WT.transformed.4)/2

plot(a,m, xlab="A", ylab="M", col=gray(.30), main ="MA plot4")

 

lmodel <- loess(m~a)

norm.ratio <- m - fitted(lmodel)

scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")

 

N.mut.4 <- mut.transformed.4 -0.5*fitted(lmodel)

N.mut.4

N.WT.4 <- WT.transformed.4 +0.5*fitted(lmodel)

N.WT.4

 

WTI9 = X[ ,13]

WTB9 = X[ ,14]

MUTI9 = X[ ,15]

MUTB9 = X[ ,16]

 

WT.intensity.9 = WTI9 - WTB9

mut.intensity.9 = MUTI9 - MUTB9

 

WT.positive.9 = WT.intensity.9[WT.intensity.9>0]

mut.positive.9 = mut.intensity.9[mut.intensity.9>0]

WT.transformed.9 = log2(WT.positive.9)

mut.transformed.9 = log2(mut.positive.9)

 

m<- mut.transformed.9 - WT.transformed.9

a<- (mut.transformed.9 + WT.transformed.9)/2

plot(a,m, xlab="A", ylab="M", col=gray(.30), main ="MA plot9")

 

lmodel <- loess(m~a)

norm.ratio <- m - fitted(lmodel)

scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")

 

N.mut.9 <- mut.transformed.9 -0.5*fitted(lmodel)

N.mut.9

N.WT.9 <- WT.transformed.9 +0.5*fitted(lmodel)

N.WT.9

 

qmz<-(1/4)*(sort(N.mut.1)+sort(N.mut.2)+sort(N.mut.4)+sort(N.mut.9))

qm1<- qmz[rank(N.mut.1)]; qm2<- qmz[rank(N.mut.2)]; qm3<- qmz[rank(N.mut.4)]; qm4<- qmz[rank(N.mut.9)]

 

qwz<-(1/4)*(sort(N.WT.1)+sort(N.WT.2)+sort(N.WT.4)+sort(N.WT.9))

qw1<- qwz[rank(N.WT.1)]; qw2<- qwz[rank(N.WT.2)]; qw3<- qwz[rank(N.WT.4)]; qw4<- qwz[rank(N.WT.9)]

 

zz<- cbind(qw1,qw2,qw3,qw4,qm1,qm2,qm3,qm4)

boxplot(qw1,qw2,qw3,qw4,qm1,qm2,qm3,qm4, main="Stage 1 Quantile Normalized Box Plots",color=”Blue”)

 

ratios.1 = N.mut.1/N.WT.1

ratios.2 = N.mut.2/N.WT.2

ratios.4 = N.mut.4/N.WT.4

ratios.9 = N.mut.9/N.WT.9

 

x.1 = c(ratios.1[61], ratios.4[61], ratios.9[61])

t.test(x.1, alternative = "two.sided", mu = 0)

 

x.2 = c(ratios.1[83], ratios.4[83], ratios.9[83])

t.test(x.2, alternative = "two.sided", mu = 0)

 

x.3 = c(ratios.1[87], ratios.4[87], ratios.9[87])

t.test(x.3, alternative = "two.sided", mu = 0)

 

x.4 = c(ratios.1[125], ratios.4[125], ratios.9[125])

t.test(x.4, alternative = "two.sided", mu = 0)

 

x.5 = c(ratios.1[260], ratios.4[260], ratios.9[260])

t.test(x.5, alternative = "two.sided", mu = 0)

 

x.6 = c(ratios.1[262], ratios.4[262], ratios.9[262])

t.test(x.6, alternative = "two.sided", mu = 0)

 

x.7 = c(ratios.1[263], ratios.4[263], ratios.9[263])

t.test(x.7, alternative = "two.sided", mu = 0)

 

x.8 = c(ratios.1[295], ratios.4[295], ratios.9[295])

t.test(x.8, alternative = "two.sided", mu = 0)

 

x.9 = c(ratios.1[338], ratios.4[338], ratios.9[338])

t.test(x.9, alternative = "two.sided", mu = 0)

 

x.10 = c(ratios.1[347], ratios.4[347], ratios.9[347])

t.test(x.10, alternative = "two.sided", mu = 0)