Statistical Analysis of Microarray Data Using the Software R
Introduction
Rweb is free statistical software online that can be used in data analysis. A data set can be entered into the program and then commands are typed in using R code to analyze and manipulate the data. Each command must be typed on a separate line. Once the commands are submitted a separate screen will appear with data output. To continue with the next commands, simply hit the back button and all of your previous commands will be saved in the code box. Rweb can be accessed at http://bayes.math.montana.edu/Rweb/Rweb.general.html.
Procedure
R Code:
WTI = X[ ,1]
WTB = X[ ,2]
MUTI = X[ ,3]
MUTB = X[ ,4] (save each column as an individual vector- set of data)
WT.intensity = WTI - WTB
mut.intensity = MUTI – MUTB (subtract all the background intensities from all the total intensities for both red and green)
We created a single column of corrected intensities of both the green and red dyes (wild type and mutant).
R code:
WT.positive = WT.intensity[WT.intensity>0]
mut.positive = mut.intensity[mut.intensity>0]
WT.transformed = log2(WT.positive)
mut.transformed = log2(mut.positive)
By listing the name of the vectors on a separate line you should get a
display of all data points in that vector (not shown here).
R code:
m<- mut.transformed - WT.transformed
a<- (mut.transformed + WT.transformed)/2
plot(a,m, xlab = "A", ylab = "M", col=gray(.30), main = "MA plot")
In the MA plot, if the data are symmetric around a horizontal line through zero then no dye bias exits. However, if the MA plot shows that the data are not symmetric around a horizontal line through zero and clearly shows a nonlinear trend then a Lowess Normalization is needed.
![]()
A
B 
C
D

Figure 1. MA plots of the data for Groups 1 (A), 2 (B), 4 (C) and 9 (D). If data is not symmetric about a horizontal line at zero, represented in these graphs, then a
Lowess Normalization is needed to correct for dye bias.
R code:
lmodel <- loess(m~a)
norm.ratio <- m - fitted(lmodel)
scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")
N.mut <- mut.transformed -0.5*fitted(lmodel)
N.WT <- WT.transformed +0.5*fitted(lmodel)
A
B

C
D

Figure 2. Lowess plots of the data from Groups 1 (A), 2 (B), 4 (C) and 9 (D) after the Lowess Normalization. The lines on the graph represent the true axis
about which the data are symmetrically distributed. An improvement from the MA plots can be seen.
R code:
qmz<-(1/4)*(sort(N.mut.1)+sort(N.mut.2)+sort(N.mut.4)+sort(N.mut.9))
qm1<- qmz[rank(N.mut.1)]; qm2<- qmz[rank(N.mut.2)]; qm3<- qmz[rank(N.mut.4)]; qm4<- qmz[rank(N.mut.9)]
qwz<-(1/4)*(sort(N.WT.1)+sort(N.WT.2)+sort(N.WT.4)+sort(N.WT.9))
qw1<- qwz[rank(N.WT.1)]; qw2<- qwz[rank(N.WT.2)]; qw3<- qwz[rank(N.WT.4)]; qw4<- qwz[rank(N.WT.9)]
zz<- cbind(qw1,qw2,qw3,qw4,qm1,qm2,qm3,qm4)
boxplot(qw1,qw2,qw3,qw4,qm1,qm2,qm3,qm4, main="Stage 1 Quantile Normalized Box Plots",color=”Blue”)

Figure 3. Boxplots of the green and red dye intensities for Groups 1, 2, 4 and 9 respectively after quantile normalization. The
datasets have approximately the same distribution which is ideal for comparisons across the groups.
R code:
ratios.1 = N.mut.1/N.WT.1
ratios.2 = N.mut.2/N.WT.2
ratios.4 = N.mut.4/N.WT.4
ratios.9 = N.mut.9/N.WT.9
Find the ratios you want to analyze for significance. Then a t-test can be performed on those values to see if they are significantly different from zero.
One sample, two-sided t-test
X is a vector containing the ratios from all trials for a single gene
i.e. X = c(x1, x2, x3)
In this case, we are comparing the ratios to zero so µ = 0
The default confidence level is α = 0.05
R code:
t.test(X, alternative = “two.sided”, mu = 0)
Data output will include the t test statistic and a p-value. The p-value is compared to α and if
p< α then the null hypothesis is rejected. In this case, the intensity ratio would be significantly different from zero.
Two-sample, two-sided t-test
This is used if you wanted to compare two different datasets (i.e. Dzms1 and Dzms1/Dzms2) to see if they are differentially expressed.
The procedure is similar to that of the one sample t-test except a second variable, y, is added in.
X is a vector containing the ratios from all trials for a single gene within the Dzms1 replicates
Y is a vector containing the ratios from all the trials for a single gene within the Dzms1/Dzms2 replicates
R code:
t.test(X, Y, alternative = "two.sided", mu = 0)
Data output will again include the t test statistic and a p-value. The p-value is compared to α as above and if p<α then the null hypothesis is rejected
and we can conclude that the gene is differentially expressed between Dzms1 and Dzms1/Dzms2.
Table 1. Results of one-sample t-tests on ten possible genes of interest in Dzms1 replicates. Exact p-values are shown and compared to a standard α = 0.05. Rejecting Ho indicates
that the average intensity ratio of that gene is significantly different from zero and therefore, failing to reject Ho indicates that the average intensity ratio of the gene is not significantly
different from zero.
| Gene | p-value | p<α | Conclusion |
| YKL197C | 0.0738 | No | Fail to reject Ho |
| YAL045C | 0.0157 | Yes | Reject Ho |
| YHL008C | 0.0335 | Yes | Reject Ho |
| YHR187W | 0.0015 | Yes | Reject Ho |
| YPL213W | 0.2258 | No | Fail to reject Ho |
| YDR416W | 0.0023 | Yes | Reject Ho |
| YLL006W | 0.0173 | Yes | Reject Ho |
| YCR037C | 0.0517 | No | Fail to reject Ho |
| YOR259C | 0.0021 | Yes | Reject Ho |
| YDL207W | 0.0026 | Yes | Reject Ho |
Complete R code
WTI1 = X[ ,1]
WTB1 = X[ ,2]
MUTI1 = X[ ,3]
MUTB1 = X[ ,4]
WT.intensity.1 = WTI1 - WTB1
mut.intensity.1 = MUTI1 - MUTB1
WT.positive.1 = WT.intensity.1[WT.intensity.1>0]
mut.positive.1 = mut.intensity.1[mut.intensity.1>0]
WT.transformed.1 = log2(WT.positive.1)
mut.transformed.1 = log2(mut.positive.1)
m<- mut.transformed.1 - WT.transformed.1
a<- (mut.transformed.1 + WT.transformed.1)/2
plot(a,m, xlab="A", ylab="M", col=gray(.30), main ="MA plot1")
lmodel <- loess(m~a)
norm.ratio <- m - fitted(lmodel)
scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")
N.mut.1 <- mut.transformed.1 -0.5*fitted(lmodel)
N.mut.1
N.WT.1 <- WT.transformed.1 +0.5*fitted(lmodel)
N.WT.1
WTI2 = X[ ,5]
WTB2 = X[ ,6]
MUTI2 = X[ ,7]
MUTB2 = X[ ,8]
WT.intensity.2 = WTI2 - WTB2
mut.intensity.2 = MUTI2 - MUTB2
WT.positive.2 = WT.intensity.2[WT.intensity.2>0]
mut.positive.2 = mut.intensity.2[mut.intensity.2>0]
WT.transformed.2 = log2(WT.positive.2)
mut.transformed.2 = log2(mut.positive.2)
m<- mut.transformed.2 - WT.transformed.2
a<- (mut.transformed.2 + WT.transformed.2)/2
plot(a,m, xlab="A", ylab="M", col=gray(.30), main ="MA plot2")
lmodel <- loess(m~a)
norm.ratio <- m - fitted(lmodel)
scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")
N.mut.2 <- mut.transformed.2 -0.5*fitted(lmodel)
N.mut.2
N.WT.2 <- WT.transformed.2 +0.5*fitted(lmodel)
N.WT.2
WTI4 = X[ ,9]
WTB4 = X[ ,10]
MUTI4 = X[ ,11]
MUTB4 = X[ ,12]
WT.intensity.4 = WTI4 - WTB4
mut.intensity.4 = MUTI4 - MUTB4
WT.positive.4 = WT.intensity.4[WT.intensity.4>0]
mut.positive.4 = mut.intensity.4[mut.intensity.4>0]
WT.transformed.4 = log2(WT.positive.4)
mut.transformed.4 = log2(mut.positive.4)
m<- mut.transformed.4 - WT.transformed.4
a<- (mut.transformed.4 + WT.transformed.4)/2
plot(a,m, xlab="A", ylab="M", col=gray(.30), main ="MA plot4")
lmodel <- loess(m~a)
norm.ratio <- m - fitted(lmodel)
scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")
N.mut.4 <- mut.transformed.4 -0.5*fitted(lmodel)
N.mut.4
N.WT.4 <- WT.transformed.4 +0.5*fitted(lmodel)
N.WT.4
WTI9 = X[ ,13]
WTB9 = X[ ,14]
MUTI9 = X[ ,15]
MUTB9 = X[ ,16]
WT.intensity.9 = WTI9 - WTB9
mut.intensity.9 = MUTI9 - MUTB9
WT.positive.9 = WT.intensity.9[WT.intensity.9>0]
mut.positive.9 = mut.intensity.9[mut.intensity.9>0]
WT.transformed.9 = log2(WT.positive.9)
mut.transformed.9 = log2(mut.positive.9)
m<- mut.transformed.9 - WT.transformed.9
a<- (mut.transformed.9 + WT.transformed.9)/2
plot(a,m, xlab="A", ylab="M", col=gray(.30), main ="MA plot9")
lmodel <- loess(m~a)
norm.ratio <- m - fitted(lmodel)
scatter.smooth(norm.ratio~a, col=gray(.30),xlab="A",ylab="M - Est(M)", main="Lowess Plot")
N.mut.9 <- mut.transformed.9 -0.5*fitted(lmodel)
N.mut.9
N.WT.9 <- WT.transformed.9 +0.5*fitted(lmodel)
N.WT.9
qmz<-(1/4)*(sort(N.mut.1)+sort(N.mut.2)+sort(N.mut.4)+sort(N.mut.9))
qm1<- qmz[rank(N.mut.1)]; qm2<- qmz[rank(N.mut.2)]; qm3<- qmz[rank(N.mut.4)]; qm4<- qmz[rank(N.mut.9)]
qwz<-(1/4)*(sort(N.WT.1)+sort(N.WT.2)+sort(N.WT.4)+sort(N.WT.9))
qw1<- qwz[rank(N.WT.1)]; qw2<- qwz[rank(N.WT.2)]; qw3<- qwz[rank(N.WT.4)]; qw4<- qwz[rank(N.WT.9)]
zz<- cbind(qw1,qw2,qw3,qw4,qm1,qm2,qm3,qm4)
boxplot(qw1,qw2,qw3,qw4,qm1,qm2,qm3,qm4, main="Stage 1 Quantile Normalized Box Plots",color=”Blue”)
ratios.1 = N.mut.1/N.WT.1
ratios.2 = N.mut.2/N.WT.2
ratios.4 = N.mut.4/N.WT.4
ratios.9 = N.mut.9/N.WT.9
x.1 = c(ratios.1[61], ratios.4[61], ratios.9[61])
t.test(x.1, alternative = "two.sided", mu = 0)
x.2 = c(ratios.1[83], ratios.4[83], ratios.9[83])
t.test(x.2, alternative = "two.sided", mu = 0)
x.3 = c(ratios.1[87], ratios.4[87], ratios.9[87])
t.test(x.3, alternative = "two.sided", mu = 0)
x.4 = c(ratios.1[125], ratios.4[125], ratios.9[125])
t.test(x.4, alternative = "two.sided", mu = 0)
x.5 = c(ratios.1[260], ratios.4[260], ratios.9[260])
t.test(x.5, alternative = "two.sided", mu = 0)
x.6 = c(ratios.1[262], ratios.4[262], ratios.9[262])
t.test(x.6, alternative = "two.sided", mu = 0)
x.7 = c(ratios.1[263], ratios.4[263], ratios.9[263])
t.test(x.7, alternative = "two.sided", mu = 0)
x.8 = c(ratios.1[295], ratios.4[295], ratios.9[295])
t.test(x.8, alternative = "two.sided", mu = 0)
x.9 = c(ratios.1[338], ratios.4[338], ratios.9[338])
t.test(x.9, alternative = "two.sided", mu = 0)
x.10 = c(ratios.1[347], ratios.4[347], ratios.9[347])
t.test(x.10, alternative = "two.sided", mu = 0)