

Introduction
This experiment sought to gain an understanding of gene expression in Saccharomyces cerevisiae in it's control of oxidative stress. In particular, the zms1 and zms2 genes are of importance because of their proven role in relieving oxidative stress caused by free radicals, by-products of aerobic respiration.4 By using microarrays and statistical analysis, gene expression levels can be quantified and thus applied directly to the understanding of oxidative stress.
Statistical analysis of microarray data is a necessary component to understanding exactly what data within the microarray is usable statistically. Likewise, it is important for microarray data to have some form of statistical accountability for use in further studies. Without some means of quantification, the data is worthless to scientists. The method I will be using to analyze our microarray is the statistical T-test.
T-tests are one of the most common means of statistical analysis and are often used to differentiate between data that are significantly expressed and data whose variations cannot be definitively explained by the predicted variations. Application of a T-test to a microarray presents a unique problem, but can usually be applied given maximal replication and subsequent verification using other statistical methodologies. One such problem encountered when applying a statistical T-test to microarray analysis is the unusually large number of samples completely dominates the unusually low number of repeated experiments.3 Compounding this problem is the fact that statistical T-tests must assume a normal distribution; i.e. one that isn’t present in microarray data.3 Even given these pitfalls, a T-test will be applied to the twelve genes determined to be differentially expressed across the 3 microarray slides containing zms1 knockouts described in the previous experiments.
The T-test will essentially compare the means of signals measured for zms1 knockout genes and wild type genes. I hypothesize that the zms1 knockout will cause the twelve genes to be differentially expressed from wild type gene expression. My null hypothesis will be that the zms1 knockout yeast will have no effect on gene expression.
Materials and Methods
The materials and methods for microarray data collection and analysis using Magic Tool are reported in the group materials and methods page. The data compares two groups of signals on the reported slides: Wild Type (group A) and zms1 knockout (group B). The genes chosen for analysis are those identified in the group project, and the slides used to obtain these genes are found in the materials and methods page.
1. Arrange signal values corrected for background noise in Excel placing genes of interest in different rows and trials in different columns for each group
2. Arrange columns so that the groups are separated (i.e. Group A1, Group A2, Group B1, Group B2)
3. Calculate means and standard deviations for each (Use excel functions for these)
4. Calculate p-value by using the excel function “ttest” using Group A (wildtype) as array 1 and Group B (ZMS1 knockout) as group 2. A 2-tailed t-test will be used since data can be up-regulated or down-regulated and a paired t-test will be used since all samples came from the same specimen.
5. Choose your desired acceptance level; also called alpha value (standard=0.05)
6. Compare calculated p-vales to alpha values to determine if signal level changes are significant. A p-value lower than the alpha value suggests that there is significant differential expression, while a p-value greater than the alpha value fails to support the hypothesis that there is significant differential expression.
Results
The results leading up to the identification of 12 genes differentially expressed at a 3-fold level in 3 out of 4 microarray slides are detailed in our group results page. These genes were then analyzed for statistical significance in their signal intensity values. First, the data was organized into columns containing information from each of the four microarray slides used. This data represents signal intensities corrected for background noise. This data is shown in table 1.
Table 1: Intensity values minus background noise. Group A represents Wild Type yeast expression levels, while group B represents zms1 knockout yeast expression levels. Blank spaces indicate negative values which were thus thrown out.

After organizing the data, mean values and standard deviations were calculated for each group. Again, blank spaces indicating negative numbers were left out of the calculations. These means were then used to conduct a statistical t-test to measure the differences between the two expression levels. This was done using the excel function using group A as array 1, group B as array 2, two tails, and a paired setting. Using two tails was important since the expression levels could be up or down regulated. Also, using the paired setting was important since the samples came from the same yeast. An alpha value of 0.05 was chosen for this study. This data is shown in table 2.
Table 2: Means, standard deviations, and p-values for each gene chosen. Note that group A is wild type yeast and group B is zms1 knockout yeast. Note that only YLR293C_01 yielded significant levels of differentiation.

Finally, analysis of my results was accomplished by comparing significantly expressed genes to their known functions within cells. This was accomplished by comparing differentially expressed genes to the yeast genome from the GCAT website. This is shown in table 3.
Table 4. The genes that were considered differentially expressed were compared to the yeast genome gene list available on the GCAT website. Red color corresponds to upregulated genes, while green color corresponds to downregulated genes in Δzms1 yeast.

Discussion
The original analysis of microarray data can be found in our group discussion page. Statistical analysis of the data obtained by the group is extremely important in order to validate our findings. Likewise, without applying a statistical test to the results, the data would have little standing. Although prior research has provided evidence that applying statistical t-tests to microarray data can be very difficult, it has been used as an accepted way of quantifying the validity of results.1
By applying a statistical t-test, I found that only 1 of the 12 originally reported genes was statistically significant. This gene, YLR293C_01, yielded a p-value of 0.0227, well below the accepted alpha value of 0.05. Given this result, this gene supports my alternative hypothesis that zms1 knockout genes will cause differential expression in comparison to wild type yeast expression levels. Also, in previous results, this gene was determined to be up-regulated, leading to the conclusion that this gene is significantly up-regulated in zms1 knockout yeast.
YLR293C_01 plays a role in small monomeric GTPase activity. Consequently, the up-regulation of this gene suggests that more GTP is being broken down within the mutant. This result, however, is very vague and cannot be applied directly to oxidative stress without further research.
All 11 other genes yielded p-values above the accepted alpha level. Thus, these values failed to support my alternative hypothesis. However, it is important to note that the signal levels in all samples were relatively low. When the signal levels are low, data can be inaccurate and in combination with other statistical difficulties, further research is warranted to determine their relation to oxidative stress.
When conducting statistical t-tests, several assumptions are made. One of those assumptions is that the distribution of the system is normal. As shown in our data, this is not the case. Failing to accommodate this assumption possibly signifies unreliable data. Also, the sample size for our data was very small and only included four trials. Statistically, this represents a very difficult situation. There is work being done to compensate for this lack of statistical power, and thus this experiment requires further research to support my hypothesis.2
Literature Cited
1. Slonim DK. (2002). From patterns to pathways: gene expression data analysis comes of age. Nat Genet., 32 Suppl:502-8.
2. Fox, Richard, J. (2006). A two-sample Bayesian T-test for Microarray data. BMC Bioinformatics. 2006; 7: 126. Published online 2006 March 10.
3. Trochim, William M.K. The T-Test. Research Methods Knowledge Base. Available [Online] http://www.socialresearchmethods.net/kb/stat_t.php. Nov. 2006.
4. Slekar, K. A Genetic Study of Anti-Oxidant Factors in Yeast. Lecture 2008 Oct.