Company A produces biological reagents and some laboratory equipment. The weekly production of reagent M follows the normal probability distribution with a mean of 200 and a standard deviation of 16. Recently, new production methods have been introduced and 50 reagent M were produced whose mean is 203.5
- The boss would like to investigate whether there has been a change in weekly production of reagent M. Test using 0.01 significance level.
- Suppose the boss want to know whether there has been an increase in weekly production of reagent M. To put it another way, can we conclude, because of the improved production methods, that the mean production of M was more than 200? Test using 0.01 significance level.
Please write down the key steps to solve the problem (The process and R code)
# Q1
P_mean = 200
P_sd = 16
s_mean = 203.5
# plot the population normal distribution
x = seq(P_mean-3*P_sd, P_mean+3*P_sd, length=100)
density = dnorm(x, mean=P_mean, sd=P_sd)
plot(x, density, type='l' )
abline(v=s_mean, col='red')
# caculate the P value (two tail)
pval = (1 - pnorm(s_mean, mean=P_mean, sd=P_sd)) * 2
pval # 0.8 Not even close to 0.01
# Thus, I highly doubt the given numbers in this exercise,
# it would make much more sense if the population std is 1.6,
# 16 is simply too big
P_mean = 200
P_sd = 1.6
s_mean = 203.5
# plot the population normal distribution
x = seq(P_mean-3*P_sd, P_mean+3*P_sd, length=100)
density = dnorm(x, mean=P_mean, sd=P_sd)
plot(x, density, type='l' )
abline(v=s_mean, col='red')
# caculate the P value (two tail)
pval = (1 - pnorm(s_mean, mean=P_mean, sd=P_sd)) * 2
pval # 0.028
# the one tail P value is 0.028/2=0.014
# if we use 0.01 as a threshold, than we can conclude
# there is a increase or change
This question needs to use data “data.csv’’,which derives from a microarray dataset investigating gene ression of certain disease. The data has been processed,and the .rst row of the data is the sample serial number, namely, S1 - S20, and the .rst column of the data is the genes. The numbers are the expression values of each gene.Please answer the following questions (R code required)
- Please draw a density plot (PDF) to investigate the distribution G3 gene expression among 20 samples,and then calculate its minimum, median and variance
using certain function in R. - Please draw a boxplot to compare the distribution of all genes expression among 20 different samples. Note that you should check whether there are any outliers in
them? If they really exist, please delete them and redo it.
# Q2
dat = read.table('./homework3-4_data.csv', header=T, row.names=1,
stringsAsFactors=F, sep=',')
dim(dat)
dat[1:5,1:5]
pdf(file='boxplot.pdf', width=6, height=4, pointsize=6)
boxplot(dat, outline = F, main='Boxplot without outliers')
dev.off()
pdf(file='G3_hist.pdf', width=6, height=4)
hist(unlist(dat['G3',]), main='Histgram of G3 expression')
dev.off()
# Actually .there are several different way to define outliers.
# points outside 3 standard deviation
# points above / below 1.5*(quantile(x, .75) - quantile(x,.25))
网友评论