# R cheat sheet for working with normal distribution

This article presents a bunch of code snippets that you can use as a reference when dealing with normal distribution. R makes it easy to run stats. Use, improve and share your comments please.

#the 68-96-99.7% rule for the normal distribution with regards to standard deviations
#How to work with standard normal tables

#What is normal distribution?
#The Normal Distribution is a theoretical probability distribution that is perfectly symmetric about its mean
#Normal Distributions are uniquely defined by two quantities: a mean (µ), and standard deviation (?)
#The entire distribution of values described by a normal distribution can be
#completely specified by knowing just the mean and standard deviation
#The 68-95-99.7 Rule for the Normal Distribution
#68% of the observations fall within one standard deviation of the mean
#The probability that any randomly selected value is with one standard deviation of the mean is 0.68 or 68%
#In R that could be found with the pnorm fuction as below:

# Just remember that pnorm gives you cumulative area under the argument you provide.

area_under_1SD=pnorm(1)-pnorm(-1)
print(area_under_1SD)
print(100*area_under_1SD)

#Remember to use 1-pnorm for this one
area_above_1SD=1-pnorm(1)
area_above_1SD

#you can get it directly from the pnorm function
area_below_1SD=pnorm(-1)
area_below_1SD

area_under_2SD=pnorm(1.96)-pnorm(-1.96)
print(area_under_2SD)
print(100*area_under_2SD)

#Remember to use 1-pnorm for this one
area_above_2SD=1-pnorm(1.96)
area_above_2SD

#you can get it directly from the pnorm function
area_below_2SD=pnorm(-1.96)
area_below_2SD

#Applying the Principles of the Normal Distribution to Sample Data to Estimate
#Characteristics of Population Data

#Given that blood sugar mean=123.6, standard deviation=12.9
#Using only the sample mean and standard deviation, and assuming normality,
#let’s estimate the 2.5th and 97.5th percentiles blood sugar
qnorm(.025,123.6,12.9)
qnorm(.5,123.6,12.9)# Check point, 50th percentile should give you the mean back.
qnorm(.975,123.6,12.9)

#by hand calculations
#2.5th %ile: = 123.6 –(2×12.9) = 97.8
#97.5th %ile: = 123.6 +(2×12.9) = 149.4

##################
#Problem type 2
#mean and sd are given
#Test an individual observation relative to the rest of population
sample=130
n=113
mean=123.6
sd=12.9
# A patient has blood sugar =130
#what is proportion of men with higher blood sugar than 130

#First find the difference betweem mean and sample
diff=sample-mean
diff
# 6.4
#determine difference in terms of sd
diff_by_sd=diff/sd
diff_by_sd
# 0.496124

#Now we can determine what %of the normal curve is more than 0.5 SD above it’se mean
pnorm(1) #cumulative area under +1SD
pnorm(0) #cumulative area in the left hand side of the
pnorm(1)-pnorm(0) # area ABOVE the mean within 1 sd
pnorm(0)-pnorm(-1) #area BELOW the mean within 1 sd
pnorm(1)-pnorm(-1) # total area within 1 sd

pnorm(.5)-pnorm(0) # area ABOVE the mean within .5 sd

# But the question is asking what % od the population is to the right side above the
#area between mean and .5 sd, which cab be found by:
.5-(pnorm(.5)-pnorm(0))

# or in short we can get the same answer using
# we know 99.7% population will be within 3 sd.
pnorm(3)-pnorm(.5)

#####################################
# ANOTHER EXAMPLE
mean=7.1 #kg
sd=1.2 #kg
n=236 # no. of children / observation
#assume normal distribution
#calculate: Range of weights of children in the population
#we will use the qnorm function
#we know 2sd covers above 95% of the population between 2.5% to 97.5%
#
q25=qnorm(0.025,mean,sd)
q25
q97.5=qnorm(.975,mean,sd)
q97.5
# so, range is between 4.7 to 9.5 kgs

# now once child has weight of 5kgs, how to interpret this data?
sample=5 #kg
#First find the difference betweem mean and sample
#Let’s go back to our previously defined equations
diff=sample-mean
diff
#determine difference in terms of sd
diff_by_sd=diff/sd
diff_by_sd
#so, the sample is 1.75sd below the mean

# So, we can determine what percent of the population is less than
#-1.75sd in the population using pnorm
pnorm(-1.75)
#so only 4% childred would have less than 5 kg weight
pnorm(0)-pnorm(-1.75)
#so, 46% of the children will be between 5 kg and 7.1 kg
#area under -1.75 and 1.75 sd
pnorm(1.75)-pnorm(-1.75)
#so, 92% of the pulation is within 1.75 sd

#####################
#Example 3:
####################
n=1860
mean=23.6
sd=4.9
# first assume normal distribution
# find the range for 95% of the data
q25=qnorm(0.025,mean,sd)
q25
q97.5=qnorm(.975,mean,sd)
q97.5

#find the 95th percentile
q95=qnorm(.95,mean,sd)
q95
#note that the 95 percentile is often used as a cut off value for a study

# to find a % of pulation with a given interval (a,b) do this:
a=18.5
b=24.9
mean=23.6
sd=4.9
a=(a-mean)/sd
b=(b-mean)/sd
a
b
#% area between a and b
pnorm(b)-pnorm(a)