Hypothesis Testing in R Programming
A hypothesis is made by the researchers about the data collected for any experiment or data set. A hypothesis is an assumption made by the researchers that are not mandatory true. In simple words, a hypothesis is a decision taken by the researchers based on the data of the population collected. Hypothesis Testing in R Programming is a process of testing the hypothesis made by the researcher or to validate the hypothesis. To perform hypothesis testing, a random sample of data from the population is taken and testing is performed. Based on the results of testing, the hypothesis is either selected or rejected. This concept is known as Statistical Inference. In this , we’ll discuss the four-step process of hypothesis testing, One sample T-Testing, Two-sample T-Testing, Directional Hypothesis, one sample -test, two sample -test and correlation test in R programming.
Four Step Process of Hypothesis Testing
There are 4 major steps in hypothesis testing:
State the hypothesis- This step is started by stating null and alternative hypothesis which is presumed as true.
Formulate an analysis plan and set the criteria for decision- In this step, significance level of test is set. The significance level is the probability of a false rejection in a hypothesis test.
Analyze sample data- In this, a test statistic is used to formulate the statistical comparison between the sample mean and the mean of the population or standard deviation of the sample and standard deviation of the population.
Interpret decision- The value of the test statistic is used to make the decision based on the significance level. For example, if the significance level is set to 0.1 probability, then the sample mean less than 10% will be rejected. Otherwise, the hypothesis is retained to be true.
One Sample T-Testing
Student’s t-test is a classic method for comparing mean values of two samples that are normally distributed (i.e. they have a Gaussian distribution). Such samples are described as being parametric and the t-test is a parametric test. In R the t.test() command will carry out several versions of the t-test.
One sample T-Testing approach collects a huge amount of data and tests it on random samples. To perform T-Test in R, normally distributed data is required. This test is used to test the mean of the sample with the population. For example, the height of persons living in an area is different or identical to other persons living in other areas.
Syntax:
t.test(x, y, alternative, mu, paired, var.equal, …)
Parameters:
x – a numeric sample.
y – a second numeric sample (if this is missing the command carries out a 1-sample test).
paired – the default is paired = FALSE. This assumes independent samples. The alternative paired = TRUE is used for matched pair tests.
y – a second numeric sample (if this is missing the command carries out a 1-sample test).
alternative – how to compare means, the default is “two.sided”. You can also specify “less” or “greater”.
mu – the true value of the mean (or mean difference). The default is 0.paired – the default is paired = FALSE. This assumes independent samples. The alternative paired = TRUE is used for matched pair tests.
equal – the default is var.equal = FALSE. This treats the variance of the two samples separately. If you set var.equal = TRUE you conduct a classic t-test using pooled variance.
… – there are additional parameters that we aren’t concerned with here.
To know about more optional parameters of t.test(), try below command:
help("t.test")
Example:
# Defining sample vector
x <- rnorm(100)
# One Sample T-Test
t.test(x, mu = 5)
Output:
One Sample t-test
data: x
t = -49.6586, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 5
95 percent confidence interval:
-0.2637418 0.1407442
sample estimates:
mean of x
-0.0614988
Two Sample T-Testing
In two sample T-Testing, the sample vectors are compared. If var.equal = TRUE, the test assumes that the variances of both the samples are equal.
Syntax:
t.test(x, y)
Parameters:
x and y: Numeric vectors
Example:
Parameters:
x and y: Numeric vectors
Example:
# Defining sample vector
x <- rnorm(100)
y <- rnorm(100)
# Two Sample T-Test
t.test(x, y)
Output:
Welch Two Sample t-test
data: x and y
t = 0.485, df = 196.842, p-value = 0.6282
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2116428 0.3496944
sample estimates:
mean of x mean of y
-0.01672685 -0.08575265
Refer here for the test results description
Using the directional hypothesis, the direction of the hypothesis can be specified like, if the user wants to know the sample mean is lower or greater than another mean sample of the data.
Syntax:
t.test(x, mu, alternative)
Parameters:
x: represents numeric vector data
mu: represents mean against which sample data has to be tested
alternative: sets the alternative hypothesis
Parameters:
x: represents numeric vector data
mu: represents mean against which sample data has to be tested
alternative: sets the alternative hypothesis
Example:
# Defining sample vector
x <- rnorm(100)
# Directional hypothesis testing
t.test(x, mu = 2, alternative = 'greater')
Output:
One Sample t-test
data: x
t = -17.7266, df = 99, p-value = 1
alternative hypothesis: true mean is greater than 2
95 percent confidence interval:
0.09922095 Inf
sample estimates:
mean of x
0.2620122
One Sample -Test
This type of test is used when comparison has to computed on one sample and the data is non-parametric. It is performed using wilcox.test() function in R programming.
Syntax:
wilcox.test(x, y, exact = NULL)
Parameters:
x and y: represents numeric vector
exact: represents logical value which indicates whether p-value be computed
To know about more optional parameters of wilcox.test(), use below command:
help("wilcox.test")
Example:
# Define vector
x <- rnorm(100)
# one sample test
wilcox.test(x, exact = FALSE)
Output:
Wilcoxon signed rank test with continuity correction
data: x
V = 3037, p-value = 0.07863
alternative hypothesis: true location is not equal to 0
This test is performed to compare two samples of data.
Example:
# Define vectors
x <- rnorm(100)
y <- rnorm(100)
# Two sample test
wilcox.test(x, y)
Output:
Wilcoxon rank sum test with continuity correction
data: x and y
W = 3677, p-value = 0.001232
alternative hypothesis: true location shift is not equal to 0
Correlation Test
This test is used to compare the correlation of the two vectors provided in the function call or to test for the association between the paired samples.
Syntax:
cor.test(x, y)
Parameters:
x and y: represents numeric data vectors
To know about more optional parameters in cor.test() function, use below command:
help("cor.test")
Example:
cor.test(mtcars$mpg, mtcars$hp)
Output:
Pearson's product-moment correlation
data: mtcars$mpg and mtcars$hp
t = -6.7424, df = 30, p-value = 1.788e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.8852686 -0.5860994
sample estimates:
cor
-0.7761684
Comments
Post a Comment