Posts

Showing posts from April, 2023

R Strip Charts

Image
 R Strip Chart A strip chart is a type of chart that displays numerical data along a single strip.A strip chart can be used to visualize dozens of time series at once. Dataset to Create Strip Chart In R, first we need to load the dataset of which we want to create the strip chart of. In this tutorial, we will be using the built-in dataset named airquality to create a strip chart. Let's see the first six rows of the dataset we will be using, # use head() to load first six rows of airquality dataset  head(airquality)  Ozone Solar.R Wind Temp Month Day 1    41     190  7.4   67     5   1 2    36     118  8.0   72     5   2 3    12     149 12.6   74     5   3 4    18     313 11.5   62     5   4 5    NA      NA 14.3   56     5   5 6    28      NA 14.9   66     5   6 Create Strip Chart in R In R, we use the stripchart() function to create a strip chart.  For example, # strip chart for ozone reading of airquality dataset  stripchart(airquality$Ozone) In the above example, we have used the

Save Plot

  R Save Plot In this tutorial, you will learn about Plot Saving in R with the help of examples.All the graphs (bar plot, pie chart, histogram, etc.) we plot in R programming are displayed on the screen by default. We can save these plots as a file on disk with the help of built-in functions. It is important to know that plots can be saved as bitmap images (raster) which are fixed size or as vector images which are easily resizable. We will use the temperature column of built-in dataset airquality to demonstrate how the plots are saved in R.To demonstrate how the plots are saved, we will create and save a histogram plot. Save Plot as bitmap Image Most of the images we come across like jpeg or png are bitmap images. They have a fixed resolution and are pixelated when zoomed enough. Note: All the functions worked the same way, they just return different file types. 1. Save as jpeg Image In R, to save a plot in jpeg format, we use the jpeg() function. For example, # save histogram in jpeg

R Data Sets

A dataset is a data collection presented in a table. The R programming language has tons of built-in datasets that can generally be used as a demo data to illustrate how the R functions work. Most Used built-in Datasets in R In R, there are tons of datasets we can try but the mostly used built-in datasets are: airquality - New York Air Quality Measurements AirPassengers - Monthly Airline Passenger Numbers 1949-1960 mtcars - Motor Trend Car Road Tests iris - Edgar Anderson's Iris Data These are few of the most used built-in data sets. If you want to learn about other built-in datasets, please visit The R Datasets Package . In this tutorial we will be using the airquality dataset to demonstrate the use of datasets in R. Information About the Data Set You can use the question mark (?) to get information about the airquality data set: Example # Use the question mark to get information about the data set ? airquality Display R datasets To display the dataset, we simply write the name o

R Statistics

 Statistics is the science of analyzing, reviewing and conclude data. Some basic statistical numbers include:Mean, median and mode Minimum and maximum value Percentiles Variance and Standard Deviation Covariance and Correlation Probability distributions The R language was developed by two statisticians. It has many built-in functionalities, in addition to libraries for the exact purpose of statistical analysis. Min and Max The min() and max() functions can be used to find the lowest or highest value in a set.Lets consider  the mtcars data set and Find the largest and smallest value of the variable hp (horsepower). > min(mtcars$hp) [1] 52 > max(mtcars$hp) [1] 335 Now we know that the largest horsepower value in the set is 335, and the lowest 52.We could take a look at the data set and try to find out which car these two values belongs to: For example, we can use the which.max() and which.min() functions to find the index position of the max and min value in the table: > wh

Read and write csv files in R

The CSV (Comma Separated Value) file is a plain text file that uses a comma to separate values. R has a built-in functionality that makes it easy to read and write a CSV file. Sample CSV File To demonstrate how we read CSV files in R, let's suppose we have a CSV file named stud.csv with following data: rollno   name    place mark 1    101   binu ernkulam   45 2    103  ashik alleppey   35 3    102 faisal   kollam   48 4    105   biju  kotayam   25 5    106    ann  thrisur   30 The csv file contains rollnumber,name,place and mark of students Read a CSV File in R In R, we use the read.csv() function to read a CSV file available in our current directory. For example, stud=read.csv("stud.csv") print(stud) output:   rollno   name    place mark 1    101   binu ernkulam   45 2    103  ashik alleppey   35 3    102 faisal   kollam   48 4    105   biju  kotayam   25 5    106    ann  thrisur   30 Note: If the file is in some other location, we have to specify the path along with the

read and write xlsx files in R

Image
An xlsx is a file format used for Microsoft Excel spreadsheets. Excel can be used to store tabular data. R has a built-in functionality that makes it easy to read and write a xlsx file. Sample xlsx File To demonstrate how we read xlsx files in R, let's suppose we have an excel file named stud.xlsx with following data: Install and Load xlsx Package In order to read, write, and format Excel files into R, we first need to install and load the xlsx package as: # install xlsx package  install.package("xlsx")   # load xlsx file  library("xlsx") Here, we have successfully installed and loaded the xlsx package.Now, we are able to read data from an xlsx file. Read a xlsx File in R In R, we use the read.xlsx() function to read a xlsx file available in our current directory. For example, # install xlsx package   install.package("xlsx") # load xlsx file  library("xlsx")  # read stud.xlsx file from our current directory  read_data <- read.xlsx("s

Data Handling in R

Image
R Programming Language is used for statistics and data analytics purposes. Importing and exporting of data is often used in all these applications of R programming. R language has the ability to read different types of files such as comma-separated values (CSV) files, text files, excel sheets and files, SPSS files, SAS files, etc. R allows its users to work smoothly with the systems directories with the help of some pre-defined functions that take the path of the directory as the argument or return the path of the current directory that the user is working on. Below are some directory functions in R: getwd(): This function is used to get the current working directory being used by R. setwd(): This function in R is used to change the path of current working directory and the path of the directory is passed as argument in the function. Example: setwd("C:/RExamples/") OR setwd("C:\\RExamples\\") list.files(): This function lists all files and folders present in current

Mean Median Mode

Image
The measure of central tendency in R Language represents the whole set of data by a single value. It gives us the location of the central points. There are three main measures of central tendency:  Mean Median Mode Before doing any computation, first of all, we need to prepare our data and save our data in external .txt or .csv files and it’s a best practice to save the file in the current directory. After that import, your data into R as follow: Get the CSV file   here.    Read and display the csv file # R program to import data into R # Import the data using read.csv() myData = read.csv("CardioGoodFitness.csv", stringsAsFactors=F) # Print the first 6 rows print(head(myData)) Output: Product Age Gender Education MaritalStatus Usage Fitness Income Miles 1   TM195  18   Male        14        Single            3       4  29562   112 2   TM195  19   Male        15        Single            2       3  31836    75 3   TM195  19 Female       14       Partnered        4       3 

Variance and Standard Deviation

Image
Range The range describes the difference between the largest and smallest data point in our data set. The bigger the range, the more is the spread of data and vice versa. Range = Largest data value – smallest data value # R program to illustrate # Descriptive Analysis # Import the data using read.csv() myData = read.csv("CardioGoodFitness.csv", stringsAsFactors = F) # Calculate the maximum max = max(myData$Age) # Calculate the minimum min = min(myData$Age) # Calculate the range range = max - min cat("Range is:\n") print(range) # Alternate method to get min and max r = range(myData$Age) print(r) Output: Range is:  [1] 32  [1] 18 50 Variance in R Programming Language Variance is the sum of squares of differences between all numbers and means. The mathematical formula for variance is as follows, where, N is the total number of elements or frequency of distribution. Computing Variance in R Programming One can calculate the variance by using var() function in R. Syn

Quartiles and Summary

  Quartiles A quartile is a type of quantile. The first quartile (Q1), is defined as the middle number between the smallest number and the median of the data set, the second quartile (Q2) – the median of the given data set while the third quartile (Q3), is the middle number between the median and the largest value of the data set. Example: # R program to illustrate # Descriptive Analysis # Import the data using read.csv() myData = read.csv("CardioGoodFitness.csv", stringsAsFactors = F) # Calculating Quartiles quartiles = quantile(myData$Age) print(quartiles) Output: 0%  25%  50%  75%      100%   18   24         26       33        50  Interquartile Range The interquartile range (IQR), also called as midspread or middle 50%, or technically H-spread is the difference between the third quartile (Q3) and the first quartile (Q1). It covers the center of the distribution and contains 50% of the observations. IQR = Q3 – Q1 # R program to illustrate # Descriptive Analysis # Import the

Normal Distributions in R

Image
Normal Distribution is a probability function used in statistics that tells about how the data values are distributed. It is the most important probability distribution function used in statistics because of its advantages in real case scenarios. For example, the height of the population, shoe size, IQ level, rolling a dice, and many more. It is generally observed that data distribution is normal when there is a random collection of data from independent sources. The graph produced after plotting the value of the variable on x-axis and count of the value on y-axis is bell-shaped curve graph. The graph signifies that the peak point is the mean of the data set and half of the values of data set lie on the left side of the mean and other half lies on the right part of the mean telling about the distribution of the values. The graph is symmetric distribution. In R, there are 4 built-in functions to generate normal distribution: dnorm()      dnorm(x, mean, sd) pnorm()      pnorm(x, mean, sd

Poisson Distribution in R

Image
The Poisson distribution is a discrete distribution that has only one parameter named as lambda and it is the rate parameter. The rate parameter is defined as the number of events that occur in a fixed time interval. To create a plot of Poisson distribution in R, we can use the plot function with the density of the Poisson distribution using dpois function. Example plot(dpois(x=1:50,lambda=3))

Binomial Distribution in R

Image
Binomial distribution in R is a probability distribution used in statistics. The binomial distribution is a discrete distribution and has only two outcomes i.e. success or failure. All its trials are independent, the probability of success remains the same and the previous outcome does not affect the next outcome. The outcomes from different trials are independent. Binomial distribution helps us to find the individual probabilities as well as cumulative probabilities over a certain range. It is also used in many real-life scenarios such as in determining whether a particular lottery ticket has won or not, whether a drug is able to cure a person or not, it can be used to determine the number of heads or tails in a finite number of tosses, for analyzing the outcome of a die, etc. Binomial distribution for any random variable x is given by P(x) =  n C x   ·  p x  (1 − p) n − x Functions for Binomial Distribution We have four functions for handling binomial distribution in R namely: dbino

Covariance and Correlation in R Programming

Image
Covariance and Correlation are terms used in statistics to measure relationships between two random variables. Both of these terms measure linear dependency between a pair of random variables or bivariate data. In this, we are going to discuss cov(), cor() and cov2cor() functions in R which use covariance and correlation methods of statistics and probability theory. Covariance in R Programming Language In R programming, covariance can be measured using cov() function. Covariance is a statistical term used to measures the direction of the linear relationship between the data vectors. Mathematically, where, x represents the x data vector y represents the y data vector represents mean of x data vector represents mean of y data vector N represents total observations Covariance Syntax in R Syntax:            cov(x, y, method) where, x and y represents the data vectors method defines the type of method to be used to compute covariance. Default is “pearson”. Example: x <- c(1, 3, 5, 10) y