R Data Sets

A dataset is a data collection presented in a table.


The R programming language has tons of built-in datasets that can generally be used as a demo data to illustrate how the R functions work.

Most Used built-in Datasets in R

In R, there are tons of datasets we can try but the mostly used built-in datasets are:
airquality - New York Air Quality Measurements
AirPassengers - Monthly Airline Passenger Numbers 1949-1960
mtcars - Motor Trend Car Road Tests
iris - Edgar Anderson's Iris Data

These are few of the most used built-in data sets. If you want to learn about other built-in datasets, please visit The R Datasets Package.

In this tutorial we will be using the airquality dataset to demonstrate the use of datasets in R.

Information About the Data Set

You can use the question mark (?) to get information about the airquality data set:

Example
# Use the question mark to get information about the data set

?airquality

Display R datasets


To display the dataset, we simply write the name of the dataset inside the print() function. For example,
# display airquality dataset 
print(airquality)

 Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
2      36     118  8.0   72     5   2
3      12     149 12.6   74     5   3
4      18     313 11.5   62     5   4
5      NA      NA 14.3   56     5   5
6      28      NA 14.9   66     5   6
7      23     299  8.6   65     5   7
8      19      99 13.8   59     5   8
9       8      19 20.1   61     5   9
10     NA     194  8.6   69     5  10
11      7      NA  6.9   74     5  11
12     16     256  9.7   69     5  12
13     11     290  9.2   66     5  13
14     14     274 10.9   68     5  14
15     18      65 13.2   58     5  15
Total 153 rows

Get Informations of Dataset

In R, there are various functions we can use to get information about the dataset like: 
dimensions of dataset, number of rows and columns, name of variables and so on. For example,
# use dim() to get dimension of dataset 
cat("Dimension:",dim(airquality)) 
# use nrow() to get number of rows 
cat("\nRow:",nrow(airquality)) 
# use ncol() to get number of columns 
cat("\nColumn:",ncol(airquality)) 
# use names() to get name of variable of dataset 
cat("\nName of Variables:",names(airquality))

Output:
Dimension: 153 6 
Row: 153 
Column: 6 
Name of Variables: Ozone Solar.R Wind Temp Month Day

Display Variables Value in R

To display all the values of the specified variable in R, we use the $ operator and the name of the variable. For example,
# display all values of Temp variable 
print(airquality$Temp)
  [1] 67 72 74 62 56 66 65 59 61 69 74 69 66 68 58 64 66 57 68 62 59 73 61 61 57
 [26] 58 57 67 81 79 76 78 74 67 84 85 79 82 87 90 87 93 92 82 80 79 77 72 65 73
 [51] 76 77 76 76 76 75 78 73 80 77 83 84 85 81 84 83 83 88 92 92 89 82 73 81 91
 [76] 80 81 82 84 87 85 74 81 82 86 85 82 86 88 86 83 81 81 81 82 86 85 87 89 90
[101] 90 92 86 86 82 80 79 77 79 76 78 78 77 72 75 79 81 86 88 97 94 96 94 91 92
[126] 93 93 87 84 80 78 75 73 81 76 77 71 71 78 67 76 68 82 64 71 81 69 63 70 77
[151] 75 76 68


Sort Variables Value in R

In R, we use the sort() function to sort values of variables in ascending order. For example,
# sort values of Temp variable 
sort(airquality$Temp)
 [1] 56 57 57 57 58 58 59 59 61 61 61 62 62 63 64 64 65 65 66 66 66 67 67 67 67
 [26] 68 68 68 68 69 69 69 70 71 71 71 72 72 72 73 73 73 73 73 74 74 74 74 75 75
 [51] 75 75 76 76 76 76 76 76 76 76 76 77 77 77 77 77 77 77 78 78 78 78 78 78 79
 [76] 79 79 79 79 79 80 80 80 80 80 81 81 81 81 81 81 81 81 81 81 81 82 82 82 82
[101] 82 82 82 82 82 83 83 83 83 84 84 84 84 84 85 85 85 85 85 86 86 86 86 86 86
[126] 86 87 87 87 87 87 88 88 88 89 89 90 90 90 91 91 92 92 92 92 92 93 93 93 94
[151] 94 96 97

Comments

Popular posts from this blog

Programming in R - Dr Binu V P

Introduction

R Data Types