Data Structures in R

A data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values.

R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether they’re homogeneous (all elements must be of the identical type) or heterogeneous (the elements are often of various types). This gives rise to the six data types which are most frequently utilized in data analysis.

The most essential data structures used in R include: 
  • Vectors
  • Lists
  • Dataframes
  • Matrices
  • Arrays
  • Factors

Vectors

A vector is an ordered collection of basic data types of a given length. The only key thing here is all the elements of a vector must be of the identical data type e.g homogeneous data structures. Vectors are one-dimensional data structures.
Example:
# R program to illustrate Vector
# Vectors(ordered collection of same data type)
X = c(1, 3, 5, 7, 8)

# Printing those elements in console
print(X)
Output:
[1] 1 35 7 8


Lists

A list is a generic object consisting of an ordered collection of objects. Lists are heterogeneous data structures. These are also one-dimensional data structures. A list can be a list of vectors, list of matrices, a list of characters and a list of functions and so on.

Example:
# R program to illustrate Vector

# Vectors(ordered collection of same data type)
X = list(101,"binu",45.0)

# Printing those elements in console
print(X)
Output:
[[1]]
[1] 101

[[2]]
[1] "binu"

[[3]]
[1] 45

# R program to illustrate a List

# The first attributes is a numeric vector
# containing the employee IDs which is
# created using the 'c' command here
empId = c(1, 2, 3, 4)

# The second attribute is the employee name
# which is created using this line of code here
# which is the character vector
empName = c("Binu", "Padma", "Aditi", "Abhijith")

# The third attribute is the number of employees
# which is a single numeric variable.
numberOfEmp = 4

# We can combine all these three different
# data types into a list
# containing the details of employees
# which can be done using a list command
empList = list(empId, empName, numberOfEmp)

print(empList)
Output:
[[1]]
[1] 1 2 3 4

[[2]]
[1] "Binu"     "Padma"    "Aditi"    "Abhijith"

[[3]]
[1] 4


Dataframes

Dataframes are generic data objects of R which are used to store the tabular data. Dataframes are the foremost popular data objects in R programming because we are comfortable in seeing the data within the tabular form. They are two-dimensional, heterogeneous data structures. These are lists of vectors of equal lengths.

Data frames have the following constraints placed upon them:
  • A data-frame must have column names and every row should have a unique name.
  • Each column must have the identical number of items.
  • Each item in a single column must be of the same data type.
  • Different columns may have different data types.

To create a data frame we use the data.frame() function.

Example:
# R program to illustrate dataframe

# A vector which is a character vector
Name = c("Amiya", "Raj", "Asish")

# A vector which is a character vector
Language = c("R", "Python", "Java")

# A vector which is a numeric vector
Age = c(22, 25, 45)

# To create dataframe use data.frame command
# and then pass each of the vectors
# we have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)
print(df)

Output:
Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45


Matrices

A matrix is a rectangular arrangement of numbers in rows and columns. In a matrix, as we know rows are the ones that run horizontally and columns are the ones that run vertically. Matrices are two-dimensional, homogeneous data structures.
Now, let’s see how to create a matrix in R. To create a matrix in R you need to use the function called matrix. The arguments to this matrix() are the set of elements in the vector. You have to pass how many numbers of rows and how many numbers of columns you want to have in your matrix and this is the important point you have to remember that by default, matrices are in column-wise order.

Example:
# R program to illustrate a matrix

A = matrix(
# Taking sequence of elements
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
# No of rows and columns
nrow = 3, ncol = 3,

# By default matrices are
# in column-wise order
# So this parameter decides
# how to arrange the matrix
#byrow = TRUE
)

print(A)

Output:
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9


Arrays

Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-dimensional data structures. For example, if we create an array of dimensions (2, 3, 3) then it creates 3 rectangular matrices each with 2 rows and 3 columns. They are homogeneous data structures.

Now, let’s see how to create arrays in R. To create an array in R you need to use the function called array(). The arguments to this array() are the set of elements in vectors and you have to pass a vector containing the dimensions of the array.

Example:
# R program to illustrate an array

A = array(
# Taking sequence of elements
c(1, 2, 3, 4, 5, 6),

# Creating two rectangular matrices
# each with two rows and two columns
dim = c(3, 2)
)

print(A)

Output:
[,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

Factors

Factors are the data objects which are used to categorize the data and store it as levels. They are useful for storing categorical data. They can store both strings and integers. They are useful to categorize unique values in columns like “TRUE” or “FALSE”, or “MALE” or “FEMALE”, etc.. They are useful in data analysis for statistical modeling.

Now, let’s see how to create factors in R. To create a factor in R you need to use the function called factor(). The argument to this factor() is the vector.

Example:
# R program to illustrate factors

# Creating factor using factor()
fac = factor(c("Male", "Female", "Male",
"Male", "Female", "Male", "Female"))

print(fac)

Output:
[1] Male   Female Male   Male   Female Male   Female
Levels: Female Male

Comments

Popular posts from this blog

Programming in R - Dr Binu V P

R Data Types

R- Linear Regression