Data Handling in R

R Programming Language is used for statistics and data analytics purposes. Importing and exporting of data is often used in all these applications of R programming.

R language has the ability to read different types of files such as comma-separated values (CSV) files, text files, excel sheets and files, SPSS files, SAS files, etc.

R allows its users to work smoothly with the systems directories with the help of some pre-defined functions that take the path of the directory as the argument or return the path of the current directory that the user is working on. Below are some directory functions in R:

getwd(): This function is used to get the current working directory being used by R.
setwd(): This function in R is used to change the path of current working directory and the path of the directory is passed as argument in the function.

Example:
setwd("C:/RExamples/")

OR
setwd("C:\\RExamples\\")

list.files(): This function lists all files and folders present in current working directory.

Importing Files in R

Let us take some basic files to import in R to learn the approach of importing data files:

Importing Text Files

In R language, text files can be read using read.table() function.

Syntax:
read.table(filename, header = FALSE, sep = "")

Parameters:
header- represents if the file contains header row or not
sep -represents the delimiter value used in file

To know about all the arguments of read.table(), execute below command in R:
help("read.table")

Example:
Suppose a file is present in the current working directory and using R programming, import the data from that particular text file and the content of text file is as shown:

101 'binu' 40
102 'biju'  50
103 'bini'  60
104 'biji'   45

Values are separated by white spaces.

# Check current working directory
getwd()

# Get content into a data frame
data <- read.table("stud.txt", header = FALSE, sep = " ")

# Printing content of Text File
print(data)

# Print the class of data
print(class(data))

Output:
[1] "C:/Users/binuvp/Documents"
 V1   V2 V3
1 101 binu 40
2 102 biju 50
3 103 bini 60
4 104 biji 45
[1] "data.frame"

The code below allow the user to choose the file
# import and store the dataset in data2
data2 <- read.table(file.choose(), header=F, sep=",")

# display data
data2


Importing CSV Files

Comma separated values or CSV files can be imported and read in R using read.csv() function.

Syntax:
    read.csv(filename, header = FALSE, sep = "")

Parameters:
header represents if the file contains header row or not
sep represents the delimiter value used in file

To know about all the arguments of read.csv(), execute below command in R:
    help("read.csv")

Example
# Check current working directory
getwd()

# Get content into a data frame
data <- read.csv("CSVFileExample.csv",
header = FALSE,sep = "\t")

# Printing content of Text File
print(data)

# Print the class of data
print(class(data))
Output:
[1] "C:/Users/binuvp/Documents"
 V1   V2 V3
1 101 binu 40
2 102 biju 50
3 103 bini 60
4 104 biji 45
[1] "data.frame"

Importing Excel File

To read and import the excel files, “xlsx” package is required to use the read.xlsx() function. To read “.xls” excel files, “gdata” package is required to use read.xls() function.

Syntax:
    read.xlsx(filename, sheetIndex)

OR
        read.xlsx(filename, sheetName)

Parameters:
sheetIndex specifies number of sheet
sheetName specifies name of sheet

To know about all the arguments of read.xlsx(), execute below command in R:
help("read.xlsx")

Example:
# Install xlsx package
install.packages("xlsx")

library(xlsx)

# Check current working directory
getwd()

# Get content into a data frame
data <- read.xlsx("ExcelExample.xlsx",
sheetIndex = 1,
header = FALSE)

# Printing content of Text File
print(data)

# Print the class of data
print(class(data))

 X1   X2     X3
1 101 binu  40
2 102 biju   50
3 103 bini   60
4 104 biji    45
[1] "data.frame"

Exporting files in R

Below are some methods to export the data to a file in R:Using console cat() function in R is used to output the object to console. It can be also used as redirecting the output to a particular file.

Syntax:
        cat(..., file)

Parameter:
file specifies the filename to which output has to redirected

To know about all the arguments of cat(), execute below command in R:

help("cat")

Example:

str = "World"

# Redirect Output to file
cat("Hello, ", str, file = "Example.txt")


Output:
Above code creates a new file and redirects the output of cat(). The contents of the file are shown below after executing the code-

Hello, World

Using sink() function:

sink() function is used to redirect all the outputs from cat() and print() to the given filename.

Syntax:
sink(filename) # begins redirecting output to file . . sink()

To know about all the arguments of sink(), execute below command in R:
help("sink")

Example
# Begin redirecting output
sink("SinkExample.txt")

x <- c(1, 3, 4, 5, 10)
print(mean(x))
print(class(x))
print(median(x))

sink()

The above code creates a new file and redirects the output. The contents of the file are shown below after executing the code-
Output:
[1] 4.6
[1] "numeric"
[1] 4

Writing to CSV files

A matrix or data-frame object can be redirected and written to csv file using write.csv() function.

Syntax: 
            write.csv(x, file)

Parameter:
file-specifies the file name used for writing

To know about all the arguments of write.csv(), execute below command in R:
help("write.csv")

Example:

# Create vectors
x <- c(1, 3, 4, 5, 10)
y <- c(2, 4, 6, 8, 10)
z <- c(10, 12, 14, 16, 18)

# Create matrix
data <- cbind(x, y, z)

# Writing matrix to CSV File
write.csv(data, file = "CSVWrite.csv", row.names = FALSE)

Output:
Above code creates a new file and redirects the output. The contents of the file is shown below after executing the code

Using R-Studio

Here we are going to import data through R studio with the following steps.

Steps: 
  • From the Environment tab click on the Import Dataset Menu
  • Select the file extension from the option
  • In the third step, a pop-up box will appear, either enter the file name or browse the desktop.
  • The selected file will be displayed on a new window with its dimensions.
  • In order to see the output on the console, type the filename.
In order to load the data onto the console for use, we use the attach command.

attach(dataset)


Read JSON Files Into R

In order to work with JSON files in R, one needs to install the “rjson” package. The most common tasks done using JSON files under rjson packages are as follows:

  • Install and load the rjson package in R console
  • Create a JSON file
  • Reading data from JSON file
  • Write into JSON file
  • Converting the JSON data into Dataframes
  • Working with URLs

JSON file for demonstration:
{ 
   "ID":["1","2","3","4","5"],
   "Name":["Mithuna","Tanushree","Parnasha","Arjun","Pankaj"],
   "Salary":["722.5","815.2","1611","2829","843.25"],
   "StartDate":["6/17/2014","1/1/2012","11/15/2014","9/23/2013","5/21/2013"],
   "Dept":["IT","IT","HR","Operations","Finance"]
}
# Read a JSON file 
# Load the package required to read JSON files. 
library("rjson") 
# Give the input file name to the function. 
result <- fromJSON(file = "E:\\example.json")
 # Print the result. 
print(result)

$ID [1] "1" "2" "3" "4" "5" 
$Name 
[1] "Mithuna" "Tanushree" "Parnasha" "Arjun" "Pankaj" 
$Salary 
[1] "722.5" "815.2" "1611" "2829" "843.25" 
$StartDate 
[1] "6/17/2014" "1/1/2012" "11/15/2014" "9/23/2013" "5/21/2013" 
$Dept 
[1] "IT" "IT" "HR" "Operations" "Finance"

Comments

Popular posts from this blog

Programming in R - Dr Binu V P

R Data Types

R- Linear Regression