R Strings


Strings are a bunch of character variables. It is a one-dimensional array of characters. One or more characters enclosed in a pair of matching single or double quotes can be considered a string in R. Strings in R Programming represent textual content and can contain numbers, spaces, and special characters. An empty string is represented by using “. R Strings are always stored as double-quoted values. A double-quoted string can contain single quotes within it. Single-quoted strings can’t contain single quotes. Similarly, double quotes can’t be surrounded by double quotes.
Creation of String in R

R Strings can be created by assigning character values to a variable. These strings can be further concatenated by using various functions and methods to form a big string.

Example
# R program for String Creation

# creating a string with double quotes
str1 <- "OK1"
cat ("String 1 is : ", str1,"\n")

# creating a string with single quotes
str2 <- 'OK2'
cat ("String 2 is : ", str2,"\n")
str3 <- "This is 'acceptable and 'allowed' in R"
cat ("String 3 is : ", str3,"\n")
str4 <- 'Hi, Wondering "if this "works"'
cat ("String 4 is : ", str4,"\n")
str5 <- 'hi, ' this is not allowed'
cat ("String 5 is : ", str5)

Output:
String 1 is :  OK1 
String 2 is :  OK2 
String 3 is :  This is 'acceptable and 'allowed' in R 
ERROR!
String 4 is :  Hi, Wondering "if this "works" 
Error: unexpected symbol in "str5 <- 'hi, ' this"
Execution halted

Length of String

The length of strings indicates the number of characters present in the string. The function str_length() belonging to the ‘string’ package or nchar() inbuilt function of R can be used to determine the length of strings in R.

Using the str_length() function
library(string)

# Calculating length of string
str_length("hello")


using nchar function
# R program to find length of string

# Using nchar() function
nchar("hello")


Accessing portions of an R string

The individual characters of a string can be extracted from a string by using the indexing methods of a string. There are two R’s inbuilt functions in order to access both the single character as well as the substrings of the string.



substr() or substring() function in R extracts substrings out of a string beginning with the start index and ending with the end index. It also replaces the specified substring with a new set of characters.

Syntax:

    substr(..., start, end) or substring(..., start, end)

Using substr() function
# R program to access characters in string
str <- "R Programming"
# counts the number of characters of str 
len <- nchar(str)
print(substr(str, 1, 6))
print(substr(str, len-2, len))

Output
[1] "R Prog"
[1] "ing"

Case Conversion

The R string characters can be converted to upper or lower case by R’s inbuilt function toupper() which converts all the characters to upper case, tolower() which converts all the characters to lower case, and casefold(…, upper=TRUE/FALSE) which converts on the basis of the value specified to the upper argument. All these functions can take in as arguments multiple strings too. The time complexity of all the operations is O(number of characters in the string).

Example
# R program to Convert case of a string
str <- "Hi LeArn CodiNG"
print(toupper(str))
print(tolower(str))
print(casefold(str, upper = TRUE))

Output:
[1] "HI LEARN CODING"
[1] "hi learn coding"
[1] "HI LEARN CODING"

By default, the value of upper in casefold() function is set to FALSE. If we set it to TRUE, the R string gets printed in upper case.

Concatenation of R Strings

Using R’s paste function, you can concatenate strings. Here is a straightforward example of code that joins two strings together:
paste() function: This function is used to combine strings in R. It can take n number of arguments to combine together.

Syntax: 
paste(…., sep = " ", collapse =NULL )

Parameters: …..: 
It is used to pass n no of arguments to combine together.
sep: It is used to represent the separator between the arguments. It is optional.
collapse: It is used to remove the space between 2 strings, But not space within two words in one string.

Example:
# concatenate two strings
str1 <- "hello"
str2 <- "how are you?"
print(paste(str1, str2, sep = " ", collapse = "NULL"))

# Create two strings
string1 <- "Hello"
string2 <- "World"

# Concatenate the two strings
result <- paste(string1, string2)

# Print the result
print(result)

Output:
[1] "Hello World"

You can also concatenate multiple strings by passing them as separate arguments to the paste function, like this:
# Concatenate three strings
result <- paste("Hello", "to", "the World")

# Print the result
print(result)

Output:
[1] "Hello to the World"

R String formatting

String formatting in R is done via the sprintf function. An easy example of code that prepares a string using a variable value is provided below:

# Create two variables with values
x <- 42
y <- 3.14159

# Format a string with the two variable values
result <- sprintf("The answer is %d, and pi is %.2f.", x, y)

# Print the result
print(result)

Output:
[1] "The answer is 42, and pi is 3.14."

Updating R strings
The characters, as well as substrings of a string, can be manipulated to new string values. The changes are reflected in the original string. In R, the string values can be updated in the following way:
# Create a string
string <- "Hello, World!"

# Replace "World" with "Universe"
string <- gsub("World", "Universe", string)
print(string)
substring(string,2,3)<-"hi"
# Print the updated string
print(string)

Output:
[1] "Hello, Universe!"
[1] "Hhilo, Universe!"

Multiple strings can be updated at once, with the start <= end.
  • If the length of the substring is larger than the new string, only the portion of the substring equal to the length of the new string is replaced.
  • If the length of the substring is smaller than the new string, the position of the substring is replaced with the corresponding new string values.

R Multiline String

In R, we can also create a multiline string. However, at the end of each line break R will add "\n" to indicate a new line. For example,
# define multiline string 
message1 <-"R is awesome 
It is a powerful  language 
R can b used data science" 
# display multiline string 
cat(message1)

Output:
R is awesome 
It is a powerful  language 
R can b used data science

Check a String

Using grep()

grep() function returns the index at which the pattern is found in the vector. If there are multiple occurrences of the pattern, it returns a list of indices of the occurrences. This is very useful as it not only tells us about the occurrence of the pattern but also of its location in the vector.

Syntax: grep(pattern, string, ignore.case=FALSE)

Parameters: 
pattern: A regular expressions pattern.
string: The character vector to be searched.
ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.

Example : 
str <- c("Hello", "hello", "hi", "hey")
grep('hey', str)

output:
[1] 4

To find all instances of specific words in the string irrespective of case
str <- c("Hello", "hello", "hi", "hey")
grep('he', str, ignore.case ="True")
Output:
[1] 1 2 4

To find all instances of specific words in the string.Use the grepl() function to check if a character or a sequence of characters are present in a string:

Example
str <- "Hello World!"

grepl("H", str)
grepl("Hello", str)
grepl("X", str)

output:
[1] TRUE
[1] TRUE
[1] FALSE


Escape Sequences in R String

The escape sequence is used to escape some of the characters present inside a string.

Suppose we need to include double quotes inside a string.
# include double quote 
example1 <- "This is "R" programming" 
example1 # throws error

Since strings are represented by double quotes, the compiler will treat "This is " as the string. Hence, the above code will cause an error.

To solve this issue, we use the escape character \ in R.
# use the escape character 
example1 = "This is \"R\" programming" 
 # use of cat() to omit backslash 
cat(example1) # 
Output: 
[1] This is "R" programming

Now the program will run without any error. Here, the escape character will tell the compiler to ignore the character after \.

Note: Auto-printing will print the backslash in the output. To print without backlash we use the cat() function.

Here is a list of all the escape sequences supported by R.

Escape Sequences Character
\b    backspace
\\     plain backslash
\t     a horizontal tab
\n    line feed
\"    double quote

Note: A double-quoted string can have single quotes without escaping them. For example,
message <- "Let's code" 
 print(message) # 
Output : 
[1] "Let's Code"

Formatting numbers and string – format() function: 

This function is used to format strings and numbers in a specified style.

Syntax: 
    format(x, digits, nsmall, scientific, width, justify = c(“left”, “right”, “centre”, “none”))

Parameters:
x is the vector input.
digits here is the total number of digits displayed.
nsmall is the minimum number of digits to the right of the decimal point.
scientific is set to TRUE to display scientific notation.
width indicates the minimum width to be displayed by padding blanks in the beginning.
justify is the display of the string to left, right, or center.

Example:
# formatting numbers and strings

# Total number of digits displayed.
# Last digit rounded off.
result < - format(69.145656789, digits=9)
print(result)

# Display numbers in scientific notation.
result < - format(c(3, 132.84521),
scientific=TRUE)
print(result)

# The minimum number of digits
# to the right of the decimal point.
result < - format(96.47, nsmall=5)
print(result)

# Format treats everything as a string.
result < - format(8)
print(result)

# Numbers are padded with blank
# in the beginning for width.
result < - format(67.7, width=6)
print(result)

# Left justify strings.
result < - format("Hello", width=8,
justify="l")
print(result)

Output:
[1] "69.1456568" 
[1] "3.000000e+00" "1.328452e+02" 
[1] "96.47000" 
[1] "8" 
[1] " 67.7"
[1] "Hello "


Using Tidyverse module

In this method, we will use the Tidyverse module, which includes all the packages required in the data science workflow, ranging from data exploration to data visualization. stringr is a library that has many functions used for data cleaning and data preparation tasks. It is also designed for working with strings and has many functions that make this an easy process.

Detect the string

In this example, we will detect the string using str_detect() method.

Syntax: 
        str_detect( string, “text in string”)

Parameters:String is the vector input

Example:
string<-"Hello World"
str_detect( string, “Hello”)

Locate the string

In this example, we will detect the string using str_locate() method.

Syntax: 
            str_locate( string, “text in string”)

Parameters:String is the vector input
Example:
library(tidyverse)
string<-"Hello World"
str_locate(string, "World")
Output:
start end
7 11


Replace the string

In this example, we will detect the string using str_replace() method.

Syntax: str_replace( string, “text in string”)

Parameters:String is the vector input

Example:
library(tidyverse)
string<-"Hello World"
str_replace(string, "World", " MEC")

Output:
Hello MEC


Comments

Popular posts from this blog

Programming in R - Dr Binu V P

Introduction

R Data Types