R Strings
Strings are a bunch of character variables. It is a one-dimensional array of characters. One or more characters enclosed in a pair of matching single or double quotes can be considered a string in R. Strings in R Programming represent textual content and can contain numbers, spaces, and special characters. An empty string is represented by using “. R Strings are always stored as double-quoted values. A double-quoted string can contain single quotes within it. Single-quoted strings can’t contain single quotes. Similarly, double quotes can’t be surrounded by double quotes.
Creation of String in R
R Strings can be created by assigning character values to a variable. These strings can be further concatenated by using various functions and methods to form a big string.
Example
# R program for String Creation
# creating a string with double quotes
str1 <- "OK1"
cat ("String 1 is : ", str1,"\n")
# creating a string with single quotes
str2 <- 'OK2'
cat ("String 2 is : ", str2,"\n")
str3 <- "This is 'acceptable and 'allowed' in R"
cat ("String 3 is : ", str3,"\n")
str4 <- 'Hi, Wondering "if this "works"'
cat ("String 4 is : ", str4,"\n")
str5 <- 'hi, ' this is not allowed'
cat ("String 5 is : ", str5)
Output:
String 1 is : OK1
String 2 is : OK2
String 3 is : This is 'acceptable and 'allowed' in R
ERROR!
String 4 is : Hi, Wondering "if this "works"
Error: unexpected symbol in "str5 <- 'hi, ' this"
Execution halted
Length of String
The length of strings indicates the number of characters present in the string. The function str_length() belonging to the ‘string’ package or nchar() inbuilt function of R can be used to determine the length of strings in R.
Using the str_length() function
The length of strings indicates the number of characters present in the string. The function str_length() belonging to the ‘string’ package or nchar() inbuilt function of R can be used to determine the length of strings in R.
Using the str_length() function
library(string)
# Calculating length of string
str_length("hello")
using nchar function
# R program to find length of string
# Using nchar() function
nchar("hello")
Accessing portions of an R string
The individual characters of a string can be extracted from a string by using the indexing methods of a string. There are two R’s inbuilt functions in order to access both the single character as well as the substrings of the string.
substr() or substring() function in R extracts substrings out of a string beginning with the start index and ending with the end index. It also replaces the specified substring with a new set of characters.
Syntax:
substr(..., start, end)
or
substring(..., start, end)
Using substr() function
# R program to access characters in string
str <- "R Programming"
# counts the number of characters of str
len <- nchar(str)
print(substr(str, 1, 6))
print(substr(str, len-2, len))
Output
[1] "R Prog"
[1] "ing"
Case Conversion
The R string characters can be converted to upper or lower case by R’s inbuilt function toupper() which converts all the characters to upper case, tolower() which converts all the characters to lower case, and casefold(…, upper=TRUE/FALSE) which converts on the basis of the value specified to the upper argument. All these functions can take in as arguments multiple strings too. The time complexity of all the operations is O(number of characters in the string).
Example
# R program to Convert case of a string
str <- "Hi LeArn CodiNG"
print(toupper(str))
print(tolower(str))
print(casefold(str, upper = TRUE))
Output:
[1] "HI LEARN CODING"
[1] "hi learn coding"
[1] "HI LEARN CODING"
By default, the value of upper in casefold() function is set to FALSE. If we set it to TRUE, the R string gets printed in upper case.
Using R’s paste function, you can concatenate strings. Here is a straightforward example of code that joins two strings together:
paste() function: This function is used to combine strings in R. It can take n number of arguments to combine together.
Syntax:
paste(…., sep = " ", collapse =NULL )
Parameters: …..:
Parameters: …..:
It is used to pass n no of arguments to combine together.
sep: It is used to represent the separator between the arguments. It is optional.
collapse: It is used to remove the space between 2 strings, But not space within two words in one string.
Example:
# concatenate two strings
str1 <- "hello"
str2 <- "how are you?"
print(paste(str1, str2, sep = " ", collapse = "NULL"))
Multiple strings can be updated at once, with the start <= end.
Check a String
Using grep()
Syntax: grep(pattern, string, ignore.case=FALSE)
Parameters:
To find all instances of specific words in the string irrespective of case
str <- c("Hello", "hello", "hi", "hey")
grep('he', str, ignore.case ="True")
sep: It is used to represent the separator between the arguments. It is optional.
collapse: It is used to remove the space between 2 strings, But not space within two words in one string.
Example:
# concatenate two strings
str1 <- "hello"
str2 <- "how are you?"
print(paste(str1, str2, sep = " ", collapse = "NULL"))
# Create two strings
string1 <- "Hello"
string2 <- "World"
# Concatenate the two strings
result <- paste(string1, string2)
# Print the result
print(result)
Output:
[1] "Hello World"
You can also concatenate multiple strings by passing them as separate arguments to the paste function, like this:
# Concatenate three strings
result <- paste("Hello", "to", "the World")
# Print the result
print(result)
Output:
[1] "Hello to the World"
R String formatting
String formatting in R is done via the sprintf function. An easy example of code that prepares a string using a variable value is provided below:
# Create two variables with values
x <- 42
y <- 3.14159
# Format a string with the two variable values
result <- sprintf("The answer is %d, and pi is %.2f.", x, y)
# Print the result
print(result)
Output:
[1] "The answer is 42, and pi is 3.14."
Updating R strings
The characters, as well as substrings of a string, can be manipulated to new string values. The changes are reflected in the original string. In R, the string values can be updated in the following way:
# Create a string
string <- "Hello, World!"
# Replace "World" with "Universe"
string <- gsub("World", "Universe", string)
print(string)
substring(string,2,3)<-"hi"
# Print the updated string
print(string)
Output:
[1] "Hello, Universe!"
[1] "Hhilo, Universe!"
- If the length of the substring is larger than the new string, only the portion of the substring equal to the length of the new string is replaced.
- If the length of the substring is smaller than the new string, the position of the substring is replaced with the corresponding new string values.
R Multiline String
In R, we can also create a multiline string. However, at the end of each line break R will add "\n" to indicate a new line. For example,
# define multiline string
message1 <-"R is awesome
It is a powerful language
R can b used data science"
# display multiline string
cat(message1)
Output:
R is awesome
It is a powerful language
R can b used data science
Using grep()
grep() function returns the index at which the pattern is found in the vector. If there are multiple occurrences of the pattern, it returns a list of indices of the occurrences. This is very useful as it not only tells us about the occurrence of the pattern but also of its location in the vector.
Syntax: grep(pattern, string, ignore.case=FALSE)
Parameters:
pattern: A regular expressions pattern.
string: The character vector to be searched.
ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.
Example :
string: The character vector to be searched.
ignore.case: Whether to ignore case in the search. Here ignore.case is an optional parameter as is set to FALSE by default.
Example :
str <- c("Hello", "hello", "hi", "hey")
grep('hey', str)
output:
[1] 4
To find all instances of specific words in the string irrespective of case
str <- c("Hello", "hello", "hi", "hey")
grep('he', str, ignore.case ="True")
Output:
[1] 1 2 4
Escape Sequences in R String
The escape sequence is used to escape some of the characters present inside a string.
Suppose we need to include double quotes inside a string.
# include double quote
To solve this issue, we use the escape character \ in R.
# use the escape character
To find all instances of specific words in the string.Use the grepl() function to check if a character or a sequence of characters are present in a string:
Example
str <- "Hello World!"
grepl("H", str)
grepl("Hello", str)
grepl("X", str)
Example
str <- "Hello World!"
grepl("H", str)
grepl("Hello", str)
grepl("X", str)
output:
[1] TRUE
[1] TRUE
[1] FALSE
Escape Sequences in R String
The escape sequence is used to escape some of the characters present inside a string.
Suppose we need to include double quotes inside a string.
# include double quote
example1 <- "This is "R" programming"
example1 # throws error
Since strings are represented by double quotes, the compiler will treat "This is " as the string. Hence, the above code will cause an error.
To solve this issue, we use the escape character \ in R.
# use the escape character
example1 = "This is \"R\" programming"
# use of cat() to omit backslash
cat(example1)
#
Output:
[1] This is "R" programming
Here is a list of all the escape sequences supported by R.
Escape Sequences Character
\b backspace
\\ plain backslash
\t a horizontal tab
\n line feed
\" double quote
Note: A double-quoted string can have single quotes without escaping them. For example,
message <- "Let's code"
Now the program will run without any error. Here, the escape character will tell the compiler to ignore the character after \.
Note: Auto-printing will print the backslash in the output. To print without backlash we use the cat() function.
Here is a list of all the escape sequences supported by R.
Escape Sequences Character
\b backspace
\\ plain backslash
\t a horizontal tab
\n line feed
\" double quote
Note: A double-quoted string can have single quotes without escaping them. For example,
message <- "Let's code"
print(message)
#
Output :
[1] "Let's Code"
This function is used to format strings and numbers in a specified style.
Syntax:
Syntax:
format(x, digits, nsmall, scientific, width, justify = c(“left”, “right”, “centre”, “none”))
Parameters:
Parameters:
x is the vector input.
digits here is the total number of digits displayed.
nsmall is the minimum number of digits to the right of the decimal point.
scientific is set to TRUE to display scientific notation.
width indicates the minimum width to be displayed by padding blanks in the beginning.
justify is the display of the string to left, right, or center.
Example:
digits here is the total number of digits displayed.
nsmall is the minimum number of digits to the right of the decimal point.
scientific is set to TRUE to display scientific notation.
width indicates the minimum width to be displayed by padding blanks in the beginning.
justify is the display of the string to left, right, or center.
Example:
# formatting numbers and strings
# Total number of digits displayed.
# Last digit rounded off.
result < - format(69.145656789, digits=9)
print(result)
# Display numbers in scientific notation.
result < - format(c(3, 132.84521),
scientific=TRUE)
print(result)
# The minimum number of digits
# to the right of the decimal point.
result < - format(96.47, nsmall=5)
print(result)
# Format treats everything as a string.
result < - format(8)
print(result)
# Numbers are padded with blank
# in the beginning for width.
result < - format(67.7, width=6)
print(result)
# Left justify strings.
result < - format("Hello", width=8,
justify="l")
print(result)
Output:
[1] "69.1456568" [1] "3.000000e+00" "1.328452e+02"
[1] "96.47000"
[1] "8"
[1] " 67.7"
[1] "Hello "
Using Tidyverse module
In this method, we will use the Tidyverse module, which includes all the packages required in the data science workflow, ranging from data exploration to data visualization. stringr is a library that has many functions used for data cleaning and data preparation tasks. It is also designed for working with strings and has many functions that make this an easy process.
In this example, we will detect the string using str_detect() method.
Syntax:
str_detect( string, “text in string”)
Parameters:String is the vector input
Parameters:String is the vector input
Example:
string<-"Hello World"
str_detect( string, “Hello”)
In this example, we will detect the string using str_locate() method.
Syntax:
str_locate( string, “text in string”)
Parameters:String is the vector input
Parameters:String is the vector input
Example:
library(tidyverse)string<-"Hello World"
str_locate(string, "World")
str_locate(string, "World")
Output:
start end
7 11
Replace the string
In this example, we will detect the string using str_replace() method.
Syntax: str_replace( string, “text in string”)
Parameters:String is the vector input
Example:
library(tidyverse)
string<-"Hello World"
str_replace(string, "World", " MEC")Output:
Hello MEC
Comments
Post a Comment