R Factors
Factor is a data structure used for fields that takes only predefined, finite number of values (categorical data). For example: a data field such as marital status may contain only values from single, married, separated, divorced, or widowed.
In such case, we know the possible values beforehand and these predefined, distinct values are called levels. Examples of factors are:
Demography: Male/Female
Music: Rock, Pop, Classic, Jazz
Training: Strength, Stamina
To create a factor, use the factor() function and add a vector as argument:
Example:
# Create a factor
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
# Print the factor
music_genre
Output:
[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
You can see from the example above that that the factor has four levels (categories): Classic, Jazz, Pop and Rock.
To only print the levels, use the levels() function:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
levels(music_genre)
Output:
[1] "Classic" "Jazz" "Pop" "Rock"
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
str(music_genre)
Output:
Factor w/ 4 levels "Classic","Jazz",..: 2 4 1 1 3 2 4 2
We see that levels are stored in a character vector and the individual elements are actually stored as indices.
Factors are also created when we read non-numerical columns into a data frame.
By default, data.frame() function converts character vector into factor. To suppress this behavior, we have to pass the argument stringsAsFactors = FALSE.
Factor Length
Use the length() function to find out how many items there are in the factor:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
length(music_genre)
Output:
[1] 8
To access the items in a factor, refer to the index number, using [] brackets:
Example:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
#access third item
music_genre[3]
#access first and second item
music_genre[c(1,2)]
#access all except first item
music_genre[-1]
#use the logical vector to access
music_genre[c(TRUE, FALSE, FALSE, TRUE,FALSE,FALSE,FALSE,FALSE)]
Output:
[1]Classic
Levels: Classic Jazz Pop Rock
[1] Jazz Rock
Levels: Classic Jazz Pop Rock
[1] Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
[1] Jazz Classic
Levels: Classic Jazz Pop Rock
Levels can also be predefined by the programmer.
# Creating a factor with levels defined by programmer
gender <- factor(c("female", "male", "male", "female"),
levels = c("female", "transgender", "male"));
gender
Output:
[1] female male male female
Levels: female transgender male
Change Item Value
To change the value of a specific item, refer to the index number:
Example:
To change the value of a specific item, refer to the index number:
Example:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
music_genre[3] <- "Pop"
music_genre
Output:
[1] Jazz Rock Pop Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
Note that you cannot change the value of a specific item if it is not already specified in the factor. The following example will produce an error:
Example
Trying to change the value of the third item ("Classic") to an item that does not exist/not predefined ("Opera"):
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))music_genre[3] <- "Opera"
music_genre[3]
Output:
Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "Opera") :
invalid factor level, NA generated
[1] <NA>
Levels: Classic Jazz Pop Rock
However, if you have already specified it inside the levels argument, it will work:
Example
Change the value of the third item:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"), levels = c("Classic", "Jazz", "Pop", "Rock", "Opera"))
music_genre[3] <- "Opera"
music_genre[3]
Comments
Post a Comment