R Histogram
R Histogram
A histogram is a graphical display of data using bars of different heights.Histogram is used to summarize discrete or continuous data that are measured on an interval scale.Create Histogram in R
In R, we use the hist() function to create Histograms.
For example,
temperatures <- c(67 ,72 ,74 ,62 ,76 ,66 ,65 ,59 ,61 ,69 )
temperatures <- c(67 ,72 ,74 ,62 ,76 ,66 ,65 ,59 ,61 ,69 )
# histogram of temperatures vector
result <- hist(temperatures)
print(result)
Output:
In the above example, we have used the hist() function to create a histogram of the temperatures vector.
The histogram we have created above is plain and simple, we can add so many things to the Histogram.
To add a title and a label to our Histogram in R, we pass the main and the xlab parameter respectively inside the hist() function.
Change Bar Color of Histogram in R
In R, we pass the col parameter inside hist() to change the color of bars. For example,
temperatures <- c(67 ,72 ,74 ,62 ,76 ,66 ,65 ,59 ,61 ,69 )
# histogram of temperatures vector
result <- hist(temperatures,
main = "Histogram of Temperature",
xlab = "Temperature in degrees Fahrenheit",
col = "red")
print(result)
Output:
Range of Axes in R
To provide a range of the axes in R, we pass the xlab and the ylab parameter inside hist(). For example,
temperatures <- c(67 ,72 ,74 ,62 ,76 ,66 ,65 ,59 ,61 ,69 )
# histogram of temperatures vector
result <- hist(temperatures,
main = "Histogram of Temperature",
xlab = "Temperature in degrees Fahrenheit",
col = "red",
xlim=c(50,100),
ylim=c(0,5)
)
print(result)
Output:
Let us use the built-in dataset airquality which has Daily air quality measurements in New York, May to September 1973.-R documentation.
> str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
We will use the temperature parameter which has 154 observations in degree Fahrenheit.
Example 1: Simple histogram
Temperature <- airquality$Temp
hist(Temperature)
We can see above that there are 9 cells with equally spaced breaks. In this case, the height of a cell is equal to the number of observation falling in that cell.
We can pass in additional parameters to control the way our plot looks. You can read about them in the help section ?hist.
Some of the frequently used ones are, main to give the title, xlab and ylab to provide labels for the axes, xlim and ylim to provide range of the axes, col to define color etc.
Additionally, with the argument freq=FALSE we can get the probability distribution instead of the frequency.
Example:
# histogram with added parameters
Temperature <- airquality$Temp
hist(Temperature,
main="Maximum daily temperature at Kochin Airport",
xlab="Temperature in degrees Fahrenheit",
xlim=c(50,100),
col="darkmagenta",
freq=FALSE
)
Note that the y axis is labelled density instead of frequency. In this case, the total area of the histogram is equal to 1.
The hist() function returns a list with 7 components.breaks-places where the breaks occur,
counts-the number of observations falling in that cell,
counts-the number of observations falling in that cell,
intensities-the probabilities
density-the density of cells,
density-the density of cells,
mids-the midpoints of cells,
xname-the x argument name and
equidist-a logical value indicating if the breaks are equally spaced or not.
We can use these values for further processing.
xname-the x argument name and
equidist-a logical value indicating if the breaks are equally spaced or not.
We can use these values for further processing.
For example, in the following example we use the return values to place the counts on top of each cell using the text() function.
# histogram with added parameters
Temperature <- airquality$Temp
h <- hist(Temperature,ylim=c(0,40),
text(h$mids,h$counts,labels=h$counts, adj=c(0.5, -0.5)))
Output:
Defining the Number of Breaks
With the breaks argument we can specify the number of cells we want in the histogram. However, this number is just a suggestion.
R calculates the best number of cells, keeping this suggestion in mind. Following are two histograms on the same data with different number of cells.
# histogram with added parameters
Temperature <- airquality$Temp
hist(Temperature, breaks=4, main="With breaks=4")
hist(Temperature, breaks=20, main="With breaks=20")
In the above figure we see that the actual number of cells plotted is greater than we had specified.
We can also define breakpoints between the cells as a vector. This makes it possible to plot a histogram with unequal intervals. In such case, the area of the cell is proportional to the number of observations falling inside that cell.
Comments
Post a Comment