R Programming By Example
上QQ阅读APP看书,第一时间看更新

Understanding interactions with correlations

The correlation is a measure of the linear relation among two variables. Its value ranges from -1, representing a perfect inverse relation, to 1, representing a perfect direct relation. Just as we created a matrix of scatter plots, we will now create a matrix of correlations, and resulting graph is shown below. Large circles mean high absolute correlation. Blue circles mean positive correlation, while red circles mean negative correlation.

To create this plot we will use the corrplot() function from the corrplot package, and pass it the correlations data computed by the cor() function in R, and optionally some parameters for the text labels (tl), such as color (color) and size (cex).

Variable Correlations

Now, let's look at the following code:

library(corrplot)
corrplot(corr = cor(data_numerical), tl.col = "black", tl.cex = 0.6)

If we look at the relation among the Proportion variable and the other variables, variables in large blue circles are positively correlated with it, meaning that the more that variable increases, the more likely it is for the Proportion variable to also increase. For examples of this type, look at the relations among AdultMeanAge and NoQuals with Proportion. If we find large red circles among Proportion and other variables, it means that the more that variable increases, the more Proportion is likely to decrease. For examples of this type, look at the relations among Age_25to29, Age_30to44, and L4Quals_plus with Proportion: