Correlation Coefficient in Power BI: The Easy Way
Apr 24, 2024Correlation Coefficient in Power BI: The Easy Way
Correlation is a measure of how strongly two variables are related. It tells us how two variables “co”-relate with each other. It is a useful metric in data analysis as it provides us with a way to predict the value of a variable given another.
For example, we may observe that when we sleep for longer periods, we tend to be more productive during the day. This means there is a relationship between hours of sleep, and another variable, productivity. The more sleep we have, the higher our productivity.
In business, you will also see that when a product is expensive you are less likely to purchase it. Thus, we can say that when the price of a product is high, the quantity of the product sold will, on average, be low.
When two variables move in the same direction, they are said to have a positive relationship or are positively correlated (sleep and productivity). And when they are moving in opposite directions, they are negatively correlated (price and quantity). But sometimes you want to know the extent to which they are correlated.
Correlation coefficient is the statistical measure to quantify the strength of this relationship. The most common type of correlation coefficient is the Pearson correlation coefficient, usually represented with (r) and is computed with the following formula:
Correlation Coefficient: Simple Definition, Formula, Easy Calculation Steps (statisticshowto.com)
Now, you don’t need to know the scary formula to calculate correlation coefficient. In Excel, there is a function, CORREL, that computes the coefficient. However, in Power BI, we do not have the CORREL function. Typically, most people try to replicate that gigantic formula in Power BI to compute the correlation coefficient.
In this article, I show you how to do this without replicating the formula. Rather, we use the regression parameter outputs of the LINEST function in DAX (Data Analysis Expressions) to achieve the desired results.
R-Squared and the linear regression equation
From elementary statistics or Math, we must have seen this function: y=mx + c. The equation of a line. Which basically shows the relationship between ‘y’ and ‘x’. The parameter that tells us how much y is affected by x is m also known as the slope.
To calculate this slope, we estimate a linear regression equation using a statistical method called least squares estimation which you can do in any statistical software. The software will usually give us the slope value and a bunch of other parameters. We are not focusing on the technical details of regression in this article. However, for this computation of the correlation coefficient we are interested in one of these many parameters called R-Squared.
R-Squared measures how good our equation best describes the dependency between y and x. It has its own complex formula but as the name it implies, it is really the square of “R”. What is R? The correlation coefficient!
This means if we can get the R-Squared, we can take the square root of that to get our correlation coefficient. That is the focus of this article. Since we are not using statistical software. How then do we compute this R-Squared? Introducing the LINEST function in Power BI.
The LINEST function in Power BI
The LINEST function is a DAX function that uses the least squares method to compute the line that best fits the relationship between two variables.
Let’s look at the sample data for this exercise. We have a list of prices and the respective quantities sold. Plotting these values in a scatterplot we see that there is a negative relationship between price and quantity shown by the line sloping downwards in the chart. When the price is high, the quantity sold tends to be lower.
In Excel, to compute the correlation coefficient for this dataset, we would simply use the CORREL function. Using this data, we get a correlation coefficient of -0.46. The negative sign tells us that price and quantity are negatively correlated. And 0.46 tells us that it’s a moderately strong negative relationship. The closer to 1, the stronger the correlation.
correlation co-efficient in excel
With this data in Power BI, let’s see what the LINEST function gives us. The LINEST function is a table function, meaning that the output of the function is a table and not a scalar value like the output of the SUM or MAX function.
To see the results of this function we have to write DAX queries in the DAX Query view (what is DAX query view?). The LINEST function requires only two compulsory arguments, the X-Values and the Y-values. For this exercise, we are not particular about which is Y or X because we are not interested in the effect of Y on X, only the strength of the correlation.
The result of the LINEST function gives us the following. A table with a lot of columns. We are only interested in two columns, the slope and the R-Squared. The R-Squared is also known as the Coefficient of Determination and in the table it is called “CoefficientOfDetermination”.
Focusing on these two metrics, we get a slope of -0.24 and an R-Squared of 0.21. With these two values, we are to calculate; the square root of the R-Squared which gives the value correlation coefficient. But remember, that a square root is +/-. The square root therefore gives us the strength of the relationship but not the direction of the relationship. The sign of the slope is how we determine the direction of the relationship. If it’s positive, it’s a positive relationship other wise it’s negative. With these two, we can now calculate the correlation coefficient.
The DAX Measure
We can start creating the measure in bits. First let’s get the value of the R-Squared (CoefficientOfDetermination).
We start by storing the result of LINEST in a table. Next, we extract the value in the “CoefficientOfDetermination” column from the LINEST results using SELECTCOLUMNS and MAXX. We repeat the process for the slope by extracting the “Slope1” column from the LINEST results.
Next, we compute the square root of the R-Squared using the SQRT function.
Now we have the value of the correlation coefficient. The only thing left to do is to determine the direction of the relationship from the slope. To do that we use the SIGN function.
To get the final result we simply multiply the Direction with the square root of the R-squared and we have our correlation coefficient.
Conclusion
And there you have it. With a little bit of understanding of statistics and some simple DAX functions, we can calculate the correlation coefficient easily without having to write many lines of DAX code and replicate a complex formula.