Tableau despite being an excellent tool to quickly visualize the data can also be used for creation and verification of Linear regression models used for predictive analytics. The ability of Tableau to integrate with external statistical languages like Python or R allows it to use the Regression models built in those languages to directly be used in Tableau.
Integration of R and Tableau
Download and install software:
To integrate R with Tableau, we would need R Studio:
R download link: https://cran.r-project.org/bin/windows/base/
R Studio download link: https://www.rstudio.com/products/rstudio/download/
We would also need Tableau desktop: https://www.tableau.com/products/desktop
Open R Studio and Type below commands on R command line:
install.packages(“Rserve”);
library(Rserve);
Rserve()
Open Tableau desktop and goto Help Menu -> Settings and performance -> Manage External Service Connection
Select Localhost and port 6311
Test Connection and Ok.
Development of Linear Regression Model:
After integrating RServe and Tableau, we are all set to embed the R code for linear regression model creation into R calculated fields.
The sample data used here is an open source data available for download from Duke university’s website: http://www2.stat.duke.edu/~mc301/data/movies.html
The data contains a sample of 651 movies, their reviews, critics score etc. (The data dictionary is also present at above link).
Let us try to develop a regression model to predict the audience score from various other dependent variables like IMDB Rating or Critics Score.
We will first analyze the relationships among these variables via a scatter plot among them in Tableau:
Above figure shows two plots:
IMDB votes Vs Audience Score
Critics Score Vs Audience Score
Clearly, Critics Score seems to have a greater linear relationship with Audience Score. That simply means that Critics Score is a better predictor of Audience Score rather than IMDB Votes.
Let us write a calculated field called “Predicted Audience Score”
Tableau’s SCRIPT_REAL function can be used to embed R or Python code in Tableau’s calculation.
Here we have used Critics Score to Predict Audience Score.
Let us plot the Predicted Audience Score Vs Audience Score.
As can be seen clearly the plot of the Predicted Audience Score Vs Audience Score comes out as a perfect straight line.
Conclusion:
Given above is a very basic example of achieving a simple linear regression model using Tableau and R.
The advanced and much more sophisticated Linear regression model has been developed in R and can be located at below GitHub URL:
https://github.com/shashibhushan86/Linear_Regression/blob/master/reg_model_project.Rmd
Let’s have a 30-minute conversation—no slides, no sales pressure, just real talk about your biggest opportunities.