Blog-01-550x550

Tableau Prep Builder is a Tableau product that is designed in a way that could help anyone to quickly and confidently combine, shape, as well as clean their data for analysis. For this, you can start by connecting to your data from a variety of files, servers, or Tableau extracts; data from multiple data sources can be combined. You can bring your tables into the flow pane by drag and drop or double-click, and then clean and shape your data by using operations such as filter, split, rename, pivot, join and union.

Each step in the process is represented visually in a flow chart which can be created and controlled. You can check your work and make changes at any point in the flow. Tableau Prep Builder validates each operation.

Before the current release of Tableau prep, there were multiple requests and suggestions from users regarding the use of scripting language to transform the data, look up for additional information in the remote sources as well as to run complex machine learning algorithms on top of the inputs. Tableau, as always proved, has looked into the user requirements and has come up with a new feature. With the latest release of Tableau Prep Builder 2019.3.1, Tableau has brought up a Python and R integration feature which can add advanced scenarios to Tableau Prep. Using this feature, users can connect their scripts at any point or step in the flow and use up the full power of their scripting language as per their requirements.

Setting up TabPy

TabPy is an open source tool, used by the Tableau Prep Builder to execute Python code. TabPy, short for Tableau Python Server, is the tool through which the Python integration takes place. Script execution is an advanced feature, so the TabPy needs to be setup. You can click on TabPy installation for the detailed installation process.

Now if everything is followed and put together as expected, you will find a port number at the very bottom which is required for connecting to TabPy server. By default, the port value is ‘9004’ and server name ‘localhost’.

 

SVM classifier implementation

One of the most popular machine learning classification algorithms is the Support Vector Machine classifier. It is mostly used in addressing multi-classification problems. For example:

  • Given fruit features like color, size, taste, weight, shape, predicting the fruit type.
  • By analysing the skin, predicting the different skin disease.
  • Given Google news articles, predicting the topic of the article. This could be sport, movie, tech news related article, etc.
  • Classifying the twitter replies to different categories, for Sentiment Analysis.

We are going to use the Iris dataset to implement the SVM classifier. The Iris dataset was first time used in Fisher’s classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems. This dataset has four features of iris flower and one target class. The four features are SepalLengthCm, SepalWidthCm, PetalLengthCm and PetalWidthCm. The flower species type is the target class, and it is having three types- Setosa, Versicolor and Virginica.

The SVM classifier is implemented mainly so that the iris features can be used to train the SVM classifier, and this trained SVM model can be used to predict the Iris species type.  Now let’s see how this can be done.

You can be do this by connecting the data to Prep Builder and adding the prewritten python script to the flow. The script will read the data from Tableau Prep and will run the SVM classifier.

The ‘train data’ is used for training the SVM model and the flower species is predicted for the ‘test data’. The result for this is sent back to Prep Builder which we can then use for further analysis.

With this new scripting support in Prep Builder, it’s now easier than ever to implement complex data transformation scenarios which can go well beyond the built-in capabilities of Prep. This scripting feature can be used from simple calculations to complex machine learning models and fetching data from the internet.