Getting started as a Tableau Server Administrator - Part 1
I’ve been managing our company's Tableau Server environment for around 2 years now, but my role was also split as a DevOps engineer in the past. It’s only been in the last year or so that I’ve really re-focused my efforts purely as a Tableau Administrator. I thought a post about what I’ve learnt on the way would be useful for anyone who is either starting out or already an admin. I feel that if I knew some of the things that I do now, it would have saved a lot of time and provided us with a more robust platform sooner. I feel knowledge sharing is imperative - there is always something we can learn from others.
Who are Tableau?
Tableau provides a range of tools to allow companies to unlock the true value and potential of their data. Allowing people to truly explore and understand their data, provide a unified platform in collaboration with their other tools to share insights with others, whether it's publicly or privately.
There are 4 main Tableau Software products:
- Tableau Desktop – Allows analysts to connect to data, create dashboards and insights
- Tableau Server – Usually installed on another server. It allows users to view, collaborate and share workbooks created in Tableau Desktop
- Tableau Online – Tableau-hosted version of Tableau Server removing the overhead of server hosting and maintenance costs
- Tableau Public* – A public facing platform to allow users to create and publish workbooks to Tableau’s own servers for the public to interact and view
*Tableau Public only connects to a subset of data sources and limits the number of rows
Tell me more about Tableau Server
In my opinion, the purpose of Tableau Server is to provide analysts a means to distribute their work for others to view. Usually the flow of getting a workbook on the server would be:
- Open Tableau Desktop
- Connect to data
- Create visualisations
- Connect to server/site
- If needed publish workbook/data source
- View on Tableau Server or embedded webpage
There are obviously more detailed steps in between, but the above gives a high level overview of what is involved in getting a workbook onto Tableau Server.
What does it compose of?
Tableau Server is made up of many different cogs (some optional) that keep it running. I will do my best to explain the architecture without going too much into detail.
What is it?
This is request handler powered by Apache. Any requests sent by anything looking at the server go through here. You can only have 1 per machine. The default port is 80 but you can change this
Handles the web application, REST API calls, supports browsing and searching
Loads and renders views, computes and executes queries
This in-memory cache speeds user experience across many scenarios
Search & Browse
Deals with searching, retrieving and displaying the metadata on the server
Executes server tasks, including extract refreshes, subscriptions, and “Run Now” tasks, no matter where the request comes from
You can increase the number of backgrounder processes but bear in mind the increases resource needed to run it
Manages connections to Tableau Server data sources
Stores data extracts and answers queries
The Tableau Server File Store process is installed along with the Data Engine and controls the storage of extracts
Is a database that stores server data. This data includes information about Tableau Server users, groups and group assignments, permissions, projects, data sources, and extract metadata and refresh information
A lot of these processes are limited by resources and will only scale up if certain criteria is met.
It looks a lot easier when you represent the architecture as a diagram. Many of the components interact with each other, some of which are internal and some external (e.g. connectors, SDKs, APIs)
Below is a small diagram of how a server looks from the front end (if you use the base functionality as there are more elements you could add in):
A Tableau Server instance can contain one or more sites. And within the server you can have n users. You can assign users to a single or multiple sites. Within a site you can also have groups. This makes it easier when you have many users across different areas and want to group them together. These groups can be used to make your permissions model clearer too. Rather than explicitly adding a user to each workbook/data source. You can assign a group to content instead, that way you can just add and remove users from that group as you wish and the permissions will reflect in future.
Projects are a funny thing, it all depends on what the admin considers a project. There may be times where a project may become a site. The main thing you need to worry about is that data sources and workbooks are published to projects within a site. You are not able to access data sources/workbooks from on project to another. So its only useful to use this split if you require a level of isolation and distinction between the workbooks/data sources. Below is another diagram so you have a better understanding of how everything links together.
- Can Workbook A access data source B? – No, Workbook A is in a different project to data source B
- Who has access to Workbook B? – User B and User C as they are part of Group A who is assigned to have permissions to both the workbook and data source
- What happens if we delete group A? - User B and User C will no longer have access to any content as they were part of Group A
What do you want to do?
There might be many different reasons why you might want to use Tableau Server. I have outlined some scenarios which may help you pinpoint the way you organise your content:
Single site setup
- You have a small number of workbooks that can be split out easily into projects (e.g. department workbooks (i.e. marketing, sales).
- You want to share your data sources across workbooks.
This is probably the best way to start out. If in the future you find that you start publishing a lot of workbooks, it can get a bit difficult to manage and keep track. It might be a better idea to move to a different site setup if you can’t logically split them across projects.
If you are happy with a single site setup but don’t want to worry about the actual server management, Tableau Online would be your best solution
Multi site setup (small n of sites)
- If you want the highest level of isolation between sites.
- If you want to implement site level security (only give users access to a number of sites)
- If you don’t want to keep everything in a Default project or don’t want loads of projects (You might have a site per department and each department might have projects they can split workbooks across)
Multi site setup (large n of sites)
- The only time this option would be useful, is if again like for small n sites, you want site level isolation of content, and you have many logical clients you want to split the server up into (e.g. If you were working with many other companies, each one could have it’s own site)
What do you need to run it?
So like all software, there is always a hardware requirement attached to it. These are split into minimum required and minimum recommended. Required meaning that you need to match these specifications to install Tableau Server, and recommended being what you should have to run a production environment.
2-Core (4 vCPU)
8-Core (16 vCPU)
Tableau server can be set up as a single node or multi-node cluster. For smaller deployments with a low level of user activity, extract refresh and external requests a single node setup would be fine.
If you are planning on having a high volume of different complex workbooks with large/frequent data source refreshes and many of concurrent users, it would be better to have a multi-node cluster where you can split some of the earlier mentioned processes across the machines. This is so the workload can be balanced and maximise server resource utilisation without the risk of using it all up. You can also use a multi-node setup for high availability. Processes can be limited by your hardware as well (i.e. you cannot run 3 backgrounders on a 2-core machine)
Look here for examples of single/multi-node environments.
Logging and Monitoring
One of the things I wish I knew about earlier is how useful logging and monitoring Tableau Server was. This not only involves the application monitoring/logging but also host monitoring. This helps tie together any issues you might have (i.e. dropped sessions raised by logs, identified by maxed out RAM). I’ve outlined below what the main logs you should use:
- httpd – Apache logs. Look here for authentication entries. Very useful when you are using the REST API and/or trusted ticketing.
- vizqlserver – logs about loaded views and queries
- Backgrounder – Gives you information about server tasks such as refreshes, tabcmd and subscriptions.
- tabadmin.log – Contains logs about commands and processes that are run as part of tabadmin (e.g. starting and stopping of server, setting configurations, checking configurations)
Using a third-party logging agent such as LogEntries is very useful, because unless you load the data into another database, the logs won’t persist when you run a clean-up (to save disk space). LogEntries in particular has its own query language, you can read in the raw JSON log files and create custom dashboards and charts to give a visual representation of whats going on. You can even set up triggers to alert you when certain criterion is met (i.e. within a 20 minute period there have been many dropped sessions).
Tips and Tricks
There a number of tips and tricks that I have picked up during my time as a Tableau Server admin:
- When you purchase a Tableau Server license, you are actually able to install up to 3 instances; 1 production and 2 non-production. This can be useful for setting up a development environment or sandbox for new versions of Tableau Server
- You can gain access to the underlying postgreSQL database to create your own customised views
- You can export and import individual sites between servers (though they must be the same minor version (cannot do 10.1 to 10.2 but can do 10.1.4 to 10.1.10)