Data Analytics Bootcamp
  • Syllabus
  • Statistical Thinking
  • SQL
  • Python
  • Tableau
  • Lab
  • Capstone
  1. Tableau
  2. Tableau
  3. Tableau Session 03: Advanced Analytics
  • Syllabus
  • Statistical Thinking
    • Statistics
      • Statistics Session 01: Data Layers and Bias in Data
      • Statistics Session 02: Data Types
      • Statistics Session 03: Probabilistic Distributions
      • Statistics Session 04: Probabilistic Distributions
      • Statistics Session 05: Sampling
      • Statistics Session 06: Inferential Statistics
      • Slides
        • Course Intro
        • Descriptive Stats
        • Data Types
        • Continuous Distributions
        • Discrete Distributions
        • Sampling
        • Hypothesis Testing
  • SQL
    • SQL
      • Session 01: Intro to Relational Databases
      • Session 02: Intro to PostgreSQL
      • Session 03: DA with SQL | Data Types & Constraints
      • Session 04: DA with SQL | Filtering
      • Session 05: DA with SQL | Numeric Functions
      • Session 06: DA with SQL | String Functions
      • Session 07: DA with SQL | Date Functions
      • Session 08: DA with SQL | JOINs
      • Session 09: DA with SQL | Advanced SQL
      • Session 10: DA with SQL | Advanced SQL Functions
      • Session 11: DA with SQL | UDFs, Stored Procedures
      • Session 12: DA with SQL | Advanced Aggregations
      • Session 13: DA with SQL | Final Project
      • Slides
        • Intro to Relational Databases
        • Intro to PostgreSQL
        • Basic Queries: DDL DLM
        • Filtering
        • Numeric Functions
        • String Functions
        • Date Functions
        • Normalization and JOINs
        • Temporary Tables
        • Advanced SQL Functions
        • Reporting and Analysis with SQL
        • Advanced Aggregations
  • Python
    • Python
      • Session 01: Programming for Data Analysts
      • Session 02: Python basic Syntax, Data Structures
      • Session 03: Introduction to Pandas
      • Session 04: Advanced Pandas
      • Session 05: Intro to Data Visualization
      • Session 06: Data Visualization
      • Session 07: Working with Dates
      • Session 08: Data Visualization | Plotly
      • Session 09: Customer Segmentation | RFM
      • Session 10: A/B Testing
      • Session 11: Cohort Analysis
      • Session 12: Simple Linear Regression and Forecasting
      • Session 13: Logistic Regression
      • Session 14: Clustering
      • Session 15: Geoanalytics
      • Session 16: SQL Alchemy
      • Slides
        • Grammar of Graphics
        • Data Analyst
  • Tableau
    • Tableau
      • Tableau Session 01: Introduction to Tableau
      • Tableau Session 02: Intermediate Visual Analytics
      • Tableau Session 03: Advanced Analytics
      • Tableau Session 04: Dashboard Design & Performance
      • Slides
        • Data Analyst
        • Data Analyst
        • Data Analyst
        • Data Analyst

On this page

  • Learning Goals
  • Advanced Table Calculations
    • Combining Aggregation with Table Calculations
    • Nested Table Calculations
    • Conditional Table Calculations
    • Window Functions
    • Difference and Percent Difference
    • Multi-Level Calculations
    • Interaction with View Structure
    • Table Calculations
    • Partitioning Fields (Scope)
    • Addressing Fields (Direction)
    • How Partitioning and Addressing Work Together
    • Common Table Calculations
    • Specific Dimensions vs Compute Using
    • More Examples of Table Calculations
  • Date Functions
    • Date Parts
    • Core Date Functions
    • What Is [start_of_week]?
    • The Date Literal (#)
  • Date Parameters
    • Conceptual Understanding
    • How Date Parameters Differ from Filters
    • Custom N Date Part Selection
    • Dynamic KPI Calculations
    • Dynamic Time Aggregation
    • Multi-Source Dashboards
  • Spatial Analytics (spatial relationships, spatial joins, spatial functions)
    • Geographic Data Formats
  • Dataset for Spatial Analysis
  • Spatial Relationships
    • Steps to Create Spatial Relationship
  • Spatial Joins
    • Steps to Create Spatial Join
    • Troubleshooting Spatial Joins
  • Spatial Functions
    • Common Use Cases
    • Examples
    • Spatial Functions Reference
  • Mapping in Tableau (Map Layers, Map Styling & Configuration)
    • Geographical data configuration
    • Mapping
    • Map Styling and Layering
    • Basic Map Creation
    • Map with layers, polygons, points, and lines
    • Proportional Symbol Map
    • Density Map
  • Cohort Analysis in Tableau
    • Cohort Analysis Concept
    • Cohort Analysis Workflow in Tableau
    • Cohort Analysis Example
    • Cohort Definition
    • Cohort Analysis Logic
    • Retention Rate Calculation (Important)
    • Example Interpretation
    • Additional Analysis
  • Cleaning and Reshaping Data in Tableau
    • Data Cleaning in Tableau
    • Handling Missing Values (NULL Values)
    • Correcting Data Types
    • Removing Duplicate Records
    • Splitting Columns
    • Creating Hierarchies
    • Using Tableau Prep for Advanced Data Preparation
    • When Data Cleaning Should Be Done Outside Tableau
    • Data Reshaping in Tableau
    • Pivoting Data in Tableau
    • Example Dataset: Olympic Medals
    • Pivoting Medal Columns
    • Steps in Tableau
    • Benefits of Pivoting
    • Analytical Questions Enabled by Pivoting
    • Tableau Prep vs Tableau Desktop
    • Summary
    • Best Practices for Data Cleaning and Reshaping
  1. Tableau
  2. Tableau
  3. Tableau Session 03: Advanced Analytics

Tableau Session 03: Advanced Analytics

ADVANCED CALCULATIONS
DATE FUNCTIONS
COHORT ANALYSIS
SPATIAL ANALYTICS
DATA MODELING
TABLEAU PREP

Learning Goals

  • Use advanced Tableau functions for analytical modeling
  • Build complex calculated fields
  • Apply table calculations (running totals, percent of total, ranking, differences, moving averages)
  • Work with advanced date logic and date parameters
  • Build cohort and retention analysis views
  • Perform spatial analysis and spatial joins
  • Connect and visualize geographic data
  • Apply Tableau data modeling concepts from Session 2 in more complex use cases
  • Build advanced KPI dashboards and analytical heatmaps

In previous sessions, we focused on building visualizations, working with filters and parameters, and creating calculated fields. We also introduced key concepts such as relationships, joins, and Level of Detail expressions, which form the foundation of analytical work in Tableau.

In this session, we move from building charts to building analytical logic inside Tableau. Instead of focusing only on how data is displayed, we focus on how calculations are performed, how different components interact, and how to ensure that results remain accurate across different analytical scenarios.

This shift allows us to move from simple dashboards to more advanced analytical systems, where calculations, date logic, and data modeling work together to answer more complex business questions.

Advanced Table Calculations

In real analytical scenarios, calculations rarely rely on a single function. Instead, they combine multiple layers of logic, including aggregation, table calculations, conditional expressions, and sometimes date logic. These are referred to as complex calculations.

Complex calculations are used when simple aggregations such as SUM or AVG are not sufficient to answer analytical questions. They allow analysts to model behavior such as growth rates, comparisons across time, cumulative metrics, and conditional ranking.

A key characteristic of complex calculations is that they often combine multiple computation stages, where:

  • Some parts are calculated at the row or aggregate level
  • Other parts are computed as table calculations after aggregation

Understanding this layered computation is essential for building correct analytical models.


Combining Aggregation with Table Calculations

One of the most common patterns in complex calculations is combining aggregation with table calculations.

For example, calculating the percentage contribution of each category over time:

RUNNING_SUM(SUM([Order Total])) / WINDOW_SUM(SUM([Order Total]))

This calculation combines:

  • Aggregation: SUM([Order Total])
  • Running accumulation: RUNNING_SUM
  • Total window comparison: WINDOW_SUM

This type of calculation is often used to understand cumulative contribution.


Nested Table Calculations

Tableau allows nesting of table calculations, where one table calculation is used inside another.

For example, calculating percent difference between consecutive running totals:

RUNNING_SUM(SUM([Order Total])) - LOOKUP(RUNNING_SUM(SUM([Order Total])), -1)

This calculation combines:

  • A running total
  • A lookup to the previous value

Nested calculations like this are powerful but require careful configuration of partitioning and addressing.


Conditional Table Calculations

Complex calculations often include conditional logic to control behavior dynamically.

For example, showing only positive growth:

IF SUM([Order Total]) - LOOKUP(SUM([Order Total]), -1) > 0 THEN
    SUM([Order Total]) - LOOKUP(SUM([Order Total]), -1)
END

This type of logic allows analysts to highlight specific patterns such as growth periods.


Window Functions

Window functions perform calculations across a defined range of data within a partition.

These are essential for complex analytics.

Moving Average

WINDOW_AVG(SUM([Order Total]), -3, 0)

Calculates average over the last 4 periods (offset -3 to 0 means the current period plus 3 previous periods = 4 total).


Window Sum

WINDOW_SUM(SUM([Order Total]))

Calculates total within the partition.


Window Max

WINDOW_MAX(SUM([Order Total]))

Finds the maximum value within a window.


Difference and Percent Difference

Complex calculations often involve comparing values across time or categories.

Difference

SUM([Order Total]) - LOOKUP(SUM([Order Total]), -1)

Percent Difference

(SUM([Order Total]) - LOOKUP(SUM([Order Total]), -1)) 
/ LOOKUP(SUM([Order Total]), -1)

These calculations are used for:

  • Growth analysis
  • Trend comparison
  • Performance tracking

Multi-Level Calculations

Complex calculations often operate across multiple levels of detail within the same view.

For example:

  • Category-level aggregation
  • Within-region ranking
  • Across-time accumulation

This requires careful control of partitioning and addressing to ensure that calculations are applied correctly at each level.


Interaction with View Structure

One of the most important aspects of complex table calculations is that they are highly dependent on the structure of the view.

Changes such as:

  • Adding a dimension
  • Changing sort order
  • Modifying layout

can significantly alter the result.

Because of this, it is important to always validate:

  • Partitioning fields
  • Addressing fields
  • Compute Using configuration

Table Calculations

When you add a table calculation, you must account for all dimensions in the level of detail.
Each dimension must be used either for:

  • Partitioning (scoping), or
  • Addressing (direction)

Partitioning Fields (Scope)

Partitioning fields define how the data is grouped before the table calculation is applied.

  • They break the view into multiple sub-tables (partitions)
  • The table calculation is performed separately within each partition
  • They determine the scope of the calculation

In other words, partitioning controls where the calculation resets.


Addressing Fields (Direction)

Addressing fields define how the calculation moves within each partition.

  • They determine the direction of the calculation
  • They control the sequence of marks used in calculations such as:
    • Running totals
    • Differences between values
    • Percent change

In short, addressing controls how the calculation progresses.


How Partitioning and Addressing Work Together

  1. Partitioning fields split the view into multiple sub-views (sub-tables).
  2. The table calculation is applied independently inside each partition.
  3. The addressing fields determine the direction in which the calculation moves through the marks within each partition.

For example:

  • In a running total, addressing determines the order of accumulation.
  • In a difference calculation, addressing determines which value is compared to which.

Running Total Example

Difference Calculation Example

Common Table Calculations

Table calculations are widely used in analytical dashboards to understand trends, comparisons, and rankings.

Running Total

Running totals accumulate values across a defined direction.

RUNNING_SUM(SUM([Order Total]))

This is commonly used to track cumulative metrics such as revenue growth over time.


Percent of Total

Percent of total calculates each value’s contribution relative to the total.

SUM([Order Total]) / TOTAL(SUM([Order Total]))

This is useful for understanding share distribution across categories or regions.


Rank

Ranking assigns a position to values based on a selected measure.

RANK(SUM([Order Total]))

This is often used for identifying top or bottom performers.


Specific Dimensions vs Compute Using

When using Compute Using options, Tableau automatically assigns some dimensions as:

  • Partitioning fields
  • Addressing fields

However, when selecting Specific Dimensions, you must manually decide:

  • Which dimensions define the partition (scope)
  • Which dimensions define the addressing (direction)

Rank: The direction of movement

Rank: Specific Dimension

In the Specific Dimensions section of the Table Calculation dialog:

  • The order of fields determines the direction of movement through the marks: Pane (Down)
  • Checked dimensions define how Tableau computes the table calculation across the view: Year of Order Date, Region

Tip

A helpful mental model:

  • Partitioning = Where does the calculation reset?
  • Addressing = In what order does it move?

Understanding this distinction is essential for correctly configuring running totals, rankings, percent differences, and other table calculations.


More Examples of Table Calculations

Using Table Calculations on Marks card fields

When you place a table calculation on the Marks card Color, the table will be colored based on the measure on the view SUM(Revenue), while the table calculation on Color SUM(Total Quantity) will determine how those values are visually encoded.

Marks Card Table Calculation

Year-over-year growth is another common use case. If you place SUM(Revenue) on the view and apply a table calculation for year-over-year growth, Tableau will compute the change in revenue across periods based on the addressing direction.

Year over Year Growth Calculation

Year over Year Growth

Using Table Calculations on Rows or Columns fields

When you place a table calculation on the Rows or Columns shelf, Tableau modifies the structure of the visualization.

For example, adding a moving average to SUM(Revenue) introduces a smoothed trend line that helps identify patterns over time.

Moving Average Example

Using Table Calculations on Filters

When you place a table calculation on the Filters shelf, Tableau filters the results after aggregation.

For example, applying a ranking calculation allows you to filter the view to show only the Top N values based on a measure such as Revenue.

This is commonly used in scenarios where users want to focus on the highest-performing categories, customers, or regions.


Date Functions

Dates are a common element in most data sources.
If a field contains recognizable dates, Tableau automatically assigns it a Date or Date & Time data type and enables special functionality.

When date fields are used in visualizations, Tableau provides:

  • Automatic date hierarchy (Year → Quarter → Month → Day)
  • Date-specific filtering options
  • Continuous and discrete date options
  • Specialized date formatting

Date functions are used to manipulate date values, not just change how they are displayed.

If you only want to change appearance (for example, show 01/09/24 instead of September 01, 2024), use formatting — not a calculation.

Date Formatting Action 1

Date Formatting Action 2

Date Parts

Many date functions use a date_part argument.

Common date parts include:

Datepart

Core Date Functions

DATE

Date converts a string to a date. It can also be used to truncate a datetime to a date.

DATE(string) 

DATEADD

DateAdd adds a specified time interval to a date.

DATEADD(date_part, interval, date)

Example: Calculate Expected Delivery Date

Objective: Add 5 days to Order Date to estimate delivery.

DATEADD('day', 5, [Order Date])

Expected Delivey Date

Example: Rolling 12-Months Window

Objective: Filter records from the last 12 months dynamically.

[Order Date] >= DATEADD('month', -12, TODAY())

Filtering Last 12 Months

DATEDIFF

Datediff calculates the difference between two dates in specified date parts.

DATEDIFF(date_part, start_date, end_date, [start_of_week])

Optional parameter:
[start_of_week] defines week beginning (Sunday, Monday, etc.)

Example: Time to First Order

Objective: Measure when customers place their first order after signing up.

First order date per customer:

{ FIXED [Customer ID] : MIN([Order Date]) }

Months between signup and first order:

DATEDIFF('month', [Signup Date], [First Order Date])

Sign Up, First Order

DATENAME

DateName returns the name of a date part as a string. For example, if you want to extract the month from a date, DATENAME will return the name of the month (e.g., “January”, “February”, etc.).

DATENAME(date_part, date)

DATEPART

Datepart returns the integer value of a date part. For example, if you want to extract the month from a date, DATEPART will return a number between 1 and 12, while DATENAME would return the name of the month (e.g., “January”, “February”, etc.).

DATEPART(date_part, date)
Note

DATEPART is typically faster than DATENAME.

Example: Extract Year for Grouping

Objective: Build yearly trend charts, create custom year filters

DATEPART('year', [Order Date])

When used in the view, make the calculated field discrete dimension to group by year.

Year Grouping

DATEPARSE

Converts a specifically formatted string into a date.

DATEPARSE(format, string)

Use case: When DATE cannot recognize custom format


DATETRUNC

Truncates a date to a specified level.

DATETRUNC(date_part, date, [start_of_week])
Important

DATETRUNC changes the actual value, not just the display.

Example: Order Date = 28-12-2023 15:45:30

DATETRUNC('month', [Order Date])
→ 01-12-2023 00:00:00

DATETRUNC('year', [Order Date]) → 01-01-2023 00:00:00

Notice that:

  • The date is not just displayed differently
  • The underlying value is changed

If you only want to hide the time portion (for example, remove hours/minutes visually), you should format the field instead of using DATETRUNC.

Formatting affects appearance.
DATETRUNC affects the data itself.


What Is [start_of_week]?

This optional parameter defines which day is considered the first day of the week. The further calculations will be based on this definition of a week.

Example: Calculate week based on ISO standard (week starts on Monday) and week starting on Sunday.

DATETRUNC('week', [Order Date], 'sunday')

Order Week

The first calculated field [Order Week] will calculated week based on ISO standard, which will group orders by week starting on Monday, while the second one [Order Week From Sunday] will group orders by week starting on Sunday.


DAY / MONTH / QUARTER / YEAR / WEEK

These functions extract specific parts of a date as integers.

DAY(Order Date) 

Order Day

TODAY

Returns the current system date (without time).

TODAY()

NOW

Returns the current system date and time.

NOW()

MAKEDATE

Constructs a date from numeric year, month, and day.

MAKEDATE(year, month, day)

Example: Create Date for 01.01.2026 as [Reporting Date]

MAKEDATE(2026, 1, 1)

Reporting Date

MAKEDATETIME

Combines date and time into datetime.

MAKEDATETIME(date, time)

MAKETIME

Constructs time using hour, minute, second.

MAKETIME(hour, minute, second)
Note

Output is datetime (Tableau does not support standalone time type).


ISDATE

Checks whether a string is a valid date.

ISDATE(string)

Use case is data validation and cleaning messy datasets.


MAX and MIN (with Dates)

Most recent date:

MAX(date)

Earliest date:

MIN(date) 

The Date Literal (#)

Date values enclosed in # symbols are interpreted as date literals.

Example: #3/25/2025#

This is a special syntax that allows you to directly input date values in calculations without using functions like DATE().

When you use #3/25/2025#, Tableau recognizes it as a date literal and treats it as a date value in calculations.

Without #, Tableau may interpret the value as:

  • String
  • Number
  • Invalid format

Custom Date

Date Parameters

Date parameters are user-defined controls that allow viewers to dynamically select dates and influence calculations, filters, and visual behavior within a Tableau workbook.

Unlike quick filters, which are directly tied to a specific field in a specific data source, parameters are independent objects. This independence makes them extremely powerful in advanced analytical scenarios, especially when working with multiple data sources, custom logic, or dynamic aggregations.

Conceptual Understanding

A parameter in Tableau is a single value that can be referenced inside one or more calculated fields. When that parameter is of data type Date (or Date & Time), it becomes a flexible time control that can drive time-based logic across worksheets.

Date parameters do not filter data automatically. Instead, they act as inputs to calculations. This means you must explicitly reference them inside a calculated field to control behavior.

For example:

  • They can define the start and end of a reporting period.
  • They can determine which dates should be included in a KPI.
  • They can dynamically adjust the level of time aggregation on an axis.
  • They can drive rolling windows or forecast periods.

Because parameters are global to the workbook, one Date parameter can influence multiple worksheets simultaneously.

How Date Parameters Differ from Filters

A standard date filter:

  • Is tied to a single data source.
  • Automatically filters the data.
  • Cannot easily control multiple data sources.

A Date parameter:

  • Is independent of any single data source.
  • Must be referenced in a calculation.
  • Can control multiple data sources.
  • Can drive complex logic beyond simple filtering.

This distinction is important. Filters limit data directly. Parameters influence calculations that then determine what to display.

Custom N Date Part Selection

This type of date parameter is flexible because it allows users to simultaneously choose:

  • The date part (day, week, month, quarter, year)
  • The date range (N periods)

Instead of creating separate filters for:

  • Last 30 Days
  • Last 3 Months
  • Last 1 Year

the user can dynamically control both:

  • The unit of time
  • The number of periods

Step 1

Create a parameter named: Date Part

Configuration:

  • Data Type → String
  • Allowable Values → List
  • Add values:
    • Day
    • Month
    • Year

Right-click the parameter → Select Show Parameter

This parameter controls the time unit.

Date Part Parameter

Step 2

Create another parameter named: N Values

Configuration:

  • Data Type → Integer
  • Allowable Values → Range
  • Minimum → 1
  • Maximum → 30
  • Step Size → 1

Right-click → Select Show Parameter

This parameter controls how many periods to include.

N Values Parameter

Step 3

Create an Anchor Date (Recommended)

We will anchor the calculation to the latest available Order Date in the dataset.

Create a calculated field: Latest Order Date

{ FIXED : MAX([Order Date]) }

We are doing this way beacause

  • TODAY() → depends on the system date
  • NOW() → depends on server timezone
  • { FIXED : MAX([Order Date]) } → finds the most recent order in the dataset, removes any dimension filtering from the view

Step 4

Create a calculated field named: p. Date Part

Note

The p. prefix is a naming convention used throughout this course to indicate that a calculated field is parameter-driven — its value depends on a parameter rather than raw data. This makes it easy to identify parameter-linked fields in the Data pane at a glance.

IF [Date Part] = 'day' then ([Order Date]) > DATEADD('day', -[N Values], [Latest Order Date])
ELSEIF [Date Part] = 'month' THEN ([Order Date]) > DATEADD('month', -[N Values], [Latest Order Date])
ELSE ([Order Date]) > DATEADD('year', -[N Values], [Latest Order Date])
END

Step 5

  • Drag p. Date Part to filters shelf
  • Select True

Step 6

  • Drag to the column shelf a dimension: Region
  • Drag COUNT([Order ID]) to the Rows shelf

Step 7

Change the parameter values to observe the changes in dimension aggregation.

Date Part Parameter View

Dynamic KPI Calculations

Date parameters can define a reporting window inside calculations.

For example, instead of filtering out data outside the selected range, you can write a calculation that returns Sales only if the Order Date falls between the selected parameters.

This allows you to:

  • Compare full dataset vs selected period.
  • Build dynamic period-to-period comparisons.
  • Create flexible dashboard controls.

This will be further explored in the KPI section.

Dynamic Time Aggregation

One advanced use of Date Parameters is dynamically controlling the level of date aggregation.

For example, the date axis can automatically aggregate by:

  • Day
  • Week
  • Month
  • Quarter
  • Year

depending on the selected parameter value.

This is achieved using DATETRUNC, which adjusts the aggregation level dynamically.

By using this approach, the visualization adapts automatically to the selected time granularity, improving readability and analytical clarity.

Step 1

Create a parameter named Date Granularity.

Parameter configuration:

  • Data Type → String
  • Allowable values → List
  • Values → day, week, month, quarter, year

Right-click the parameter and select Show Parameter.

Date Granularity Parameter

Step 2

Create a calculated field named p. Date Granularity with the following calculation:

DATE(
CASE [Date Granularity]
WHEN 'day' THEN [Order Date]
WHEN 'week' THEN DATETRUNC('week', [Order Date])
WHEN 'month' THEN DATETRUNC('month', [Order Date])
WHEN 'quarter' THEN DATETRUNC('quarter', [Order Date])
ELSE DATETRUNC('year', [Order Date])
END
)

Step 3

  • Drag p. Date Granularity to the Columns shelf
  • Right-click the pill
  • Select Exact Date
  • Change it to Discrete
  • Drag COUNT([Order ID]) to the Rows shelf

Selecting Exact Date ensures Tableau uses the full date value returned by the calculation.
Without this, Tableau may automatically aggregate the date into a higher-level hierarchy (for example, Year or Month), which would override the dynamic logic we created.

Changing the field to Discrete creates distinct headers for each date value instead of a continuous timeline.

This is important because:

  • Each truncated date (day/week/month/quarter/year) should appear as a separate category.
  • It prevents Tableau from interpolating values across a continuous axis.
  • It ensures the view respects the exact granularity selected in the parameter.

Step 4

Change the parameter value to observe how the granularity changes dynamically.

When you switch between:

  • day
  • week
  • month
  • quarter
  • year

the axis updates automatically to reflect the selected time level.

Date Granularity in the View

Multi-Source Dashboards

When working with multiple data sources, each source typically has its own date field.

If you use quick filters, you would need:

  • One date filter per data source

This leads to duplicated controls on the dashboard and a less clean user experience.

Using Date parameters instead allows you to create:

  • A single Date Part parameter
  • A single Start Date parameter
  • A single End Date parameter

Each data source then contains its own calculated filter that references the same parameters.

As a result:

  • All worksheets respond to the same date controls
  • The dashboard remains clean and professional
  • The user interacts with one unified time control

This creates a seamless and consistent experience across multi-source dashboards.

Spatial Analytics (spatial relationships, spatial joins, spatial functions)

Spatial analytics focuses on analyzing data that contains a geographic component. Unlike traditional visualizations, spatial analysis allows us to understand how data behaves in relation to location, distance, and geographic structures. It enables answering questions such as where events occur, how locations interact, how movement happens across space, and how metrics vary across regions.

Maps are powerful because they allow patterns to be understood visually. They are especially useful when geography is not just a visual element, but a key part of the analysis.


Geographic Data Formats

Geographic data can exist in multiple formats, and understanding these formats is essential for correct analysis.

  • Spatial files such as Shapefile, GeoJSON, KML
  • Flat files such as Excel or CSV
  • Databases with spatial or location-based fields

Spatial files store both:

  • Geometry → the shape (point, line, polygon)
  • Attributes → descriptive data

When a spatial file is connected, Tableau automatically creates a Geometry field, which can be directly used for mapping.

Location-based data (like City, Country, Latitude, Longitude) does not contain shapes. Tableau uses geocoding to convert these values into map coordinates.


Dataset for Spatial Analysis

In this session, we use:

  • Citi Bike dataset (trip-level data with coordinates)
  • NYC District GeoJSON file (polygon-level data)

The Citi Bike dataset contains:

  • Start and End coordinates
  • Time information
  • Station details

Since the dataset is split across multiple CSV files, we combine them using Union.

The GeoJSON dataset contains:

  • District boundaries (MultiPolygon)
  • Geographic shapes for mapping

Spatial Relationships

Spatial relationships are used when datasets do not share a common key but still need to be analyzed together.

Steps to Create Spatial Relationship

  1. Load the Citi Bike dataset
  2. Combine all CSV files using Union
  3. Add the GeoJSON file to the data model
  4. Create a calculated field in both datasets:
'New York'
  1. Use this field to create a relationship between the datasets

This approach allows Tableau to connect the datasets logically without physically joining them, preserving flexibility and avoiding duplication.

Relationship Calculations

Now we have a working relationship between two tables.

Spatial Relationship

Spatial Joins

Spatial joins combine datasets based on geographic relationships rather than keys.

The most common spatial join is:

  • INTERSECTS → checks if two geometries overlap

Steps to Create Spatial Join

  1. In the data source tab, switch to the physical layer
  2. Add the Citi Bike dataset
  3. Create calculated fields:
MAKEPOINT([Start Lat], [Start Lng])
MAKEPOINT([End Lat], [End Lng])
  1. Drag the GeoJSON dataset next to the Citi Bike dataset
  2. Choose Left Join
  3. Set join condition:
    • Spatial field (point) INTERSECTS polygon geometry

This will match each trip to a district based on location.

Spatial Join

Troubleshooting Spatial Joins

Common issue:

  • Geometry is incompatible with geography

Solution:

  • Convert geometry to geography
  • Use coordinate system EPSG:4326
  • Ensure coordinates follow Longitude, Latitude order

Spatial Functions

Spatial functions enable advanced geographic calculations directly within Tableau. They allow you to create, transform, and analyze spatial objects such as points, lines, and polygons. These functions are especially important when working with coordinate-based datasets, as they convert raw latitude and longitude values into map-ready objects and allow analytical operations such as distance measurement, movement analysis, and spatial comparison.

Spatial functions are commonly used together with mapping and spatial joins. They help bridge the gap between raw data and geographic insight, making it possible to answer questions about proximity, interaction, and movement.


Common Use Cases

  • Converting latitude and longitude into spatial points
  • Visualizing movement between two locations
  • Measuring distance between locations
  • Creating service or coverage areas
  • Identifying overlapping or interacting regions

Examples

Creating a spatial point from coordinates:

MAKEPOINT([Latitude], [Longitude])

Visualizing movement between two locations:

MAKELINE(
    MAKEPOINT([Start Lat], [Start Lng]),
    MAKEPOINT([End Lat], [End Lng])
)

Calculating distance between two points:

DISTANCE(
    MAKEPOINT([Start Lat], [Start Lng]),
    MAKEPOINT([End Lat], [End Lng]),
    'km'
)

Spatial Functions Reference

Function Description Typical Use Case
MAKEPOINT Converts latitude and longitude columns into a spatial point. Enabling spatial joins for coordinate-based datasets.
MAKELINE Creates a line between two spatial points. Origin–destination maps, mobility analysis, route visualization.
DISTANCE Calculates the distance between two spatial points using specified units. Nearest branch analysis, trip distance calculation, proximity analysis.
AREA Returns the total surface area of a spatial polygon. Territory size comparison, land coverage analysis.
LENGTH Returns the total geodetic length of a linestring geometry. Route length measurement, infrastructure analysis.
BUFFER Creates a radius around a point, line, or polygon. Service coverage zones, delivery radius modeling, proximity analysis.
INTERSECTS Returns True or False indicating whether two geometries overlap. Spatial joins, containment checks.
INTERSECTION Returns the overlapping portion between two geometries. Market overlap analysis, shared service area evaluation.
DIFFERENCE Subtracts the overlapping area of one polygon from another. Identifying uncovered or restricted areas.
SYMDIFFERENCE Removes overlapping portions from both geometries and returns the remaining parts. Territory comparison and competitive analysis.
OUTLINE Converts polygon geometry into boundary lines. Styling borders separately from polygon fill.
SHAPETYPE Returns the geometry structure as text (Point, Polygon, etc.). Debugging spatial data issues.
VALIDATE Confirms whether spatial geometry is topologically correct. Cleaning corrupted spatial files and preventing join errors.

Mapping in Tableau (Map Layers, Map Styling & Configuration)

Mapping in Tableau is not only about placing marks on a geographic background.
It is a combination of:

  • Data modeling
  • Geocoding
  • Aggregation logic
  • Visual design

For a map to function correctly — analytically and visually — four foundational components must be configured properly:

  1. Data Type
  2. Data Role
  3. Geographic Role
  4. Geographic Hierarchy

If any of these elements are misconfigured, you may encounter:

  • Unknown locations
  • Incorrect aggregation
  • Missing map rendering
  • Broken drill-down behavior
  • Spatial joins that do not work as expected

Geographical data configuration

Mapping accuracy begins with proper data configuration.

Data Type — Structural Foundation

The Data Type determines how Tableau stores and interprets the raw values.

This is the first layer of configuration.

Common Geographic Data Types

Field Type Required Data Type Why
Latitude Number (Decimal) Must allow precise coordinate plotting
Longitude Number (Decimal) Must allow precise coordinate plotting
Country/State/City String Needed for geocoding
Postal Code String Preserves leading zeros
Geometry (GeoJSON/Shapefile) Geometry Native spatial object

Incorrect data types can cause:

  • Aggregation errors
  • Loss of leading zeros (postal codes)
  • Tableau not recognizing geographic information
  • Failure in map rendering

Example:

If Postal Code is stored as Number: - 01234 becomes 1234 - Geocoding fails

Data Role — Analytical Behavior

The Data Role defines how Tableau treats the field in analysis.

Two primary roles:

  • Dimension → categorical grouping
  • Measure → numeric aggregation

Typical Configuration for Mapping

Field Data Role
Latitude Measure
Longitude Measure
Country Dimension
State Dimension
City Dimension
Geometry Measure

If Latitude/Longitude are set as Dimensions: - Points may not render correctly
- Aggregation logic may break

Geographic Role — Geocoding Layer

The Geographic Role connects a field to Tableau’s geocoding engine.

This tells Tableau: > “This field represents a real-world geographic level.”

Common Geographic Roles

  • Country/Region
  • State/Province
  • County
  • City
  • Postal Code
  • Latitude
  • Longitude

Once a geographic role is assigned:

  • Tableau generates Latitude (generated)
  • Tableau generates Longitude (generated)

These generated fields are automatically used for plotting.

How Tableau Geocoding Works

When using location names:

  1. Tableau references its internal geographic database
  2. Matches names to coordinates
  3. Places marks accordingly

If Tableau cannot match values, you will see:

  • Unknown locations warning

To resolve:

  • Click the warning icon
  • Edit locations
  • Specify country context
  • Correct spelling inconsistencies

Geographic Hierarchy — Drill-Down Structure

A Hierarchy defines the logical order of geographic levels.

Example:

  • Country
    • State
      • City
        • Postal Code

Hierarchies allow:

  • Drill-down navigation
  • Controlled aggregation
  • Structured geographic exploration

How to Create a Hierarchy

  1. Right-click a geographic field (e.g., Country)
  2. Select Hierarchy → Create Hierarchy
  3. Drag lower levels into it

Benefits

  • Enables + / − drill controls
  • Maintains geographic logic
  • Improves dashboard interactivity

Mapping

Mapping with Raw Coordinates

If your dataset contains Latitude and Longitude:

Required Configuration

  • Data Type → Number (Decimal)
  • Data Role → Measure
  • Geographic Role → Latitude / Longitude

Validation Rules

  • Longitude range: -180 to 180
  • Latitude range: -90 to 90
  • Coordinates must be decimal degrees

Longitude always goes to:

  • Columns (X-axis)

Latitude always goes to:

  • Rows (Y-axis)

If properly configured:

  • Tableau plots points automatically
  • No internal geocoding is required

Mapping with Location Names

If your dataset contains names instead of coordinates:

Required Configuration

  • Data Type → String
  • Data Role → Dimension
  • Geographic Role → Appropriate geographic level

Tableau converts names into coordinates using geocoding. In some cases, you may need to specify the country context to resolve ambiguities.

Mapping with Spatial Files (GeoJSON / Shapefile)

Spatial files contain embedded geometry objects.

When imported:

  • Field Type → Geometry
  • Data Role → Measure

Characteristics:

  • Coordinates are embedded
  • No geocoding required
  • Supports polygon and line rendering

Enables:

  • Choropleth maps
  • Boundary overlays
  • Spatial joins
  • Spatial calculations

Map Styling and Layering

Map Styles

Once configuration is correct, map styling enhances interpretability.

Background Map Styles are

  • Light
  • Normal
  • Streets
  • Satellite

Choose style based on:

  • Analytical clarity
  • Contrast with marks
  • Density visualization

Map Layers

Tableau allows multiple layers:

  • Polygon layer (district boundaries)
  • Point layer (stations)
  • Line layer (routes)

Multi-layer maps enable:

  • Territory + event visualization
  • Origin–destination flows
  • Hotspot analysis

Each layer can have:

  • Independent mark type
  • Independent color
  • Independent size

Basic Map Creation

Now let’s create a simple map showing the distribution of the trips across New York city using the geometry field from the spatial file.

Step 1

As we do not have a column with states we will make a calculated field 'New York' and assign it as a geographic role with the level of State/Province. This will allow us to use the geometry field from the spatial file to plot the map of New York city.

Step 2

Now we can make a hierarchy by right clicking on the newly created field, [State] and choosing Hierarchy → Create Hierarchy and then dragging the [Boroname] field (NYC borough) to the hierarchy. This will allow us to drill down from the state level to the geometry level and see the different districts of New York city which are available in our spatial file.

Location Hierarchy

Step 3

In order to count rides by district, we will drag the Ride ID field to the view and change the aggregation to Count. To show the distribution we can place the Count of Ride ID on the color mark and we will have a choropleth map showing the distribution of the rides across the different districts of New York city.

Step 4

To make districts more visible drag and drop [Boroname] field to the label mark and we will have the name of the district on the map.

Step 5

To make the map more readable, we can also change the background map style by right clicking on the map and choosing Background Layers and then choosing the style that we like. In this case, we will choose the Light style to make the districts more visible and add Background Map Layer by ticked prefferences such as Land Cover and Labels to make the map more informative.

Simple Map

Map with layers, polygons, points, and lines

Step 1

As we have already done the join between the spatial file and the tabular file, we can now build a map using the geometry field from the spatial file. To do that:

  • First make sure that the fields with geographic roles are correctly assigned
  • Double click on the geometry field and it will be added to the view

Tableau automatically:

  • Adds Latitude (generated) to Rows
  • Adds Longitude (generated) to Columns
  • Places Geometry on the Marks card, Details

The result is a map of New York City with its districts.

Polygons

Step 2

Now we can add the trips data to the map. To do that let’s create calculated fields for the start and end locations of the trips using MAKEPOINT() function as explained in the spatial joins section. Then we can add these calculated fields to the view to show the trip start and end locations on the map.

Step 3

Having the trip start and end locations we can now make a flow map to show the movement of the trips between the start and end locations. To do that we will use MAKELINE() function to create a line between the start and end points of each trip. Then we can add this line to the view to visualize the flow of trips across the city.

Lines

Step 4

In map visualizations, we can use different geometry types adding map layers to show different aspects of the data. For example, we can draw routes to the polygon layer by dragging the line geometry to the view and we can also show the start and end points of the trips by dragging the point geometry to the view. This allows us to create a multi-layered map that shows both the routes and the locations of trip starts and ends.

Map with layers

Proportional Symbol Map

Proportional symbol maps use sized marks (usually circles) to represent the magnitude of a measure at specific locations. The size of each mark is proportional to the value it represents.

Step 1

As in previous examples, we will start by creating a map using the geometry field [Boroname] from the spatial file to show the districts of New York city.

Step 2

Add [Starting point] to the view to show the start locations of the trips on the map.

Step 3

Now we can add the Count of Ride ID to the size mark to show the number of trips starting at each location. This will create a proportional symbol map where the size of each mark corresponds to the number of trips starting at that location.

Step 4

Add the Count of Ride ID to the color mark to show the distribution of the trips across the city. From color marks activate borders.This will allow us to quickly identify areas where there are more trips starting based on the color intensity of the marks on the map.

Step 5

Add [Starting Point ID] to the detail mark to show the individual starting points of the trips on the map.

Step 6

From the Marks card, we can also change the mark type to Circle and adjust the size and color to make the map more visually appealing and easier to interpret. This will allow us to quickly identify areas with high trip activity based on the size of the circles on the map.

Step 7

Add [Boroname] to the filter shelf for interactive filtering by district. This will allow users to select specific districts and see the corresponding trip data on the map, enabling more detailed analysis of trip patterns within different areas of New York City.

This analysis can help identify which districts have the highest demand for Citi Bike trips, and can inform decisions about where to add more bike stations or increase bike availability.

Proportional Symbol Map

Density Map

Density maps use color intensity to represent the concentration of points in a given area. They are useful for visualizing patterns of activity across a geographic space.

Step 1

Bring [Make route] calculated field, [Start Station ID] and [End Station ID] to the view to show the routes of the trips on the map.

Step 2

Change the mark type to Density to create a density map that shows the concentration of trips across the city. The color intensity will indicate areas with higher or lower trip activity.

Step 3

Adjust map style to Streets to make the trips density patterns more visible on the street map background. This will allow us to better understand the spatial distribution of trips in relation to the city’s street layout.

Step 4

Make density color intensity and opacity adjustments to enhance the visibility of high-density areas. This will help us quickly identify hotspots of trip activity across New York City.

This type of analysis can be useful for understanding where the highest demand for Citi Bike trips is located, and can inform decisions about where to focus resources for bike station placement or maintenance.

Step 5

Untick Aggregate Measures on the top pane Analysis section to show the individual trip routes on the map. Density maps calculate intensity based on the number of marks in a geographic area, so if we want to see the actual routes of the trips, we need to turn off aggregation.

Density Map
NoteAnalysis → Aggregate Measures in Maps
  • ON (default) → Measures are summarized (SUM, AVG, etc.).
    Use for choropleth maps, KPIs by region, and territory comparison.

  • OFF → Each row becomes an individual mark.
    Use for point distribution maps, density maps, and event-level analysis.

Rule of thumb:
Use aggregation for regional summaries.
Turn it off for raw spatial events and clustering analysis.


Cohort Analysis in Tableau

Cohort Analysis is a technique used to analyze the behavior of groups of users that share a common characteristic over time.

Instead of analyzing all users together, cohort analysis groups users based on a shared starting event, allowing to observe how behavior changes across time for each group.

Common cohort grouping criteria include:

  • First purchase date
  • First login date
  • First subscription date
  • First product activation

This method is widely used in:

  • Customer retention analysis
  • Product usage analysis
  • Marketing performance evaluation
  • Telecom subscriber lifecycle analysis

For example:

  • Customers who made their first purchase in January
  • Customers who made their first purchase in February

Each of these groups becomes a cohort.


Cohort Analysis Concept

A cohort analysis typically consists of three components:

  • Period
  • Start Date
  • Metric Being Measured

Example:

Start Date Period 0 Period 1 Period 2 Period 3
Jan 2024 100% 70% 55% 40%
Feb 2024 100% 75% 60% 45%
Mar 2024 100% 72% 58% 44%

Interpretation:

  • Month 0 → when users first appeared
  • Month 1 → one month after acquisition
  • Month 2 → two months after acquisition

This structure allows us to analyze retention or activity decay over time.


Cohort Analysis Workflow in Tableau

Steps include:

  • Identify the first activity date for each user
  • Assign each user to a cohort group
  • Calculate time elapsed since the cohort start
  • Aggregate metrics by cohort and time period

Cohort Analysis Example

Let’s explore Cohort Analysis in Tableau using the Online Retail Dataset :contentReferenceoaicite:0

Cohort analysis becomes very practical with the Online Retail dataset because the table contains transactional data at the invoice line level.

From the dataset, we can see the following important fields:

  • Customer ID identifies the customer
  • Invoice Date identifies when the transaction happened
  • Quantity and Unit Price can be used to calculate sales
  • Multiple rows can belong to the same invoice because one invoice may contain several products

In cohort analysis, we usually group customers based on the date of their first purchase and then track their later activity.

In this dataset:

  • Each row is a product line within an invoice
  • One customer can have many invoices
  • One invoice can contain many products
  • Customers purchase at different times

This allows us to answer questions such as:

  • In which month did the customer make the first purchase?
  • Did the customer return in later months?
  • Which cohort has better retention?
  • Which cohort generates more revenue over time?

Cohort Definition

For this dataset, the cohort is defined as:

Customers grouped by the month of their first invoice date

So:

  • Customers whose first order was in January 2011 belong to the January 2011 cohort
  • Customers whose first order was in February 2011 belong to the February 2011 cohort

Cohort Analysis Logic

flowchart LR
A[Customer ID] --> B[Find First Invoice Date]
B --> C[Assign Cohort Month]
C --> D[Calculate Months Since First Purchase]
D --> E[Measure Retention or Sales]
E --> F[Build Cohort Heatmap]


Step 1: Find the First Purchase Date per Customer

{ FIXED [Customer ID] : MIN([Invoice Date]) }

Name: First Purchase Date

This calculation:

  • Identifies the earliest purchase per customer
  • Returns the same value for all rows of that customer

Step 2: Create the Cohort Month

DATETRUNC('month', [First Purchase Date])

Name: Cohort Month

Step 3: Create Invoice Month

DATETRUNC('month', [Invoice Date])

Name: Invoice Month

Step 4: Calculate Period

DATEDIFF('month', [Cohort Month], [Invoice Month])

Name: Period

Step 5: Calculate Active Customers

COUNTD([Customer ID])

Name: Active Customers

Step 6: Build the Heatmap

  • Drag MONTH([Cohort Month]) → Rows
  • Drag [Period] → Columns
  • Set both as Discrete Dimensions
  • Set Marks type → Square
  • Drag COUNTD([Customer ID]) → Color
  • Drag COUNTD([Customer ID]) → Label

Retention Rate Calculation (Important)

Retention rate is calculated using a table calculation.

  • Duplicate COUNTD([Customer ID]) on Label
  • Apply Quick Table Calculation → Percent From
  • Compute Using → Table (Across)
  • Relative to → First

This means:

  • Each value is divided by the value in Period 0
  • Period 0 becomes 100% baseline
  • All other periods show retention relative to cohort size

Retention Rate Calculation

Important

Retention Rate is a table calculation, not a basic aggregation.
It depends on the view layout and uses Percent From First logic.


Cohort Initial View

Step 7: Clean the View

  • Create field [Retention Rate] from table calculation
  • Drag to Filters
  • Filter values >= 0.01%

Cohort Final View

Example Interpretation

If a cohort shows:

  • Month 0 = 100%
  • Month 1 = 38%
  • Month 2 = 33%

This means:

  • All users were active in first month
  • 38% returned next month
  • 33% remained active after two months

Additional Analysis

Cohort by Country

  • Add Country to Filters

Questions:

  • Which countries retain better?
  • Which markets are stronger?

Product-Based Cohort

Group customers by first product category.

Questions:

  • Which products drive retention?
  • Which products lead to repeat purchases?

Revenue Cohort

Analyze revenue instead of user count.

Questions:

  • Which cohort generates highest revenue?
  • Do newer cohorts spend more?

Cleaning and Reshaping Data in Tableau

Before building visualizations or dashboards, data must often be cleaned and reshaped.
Real-world datasets rarely arrive in a perfectly structured format suitable for analysis.

Common issues include:

  • Missing values
  • Incorrect data types
  • Duplicated records
  • Inconsistent formatting
  • Wide tables that need to be normalized
  • Columns containing multiple values

Cleaning ensures data accuracy, while reshaping ensures data structure fits analytical needs.

flowchart LR
A[Raw Data] --> B[Data Cleaning]
B --> C[Data Reshaping/optional/]
C --> D[Structured Analytical Dataset]
D --> E[Visualization & Analysis]


Data Cleaning in Tableau

Data cleaning focuses on improving data quality before performing analysis.

Tableau allows several cleaning operations directly inside the Data Source page or through Calculated Fields.


Handling Missing Values (NULL Values)

Missing values are one of the most common problems in datasets.
NULL values may represent missing records, unavailable information, or incomplete data collection.

Common approaches to handle NULL values include:

  • Replace NULL values with a default value
  • Filter out rows containing NULL values
  • Use calculated fields to define fallback values

Example calculation replacing NULL revenue with zero:

IFNULL([Revenue],0)

Another common approach:

ZN([Revenue])

ZN() converts NULL numeric values to zero, which is often useful in financial analysis.


Correcting Data Types

Each field in Tableau has a data type such as:

  • String
  • Number
  • Date
  • Boolean
  • Geographic

Incorrect data types can lead to incorrect aggregations or calculation errors.

Common corrections include:

  • Converting strings to numeric values
  • Converting strings to date values
  • Changing dimensions to measures

Example converting a field to a date:

DATE([Order Date])

Example converting a string to a number:

INT([Customer Age])

Ensuring the correct data type improves calculation accuracy and visualization behavior.


Removing Duplicate Records

Duplicate rows can distort metrics such as totals, averages, or counts.

Instead of counting all records, analysts often count distinct entities.

Example:

COUNTD([Customer ID])

This ensures that each customer is counted only once.


Splitting Columns

Some datasets contain multiple pieces of information within a single column.
For proper analysis, it is often necessary to separate these values into multiple fields.

In the Online Retail dataset, the Description column contains long product names that may include multiple descriptive elements.

Example:

Description
SET/4 BADGES CUTE CREATURES
SET/4 SKULL BADGES
FANCY FONT BIRTHDAY CARD

Sometimes product descriptions may contain structured information such as category and product name within a single field.

Steps:

  1. Right-click the column
  2. Select Split or Custom Split
  3. Choose the delimiter (for example / or a space)

After splitting, Tableau automatically creates new columns.

Example result:

Product Type Product Name
SET 4 BADGES CUTE CREATURES
SET 4 SKULL BADGES
FANCY FONT BIRTHDAY CARD

Splitting fields improves:

  • filtering
  • grouping
  • hierarchical analysis
  • product categorization

Creating Hierarchies

Hierarchies organize fields into structured levels that support drill-down analysis.

Hierarchies allow:

  • Drill-down exploration
  • Structured navigation in dashboards
  • Aggregated views across levels
  • Simplified analysis of large datasets

The Online Retail dataset naturally supports a hierarchy such as:

graph TD
A[Country] --> B[Customer ID]
B --> C[Invoice No]

Example structure:

Country Customer ID Invoice No
United Kingdom 17850 541696
France 12583 541697

This hierarchy allows users to:

  1. Analyze sales by Country
  2. Drill down into Customers
  3. Explore specific Invoices

Steps:

  1. Drag one dimension onto another in the Data Pane
  2. Tableau creates a hierarchy automatically
  3. Rename the hierarchy if needed
  4. Use the hierarchy in visualizations for drill-down analysis

Using Tableau Prep for Advanced Data Preparation

When datasets become more complex, Tableau Prep provides a visual workflow for preparing and reshaping data.

Tableau Prep allows:

  • Joining multiple datasets
  • Removing duplicates
  • Standardizing values
  • Pivoting and unpivoting columns
  • Aggregating datasets
  • Cleaning inconsistent values

flowchart LR
A[Raw Sources] --> B[Tableau Prep Flow]
B --> C[Cleaning Steps]
C --> D[Reshaped Dataset]
D --> E[Tableau Visualization]

Prep uses a visual flow interface, allowing analysts to inspect transformations step by step before publishing the final dataset.


When Data Cleaning Should Be Done Outside Tableau

Although Tableau supports many data preparation operations, some transformations are better performed upstream.

Typical tools include:

  • SQL databases
  • ETL pipelines
  • Python data pipelines
  • Data warehouses

Reasons include:

  • Better performance for large datasets
  • Centralized transformation logic
  • Reusable data pipelines
  • Improved data governance

In production environments, Tableau often connects to pre-cleaned analytical datasets.


Data Reshaping in Tableau

Data reshaping changes the structure of a dataset so that it fits analytical needs.

Many datasets are stored in wide format, while Tableau analysis works better with long format.

flowchart LR
A[Wide Data Format] --> B[Pivot Operation]
B --> C[Long/Tidy Data Format]


Pivoting Data in Tableau

Pivoting is a data reshaping technique used to convert columns into rows.
This transformation is particularly useful when datasets store similar measures in multiple columns instead of a single column.

Many analytical workflows and visualization tools work better with long (tidy) data structures, where:

  • one column represents the type of measure
  • another column represents the value of the measure

flowchart LR
A[Gold Column]
B[Silver Column]
C[Bronze Column]

A --> D[Pivot Operation]
B --> D
C --> D

D --> E[Medal Type]
D --> F[Medal Count]


Example Dataset: Olympic Medals

For this example we use a dataset containing Olympic medal counts by country and year.
The dataset contains the following columns:

  • Country
  • Year
  • Season
  • Gold
  • Silver
  • Bronze
  • Total Medals

Structure before pivoting:

Data Structure

In this structure:

  • each medal type is stored in a separate column
  • this is called a wide format dataset

However, comparing medal types dynamically in Tableau becomes easier when the dataset is reshaped.


Pivoting Medal Columns

To reshape the dataset, we pivot the following columns:

  • Gold
  • Silver
  • Bronze

Steps in Tableau

  1. Open the dataset in Tableau

  2. Navigate to the Data Source page

  3. Select the columns:

    • Gold
    • Silver
    • Bronze
  4. Right-click the selected columns

  5. Choose Pivot

Tableau automatically creates two new fields:

  • Pivot Field Names
  • Pivot Field Values

Rename these fields:

Default Name New Name
Pivot Field Names Medal Type
Pivot Field Values Medal Count

Dataset after pivoting:

Pivot

Benefits of Pivoting

Pivoting provides several advantages for analysis and visualization:

  • Enables comparison of multiple measures within one chart
  • Simplifies dataset structure
  • Allows filtering by measure type
  • Supports dynamic dashboards and parameter-driven visualizations

Example visualization setup:

Columns Rows Color
Year SUM(Medal Count) Medal Type

This visualization shows Gold, Silver, and Bronze medals over time in a single chart.


Analytical Questions Enabled by Pivoting

After reshaping the dataset, it becomes easier to answer questions such as:

  • How do Gold, Silver, and Bronze medals change over time?
  • Which countries win the most gold medals?
  • How do medal distributions differ between Summer and Winter Olympics?
  • Which countries have the highest total medal counts across seasons?

Pivoting therefore helps transform datasets into a structure that is more flexible for exploration and analysis in Tableau.


Tableau Prep vs Tableau Desktop

Both Tableau Prep and Tableau Desktop support data preparation tasks, but they are designed for different stages of the analytics workflow.

Tableau Desktop is primarily used for data exploration and visualization, while Tableau Prep is designed for building reusable data preparation pipelines.


Summary

Task Type Tableau Desktop Tableau Prep
Quick data fixes ✓
Calculated fields ✓
Simple pivots ✓
Complex joins ✓
Large dataset cleaning ✓
Reusable data pipelines ✓

In practice, analysts often use both tools together:

  1. Tableau Prep to prepare and structure the dataset
  2. Tableau Desktop to explore, analyze, and visualize the data

Best Practices for Data Cleaning and Reshaping

  • Inspect datasets before building visualizations
  • Verify data types and formats
  • Handle NULL values explicitly
  • Convert wide datasets into tidy structures when needed
  • Document data transformations

Properly cleaned and reshaped data leads to more reliable analysis and better-performing dashboards.