Data Analytics Bootcamp
  • Syllabus
  • Statistical Thinking
  • SQL
  • Python
  • Tableau
  • Lab
  • Capstone
  1. Python
  2. Python
  3. Session 08: Data Visualization | Plotly
  • Syllabus
  • Statistical Thinking
    • Statistics
      • Statistics Session 01: Data Layers and Bias in Data
      • Statistics Session 02: Data Types
      • Statistics Session 03: Probabilistic Distributions
      • Statistics Session 04: Probabilistic Distributions
      • Statistics Session 05: Sampling
      • Statistics Session 06: Inferential Statistics
      • Slides
        • Course Intro
        • Descriptive Stats
        • Data Types
        • Continuous Distributions
        • Discrete Distributions
        • Sampling
        • Hypothesis Testing
  • SQL
    • SQL
      • Session 01: Intro to Relational Databases
      • Session 02: Intro to PostgreSQL
      • Session 03: DA with SQL | Data Types & Constraints
      • Session 04: DA with SQL | Filtering
      • Session 05: DA with SQL | Numeric Functions
      • Session 06: DA with SQL | String Functions
      • Session 07: DA with SQL | Date Functions
      • Session 08: DA with SQL | JOINs
      • Session 09: DA with SQL | Advanced SQL
      • Session 10: DA with SQL | Advanced SQL Functions
      • Session 11: DA with SQL | UDFs, Stored Procedures
      • Session 12: DA with SQL | Advanced Aggregations
      • Session 13: DA with SQL | Final Project
      • Slides
        • Intro to Relational Databases
        • Intro to PostgreSQL
        • Basic Queries: DDL DLM
        • Filtering
        • Numeric Functions
        • String Functions
        • Date Functions
        • Normalization and JOINs
        • Temporary Tables
        • Advanced SQL Functions
        • Reporting and Analysis with SQL
        • Advanced Aggregations
  • Python
    • Python
      • Session 01: Programming for Data Analysts
      • Session 02: Python basic Syntax, Data Structures
      • Session 03: Introduction to Pandas
      • Session 04: Advanced Pandas
      • Session 05: Intro to Data Visualization
      • Session 06: Data Visualization
      • Session 07: Working with Dates
      • Session 08: Data Visualization | Plotly
      • Session 09: Customer Segmentation | RFM
      • Slides
        • Data Analyst
  • Tableau
    • Tableau
      • Tableau Session 01: Introduction to Tableau
      • Tableau Session 02: Intermediate Visual Analytics
      • Tableau Session 03: Advanced Analytics
      • Tableau Session 04: Dashboard Design & Performance
      • Slides
        • Data Analyst
        • Data Analyst
        • Data Analyst
        • Data Analyst

On this page

  • Intro to Plotly
    • Why Plotly is Important
    • Static vs Interactive Visualization
    • Plotly Architecture
    • Plotly and the Grammar of Graphics
    • Common Chart Families in Plotly
    • A Practical Workflow
  • Dummy Dataset for Plotly Examples
    • Random Seed
    • Plotly Templates
    • Example 1: Bar Plot
    • Example 2: Line Plot
    • Example 3: Histogram
    • Example 4: Scatter Plot
    • Example 5: Multi-Line Chart
    • Example 6: Grouped Bar Plot
    • Example 7: Donut Chart
  • Changing Colors in Plotly
    • Common Color Formats in Plotly
    • Single Color for the Whole Plot
    • Different Colors by Category
    • Map Specific Categories to Specific Colors
    • Change Line Colors
    • Change Histogram Colors
    • Change Scatter Plot Colors
    • Change Colors After the Figure is Created
    • Recommendtion
  • Highlighting Only the Specific Categories
    • Highlighting the Largest Category in a Bar Plot
    • Highlighting the Largest Category in a Donut Chart
    • Highlighting the Smallest Category in a Scatter Plot
    • Highlighting the Largest Category in a Bubble Chart
    • Highlighting the Largest Category in a Line Chart
    • Highlighting the Largest Category in a Multi-Line Chart
    • Highlighting the Largest Category in a Histogram
    • Highlighting the Largest Category in a Grouped Bar Plot
  • Vertical and Horizontal Reference Lines
    • Why Reference Lines Matter
    • Horizontal Lines
    • Vertical Lines
    • Horizontal and Vertical Lines Together
  1. Python
  2. Python
  3. Session 08: Data Visualization | Plotly

Session 08: Data Visualization | Plotly

Plotly
Data Visualization

Intro to Plotly

Plotly is a Python library for creating interactive visualizations. It is widely used in analytics, data science, business intelligence, dashboards, and reporting because it allows users not only to see a chart, but also to interact with it.

Unlike static plotting libraries, Plotly charts allow the audience to:

  • hover over data points to see exact values
  • zoom into specific regions
  • pan across the chart
  • hide or show categories from the legend
  • inspect complex charts more carefully

This makes Plotly especially useful when we want to move beyond simple chart display and support deeper data exploration.

Tip

Explore plotly here

Why Plotly is Important

Plotly is important for several reasons.

  1. it works very naturally with Pandas DataFrames, which means analysts can move directly from cleaned and transformed data into visualization.
  2. it supports a wide variety of chart types, from basic charts to more advanced business and analytical visuals.
  3. Plotly is highly useful in modern Python applications such as:
    • Jupyter notebooks
    • Quarto documents
    • Dash applications
    • Streamlit applications

So Plotly is not only a charting library. It is also part of a larger ecosystem for analytical communication and interactive reporting.

Static vs Interactive Visualization

A useful way to understand Plotly is to compare static and interactive charts.

A static chart gives one fixed view. It is suitable for printed reports, PDFs, or slides where the figure is meant to be consumed passively.

An interactive chart gives the user control. The reader can inspect exact values, focus on specific sections, or compare categories dynamically.

This does not mean interactive charts are always better. It means that Plotly is particularly strong when the audience benefits from exploration.

Plotly Architecture

Plotly in Python is usually used in two main ways:

  • Plotly Express
  • Graph Objects

Plotly Express

Plotly Express is the high-level interface.

It is designed to make chart creation fast, concise, and readable. In many cases, a complete interactive chart can be built in a single line.

It is especially useful when:

  • the data is already tidy
  • the goal is to build a standard chart quickly
  • we want to map variables to color, size, symbol, or facets in a simple way

Plotly Express is often the best starting point for analysts because it reduces boilerplate and helps students focus on chart logic rather than technical details.

Graph Objects

plotly.graph_objects is the lower-level interface.

It provides more control and flexibility. It is useful when:

  • we need custom traces
  • we want complex layouts
  • we need subplots
  • we want advanced annotations or specialized visual structures

In practice, many analysts start with Plotly Express and move to Graph Objects when they need more control.

Plotly and the Grammar of Graphics

Plotly also connects well to the idea of the grammar of graphics.

Instead of thinking only in terms of chart names, we can think in terms of components:

  • data: the table behind the chart
  • mapping: how variables are assigned to axes or visual properties
  • geometry: bars, lines, points, areas, flows
  • aesthetics: color, size, labels, symbols
  • annotations: average lines, reference markers, labels, notes

This way of thinking is useful because it teaches students that charting is not only about memorizing functions. It is about translating business questions into visual structure.

Common Chart Families in Plotly

Plotly supports many chart families. Some of the most common are:

  • bar plots for category comparison
  • line plots for trends over time
  • histograms for distributions
  • scatter plots for relationships between variables
  • multi-line charts for comparing trends across groups

These chart types form the foundation of most analytical reporting.

A Practical Workflow

A common analytical workflow with Plotly looks like this:

\[ \text{Raw Data} \rightarrow \text{Cleaning} \rightarrow \text{Transformation} \rightarrow \text{Aggregation} \rightarrow \text{Visualization in Plotly} \rightarrow \text{Insight} \]

This reminds students that visualization is not the first step. Plotly becomes most useful after the data is already structured for analysis.

Dummy Dataset for Plotly Examples

Before introducing the major chart types, let us create a small synthetic dataset.

import pandas as pd
import numpy as np
import plotly.express as px

np.random.seed(42)
px.defaults.template = "plotly_white"

months = pd.date_range("2024-01-01", periods=6, freq="MS")
regions = ["North", "South", "East"]
products = ["A", "B"]

rows = []

for month in months:
    for region in regions:
        for product in products:
            sales = np.random.randint(80, 220)
            customers = np.random.randint(20, 90)
            units = np.random.randint(10, 70)
            
            rows.append([month, region, product, sales, customers, units])

df_plotly = pd.DataFrame(
    rows,
    columns=["month", "region", "product", "sales", "customers", "units"]
)

df_plotly["month_name"] = df_plotly["month"].dt.strftime("%b")
df_plotly.head()
month region product sales customers units month_name
0 2024-01-01 North A 182 71 38 Jan
1 2024-01-01 North B 94 80 30 Jan
2 2024-01-01 South A 182 43 12 Jan
3 2024-01-01 South B 132 21 33 Jan
4 2024-01-01 East A 117 21 69 Jan

This dummy dataset gives us:

  • a time variable: month
  • categorical variables: region, product
  • numeric variables: sales, customers, units

This structure is enough to introduce the most common chart types in Plotly.

Random Seed

WarningRandom Seed

What does it mean np.random.seed(42)?

Computers are not truly random. They use algorithms called pseudo-random number generators (PRNGs).

These algorithms:

  • Take an initial value → the seed
  • Then produce a sequence of numbers based on it

If you use the same seed, you get exactly the same sequence every time

Plotly Templates

Here you can explore plotly templates. In the scope of this program we will stick with the plotly_white. However I highly encourage you to explore other themes and adjust for your prjects:

px.defaults.template = "plotly_white"

The default template is 'plotly'

Available templates:

  • ‘ggplot2’
  • ‘seaborn’
  • ‘simple_white’
  • ‘plotly’,
  • ‘plotly_white’
  • ‘plotly_dark’
  • ‘presentation’
  • ‘xgridoff’,
  • ‘ygridoff’
  • ‘gridon’
  • ‘none’

Example 1: Bar Plot

A bar plot is used when we want to compare values across discrete categories.

Typical examples include:

  • sales by region
  • customers by segment
  • revenue by product category

A bar plot is useful when we want to compare values across categories.

For this example, we may want to compare total sales across regions.

region_sales = (
    df_plotly.groupby("region", as_index=False)["sales"]
             .sum()
)

region_sales
region sales
0 East 1559
1 North 1800
2 South 1674
fig = px.bar(
    region_sales,
    x="region",
    y="sales",
    text="sales",
    title="Total Sales by Region"
)

fig.update_traces(textposition="outside")
fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Sales"
)

fig.show()

Interpretation

This chart helps answer:

  • Which region has the highest total sales?
  • Which region performs the weakest?
  • How large are the differences across regions?

Example 2: Line Plot

A line plot is used when the x-axis has an order, most often time, in other words line plot is useful for showing change over time.

Typical examples include:

  • monthly sales
  • daily website traffic
  • weekly active users

For example, we may want to study monthly total sales.

monthly_sales = (
    df_plotly.groupby("month", as_index=False)["sales"]
             .sum()
)

monthly_sales
month sales
0 2024-01-01 807
1 2024-02-01 810
2 2024-03-01 831
3 2024-04-01 855
4 2024-05-01 934
5 2024-06-01 796
fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Monthly Total Sales"
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()

Interpretation

This chart helps answer:

  • Is sales performance increasing or decreasing?
  • Which month had the highest sales?
  • Are there visible fluctuations over time?

Example 3: Histogram

A histogram is used to show the distribution of a numeric variable.

Typical examples include:

  • revenue distribution
  • customer age distribution
  • distribution of transaction amounts

A histogram is used to show the distribution of a numeric variable.

For example, we may want to understand how sales values are distributed across all observations.

fig = px.histogram(
    df_plotly,
    x="sales",
    nbins=15,
    title="Distribution of Sales"
)

fig.update_layout(
    xaxis_title="Sales",
    yaxis_title="Count"
)

fig.show()

Interpretation

This chart helps answer:

  • Are most sales values concentrated in one range?
  • Is the distribution symmetric or skewed?
  • Are there unusually small or large values?

Example 4: Scatter Plot

A scatter plot is used to study the relationship between two numeric variables.

Typical examples include:

  • advertising spend vs sales
  • income vs spending
  • customers vs revenue

For example, we may want to see whether more customers are associated with higher sales.

fig = px.scatter(
    df_plotly,
    x="customers",
    y="sales",
    color="region",
    hover_data=["product", "month_name", "units"],
    title="Customers vs Sales"
)

fig.update_layout(
    xaxis_title="Customers",
    yaxis_title="Sales",
    legend_title="Region"
)

fig.show()

Interpretation

This chart helps answer:

  • Do higher customer counts tend to correspond to higher sales?
  • Are there outliers?
  • Do regions behave differently?

Example 5: Multi-Line Chart

A multi-line chart is a grouped line chart. It allows us to compare trends across categories over time.

Typical examples include:

  • sales by region over time
  • churn rate by segment across months
  • traffic by channel over several weeks

For example, we may want to compare monthly sales by region.

monthly_region_sales = (
    df_plotly.groupby(["month", "region"], as_index=False)["sales"]
             .sum()
)

monthly_region_sales.head()
month region sales
0 2024-01-01 East 217
1 2024-01-01 North 276
2 2024-01-01 South 314
3 2024-02-01 East 273
4 2024-02-01 North 305
fig = px.line(
    monthly_region_sales,
    x="month",
    y="sales",
    color="region",
    markers=True,
    title="Monthly Sales by Region"
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales",
    legend_title="Region"
)

fig.show()

Interpretation

This chart helps answer:

  • Which region is strongest over time?
  • Which region is most volatile?
  • Do all regions move in the same direction?

Example 6: Grouped Bar Plot

A grouped bar plot extends the basic bar plot by introducing one more categorical variable. Instead of showing only one bar per category, it allows us to compare subgroups inside each main category.

For example, we may want to compare sales by region and product at the same time. This helps us answer not only which region performs better, but also whether the same pattern holds across products.

region_product_sales = (
    df_plotly.groupby(["region", "product"], as_index=False)["sales"]
             .sum()
)

region_product_sales
region product sales
0 East A 853
1 East B 706
2 North A 888
3 North B 912
4 South A 958
5 South B 716

Understanding the Main Arguments

Before executing the code, let us understand the key arguments of px.bar() in this example.

  • data_frame = the dataset used for plotting
  • x = the main categorical variable shown on the x-axis
  • y = the numeric variable represented by the bar height
  • color = the variable used to split bars into subgroups
  • barmode="group" = places the subgroup bars side by side
  • title = chart title

In our example:

  • x="region" means each main category on the x-axis is a region
  • y="sales" means bar heights represent sales
  • color="product" means each region is split into product-based bars
  • barmode="group" means those bars appear side by side instead of stacked
fig = px.bar(
    region_product_sales,
    x="region",
    y="sales",
    color="product",
    barmode="group",
    title="Sales by Region and Product"
)

fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Sales",
    legend_title="Product"
)

fig.show()

Interpretation

This chart helps answer:

  • Which product performs better within each region?
  • Are some regions strong across both products?
  • Does one product dominate across all regions?

Try It Yourself

TipTry It Yourself

Students should experiment with the following changes:

  • change y="sales" to another numeric column such as customers or units
  • change color="product" to color="region" and observe what happens
  • change barmode="group" to barmode="stack"
  • change the title to reflect the new chart meaning

Example 7: Donut Chart

A donut chart is a variation of a pie chart with a hole in the center. It is useful when we want to show how a total is divided across categories.

For example, we may want to understand how total sales are distributed across products.

product_sales = (
    df_plotly.groupby("product", as_index=False)["sales"]
             .sum()
)

product_sales
product sales
0 A 2699
1 B 2334

Understanding the Main Arguments

Before executing the code, let us understand the key arguments of px.pie() in this example.

  • data_frame = the dataset used for plotting
  • names = the categorical variable that defines the slices
  • values = the numeric variable that determines slice sizes
  • hole = controls the size of the empty center and turns the pie chart into a donut chart
  • title = chart title

In our example:

  • names="product" means each slice represents a product
  • values="sales" means slice size depends on total sales
  • hole=0.5 creates the donut shape

Try It Yourself

Students should experiment with the following changes:

  • change values="sales" to values="customers" or values="units"
  • change hole=0.5 to hole=0.2 or hole=0.7
  • change names="product" to names="region" after preparing a suitable aggregated table
  • change the title to reflect the new chart meaning
fig = px.pie(
    product_sales,
    names="product",
    values="sales",
    hole=0.5,
    title="Share of Total Sales by Product"
)

fig.update_traces(textinfo="label+percent")

fig.show()

Changing Colors in Plotly

In Plotly, colors can be changed in several ways depending on the chart type and how much control you want.

The most common approaches are:

  • set a single color for the whole chart
  • assign different colors by category
  • provide a custom color sequence
  • manually control colors in traces
Tip

Here you can find some interesting pallets.

We are going to deep dive here during the tableau sessions.

Common Color Formats in Plotly

Plotly accepts several color formats:

  • named colors: "blue", "red", "green"
  • hex colors: "#3B6EAD"
  • RGB: "rgb(59,110,173)"
  • RGBA: "rgba(59,110,173,0.5)"

Single Color for the Whole Plot

If you want all bars, points, or lines to have the same color, you can use color_discrete_sequence.

fig = px.bar(
    region_sales,
    x="region",
    y="sales",
    title="Total Sales by Region",
    color_discrete_sequence=["#a34e31"]
)

fig.show()

Here:

  • color_discrete_sequence=["steelblue"] tells Plotly to use one color
  • you can replace "steelblue" with any valid CSS color name or hex code
TipTry It Yourself

Change "#a34e31" to:

  • "orange"
  • "green"
  • "#3B6EAD"
  • "#B7C2D1"

Different Colors by Category

If your chart uses a grouping variable such as color="product" or color="region", Plotly automatically assigns colors.

You can override those defaults with color_discrete_sequence.

Checkout bellow Grouped Bar Plot with Custom Colors

fig = px.bar(
    region_product_sales,
    x="region",
    y="sales",
    color="product",
    barmode="group",
    title="Sales by Region and Product",
    color_discrete_sequence=["#3B6EAD", "#AFC4E8"]
)

fig.show()

Here:

  • the first category gets the first color
  • the second category gets the second color

Map Specific Categories to Specific Colors

If you want full control over which category gets which color, use color_discrete_map. In other words we can have Fixed Colors for Products

fig = px.bar(
    region_product_sales,
    x="region",
    y="sales",
    color="product",
    barmode="group",
    title="Sales by Region and Product",
    color_discrete_map={
        "A": "#3B6EAD",
        "B": "#AFC4E8"
    }
)

fig.show()

This is often better than color_discrete_sequence when you want consistency across many charts.

Change Line Colors

For line charts, the same logic applies.

Single Line Color

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Monthly Total Sales",
    color_discrete_sequence=["crimson"]
)

fig.show()

Multi-Line Colors

fig = px.line(
    monthly_region_sales,
    x="month",
    y="sales",
    color="region",
    markers=True,
    title="Monthly Sales by Region",
    color_discrete_sequence=["#1f77b4", "#ff7f0e", "#2ca02c"]
)

fig.show()

Change Histogram Colors

fig = px.histogram(
    df_plotly,
    x="sales",
    nbins=15,
    title="Distribution of Sales",
    color_discrete_sequence=["purple"]
)

fig.show()

Change Scatter Plot Colors

fig = px.scatter(
    df_plotly,
    x="customers",
    y="sales",
    title="Customers vs Sales",
    color_discrete_sequence=["darkorange"]
)

fig.show()
fig = px.scatter(
    df_plotly,
    x="customers",
    y="sales",
    color="region",
    title="Customers vs Sales by Region",
    color_discrete_sequence=["#3B6EAD", "#AFC4E8", "#B7C2D1"]
)

fig.show()

Change Colors After the Figure is Created

You can also modify colors after building the figure.

Example 1: Update Trace Color

fig = px.bar(
    region_sales,
    x="region",
    y="sales",
    title="Total Sales by Region"
)

fig.update_traces(marker_color="teal")

fig.show()

Example 2: Update Trace Color

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    title="Monthly Total Sales"
)

fig.update_traces(line_color="red")

fig.show()

Recommendtion

A practical rule for learning is:

  • use color_discrete_sequence when you just want nicer colors
  • use color_discrete_map when you want specific categories to always keep the same colors
  • use update_traces() when you want to modify the figure after it is already created
TipTry yourself

Take one of your earlier charts and try all of the following:

  • apply one single custom color
  • apply two category colors
  • map exact colors to product names
  • update the color after the figure is created

This helps you to understand that color in Plotly is not fixed. It is another argument they can control.

Highlighting Only the Specific Categories

Sometimes we want to guide the audience’s attention very deliberately. Instead of giving every category a different color, we can keep most categories in a neutral tone and highlight only the largest/smallest category.

This is a very useful analytical design technique because it helps the chart communicate one main message clearly.

It is especially useful when:

  • we want to emphasize the top-performing category
  • we want to reduce visual clutter
  • we want to make the most important comparison obvious
  • we want to keep the chart clean and readable

The same logic can be reused across multiple chart types.

The general pattern is:

  1. aggregate the data
  2. find the category with the highest/lowest value
  3. create a helper column for coloring
  4. assign one color to the largest category and another color to the rest

Highlighting the Largest Category in a Bar Plot

Preparing the Data

A bar plot is one of the most natural places to use this technique.

  1. Build a summary table by Region: as_index=False the region stays normal column
  2. Find the region with the highest sales: top_region would provide the largest value (idxmin() would return the lowest valued region)
  3. Create a helper column for highlighting
region_sales = (
    df_plotly.groupby("region", as_index=False)["sales"]
             .sum()
)

top_region = region_sales.loc[region_sales["sales"].idxmax(), "region"]

region_sales["highlight"] = np.where(
    region_sales["region"] == top_region,
    "Highest Sales",
    "Other Regions"
)

region_sales
region sales highlight
0 East 1559 Other Regions
1 North 1800 Highest Sales
2 South 1674 Other Regions

Creating the Bar Plot

fig = px.bar(
    region_sales,
    x="region",
    y="sales",
    color="highlight",
    text="sales",
    title="Highlighting Only the Largest Category",
    color_discrete_map={
        "Highest Sales": "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_traces(textposition="outside")

fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()
ImportantText Position

In Plotly, textposition controls where the label (text) is placed relative to a data point, bar, or shape

  • "inside" → text inside the bar
  • "outside" → text outside (at the end of the bar)
  • "auto" → smart placement (inside unless too small → then outside)
  • "none" → hides text
TipTry Yourself
  • change the highlight color
  • change the neutral color
  • replace sales with customers
  • repeat the same logic for product

Highlighting the Largest Category in a Donut Chart

A donut chart can also highlight the largest category very effectively.

Preparing the Data

product_sales = (
    df_plotly.groupby("product", as_index=False)["sales"]
             .sum()
)

top_product = product_sales.loc[product_sales["sales"].idxmax(), "product"]

product_sales["highlight"] = np.where(
    product_sales["product"] == top_product,
    "Highest Sales",
    "Other Products"
)

product_sales
product sales highlight
0 A 2699 Highest Sales
1 B 2334 Other Products
TipTry It Yourself
  • change the hole size
  • highlight by customers instead of sales
  • create the same chart for region

Creating the Visualization

fig = px.pie(
    product_sales,
    names="product",
    values="sales",
    color="highlight",
    hole=0.5,
    title="Highlighting Only the Largest Category",
    color_discrete_map={
        "Highest Sales": "#3B6EAD",
        "Other Products": "#D9D9D9"
    }
)

fig.update_traces(textinfo="label+percent")

fig.show()

Highlighting the Smallest Category in a Scatter Plot

In a scatter plot, this technique can be used to highlight only the observations that belong to the smallest category.

Here we first identify the region with the highest total sales, then color all points from that region differently.

Preparing the Data

region_totals = (
    df_plotly.groupby("region", as_index=False)["sales"]
             .sum()
)

top_region = region_totals.loc[region_totals["sales"].idxmin(), "region"]

df_scatter = df_plotly.copy()

df_scatter["highlight"] = np.where(
    df_scatter["region"] == top_region,
    "Highest Sales Region",
    "Other Regions"
)

df_scatter.head()
month region product sales customers units month_name highlight
0 2024-01-01 North A 182 71 38 Jan Other Regions
1 2024-01-01 North B 94 80 30 Jan Other Regions
2 2024-01-01 South A 182 43 12 Jan Other Regions
3 2024-01-01 South B 132 21 33 Jan Other Regions
4 2024-01-01 East A 117 21 69 Jan Highest Sales Region
TipTry It Yourself
  • change the highlighted grouping from region to product
  • change x="customers" to x="units"
  • add size="units"

Building the Plot

fig = px.scatter(
    df_scatter,
    x="customers",
    y="sales",
    color="highlight",
    hover_data=["region", "product", "month_name", "units"],
    title="Highlighting the Largest Category",
    color_discrete_map={
        "Highest Sales Region": "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Customers",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()

Highlighting the Largest Category in a Bubble Chart

A bubble chart is similar to a scatter plot, but it adds size as another visual dimension.

Building the Data

df_bubble = df_plotly.copy()

df_bubble["highlight"] = np.where(
    df_bubble["region"] == top_region,
    "Highest Sales Region",
    "Other Regions"
)

df_bubble.head()
month region product sales customers units month_name highlight
0 2024-01-01 North A 182 71 38 Jan Other Regions
1 2024-01-01 North B 94 80 30 Jan Other Regions
2 2024-01-01 South A 182 43 12 Jan Other Regions
3 2024-01-01 South B 132 21 33 Jan Other Regions
4 2024-01-01 East A 117 21 69 Jan Highest Sales Region
TipTry It Yourself
  • change size="units" to another numeric variable
  • highlight the top product instead of the top region
  • compare the bubble chart to the simpler scatter plot
fig = px.scatter(
    df_bubble,
    x="customers",
    y="sales",
    size="units",
    color="highlight",
    hover_data=["region", "product", "month_name"],
    title="Highlighting the Largest Category",
    color_discrete_map={
        "Highest Sales Region": "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Customers",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()

Highlighting the Largest Category in a Line Chart

For a simple line chart, there is only one line, so highlighting a category is not applicable in exactly the same way. But we can highlight the maximum point instead.

This is a closely related analytical idea.

xw

monthly_sales = (
    df_plotly.groupby("month", as_index=False)["sales"]
             .sum()
)

max_month_row = monthly_sales.loc[monthly_sales["sales"].idxmax()]
max_month_row
month    2024-05-01 00:00:00
sales                    934
Name: 4, dtype: object
TipTry It Yourself
  • highlight the minimum point instead
  • change marker size
  • change marker color
fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Highlighting the Maximum Point"
)

fig.add_scatter(
    x=[max_month_row["month"]],
    y=[max_month_row["sales"]],
    mode="markers",
    marker=dict(size=14, color="#3B6EAD"),
    name="Maximum"
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()

Highlighting the Largest Category in a Multi-Line Chart

For a grouped line chart, we can highlight the category with the highest total and keep the other lines neutral.

Data Preperation

monthly_region_sales = (
    df_plotly.groupby(["month", "region"], as_index=False)["sales"]
             .sum()
)

region_totals = (
    monthly_region_sales.groupby("region", as_index=False)["sales"]
                        .sum()
)

top_region = region_totals.loc[region_totals["sales"].idxmax(), "region"]

monthly_region_sales["highlight"] = np.where(
    monthly_region_sales["region"] == top_region,
    monthly_region_sales["region"],
    "Other Regions"
)

monthly_region_sales.head()
month region sales highlight
0 2024-01-01 East 217 Other Regions
1 2024-01-01 North 276 North
2 2024-01-01 South 314 Other Regions
3 2024-02-01 East 273 Other Regions
4 2024-02-01 North 305 North
TipTry Yourself
  • highlight the top product instead of the top region
  • change the highlighted color
  • change line width using update_traces()

Building the Plot

fig = px.line(
    monthly_region_sales,
    x="month",
    y="sales",
    color="highlight",
    markers=True,
    title="Highlighting Only the Largest Category",
    color_discrete_map={
        top_region: "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()

Highlighting the Largest Category in a Histogram

A histogram shows a distribution, so categories are not always directly visible. But if we color observations by whether they belong to the largest category, we can compare the highlighted group against all others.

Preperating the Data

df_hist = df_plotly.copy()

df_hist["highlight"] = np.where(
    df_hist["region"] == top_region,
    "Highest Sales Region",
    "Other Regions"
)

df_hist.head()
month region product sales customers units month_name highlight
0 2024-01-01 North A 182 71 38 Jan Highest Sales Region
1 2024-01-01 North B 94 80 30 Jan Highest Sales Region
2 2024-01-01 South A 182 43 12 Jan Other Regions
3 2024-01-01 South B 132 21 33 Jan Other Regions
4 2024-01-01 East A 117 21 69 Jan Other Regions

barmode="overlay" = allows overlap for comparison

TipTry Yourself
  • switch from sales to customers
  • try barmode="group"
  • reduce opacity for cleaner overlap

Building the Plot

fig = px.histogram(
    df_hist,
    x="sales",
    color="highlight",
    barmode="overlay",
    opacity=0.7,
    title="Distribution with Highlighted Largest Category",
    color_discrete_map={
        "Highest Sales Region": "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Sales",
    yaxis_title="Count",
    legend_title=""
)

fig.show()

Highlighting the Largest Category in a Grouped Bar Plot

For a grouped bar plot, we can highlight only the subgroup belonging to the largest category.

Here we highlight the product with the highest total sales.

Preparing the Data

region_product_sales = (
    df_plotly.groupby(["region", "product"], as_index=False)["sales"]
             .sum()
)

product_totals = (
    region_product_sales.groupby("product", as_index=False)["sales"]
                        .sum()
)

top_product = product_totals.loc[product_totals["sales"].idxmax(), "product"]

region_product_sales["highlight"] = np.where(
    region_product_sales["product"] == top_product,
    top_product,
    "Other Products"
)

region_product_sales
region product sales highlight
0 East A 853 A
1 East B 706 Other Products
2 North A 888 A
3 North B 912 Other Products
4 South A 958 A
5 South B 716 Other Products
TipTry It Yourself
  • instead of product, highlight the top region
  • use customers instead of sales
  • switch to stacked bars

Building the Plot

fig = px.bar(
    region_product_sales,
    x="region",
    y="sales",
    color="highlight",
    barmode="group",
    title="Highlighting Only the Largest Category",
    color_discrete_map={
        top_product: "#3B6EAD",
        "Other Products": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()

Vertical and Horizontal Reference Lines

Sometimes a chart becomes much more useful when we add a reference line. A reference line helps us move beyond simply displaying values and begin interpreting them.

Vertical and horizontal lines are commonly used to show:

  • average values
  • targets or thresholds
  • important dates
  • campaign launch points
  • policy changes
  • before-and-after comparisons

These lines are not decorative. They are analytical tools. They help the audience answer questions such as:

  • Which observations are above average?
  • Which values are below the target?
  • What happened after a major event?
  • Did the pattern change after a certain date?

Why Reference Lines Matter

A chart without a reference line may show a pattern, but a chart with a reference line often shows meaning more clearly.

For example:

  • a line chart of monthly sales shows trend
  • a line chart of monthly sales with an average line shows which months performed above or below the overall benchmark
  • a line chart with a campaign start marker helps separate pre-campaign and post-campaign periods

So reference lines help connect the chart to a business question.

Horizontal Lines

A horizontal line is usually used to show a benchmark on the y-axis.

Common examples include:

  • average sales
  • target revenue
  • minimum acceptable performance
  • upper control limit
  • lower control limit

Suppose we want to examine monthly sales and compare each month against the average monthly sales.

First, let us create the summary table.

monthly_sales = (
    df_plotly.groupby("month", as_index=False)["sales"]
             .sum()
)

monthly_sales
month sales
0 2024-01-01 807
1 2024-02-01 810
2 2024-03-01 831
3 2024-04-01 855
4 2024-05-01 934
5 2024-06-01 796

Now let us compute the average.

avg_sales = monthly_sales["sales"].mean()
avg_sales
np.float64(838.8333333333334)
Understanding the Main Arguments

Before executing the chart, let us understand the important parts.

We first create the chart with px.line().

Then we add a reference line with fig.add_hline().

The most important arguments are:

  • y = the y-value where the horizontal line will be placed
  • line_dash = controls the line style, such as solid, dash, or dot
  • annotation_text = the label shown near the line
  • annotation_position = where the label appears

In our example:

  • y=avg_sales places the line at the average sales value
  • line_dash="dash" makes the line visually different from the main chart line
  • annotation_text="Average Sales" adds an explanatory label

Try It Yourself

Try to experiment with the following changes:

  • change the line style from "dash" to "dot"
  • replace the average with a fixed target value such as 900
  • change the annotation text
  • change the annotation position
fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Line Plot with Horizontal Reference Line"
)

fig.add_hline(
    y=avg_sales,
    line_dash="dash",
    annotation_text="Average Sales",
    annotation_position="top left"
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()

Interpretation

This chart helps answer:

  • Which months are above average?
  • Which months are below average?
  • Is performance consistently above the benchmark or not?

Vertical Lines

A vertical line is usually used to mark a specific point on the x-axis.

Common examples include:

  • campaign start date
  • product launch date
  • policy change date
  • market shock
  • start of a promotion

When the x-axis is time, vertical lines are especially useful.

Suppose a campaign started on April 1, 2024, and we want to mark that point on the chart.

Important Note

Important

When working with Plotly and datetime axes, add_vline() with annotation_text=... may raise errors in some environments. A safer and more reliable pattern is:

  • add the vertical line with fig.add_vline()
  • add the label separately with fig.add_annotation()

Understanding the Main Arguments

For the vertical line:

  • x = the x-value where the vertical line will be placed
  • line_dash = the style of the line
  • line_color = the color of the line

For the annotation:

  • x = where the label points
  • y = the vertical position of the label anchor
  • text = label text
  • showarrow=True = displays an arrow
  • ax and ay = control the arrow direction and text position

Try It Yourself

Try to experiment with the following changes:

  • move the event date to another month
  • change the line color
  • change the annotation text
  • adjust ax and ay to move the label
campaign_start = pd.Timestamp("2024-04-01")

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Line Plot with Vertical Event Line"
)

fig.add_vline(
    x=campaign_start,
    line_dash="dot",
    line_color="red"
)

fig.add_annotation(
    x=campaign_start,
    y=monthly_sales["sales"].max(),
    text="Campaign Start",
    showarrow=True,
    arrowhead=2,
    ax=40,
    ay=-40
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()

Interpretation

This chart helps answer:

  • What happened before the campaign?
  • What happened after the campaign?
  • Is there a visible shift in the pattern after the event?

Horizontal and Vertical Lines Together

In many real analytical situations, both types of lines are useful together.

For example, we may want to show:

  • the average performance level
  • the moment when a campaign started

This gives the audience both:

  • a benchmark
  • an event marker

When both lines appear together:

  • the horizontal line helps compare values against a benchmark
  • the vertical line helps compare periods before and after an event

This makes the chart much more analytical.

Try It Yourself

Try to experiment with the following changes:

  • change the benchmark from average to a target value
  • move the campaign start to another date
  • change the colors and line styles
  • change the annotation text
campaign_start = pd.Timestamp("2024-04-01")
avg_sales = monthly_sales["sales"].mean()

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Line Plot with Horizontal and Vertical Reference Lines"
)

fig.add_hline(
    y=avg_sales,
    line_dash="dash",
    annotation_text="Average Sales",
    annotation_position="top left"
)

fig.add_vline(
    x=campaign_start,
    line_dash="dot",
    line_color="red"
)

fig.add_annotation(
    x=campaign_start,
    y=monthly_sales["sales"].max(),
    text="Campaign Start",
    showarrow=True,
    arrowhead=2,
    ax=40,
    ay=-40
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()