Session 08: Data Visualization | Plotly

Plotly

Data Visualization

Intro to Plotly

Plotly is a Python library for creating interactive visualizations. It is widely used in analytics, data science, business intelligence, dashboards, and reporting because it allows users not only to see a chart, but also to interact with it.

Unlike static plotting libraries, Plotly charts allow the audience to:

hover over data points to see exact values
zoom into specific regions
pan across the chart
hide or show categories from the legend
inspect complex charts more carefully

This makes Plotly especially useful when we want to move beyond simple chart display and support deeper data exploration.

Tip

Explore plotly here

Why Plotly is Important

Plotly is important for several reasons.

it works very naturally with Pandas DataFrames, which means analysts can move directly from cleaned and transformed data into visualization.
it supports a wide variety of chart types, from basic charts to more advanced business and analytical visuals.
Plotly is highly useful in modern Python applications such as:
- Jupyter notebooks
- Quarto documents
- Dash applications
- Streamlit applications

So Plotly is not only a charting library. It is also part of a larger ecosystem for analytical communication and interactive reporting.

Static vs Interactive Visualization

A useful way to understand Plotly is to compare static and interactive charts.

A static chart gives one fixed view. It is suitable for printed reports, PDFs, or slides where the figure is meant to be consumed passively.

An interactive chart gives the user control. The reader can inspect exact values, focus on specific sections, or compare categories dynamically.

This does not mean interactive charts are always better. It means that Plotly is particularly strong when the audience benefits from exploration.

Plotly Architecture

Plotly in Python is usually used in two main ways:

Plotly Express
Graph Objects

Plotly Express

Plotly Express is the high-level interface.

It is designed to make chart creation fast, concise, and readable. In many cases, a complete interactive chart can be built in a single line.

It is especially useful when:

the data is already tidy
the goal is to build a standard chart quickly
we want to map variables to color, size, symbol, or facets in a simple way

Plotly Express is often the best starting point for analysts because it reduces boilerplate and helps students focus on chart logic rather than technical details.

Graph Objects

plotly.graph_objects is the lower-level interface.

It provides more control and flexibility. It is useful when:

we need custom traces
we want complex layouts
we need subplots
we want advanced annotations or specialized visual structures

In practice, many analysts start with Plotly Express and move to Graph Objects when they need more control.

Plotly and the Grammar of Graphics

Plotly also connects well to the idea of the grammar of graphics.

Instead of thinking only in terms of chart names, we can think in terms of components:

data: the table behind the chart
mapping: how variables are assigned to axes or visual properties
geometry: bars, lines, points, areas, flows
aesthetics: color, size, labels, symbols
annotations: average lines, reference markers, labels, notes

This way of thinking is useful because it teaches students that charting is not only about memorizing functions. It is about translating business questions into visual structure.

Common Chart Families in Plotly

Plotly supports many chart families. Some of the most common are:

bar plots for category comparison
line plots for trends over time
histograms for distributions
scatter plots for relationships between variables
multi-line charts for comparing trends across groups

These chart types form the foundation of most analytical reporting.

A Practical Workflow

A common analytical workflow with Plotly looks like this:

\[ \text{Raw Data} \rightarrow \text{Cleaning} \rightarrow \text{Transformation} \rightarrow \text{Aggregation} \rightarrow \text{Visualization in Plotly} \rightarrow \text{Insight} \]

This reminds students that visualization is not the first step. Plotly becomes most useful after the data is already structured for analysis.

Dummy Dataset for Plotly Examples

Before introducing the major chart types, let us create a small synthetic dataset.

import pandas as pd
import numpy as np
import plotly.express as px

np.random.seed(42)
px.defaults.template = "plotly_white"

months = pd.date_range("2024-01-01", periods=6, freq="MS")
regions = ["North", "South", "East"]
products = ["A", "B"]

rows = []

for month in months:
    for region in regions:
        for product in products:
            sales = np.random.randint(80, 220)
            customers = np.random.randint(20, 90)
            units = np.random.randint(10, 70)
            
            rows.append([month, region, product, sales, customers, units])

df_plotly = pd.DataFrame(
    rows,
    columns=["month", "region", "product", "sales", "customers", "units"]
)

df_plotly["month_name"] = df_plotly["month"].dt.strftime("%b")
df_plotly.head()

	month	region	product	sales	customers	units	month_name
0	2024-01-01	North	A	182	71	38	Jan
1	2024-01-01	North	B	94	80	30	Jan
2	2024-01-01	South	A	182	43	12	Jan
3	2024-01-01	South	B	132	21	33	Jan
4	2024-01-01	East	A	117	21	69	Jan

This dummy dataset gives us:

a time variable: month
categorical variables: region, product
numeric variables: sales, customers, units

This structure is enough to introduce the most common chart types in Plotly.

Random Seed

What does it mean np.random.seed(42)?

Computers are not truly random. They use algorithms called pseudo-random number generators (PRNGs).

These algorithms:

Take an initial value → the seed
Then produce a sequence of numbers based on it

If you use the same seed, you get exactly the same sequence every time

Plotly Templates

Here you can explore plotly templates. In the scope of this program we will stick with the plotly_white. However I highly encourage you to explore other themes and adjust for your prjects:

px.defaults.template = "plotly_white"

The default template is 'plotly'

Available templates:

‘ggplot2’
‘seaborn’
‘simple_white’
‘plotly’,
‘plotly_white’
‘plotly_dark’
‘presentation’
‘xgridoff’,
‘ygridoff’
‘gridon’
‘none’

Example 1: Bar Plot

A bar plot is used when we want to compare values across discrete categories.

Typical examples include:

sales by region
customers by segment
revenue by product category

A bar plot is useful when we want to compare values across categories.

For this example, we may want to compare total sales across regions.

region_sales = (
    df_plotly.groupby("region", as_index=False)["sales"]
             .sum()
)

region_sales

	region	sales
0	East	1559
1	North	1800
2	South	1674

fig = px.bar(
    region_sales,
    x="region",
    y="sales",
    text="sales",
    title="Total Sales by Region"
)

fig.update_traces(textposition="outside")
fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Sales"
)

fig.show()

Interpretation

This chart helps answer:

Which region has the highest total sales?
Which region performs the weakest?
How large are the differences across regions?

Example 2: Line Plot

A line plot is used when the x-axis has an order, most often time, in other words line plot is useful for showing change over time.

Typical examples include:

monthly sales
daily website traffic
weekly active users

For example, we may want to study monthly total sales.

monthly_sales = (
    df_plotly.groupby("month", as_index=False)["sales"]
             .sum()
)

monthly_sales

	month	sales
0	2024-01-01	807
1	2024-02-01	810
2	2024-03-01	831
3	2024-04-01	855
4	2024-05-01	934
5	2024-06-01	796

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Monthly Total Sales"
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()

Interpretation

This chart helps answer:

Is sales performance increasing or decreasing?
Which month had the highest sales?
Are there visible fluctuations over time?

Example 3: Histogram

A histogram is used to show the distribution of a numeric variable.

Typical examples include:

revenue distribution
customer age distribution
distribution of transaction amounts

A histogram is used to show the distribution of a numeric variable.

For example, we may want to understand how sales values are distributed across all observations.

fig = px.histogram(
    df_plotly,
    x="sales",
    nbins=15,
    title="Distribution of Sales"
)

fig.update_layout(
    xaxis_title="Sales",
    yaxis_title="Count"
)

fig.show()

Interpretation

This chart helps answer:

Are most sales values concentrated in one range?
Is the distribution symmetric or skewed?
Are there unusually small or large values?

Example 4: Scatter Plot

A scatter plot is used to study the relationship between two numeric variables.

Typical examples include:

advertising spend vs sales
income vs spending
customers vs revenue

For example, we may want to see whether more customers are associated with higher sales.

fig = px.scatter(
    df_plotly,
    x="customers",
    y="sales",
    color="region",
    hover_data=["product", "month_name", "units"],
    title="Customers vs Sales"
)

fig.update_layout(
    xaxis_title="Customers",
    yaxis_title="Sales",
    legend_title="Region"
)

fig.show()

Interpretation

This chart helps answer:

Do higher customer counts tend to correspond to higher sales?
Are there outliers?
Do regions behave differently?

Example 5: Multi-Line Chart

A multi-line chart is a grouped line chart. It allows us to compare trends across categories over time.

Typical examples include:

sales by region over time
churn rate by segment across months
traffic by channel over several weeks

For example, we may want to compare monthly sales by region.

monthly_region_sales = (
    df_plotly.groupby(["month", "region"], as_index=False)["sales"]
             .sum()
)

monthly_region_sales.head()

	month	region	sales
0	2024-01-01	East	217
1	2024-01-01	North	276
2	2024-01-01	South	314
3	2024-02-01	East	273
4	2024-02-01	North	305

fig = px.line(
    monthly_region_sales,
    x="month",
    y="sales",
    color="region",
    markers=True,
    title="Monthly Sales by Region"
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales",
    legend_title="Region"
)

fig.show()

Interpretation

This chart helps answer:

Which region is strongest over time?
Which region is most volatile?
Do all regions move in the same direction?

Example 6: Grouped Bar Plot

A grouped bar plot extends the basic bar plot by introducing one more categorical variable. Instead of showing only one bar per category, it allows us to compare subgroups inside each main category.

For example, we may want to compare sales by region and product at the same time. This helps us answer not only which region performs better, but also whether the same pattern holds across products.

region_product_sales = (
    df_plotly.groupby(["region", "product"], as_index=False)["sales"]
             .sum()
)

region_product_sales

	region	product	sales
0	East	A	853
1	East	B	706
2	North	A	888
3	North	B	912
4	South	A	958
5	South	B	716

Understanding the Main Arguments

Before executing the code, let us understand the key arguments of px.bar() in this example.

data_frame = the dataset used for plotting
x = the main categorical variable shown on the x-axis
y = the numeric variable represented by the bar height
color = the variable used to split bars into subgroups
barmode="group" = places the subgroup bars side by side
title = chart title

In our example:

x="region" means each main category on the x-axis is a region
y="sales" means bar heights represent sales
color="product" means each region is split into product-based bars
barmode="group" means those bars appear side by side instead of stacked

fig = px.bar(
    region_product_sales,
    x="region",
    y="sales",
    color="product",
    barmode="group",
    title="Sales by Region and Product"
)

fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Sales",
    legend_title="Product"
)

fig.show()

Interpretation

This chart helps answer:

Which product performs better within each region?
Are some regions strong across both products?
Does one product dominate across all regions?

Try It Yourself

Students should experiment with the following changes:

change y="sales" to another numeric column such as customers or units
change color="product" to color="region" and observe what happens
change barmode="group" to barmode="stack"
change the title to reflect the new chart meaning

Example 7: Donut Chart

A donut chart is a variation of a pie chart with a hole in the center. It is useful when we want to show how a total is divided across categories.

For example, we may want to understand how total sales are distributed across products.

product_sales = (
    df_plotly.groupby("product", as_index=False)["sales"]
             .sum()
)

product_sales

	product	sales
0	A	2699
1	B	2334

Understanding the Main Arguments

Before executing the code, let us understand the key arguments of px.pie() in this example.

data_frame = the dataset used for plotting
names = the categorical variable that defines the slices
values = the numeric variable that determines slice sizes
hole = controls the size of the empty center and turns the pie chart into a donut chart
title = chart title

In our example:

names="product" means each slice represents a product
values="sales" means slice size depends on total sales
hole=0.5 creates the donut shape

Try It Yourself

Students should experiment with the following changes:

change values="sales" to values="customers" or values="units"
change hole=0.5 to hole=0.2 or hole=0.7
change names="product" to names="region" after preparing a suitable aggregated table
change the title to reflect the new chart meaning

fig = px.pie(
    product_sales,
    names="product",
    values="sales",
    hole=0.5,
    title="Share of Total Sales by Product"
)

fig.update_traces(textinfo="label+percent")

fig.show()

Changing Colors in Plotly

In Plotly, colors can be changed in several ways depending on the chart type and how much control you want.

The most common approaches are:

set a single color for the whole chart
assign different colors by category
provide a custom color sequence
manually control colors in traces

Tip

Here you can find some interesting pallets.

We are going to deep dive here during the tableau sessions.

Common Color Formats in Plotly

Plotly accepts several color formats:

named colors: "blue", "red", "green"
hex colors: "#3B6EAD"
RGB: "rgb(59,110,173)"
RGBA: "rgba(59,110,173,0.5)"

Single Color for the Whole Plot

If you want all bars, points, or lines to have the same color, you can use color_discrete_sequence.

fig = px.bar(
    region_sales,
    x="region",
    y="sales",
    title="Total Sales by Region",
    color_discrete_sequence=["#a34e31"]
)

fig.show()

Here:

color_discrete_sequence=["steelblue"] tells Plotly to use one color
you can replace "steelblue" with any valid CSS color name or hex code

Try It Yourself

Change "#a34e31" to:

"orange"
"green"
"#3B6EAD"
"#B7C2D1"

Different Colors by Category

If your chart uses a grouping variable such as color="product" or color="region", Plotly automatically assigns colors.

You can override those defaults with color_discrete_sequence.

Checkout bellow Grouped Bar Plot with Custom Colors

fig = px.bar(
    region_product_sales,
    x="region",
    y="sales",
    color="product",
    barmode="group",
    title="Sales by Region and Product",
    color_discrete_sequence=["#3B6EAD", "#AFC4E8"]
)

fig.show()

Here:

the first category gets the first color
the second category gets the second color

Map Specific Categories to Specific Colors

If you want full control over which category gets which color, use color_discrete_map. In other words we can have Fixed Colors for Products

fig = px.bar(
    region_product_sales,
    x="region",
    y="sales",
    color="product",
    barmode="group",
    title="Sales by Region and Product",
    color_discrete_map={
        "A": "#3B6EAD",
        "B": "#AFC4E8"
    }
)

fig.show()

This is often better than color_discrete_sequence when you want consistency across many charts.

Change Line Colors

For line charts, the same logic applies.

Single Line Color

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Monthly Total Sales",
    color_discrete_sequence=["crimson"]
)

fig.show()

Multi-Line Colors

fig = px.line(
    monthly_region_sales,
    x="month",
    y="sales",
    color="region",
    markers=True,
    title="Monthly Sales by Region",
    color_discrete_sequence=["#1f77b4", "#ff7f0e", "#2ca02c"]
)

fig.show()

Change Histogram Colors

fig = px.histogram(
    df_plotly,
    x="sales",
    nbins=15,
    title="Distribution of Sales",
    color_discrete_sequence=["purple"]
)

fig.show()

Change Scatter Plot Colors

fig = px.scatter(
    df_plotly,
    x="customers",
    y="sales",
    title="Customers vs Sales",
    color_discrete_sequence=["darkorange"]
)

fig.show()

fig = px.scatter(
    df_plotly,
    x="customers",
    y="sales",
    color="region",
    title="Customers vs Sales by Region",
    color_discrete_sequence=["#3B6EAD", "#AFC4E8", "#B7C2D1"]
)

fig.show()

Change Colors After the Figure is Created

You can also modify colors after building the figure.

Example 1: Update Trace Color

fig = px.bar(
    region_sales,
    x="region",
    y="sales",
    title="Total Sales by Region"
)

fig.update_traces(marker_color="teal")

fig.show()

Example 2: Update Trace Color

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    title="Monthly Total Sales"
)

fig.update_traces(line_color="red")

fig.show()

Recommendtion

A practical rule for learning is:

use color_discrete_sequence when you just want nicer colors
use color_discrete_map when you want specific categories to always keep the same colors
use update_traces() when you want to modify the figure after it is already created

Try yourself

Take one of your earlier charts and try all of the following:

apply one single custom color
apply two category colors
map exact colors to product names
update the color after the figure is created

This helps you to understand that color in Plotly is not fixed. It is another argument they can control.

Highlighting Only the Specific Categories

Sometimes we want to guide the audience’s attention very deliberately. Instead of giving every category a different color, we can keep most categories in a neutral tone and highlight only the largest/smallest category.

This is a very useful analytical design technique because it helps the chart communicate one main message clearly.

It is especially useful when:

we want to emphasize the top-performing category
we want to reduce visual clutter
we want to make the most important comparison obvious
we want to keep the chart clean and readable

The same logic can be reused across multiple chart types.

The general pattern is:

aggregate the data
find the category with the highest/lowest value
create a helper column for coloring
assign one color to the largest category and another color to the rest

Highlighting the Largest Category in a Bar Plot

Preparing the Data

A bar plot is one of the most natural places to use this technique.

Build a summary table by Region: as_index=False the region stays normal column
Find the region with the highest sales: top_region would provide the largest value (idxmin() would return the lowest valued region)
Create a helper column for highlighting

region_sales = (
    df_plotly.groupby("region", as_index=False)["sales"]
             .sum()
)

top_region = region_sales.loc[region_sales["sales"].idxmax(), "region"]

region_sales["highlight"] = np.where(
    region_sales["region"] == top_region,
    "Highest Sales",
    "Other Regions"
)

region_sales

	region	sales	highlight
0	East	1559	Other Regions
1	North	1800	Highest Sales
2	South	1674	Other Regions

Creating the Bar Plot

fig = px.bar(
    region_sales,
    x="region",
    y="sales",
    color="highlight",
    text="sales",
    title="Highlighting Only the Largest Category",
    color_discrete_map={
        "Highest Sales": "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_traces(textposition="outside")

fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()

Text Position

In Plotly, textposition controls where the label (text) is placed relative to a data point, bar, or shape

"inside" → text inside the bar
"outside" → text outside (at the end of the bar)
"auto" → smart placement (inside unless too small → then outside)
"none" → hides text

Try Yourself

change the highlight color
change the neutral color
replace sales with customers
repeat the same logic for product

Highlighting the Largest Category in a Donut Chart

A donut chart can also highlight the largest category very effectively.

Preparing the Data

product_sales = (
    df_plotly.groupby("product", as_index=False)["sales"]
             .sum()
)

top_product = product_sales.loc[product_sales["sales"].idxmax(), "product"]

product_sales["highlight"] = np.where(
    product_sales["product"] == top_product,
    "Highest Sales",
    "Other Products"
)

product_sales

	product	sales	highlight
0	A	2699	Highest Sales
1	B	2334	Other Products

Try It Yourself

change the hole size
highlight by customers instead of sales
create the same chart for region

Creating the Visualization

fig = px.pie(
    product_sales,
    names="product",
    values="sales",
    color="highlight",
    hole=0.5,
    title="Highlighting Only the Largest Category",
    color_discrete_map={
        "Highest Sales": "#3B6EAD",
        "Other Products": "#D9D9D9"
    }
)

fig.update_traces(textinfo="label+percent")

fig.show()

Highlighting the Smallest Category in a Scatter Plot

In a scatter plot, this technique can be used to highlight only the observations that belong to the smallest category.

Here we first identify the region with the highest total sales, then color all points from that region differently.

Preparing the Data

region_totals = (
    df_plotly.groupby("region", as_index=False)["sales"]
             .sum()
)

top_region = region_totals.loc[region_totals["sales"].idxmin(), "region"]

df_scatter = df_plotly.copy()

df_scatter["highlight"] = np.where(
    df_scatter["region"] == top_region,
    "Highest Sales Region",
    "Other Regions"
)

df_scatter.head()

	month	region	product	sales	customers	units	month_name	highlight
0	2024-01-01	North	A	182	71	38	Jan	Other Regions
1	2024-01-01	North	B	94	80	30	Jan	Other Regions
2	2024-01-01	South	A	182	43	12	Jan	Other Regions
3	2024-01-01	South	B	132	21	33	Jan	Other Regions
4	2024-01-01	East	A	117	21	69	Jan	Highest Sales Region

Try It Yourself

change the highlighted grouping from region to product
change x="customers" to x="units"
add size="units"

Building the Plot

fig = px.scatter(
    df_scatter,
    x="customers",
    y="sales",
    color="highlight",
    hover_data=["region", "product", "month_name", "units"],
    title="Highlighting the Largest Category",
    color_discrete_map={
        "Highest Sales Region": "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Customers",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()

Highlighting the Largest Category in a Bubble Chart

A bubble chart is similar to a scatter plot, but it adds size as another visual dimension.

Building the Data

df_bubble = df_plotly.copy()

df_bubble["highlight"] = np.where(
    df_bubble["region"] == top_region,
    "Highest Sales Region",
    "Other Regions"
)

df_bubble.head()

	month	region	product	sales	customers	units	month_name	highlight
0	2024-01-01	North	A	182	71	38	Jan	Other Regions
1	2024-01-01	North	B	94	80	30	Jan	Other Regions
2	2024-01-01	South	A	182	43	12	Jan	Other Regions
3	2024-01-01	South	B	132	21	33	Jan	Other Regions
4	2024-01-01	East	A	117	21	69	Jan	Highest Sales Region

Try It Yourself

change size="units" to another numeric variable
highlight the top product instead of the top region
compare the bubble chart to the simpler scatter plot

fig = px.scatter(
    df_bubble,
    x="customers",
    y="sales",
    size="units",
    color="highlight",
    hover_data=["region", "product", "month_name"],
    title="Highlighting the Largest Category",
    color_discrete_map={
        "Highest Sales Region": "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Customers",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()

Highlighting the Largest Category in a Line Chart

For a simple line chart, there is only one line, so highlighting a category is not applicable in exactly the same way. But we can highlight the maximum point instead.

This is a closely related analytical idea.

monthly_sales = (
    df_plotly.groupby("month", as_index=False)["sales"]
             .sum()
)

max_month_row = monthly_sales.loc[monthly_sales["sales"].idxmax()]
max_month_row

month    2024-05-01 00:00:00
sales                    934
Name: 4, dtype: object

Try It Yourself

highlight the minimum point instead
change marker size
change marker color

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Highlighting the Maximum Point"
)

fig.add_scatter(
    x=[max_month_row["month"]],
    y=[max_month_row["sales"]],
    mode="markers",
    marker=dict(size=14, color="#3B6EAD"),
    name="Maximum"
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()

Highlighting the Largest Category in a Multi-Line Chart

For a grouped line chart, we can highlight the category with the highest total and keep the other lines neutral.

Data Preperation

monthly_region_sales = (
    df_plotly.groupby(["month", "region"], as_index=False)["sales"]
             .sum()
)

region_totals = (
    monthly_region_sales.groupby("region", as_index=False)["sales"]
                        .sum()
)

top_region = region_totals.loc[region_totals["sales"].idxmax(), "region"]

monthly_region_sales["highlight"] = np.where(
    monthly_region_sales["region"] == top_region,
    monthly_region_sales["region"],
    "Other Regions"
)

monthly_region_sales.head()

	month	region	sales	highlight
0	2024-01-01	East	217	Other Regions
1	2024-01-01	North	276	North
2	2024-01-01	South	314	Other Regions
3	2024-02-01	East	273	Other Regions
4	2024-02-01	North	305	North

Try Yourself

highlight the top product instead of the top region
change the highlighted color
change line width using update_traces()

Building the Plot

fig = px.line(
    monthly_region_sales,
    x="month",
    y="sales",
    color="highlight",
    markers=True,
    title="Highlighting Only the Largest Category",
    color_discrete_map={
        top_region: "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()

Highlighting the Largest Category in a Histogram

A histogram shows a distribution, so categories are not always directly visible. But if we color observations by whether they belong to the largest category, we can compare the highlighted group against all others.

Preperating the Data

df_hist = df_plotly.copy()

df_hist["highlight"] = np.where(
    df_hist["region"] == top_region,
    "Highest Sales Region",
    "Other Regions"
)

df_hist.head()

	month	region	product	sales	customers	units	month_name	highlight
0	2024-01-01	North	A	182	71	38	Jan	Highest Sales Region
1	2024-01-01	North	B	94	80	30	Jan	Highest Sales Region
2	2024-01-01	South	A	182	43	12	Jan	Other Regions
3	2024-01-01	South	B	132	21	33	Jan	Other Regions
4	2024-01-01	East	A	117	21	69	Jan	Other Regions

barmode="overlay" = allows overlap for comparison

Try Yourself

switch from sales to customers
try barmode="group"
reduce opacity for cleaner overlap

Building the Plot

fig = px.histogram(
    df_hist,
    x="sales",
    color="highlight",
    barmode="overlay",
    opacity=0.7,
    title="Distribution with Highlighted Largest Category",
    color_discrete_map={
        "Highest Sales Region": "#3B6EAD",
        "Other Regions": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Sales",
    yaxis_title="Count",
    legend_title=""
)

fig.show()

Highlighting the Largest Category in a Grouped Bar Plot

For a grouped bar plot, we can highlight only the subgroup belonging to the largest category.

Here we highlight the product with the highest total sales.

Preparing the Data

region_product_sales = (
    df_plotly.groupby(["region", "product"], as_index=False)["sales"]
             .sum()
)

product_totals = (
    region_product_sales.groupby("product", as_index=False)["sales"]
                        .sum()
)

top_product = product_totals.loc[product_totals["sales"].idxmax(), "product"]

region_product_sales["highlight"] = np.where(
    region_product_sales["product"] == top_product,
    top_product,
    "Other Products"
)

region_product_sales

	region	product	sales	highlight
0	East	A	853	A
1	East	B	706	Other Products
2	North	A	888	A
3	North	B	912	Other Products
4	South	A	958	A
5	South	B	716	Other Products

Try It Yourself

instead of product, highlight the top region
use customers instead of sales
switch to stacked bars

Building the Plot

fig = px.bar(
    region_product_sales,
    x="region",
    y="sales",
    color="highlight",
    barmode="group",
    title="Highlighting Only the Largest Category",
    color_discrete_map={
        top_product: "#3B6EAD",
        "Other Products": "#D9D9D9"
    }
)

fig.update_layout(
    xaxis_title="Region",
    yaxis_title="Sales",
    legend_title=""
)

fig.show()

Vertical and Horizontal Reference Lines

Sometimes a chart becomes much more useful when we add a reference line. A reference line helps us move beyond simply displaying values and begin interpreting them.

Vertical and horizontal lines are commonly used to show:

average values
targets or thresholds
important dates
campaign launch points
policy changes
before-and-after comparisons

These lines are not decorative. They are analytical tools. They help the audience answer questions such as:

Which observations are above average?
Which values are below the target?
What happened after a major event?
Did the pattern change after a certain date?

Why Reference Lines Matter

A chart without a reference line may show a pattern, but a chart with a reference line often shows meaning more clearly.

For example:

a line chart of monthly sales shows trend
a line chart of monthly sales with an average line shows which months performed above or below the overall benchmark
a line chart with a campaign start marker helps separate pre-campaign and post-campaign periods

So reference lines help connect the chart to a business question.

Horizontal Lines

A horizontal line is usually used to show a benchmark on the y-axis.

Common examples include:

average sales
target revenue
minimum acceptable performance
upper control limit
lower control limit

Suppose we want to examine monthly sales and compare each month against the average monthly sales.

First, let us create the summary table.

monthly_sales = (
    df_plotly.groupby("month", as_index=False)["sales"]
             .sum()
)

monthly_sales

	month	sales
0	2024-01-01	807
1	2024-02-01	810
2	2024-03-01	831
3	2024-04-01	855
4	2024-05-01	934
5	2024-06-01	796

Now let us compute the average.

avg_sales = monthly_sales["sales"].mean()
avg_sales

np.float64(838.8333333333334)

Understanding the Main Arguments

Before executing the chart, let us understand the important parts.

We first create the chart with px.line().

Then we add a reference line with fig.add_hline().

The most important arguments are:

y = the y-value where the horizontal line will be placed
line_dash = controls the line style, such as solid, dash, or dot
annotation_text = the label shown near the line
annotation_position = where the label appears

In our example:

y=avg_sales places the line at the average sales value
line_dash="dash" makes the line visually different from the main chart line
annotation_text="Average Sales" adds an explanatory label

Try It Yourself

Try to experiment with the following changes:

change the line style from "dash" to "dot"
replace the average with a fixed target value such as 900
change the annotation text
change the annotation position

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Line Plot with Horizontal Reference Line"
)

fig.add_hline(
    y=avg_sales,
    line_dash="dash",
    annotation_text="Average Sales",
    annotation_position="top left"
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()

Interpretation

This chart helps answer:

Which months are above average?
Which months are below average?
Is performance consistently above the benchmark or not?

Vertical Lines

A vertical line is usually used to mark a specific point on the x-axis.

Common examples include:

campaign start date
product launch date
policy change date
market shock
start of a promotion

When the x-axis is time, vertical lines are especially useful.

Suppose a campaign started on April 1, 2024, and we want to mark that point on the chart.

Important Note

Important

When working with Plotly and datetime axes, add_vline() with annotation_text=... may raise errors in some environments. A safer and more reliable pattern is:

add the vertical line with fig.add_vline()
add the label separately with fig.add_annotation()

Understanding the Main Arguments

For the vertical line:

x = the x-value where the vertical line will be placed
line_dash = the style of the line
line_color = the color of the line

For the annotation:

x = where the label points
y = the vertical position of the label anchor
text = label text
showarrow=True = displays an arrow
ax and ay = control the arrow direction and text position

Try It Yourself

Try to experiment with the following changes:

move the event date to another month
change the line color
change the annotation text
adjust ax and ay to move the label

campaign_start = pd.Timestamp("2024-04-01")

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Line Plot with Vertical Event Line"
)

fig.add_vline(
    x=campaign_start,
    line_dash="dot",
    line_color="red"
)

fig.add_annotation(
    x=campaign_start,
    y=monthly_sales["sales"].max(),
    text="Campaign Start",
    showarrow=True,
    arrowhead=2,
    ax=40,
    ay=-40
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()

Interpretation

This chart helps answer:

What happened before the campaign?
What happened after the campaign?
Is there a visible shift in the pattern after the event?

Horizontal and Vertical Lines Together

In many real analytical situations, both types of lines are useful together.

For example, we may want to show:

the average performance level
the moment when a campaign started

This gives the audience both:

a benchmark
an event marker

When both lines appear together:

the horizontal line helps compare values against a benchmark
the vertical line helps compare periods before and after an event

This makes the chart much more analytical.

Try It Yourself

Try to experiment with the following changes:

change the benchmark from average to a target value
move the campaign start to another date
change the colors and line styles
change the annotation text

campaign_start = pd.Timestamp("2024-04-01")
avg_sales = monthly_sales["sales"].mean()

fig = px.line(
    monthly_sales,
    x="month",
    y="sales",
    markers=True,
    title="Line Plot with Horizontal and Vertical Reference Lines"
)

fig.add_hline(
    y=avg_sales,
    line_dash="dash",
    annotation_text="Average Sales",
    annotation_position="top left"
)

fig.add_vline(
    x=campaign_start,
    line_dash="dot",
    line_color="red"
)

fig.add_annotation(
    x=campaign_start,
    y=monthly_sales["sales"].max(),
    text="Campaign Start",
    showarrow=True,
    arrowhead=2,
    ax=40,
    ay=-40
)

fig.update_layout(
    xaxis_title="Month",
    yaxis_title="Sales"
)

fig.show()