Data Analytics Bootcamp
  • Syllabus
  • Statistical Thinking
  • SQL
  • Python
  • Tableau
  • Lab
  • Capstone
  1. Python
  2. Python
  3. Slides
  4. Grammar of Graphics
  • Syllabus
  • Statistical Thinking
    • Statistics
      • Statistics Session 01: Data Layers and Bias in Data
      • Statistics Session 02: Data Types
      • Statistics Session 03: Probabilistic Distributions
      • Statistics Session 04: Probabilistic Distributions
      • Statistics Session 05: Sampling
      • Statistics Session 06: Inferential Statistics
      • Slides
        • Course Intro
        • Descriptive Stats
        • Data Types
        • Continuous Distributions
        • Discrete Distributions
        • Sampling
        • Hypothesis Testing
  • SQL
    • SQL
      • Session 01: Intro to Relational Databases
      • Session 02: Intro to PostgreSQL
      • Session 03: DA with SQL | Data Types & Constraints
      • Session 04: DA with SQL | Filtering
      • Session 05: DA with SQL | Numeric Functions
      • Session 06: DA with SQL | String Functions
      • Session 07: DA with SQL | Date Functions
      • Session 08: DA with SQL | JOINs
      • Session 09: DA with SQL | Advanced SQL
      • Session 10: DA with SQL | Advanced SQL Functions
      • Session 11: DA with SQL | UDFs, Stored Procedures
      • Session 12: DA with SQL | Advanced Aggregations
      • Session 13: DA with SQL | Final Project
      • Slides
        • Intro to Relational Databases
        • Intro to PostgreSQL
        • Basic Queries: DDL DLM
        • Filtering
        • Numeric Functions
        • String Functions
        • Date Functions
        • Normalization and JOINs
        • Temporary Tables
        • Advanced SQL Functions
        • Reporting and Analysis with SQL
        • Advanced Aggregations
  • Python
    • Python
      • Session 01: Programming for Data Analysts
      • Session 02: Python basic Syntax, Data Structures
      • Session 03: Introduction to Pandas
      • Session 04: Advanced Pandas
      • Session 05: Intro to Data Visualization
      • Session 06: Data Visualization
      • Session 07: Working with Dates
      • Session 08: Data Visualization | Plotly
      • Session 09: Customer Segmentation | RFM
      • Session 10: A/B Testing
      • Session 11: Cohort Analysis
      • Session 12: Simple Linear Regression and Forecasting
      • Session 13: Logistic Regression
      • Session 14: Clustering
      • Session 15: Geoanalytics
      • Session 16: SQL Alchemy
      • Slides
        • Grammar of Graphics
        • Data Analyst
  • Tableau
    • Tableau
      • Tableau Session 01: Introduction to Tableau
      • Tableau Session 02: Intermediate Visual Analytics
      • Tableau Session 03: Advanced Analytics
      • Tableau Session 04: Dashboard Design & Performance
      • Slides
        • Data Analyst
        • Data Analyst
        • Data Analyst
        • Data Analyst

On this page

  • Why Do We Need Grammar of Graphics?
  • Main Components of Grammar of Graphics
  • Example: Scatter Plot
  • Plotly Example
  • How Plotly Arguments Match Grammar of Graphics
  • Example: Bar Chart
  • Example: Line Chart
  • Choosing the Right Geometry
  • Key Teaching Point
  • Practical Rule
  • Summary
  1. Python
  2. Python
  3. Slides
  4. Grammar of Graphics

Grammar of Graphics

Grammar of Graphics is a systematic way to build visualizations by combining several independent components.

Instead of thinking only in chart names such as bar chart, scatter plot, or line chart, the Grammar of Graphics helps us understand the structure behind a visualization.

A chart is created by combining:

  • data
  • variables
  • visual mappings
  • geometric objects
  • statistical transformations
  • scales
  • coordinate systems
  • facets
  • themes

In simple words:

\[ \text{Chart} = \text{Data} + \text{Mapping} + \text{Geometry} + \text{Layers} \]

Why Do We Need Grammar of Graphics?

When students start learning data visualization, they often memorize chart types:

  • bar chart
  • line chart
  • scatter plot
  • histogram
  • box plot

However, this is not enough.

A better approach is to understand how data becomes a visual object.

For example, when we create a scatter plot, we are not simply choosing a chart type. We are making several decisions:

  • which variable goes to the x-axis
  • which variable goes to the y-axis
  • whether color should represent a category
  • whether size should represent a numeric value
  • whether we need a trend line
  • whether we need separate charts for different groups

This is the main idea behind the Grammar of Graphics.

Main Components of Grammar of Graphics

Component Meaning Example
Data The dataset used for visualization customers, sales, transactions
Aesthetics Mapping variables to visual properties x-axis, y-axis, color, size
Geometry The visual object used in the chart points, bars, lines
Statistics Data transformation before plotting count, mean, regression line
Scales How data values are shown visually axis scale, color scale
Coordinates The coordinate system Cartesian, polar
Facets Splitting one chart into smaller charts one chart per region
Theme Non-data design elements title, font, gridlines

Example: Scatter Plot

Suppose we have a customer dataset with the following variables:

Variable Description
age Customer age
income Customer income
segment Customer segment
monthly_spend Monthly spending
region Customer region

We want to visualize the relationship between age and income.

Using the Grammar of Graphics, we can describe the chart as follows:

Grammar Component Choice
Data customer dataset
x-axis age
y-axis income
color customer segment
size monthly spend
facet region
geometry points

So the chart can be described as:

\[ \text{Scatter Plot} = \text{Customers Data} + x(\text{age}) + y(\text{income}) + color(\text{segment}) + geom\_point() \]

Plotly Example

import plotly.express as px

fig = px.scatter(
    data_frame=df,
    x="age",
    y="income",
    color="segment",
    size="monthly_spend",
    facet_col="region",
    trendline="ols",
    title="Relationship Between Customer Age and Income"
)

fig.show()

How Plotly Arguments Match Grammar of Graphics

Grammar Concept Plotly Argument
Data data_frame=df
x aesthetic x="age"
y aesthetic y="income"
color aesthetic color="segment"
size aesthetic size="monthly_spend"
facet facet_col="region"
statistical layer trendline="ols"
geometry scatter points

Example: Bar Chart

Suppose we want to compare total sales by region.

In Grammar of Graphics terms:

Component Choice
Data sales dataset
x-axis region
y-axis total sales
geometry bars
statistic sum
fig = px.bar(
    data_frame=sales_by_region,
    x="region",
    y="total_sales",
    title="Total Sales by Region"
)

fig.show()

This chart can be described as:

\[ \text{Bar Chart} = \text{Sales Data} + x(\text{region}) + y(\text{total sales}) + geom\_bar() \]

Example: Line Chart

Suppose we want to show how sales changed over time.

Component Choice
Data sales dataset
x-axis date
y-axis sales
geometry line
fig = px.line(
    data_frame=daily_sales,
    x="date",
    y="sales",
    title="Sales Trend Over Time"
)

fig.show()

The chart can be described as:

\[ \text{Line Chart} = \text{Sales Data} + x(\text{date}) + y(\text{sales}) + geom\_line() \]

Choosing the Right Geometry

Different analytical questions require different geometries.

Analytical Question Recommended Geometry Common Chart
Compare categories Bars Bar chart
Show trend over time Lines Line chart
Show relationship Points Scatter plot
Show distribution Bars or boxes Histogram, box plot
Show composition Stacked bars or areas Stacked bar, area chart
Compare groups Facets or color Small multiples, grouped charts

Key Teaching Point

The most important lesson is:

A visualization is not just a chart type.
A visualization is a structured mapping between data and visual elements.

Students should not only ask:

Which chart should I use?

They should also ask:

Which variables should be mapped to which visual features?

Practical Rule

Before creating a chart, answer three questions:

  1. What data do I have?
  2. Which variables are important?
  3. Which visual encoding communicates the message best?

Summary

Grammar of Graphics gives us a structured way to think about visualization.

It helps us understand that every chart is built from several components:

  • data
  • aesthetics
  • geometry
  • statistics
  • scales
  • coordinates
  • facets
  • theme

This approach is useful because it makes data visualization more analytical and less mechanical.

Instead of memorizing chart types, we learn how to build charts logically.