Data Analytics Bootcamp
  • Syllabus
  • Statistical Thinking
  • SQL
  • Python
  • Tableau
  • Lab
  • Capstone
  1. Statistical Thinking
  2. Statistics
  3. Statistics Session 04: Probabilistic Distributions
  • Syllabus
  • Statistical Thinking
    • Statistics
      • Statistics Session 01: Data Layers and Bias in Data
      • Statistics Session 02: Data Types
      • Statistics Session 03: Probabilistic Distributions
      • Statistics Session 04: Probabilistic Distributions
      • Statistics Session 05: Sampling
      • Statistics Session 06: Inferential Statistics
      • Slides
        • Course Intro
        • Descriptive Stats
        • Data Types
        • Continuous Distributions
        • Discrete Distributions
        • Sampling
        • Hypothesis Testing
  • SQL
    • SQL
      • Session 01: Intro to Relational Databases
      • Session 02: Intro to PostgreSQL
      • Session 03: DA with SQL | Data Types & Constraints
      • Session 04: DA with SQL | Filtering
      • Session 05: DA with SQL | Numeric Functions
      • Session 06: DA with SQL | String Functions
      • Session 07: DA with SQL | Date Functions
      • Session 08: DA with SQL | JOINs
      • Session 09: DA with SQL | Advanced SQL
      • Session 10: DA with SQL | Advanced SQL Functions
      • Session 11: DA with SQL | UDFs, Stored Procedures
      • Session 12: DA with SQL | Advanced Aggregations
      • Session 13: DA with SQL | Final Project
      • Slides
        • Intro to Relational Databases
        • Intro to PostgreSQL
        • Basic Queries: DDL DLM
        • Filtering
        • Numeric Functions
        • String Functions
        • Date Functions
        • Normalization and JOINs
        • Temporary Tables
        • Advanced SQL Functions
        • Reporting and Analysis with SQL
        • Advanced Aggregations
  • Python
    • Python
      • Session 01: Programming for Data Analysts
      • Session 02: Python basic Syntax, Data Structures
      • Session 03: Introduction to Pandas
      • Session 04: Advanced Pandas
      • Session 05: Intro to Data Visualization
      • Session 06: Data Visualization
      • Session 07: Working with Dates
      • Session 08: Data Visualization | Plotly
      • Session 09: Customer Segmentation | RFM
      • Slides
        • Data Analyst
  • Tableau
    • Tableau
      • Tableau Session 01: Introduction to Tableau
      • Tableau Session 02: Intermediate Visual Analytics
      • Tableau Session 03: Advanced Analytics
      • Tableau Session 04: Dashboard Design & Performance
      • Slides
        • Data Analyst
        • Data Analyst
        • Data Analyst
        • Data Analyst

On this page

  • Topics
  • Discrete Distributions
    • Discrete Random Variables
    • Probability Mass Function
    • Why Descrete Destributions are Essential?
  • Poisson Distribution
    • Real-World Narrative
    • Probability Mass Function
    • Expected Value and Variance
    • Business Interpretation of \(\lambda\)
    • Business Applications of Poisson Distribution
    • Spreadsheet Demonstration
    • Relationship to Continuous Models
    • Visualization
    • Interpretation
    • Case Study
  • Bernoulli Distribution
  • What Is a Bernoulli Trial
  • Real-World Narrative
  • Probability Mass Function (PMF)
  • Expected Value and Variance
  • Business Interpretation of \(p\)
  • Spreadsheet Demonstration
    • Case Study
    • Defining the Random Variable
  • Final Intuition Summary
  • Central Limit Theorem (CLT)
    • Big Idea
    • Why This Matters
    • Intuition with an Example
    • Simulation of CLT
    • What the Simulation Shows
    • Conditions for the CLT to Work
    • Practical Importance
    • Summary
  1. Statistical Thinking
  2. Statistics
  3. Statistics Session 04: Probabilistic Distributions

Statistics Session 04: Probabilistic Distributions

Discrete Probability Distributions

statistics
poison
bernuli
discrete

Topics

  • Discrete Distributions
  • Poisson Distribution
  • Bernuli Distribution
  • Central Limit Theorem | CLT

Discrete Distributions

Up to this point, we focused on continuous distributions, where variables can take infinitely many values within a range.

Now we move to discrete distributions, which model situations where outcomes are countable.

This shift is important because many business problems are naturally discrete.

A distribution is discrete if:

  • The random variable takes countable values
  • Probabilities are assigned to exact outcomes
  • The total probability is obtained by summing, not integrating

Examples of discrete outcomes:

  • Number of purchases made today
  • Number of customers who churned this month
  • Number of defects in a production batch
  • Whether a user clicked an ad (yes / no)

Discrete Random Variables

A discrete random variable is a numerical description of a countable outcome.

Typical forms:

  • Binary outcomes: \(X \in \{0,1\}\)
  • Counts: \(X = 0,1,2,3,\dots\)

Examples:

  • \(X = 1\) if a customer churns, \(0\) otherwise
  • \(X\) = number of purchases per user
  • \(X\) = number of support tickets per day

Probability Mass Function

Discrete distributions are described by a Probability Mass Function (PMF).

The PMF answers the question:

What is the probability that the random variable equals a specific value?

Mathematically:

\[ P(X = x) \]

Core Properties of a PMF

  • \(0 \le P(X = x) \le 1\)
  • Probabilities are assigned to exact values
  • The total probability sums to 1:

\[ \sum_x P(X = x) = 1 \]

Discrete vs Continuous

Aspect Discrete Continuous
Possible values Countable Infinite
Probability at a point Meaningful Always 0
Mathematical tool Sum Integral
Function PMF PDF
WarningDo Not Confuse Them

It is critical to clearly distinguish between PMF and PDF.

  • PMF (Probability Mass Function):
    Used for discrete random variables. It assigns probability to exact values: \[ P(X = x) \]

  • PDF (Probability Density Function)
    Used for continuous random variables. It does not give probabilities at exact points: \[ P(X = x) = 0 \]

Key differences:

  • PMF values are actual probabilities
  • PDF values are densities, not probabilities
  • PMFs sum to 1
  • PDFs integrate to 1

Using a PDF when the data is discrete (or vice versa) leads to incorrect interpretations and wrong conclusions.

Why Descrete Destributions are Essential?

Discrete distributions are essential when modeling:

  • Decisions (yes / no)
  • Event counts
  • Conversion funnels
  • Operational metrics
  • Risk events

They allow analysts to:

  • Quantify uncertainty in decisions
  • Estimate probabilities of rare events
  • Optimize policies and thresholds

Poisson Distribution

The Poisson Distribution models a very common business question:

How many times does an event occur within a fixed time (or space) interval?

Examples of such intervals:

  • per hour
  • per day
  • per kilometer
  • per page view session

A random variable \(X\) follows a Poisson distribution if:

\[ X \sim \text{Poisson}(\lambda) \]

where:

\(\lambda > 0\) is the average number of events per interval

The parameter \(\lambda\) captures the typical intensity of events.

It answers:

“On average, how many events should I expect in this interval?”

Importantly:

  • Events are independent
  • The average rate is stable
  • Events occur randomly, not in bursts or schedules

Real-World Narrative

Imagine a retail store observing customer foot traffic.

Every 10 minutes, a random number of customers enter the store.

Over many days, management notices:

  • Some intervals have 1–2 customers
  • Some have 5–6 customers
  • Occasionally, none

Yet the average stays roughly the same.

Typical Poisson use cases in retail and digital analytics:

  • number of customers entering per hour
  • number of purchases per minute in an online shop
  • number of returns processed in each 30-minute window
  • number of carts abandoned per hour

If these events:

  • happen independently
  • occur at a stable average rate

…then the count of events per interval follows a Poisson distribution.

Probability Mass Function

The PMF of a Poisson random variable is:

\[ P(X = x) = \frac{\lambda^x e^{-\lambda}}{x!}, \quad x = 0,1,2,\dots \]

  • \(x\) is an exact count
  • This formula gives the probability of seeing exactly \(x\) events

Expected Value and Variance

For a Poisson distribution both Variance and Expected Value

\[ E[X] = \lambda \]

\[ Var(X) = \lambda \]

  • The average number of events equals \(\lambda\)
  • The uncertainty (variance) grows at the same rate

This makes Poisson models easy to interpret and diagnose.

Business Interpretation of \(\lambda\)

  • Large \(\lambda\) → frequent events
  • Small \(\lambda\) → rare events

Concrete examples:

  • \(\lambda = 2\) → about 2 calls per hour
  • \(\lambda = 15\) → about 15 website visits per minute
  • \(\lambda = 0.2\) → a rare failure, once every 5 intervals

Business Applications of Poisson Distribution

Poisson models are widely used in practice for:

  • customer arrivals and foot traffic
  • call center volume estimation
  • website click and impression counts
  • manufacturing defects
  • incident and outage reporting

These models help translate random counts into actionable forecasts.

Spreadsheet Demonstration

Assume \(\lambda\) is stored in cell B1.

  • PMF for \(x\): =POISSON.DIST(x, B1, FALSE)
  • CDF (at most \(x\) events): =POISSON.DIST(x, B1, TRUE)

Typical spreadsheet questions:

  • “What is the probability of more than 10 calls this hour?”
  • “What is the chance we see zero purchases in 5 minutes?”

These are answered directly using the Poisson CDF.

Tip

Reference for POISSON.DIST in Distributions Spreadsheets:

Relationship to Continuous Models

  • Exponential distribution models time between events
  • Poisson distribution models number of events

They describe the same process from different perspectives.

Visualization

  • Left plot → PMF for \(\lambda = 3\) and \(\lambda = 10\)
  • Right plot → likelihood as a function of \(\lambda\)

Interpretation

  • Higher \(\lambda\) means more customers entering per time interval.
  • Lower \(\lambda\) means slower foot traffic.

In our observed retail data:

  • Total customers: \(S = 23\)
  • Number of intervals: \(n = 6\)
  • MLE: \(\hat{\lambda} = 23/6 = 3.83\) customers per interval

Meaning

  • on average, about 3.8 customers arrive every 10 minutes,
  • which translates to roughly 23 customers per hour.

This insight directly supports:

  • staff scheduling
  • checkout lane allocation
  • peak-hour planning
  • demand forecasting
  • capacity optimization

Poisson models turn randomness into operational clarity.

Case Study

In practice, “fitting a Poisson model” means:

  • confirming the data matches the Poisson story
  • estimating \(\lambda\)
  • checking whether the fitted model matches the observed counts

Business Context

A retail store records the number of customers entering the store every 10 minutes during a weekday afternoon (14:00–16:00).

  • Time window length is fixed
  • Events are counts
  • Goal: model arrivals and estimate \(\lambda\)

Observed Data (Raw Counts)

Interval Time Window Customers
1 14:00–14:10 4
2 14:10–14:20 3
3 14:20–14:30 5
4 14:30–14:40 2
5 14:40–14:50 6
6 14:50–15:00 4
7 15:00–15:10 3
8 15:10–15:20 5
9 15:20–15:30 4
10 15:30–15:40 6
11 15:40–15:50 2
12 15:50–16:00 4

Summary Statistics

Let \(X\) = number of customers per 10-minute interval.

  • Number of intervals: \(n = 12\)
  • Total customers: \(S = 48\)
  • Sample mean:
    \[ \bar{x} = \frac{48}{12} = 4 \]

This strongly suggests a candidate model:

\[ X \sim \text{Poisson}(\lambda = 4) \]

Define the Random Variable

Let:

\[ X = \text{number of customers entering the store in a 10-minute interval} \]

Key properties:

  • \(X\) is a count
  • \(X \in \{0,1,2,\dots\}\)
  • Interval length is fixed

This definition already aligns with the Poisson framework.

Before computing anything, we check the story, not formulas.

  • Fixed interval length → yes (10 minutes)
  • Counting events → yes (customers)
  • Independence → reasonable in short windows
  • Stable average rate → plausible for this time range

This tells us:

A Poisson model is a reasonable first approximation.

Bernoulli Distribution

The Bernoulli Distribution models the most fundamental probabilistic question in data analytics and business decision-making:

Did an event happen or not?

This is the simplest form of uncertainty we encounter in practice.

There are only two possible outcomes for a Bernoulli random variable:

  • success
  • failure

These outcomes can be represented in many equivalent ways depending on the business context:

  • yes / no
  • 1 / 0
  • click / no click
  • purchase / no purchase
  • tail / head (ղուշ / գիր)

A random variable \(X\) follows a Bernoulli distribution if:

\[ X \sim \text{Bernoulli}(p) \]

where:

  • \(p \in [0,1]\) is the probability of success

Unlike the Poisson distribution, which models how many times an event occurs over an interval, the Bernoulli distribution focuses on a single decision, attempt, or event.

It answers the question:

“Did the event occur this time?”

What Is a Bernoulli Trial

A Bernoulli trial is a single experiment characterized by:

  • exactly one attempt
  • exactly two possible outcomes
  • a fixed probability of success

Each trial is assumed to be independent of the others.

This structure appears constantly in real-world data.

Examples of Bernoulli trials include:

  • Did a user click an advertisement?
  • Did a customer complete a purchase?
  • Did a transaction fail?
  • Did a device respond to a health check?

Each experiment produces a single binary outcome.

Real-World Narrative

Consider an online store tracking customer conversions.

For each user session, the outcome is recorded as:

  • Purchase → 1
  • No purchase → 0

Across thousands of sessions:

  • Some users convert
  • Most users do not

However, each session is evaluated independently, with the same underlying probability of conversion.

This is precisely the setting modeled by a Bernoulli distribution.

Typical Bernoulli use cases in business analytics include:

  • ad click behavior (clicked / not clicked)
  • conversion events (purchase / no purchase)
  • email engagement (opened / ignored)
  • fraud detection flags (fraud / legitimate)
  • churn indicators (churned / stayed)

Probability Mass Function (PMF)

Because the Bernoulli distribution is discrete, it is described using a Probability Mass Function (PMF).

The PMF of a Bernoulli random variable is:

\[ P(X = x) = p^x (1-p)^{1-x}, \quad x \in \{0,1\} \]

This formula yields two probabilities:

  • \(P(X = 1) = p\)
  • \(P(X = 0) = 1 - p\)

These are the only possible outcomes, and their probabilities always sum to 1.

WarningPMF vs PDF
  • Bernoulli is a discrete distribution, so we use a PMF
  • Continuous distributions (Normal, Exponential) use a PDF
  • PMFs assign probability to exact outcomes
  • PDFs describe density, not probability at a single point

Never mix these concepts.

Expected Value and Variance

For a Bernoulli random variable:

\[ E[X] = p \]

\[ Var(X) = p(1-p) \]

Interpretation:

  • The expected value equals the probability of success
  • Variance is largest when \(p = 0.5\), reflecting maximum uncertainty
  • Variance decreases as outcomes become more predictable

Business Interpretation of \(p\)

The parameter \(p\) is often one of the most important quantities in analytics.

It represents a rate, probability, or KPI.

  • Large \(p\) → success is likely
  • Small \(p\) → success is rare

Concrete examples:

  • \(p = 0.02\) → 2% conversion rate
  • \(p = 0.35\) → 35% email open rate
  • \(p = 0.90\) → highly reliable system

Bernoulli models appear everywhere in practice:

  • conversion modeling
  • churn indicators
  • A/B testing outcomes
  • fraud detection flags
  • quality control pass/fail checks

They form the building block for more advanced statistical models.

Spreadsheet Demonstration

Assume the probability \(p\) is stored in cell B1.

  • Probability of success: =B1
  • Probability of failure: =1-B1
  • Simulate a Bernoulli outcome: =IF(RAND()<B1,1,0)
  • Estimate \(p\) from observed data: =AVERAGE(range)
  • Variance: =B1*(1-B1)

Typical spreadsheet questions include:

  • “What fraction of users converted?”
  • “How volatile is this conversion rate?”

Case Study

Business Context

An e-commerce company records whether each visitor completes a purchase during a session.

Each session is encoded as:

  • 1 → purchase
  • 0 → no purchase

Observed Data

{.smaller}

Session Purchase
1 0
2 1
3 0
4 0
5 1
6 0
7 1
8 0
9 0
10 1

Summary Statistics

Let \(X\) denote the purchase indicator per session.

  • Number of sessions: \(n = 10\)
  • Total purchases: \(S = 4\)

The sample mean is:

\[ \bar{x} = \frac{4}{10} = 0.4 \]

This provides a natural estimate of the probability of purchase:

\[ \hat{p} = 0.4 \]

Defining the Random Variable

The Bernoulli random variable can be written explicitly as:

\[ X = \begin{cases} 1 & \text{if a purchase occurs} \\ 0 & \text{otherwise} \end{cases} \]

Key properties:

  • binary outcome
  • independent trials
  • fixed probability per session

Before applying any formulas, we validate the modeling story:

  • single decision → yes
  • two outcomes → yes
  • stable probability → reasonable

Conclusion:

A Bernoulli model is an appropriate first representation of this process.

Bernoulli distributions transform binary behavior into measurable probabilities, forming the foundation of modern data analytics and statistical modeling.

Tip

Reference to the Distributions Spreadsheets:

Final Intuition Summary

  • Use Uniform for “random within limits.”
  • Use Exponential for “time until next event.”
  • Use Poisson for “number of events per interval.”
  • Use Bernoulli for “one yes/no outcome”.

Together, these four distributions cover the majority of basic retail/business modeling scenarios:

  • foot traffic analysis,
  • checkout optimization,
  • conversion measurement,
  • forecasting purchase behavior
  • and customer journey analytics.

Central Limit Theorem (CLT)

The normal distribution is kind of magical in that we see it a lot in nature. But there’s a reason for that, and that reason makes it super useful for statistics as well. The Central Limit Theorem is the basis for a lot of statistics and the good news is that it is a pretty simple concept. That magic is called The Central Limit Theorem (CLT).

“Even if you’re not normal, the average is normal”

Big Idea

The Central Limit Theorem states that when we take many independent samples from a population and compute the average of each sample, the distribution of those averages:

  • approaches a Normal distribution
  • as the sample size increases
  • regardless of the shape of the original distribution
    (provided the variance is finite)

In symbols, if:

\[ X_1, X_2, ..., X_n \] are independent with mean \(\mu\) and variance \(\sigma^2\), then the sample mean: \[ \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i \] has an approximate Normal distribution for large \(n\): \[ \bar{X} \approx \mathcal{N}\!\left(\mu,\,\frac{\sigma^2}{n}\right) \]


Why This Matters

In analytics, we rarely care about one single observation;

Instead we often care about:

  • sample means
  • totals
  • proportions
  • differences of means

Even if the raw data come from a non-Normal distribution, the CLT tells us the distribution of these statistics will be approximately Normal when the sample size is sufficiently large.

This underpins many common tools, including:

  • confidence intervals
  • hypothesis tests
  • regressions
  • A/B testing
  • control charts

Intuition with an Example

Imagine you repeatedly sample from a skewed distribution — say, an exponential distribution (like waiting times).
Even though individual values are skewed, the distribution of average values will look more symmetric and bell-shaped when you increase the sample size.

This “averaging effect” is why Normal distributions are so prevalent in real data.


Simulation of CLT


What the Simulation Shows

  • For small sample sizes, the distribution of sample means is skewed.
  • As sample size increases:
    • the distribution becomes more symmetric
    • it takes on the bell-shaped form
    • it becomes closer to Normal

This happens even though the original population was not Normal.


Conditions for the CLT to Work

The CLT applies when:

  • samples are independent
  • the sample size is large enough
  • the population has finite variance

In practice, a sample size of 30 or more is often enough for many distributions.


Practical Importance

The CLT justifies:

  • using Normal distribution tools for inference
  • approximating distributions of sums/means
  • constructing confidence intervals
  • performing hypothesis tests

Even for non-Normal raw data, aggregating information leads to Normal behavior.


Summary

  • The Central Limit Theorem explains the widespread appearance of Normal distributions.
  • It applies to sample means (and sums) of independent observations.
  • It works even when individual data are not Normal.
  • It is a cornerstone of statistical inference.

“Even if you’re not normal, the average is normal.”