
Statistical Thinking
2026-04-01
We Data Analyists/Statisticians like to make an assumptions about the value of a population parameter, and then:
A recent Wall Street Journal article titled “Does the Internet Make You Smarter or Dumber?” posed the possibility that online activities turn us into shallow thinkers.
The article cited a statistic claiming that the average time an American spends looking at a Web page is 56 seconds.
A researcher at a local university would like to test this claim using a hypothesis test.
The null hypothesis \(H_0\), represents the status quo and involves stating the belief that the population parameter is \(\le,=, \ge\) a specific value.
The null hypothesis is believed to be true unless there is overwhelming evidence to the contrary.
The alternative hypothesis, \(H_1\), represents the opposite of the null hypothesis and is believed to be true if the null hypothesis is found to be false.
The alternative hypothesis always states that the population parameter is \(\gt,\ne, \lt\) a specific value.
You need to be careful how you state the null and alternative hypotheses.
We can either reject it or not reject it (fail to reject it)
The court system assumes a person is innocent until proven guily, the hypothesis test is formulated as follows:
The court system might have two conclusions:
In this example the Internet users spend an average timeof 56 seconds on a Web page.
Status Quo
\[ H_0: \mu = 56 \text{ seconds (status quo)} \]
Alternative:
\[ H_1: \mu \ne 56 \text{ seconds} \]
| Two-Tailed Test | Left-Tailed Test | Right-Tailed Test | |
|---|---|---|---|
| Null | \[H_0: \mu = 56\] | \[H_0: \mu \ge 56\] | \[H_0: \mu \le 56\] |
| Alternative | \[H_1: \mu \ne 56\] | \[H_1: \mu < 56\] | \[H_1: \mu > 56\] |
The level of significance represents the probability of making a Type I error. A Type I error occurs when we reject the null hypothesis but it is actually true.
The common Value: \(\alpha = 0.05\)
In the scope the above example
Type I Errorwe conclude that the true average time spent on a webpage is not 56 seconds, even though in reality it is 56 seconds.
56 seconds to fall outside the acceptance region.5% level.56 seconds on average.| Alpha (α) | Tail | Critical z-Score | Critical t-Score (df = 20) |
|---|---|---|---|
| 0.01 | One | 2.33 | 2.528 |
| 0.01 | Two | 2.575 | 2.845 |
| 0.02 | One | 2.05 | 2.312 |
| 0.02 | Two | 2.33 | 2.528 |
| 0.05 | One | 1.645 | 1.725 |
| 0.05 | Two | 1.96 | 2.086 |
| 0.10 | One | 1.28 | 1.325 |
| 0.10 | Two | 1.645 | 1.725 |
When Variance is known
\[ z_{\bar{x}} = \frac{\bar{x} - \mu_{H_0}}{\sigma / \sqrt{n}} \]
\[ z_{\bar{x}} = \frac{\bar{x} - \mu_{H_0}}{\sigma / \sqrt{n}} = \frac{62 - 56}{18 / \sqrt{45}} = \frac{6}{2.683} = 2.24 \]
A test statistic of 2.24 lies well into the rejection region (far from 0).
Interpretation:
There is statistically significant evidence that the true average time spent on a webpage is different from 56 seconds.

| Test Type | Hypotheses | Condition | Conclusion |
|---|---|---|---|
| Two-tail | \(H_0: \mu = \mu_0\) | \(\lvert z_x \rvert > z_{\alpha/2}\) | Reject \(H_0\) |
| \(H_1: \mu \ne \mu_0\) | \(\lvert z_x \rvert \le z_{\alpha/2}\) | Do not reject \(H_0\) | |
| One-tail | \(H_0: \mu \le \mu_0\) | \(z_x > z_\alpha\) | Reject \(H_0\) |
| \(H_1: \mu > \mu_0\) | \(z_x \le z_\alpha\) | Do not reject \(H_0\) | |
| One-tail | \(H_0: \mu \ge \mu_0\) | \(z_x < -z_\alpha\) | Reject \(H_0\) |
| \(H_1: \mu < \mu_0\) | \(z_x \ge -z_\alpha\) | Do not reject \(H_0\) |
It is more convenient to use p-value apporach, as it helps us to rememeber the decission rule easily.
If the p-value is less than \(\alpha\), there is little chance of observing the sample mean from the population on which it is based if the null hypothesis were actually true. We therefore reject the null hypothesis under this condition.
| Condition | Conclusion |
|---|---|
| \(p\text{-value} \ge \alpha\) | Do not reject \(H_0\) |
| \(p\text{-value} < \alpha\) | Reject \(H_0\) |

Type II error, which occurs when the null hypothesis is really false and we fail to reject it. The probability of a Type II error is known as \(\beta\).
Urban mobility reports often claim that the average weekday driving speed in Yerevan is 15 km/h. A transportation analyst wants to test this claim using recent GPS data from ride-sharing vehicles.
This example is ideal for explaining all three hypothesis-test formulations:
Since the population standard deviation is unknown, a one-sample t-test is appropriate to use.
The test statistic is:
\[ t_{\bar{x}} = \frac{\bar{x} - \mu_{H_0}}{s/\sqrt{n}} \]

\[\downarrow\]
\(|7.44| > 1.96 \rightarrow\) \(p < 0.05\)
We reject the null hypothesis. There is very strong evidence that the true average weekday driving speed in Yerevan is not equal to 15 km/h.
We want to test whether the true average weekday driving speed in Yerevan is greater than
the stated value of 15 km/h.
\[H_0: \mu \le 15\]
\[H_1: \mu > 15\]

\[\downarrow\]
Since \(7.44 > 1.645\) and \(p < 0.05\), we reject \(H_0\).
Conclusion: There is extremely strong evidence that drivers in Yerevan drive faster than 15 km/h on average.
We want to test whether the true average weekday driving speed in Yerevan is less than
the stated value of 15 km/h.
\[H_0: \mu \ge 15\]
\[H_1: \mu < 15\]

\[\downarrow\]
Since \(7.44\) is not less than \(-1.645\) and \(p > 0.05\), we do NOT reject \(H_0\).
Conclusion: There is no evidence that drivers in Yerevan are slower than 15 km/h. In fact, the sample strongly indicates the opposite.
.
A telecom company is testing two versions of its mobile self-care app:
The team wants to know whether average daily user engagement (minutes/day) differs between the two versions.
Does Variant B change average daily engagement?
Since population SD is unknown for both groups, we use a two-sample Welch t-test.
\[H_0: \mu_A = \mu_B\]
\[H_1: \mu_A \ne \mu_B\]

\[\downarrow\]
Because \(|t| > 1.96\) and \(p < 0.05\), we reject \(H_0\).
Version B produces a statistically significant difference in average daily engagement compared to Version A.
Does Variant B increase engagement?
Now the team wants a directional test:
Does Version B strictly increase user engagement?

\[\downarrow\]
We fail to reject \(H_0\).
There is no evidence that Version B increases user engagement.
Does Variant B reduce app load time?
Load time is a negative metric: smaller = better.
\[H_0: \mu_B \ge \mu_A\]
\[H_1: \mu_B < \mu_A\]

\[\downarrow\]
We reject \(H_0\).
There is strong evidence that Version B reduces app load time.
| Test Type | Scenario | Hypotheses | Test Statistic | Critical Value(s) | p-Value | Decision | Interpretation |
|---|---|---|---|---|---|---|---|
| Two-Tailed | Difference between App A and App B | \(H_0:\mu_A=\mu_B\) \(H_1:\mu_A\ne\mu_B\) |
\(t=-2.93\) | \(\pm 1.96\) | \(p=0.003\) | Reject | Engagement is significantly different between A and B. |
| Right-Tailed | Does B increase engagement vs A? | \(H_0:\mu_B\le\mu_A\) \(H_1:\mu_B>\mu_A\) |
\(t=-2.93\) | \(t_{0.95}=1.645\) | \(p=0.99\) | Fail to reject | No evidence that Version B increases engagement. |
| Left-Tailed | Does B reduce app load time vs A? | \(H_0:\mu_B\ge\mu_A\) \(H_1:\mu_B<\mu_A\) |
\(t=-4.37\) | \(t_{0.05}=-1.645\) | \(p\approx 0.0000\) | Reject | Version B significantly reduces app load time. |