1. Collecting Data

1.1 Experimental Design

1.1.1 Completely Randomized Design

1.1.2 Randomized Block & Matched Pairs Design

1.1.3 Introduction to Experiments

1.1.4 Well-Designed Experiments

1.1.5 Control Groups, Placebos & Blind Experiments

1.2 Sampling Methods & Bias

1.2.1 Introduction to Sampling

1.2.2 Simple Random Sampling (SRS)

1.2.3 Random Sampling Methods

1.2.4 Types of Bias

1.2.5 Non-random (Biased) Sampling Methods

2. Inference

2.1 Inference for Regression Slopes

2.1.1 Sampling Distributions for Sample Slopes

2.1.2 Hypothesis Tests for Slopes of Regression Lines

2.1.3 Confidence Intervals for Slopes of Regression Lines

2.2 Errors in Hypothesis Tests

2.2.1 Type I & Type II Errors

2.2.2 Probabilities of Errors

2.2.3 Power of a Test

2.3 Introduction to Inference

2.3.1 Tails on a Normal Distribution

2.3.2 Introduction to Hypothesis Testing

2.3.3 Introduction to Confidence Intervals

2.4 Inference for Proportions

2.4.1 Hypothesis Tests for Population Proportions

2.4.2 Confidence Intervals for Population Proportions

2.4.3 Hypothesis Tests for Differences in Population Proportions

2.4.4 Confidence Intervals for Differences in Population Proportions

2.5 Inference for Means

2.5.1 The t-distribution

2.5.2 Hypothesis Tests for Population Means

2.5.3 Confidence Intervals for Population Means

2.5.4 Hypothesis Tests for Differences in Population Means

2.5.5 Confidence Intervals for Differences in Population Means

2.5.6 t-scores versus z-scores

2.5.7 Hypothesis Tests for Differences in Matched Pairs

2.5.8 Confidence Intervals for Differences in Matched Pairs

2.6 Goodness of Fit (Chi-Square)

2.6.1 The Chi-Square Distribution

2.6.2 Hypothesis Tests for Goodness of Fit

2.7 Independence & Homogeneity (Chi-Square)

2.7.1 Tests for Independence

2.7.2 Tests for Homogeneity

3. Probability, Random Variables and Probability Distributions

3.1 Probability

3.1.1 Estimating Probability using Relative Frequency

3.1.2 Probabilities of Single Events

3.1.3 Introduction to Combined Events

3.1.4 Addition Rule & Mutually Exclusive Events

3.1.5 Conditional Probability

3.1.6 Multiplication Rule & Independent Events

3.1.7 Probabilities of Combined Events using Tree Diagrams

3.1.8 Probabilities of Combined Events using the Rules

3.2 Discrete Random Variables

3.2.1 Probability Distributions for Discrete Random Variables

3.2.2 Cumulative Probability Distributions for Discrete Random Variables

3.2.3 Mean & Standard Deviation of a Discrete Random Variable

3.2.4 Linear Transformations of Random Variables

3.2.5 Linear Combinations of Random Variables

3.3 Binomial & Geometric Distributions

3.3.1 Introduction to Binomial Distributions

3.3.2 Probabilities for Binomial Distributions

3.3.3 Introduction to Geometric Distributions

3.3.4 Probabilities for Geometric Distributions

4. Exploring One-Variable Data

4.1 Summary Statistics

4.1.1 Describing Variables

4.1.2 Parameters & Statistics

4.1.3 Measures of Center

4.1.4 Measures of Position

4.1.5 Measures of Variability

4.1.6 Tables & Relative Frequency

4.1.7 Grouped Data

4.1.8 Outliers & Resistant Measures

4.1.9 Five-Number Summary & Boxplots

4.1.10 Skewness of Data

4.1.11 Comparing Data using Summary Statistics

4.2 Graphical Representations

4.2.1 Shape of Distributions

4.2.2 Bar Charts & Histograms

4.2.3 Dotplots & Stemplots

4.2.4 Cumulative Graphs

4.2.5 Comparing Univariate Graphs

4.3 Normal Distribution

4.3.1 Properties of Normal Distributions

4.3.2 Standardized z-scores

4.3.3 Comparing Normal Distributions

4.3.4 Finding Proportions from Normal Distributions

4.3.5 Inverse Normal Calculations

4.3.6 Estimating Parameters of Normal Distributions

5. Sampling Distributions

5.1 Sampling Distributions

5.1.1 Introduction to Sampling Distributions

5.1.2 Sampling Distributions for Sample Means

5.1.3 The Central Limit Theorem

5.1.4 Sampling Distributions for Differences in Sample Means

5.1.5 Sampling Distributions for Sample Proportions

5.1.6 Sampling Distributions for Differences in Sample Proportions

5.1.7 Biased & Unbiased Estimators

6. Exploring Two-Variable Data

6.1 Tables & Graphs

6.1.1 Two-Way Tables & Relative Frequencies

6.1.2 Bar Graphs & Mosaic Plots

6.2 Scatterplots & Regression

6.2.1 Two-Way Tables & Relative Frequencies

6.2.2 Bar Graphs & Mosaic Plots

6.2.3 Explanatory & Response Variables

6.2.4 Scatterplots

6.2.5 Association & Correlation Coefficients

6.2.6 Interpolation & Extrapolation using Linear Models

6.2.7 Residuals

6.2.8 The Least-Squares Regression Line

6.2.9 Residual Plots

6.2.10 The Coefficient of Determination

6.2.11 Outliers, High-Leverage & Influential Points

6.2.12 Linearization of Bivariate Data

The t-distribution

Topic 2/3

Revision Notes
Flashcards
Past Paper Analysis
Questions
Videos

Your Flashcards are Ready!

15 Flashcards in this deck.

The t-distribution

Introduction

The t-distribution is a fundamental concept in inferential statistics, particularly within the framework of the Collegeboard AP Statistics curriculum. It is essential for conducting hypothesis tests and constructing confidence intervals when dealing with small sample sizes or unknown population variances. Understanding the t-distribution allows students to make accurate inferences about population parameters, thereby enhancing their statistical analysis skills.

Key Concepts

Description and Definition

The t-distribution, also known as Student's t-distribution, is a probability distribution that arises when estimating the mean of a normally distributed population in situations where the sample size is small, and the population standard deviation is unknown. Unlike the normal distribution, the t-distribution accounts for additional uncertainty by having heavier tails, which provides a better fit for small sample sizes.

Properties of the t-distribution

The t-distribution shares several properties with the standard normal distribution (Z-distribution), such as being symmetric and bell-shaped. However, it has heavier tails, meaning it is more prone to producing values that fall far from its mean. The key properties include:

Symmetry around zero.
Heavier tails compared to the normal distribution.
Mean equals 0.
Variance is greater than 1, depending on degrees of freedom (df).
As the sample size increases, the t-distribution approaches the normal distribution.

Derivation and Theoretical Foundation

The t-distribution is derived from the ratio of the sample mean's deviation from the population mean to the sample standard deviation, scaled by the square root of the sample size. Mathematically, it is expressed as:

$$ t = \frac{\bar{x} - \mu}{s / \sqrt{n}} $$

Here, t is the t-statistic, 𝑥̄ is the sample mean, μ is the population mean, s is the sample standard deviation, and n is the sample size. This formulation accounts for the uncertainty in estimating the population standard deviation from a small sample.

Calculating the t-Statistic

The t-statistic is calculated using the following formula:

$$ t = \frac{\bar{x} - \mu}{s / \sqrt{n}} $$

Where:

𝑥̄: Sample mean
μ: Hypothesized population mean
s: Sample standard deviation
n: Sample size

This statistic measures how many standard errors the sample mean is away from the hypothesized population mean. A larger absolute value of the t-statistic indicates a greater deviation from the hypothesized mean.

Degrees of Freedom

Degrees of freedom (df) in the context of the t-distribution refer to the number of independent values that can vary in the calculation of a statistic. For the t-distribution used in estimating a population mean, degrees of freedom are calculated as:

$$ \text{df} = n - 1 $$

Where n is the sample size. Degrees of freedom affect the shape of the t-distribution; as df increases, the distribution becomes closer to the standard normal distribution.

Using the t-distribution for Confidence Intervals

The t-distribution is used to construct confidence intervals for a population mean when the population standard deviation is unknown and the sample size is small. The general formula for a 100(1-α)% confidence interval is:

$$ \bar{x} \pm t_{\alpha/2, \text{df}} \left( \frac{s}{\sqrt{n}} \right) $$

Where:

𝑥̄: Sample mean
t_{α/2, df}: t-score from the t-table corresponding to the desired confidence level and degrees of freedom
s: Sample standard deviation
n: Sample size

This interval estimates the range within which the true population mean is likely to fall with a specified level of confidence.

Hypothesis Testing with the t-distribution

The t-distribution is integral to hypothesis testing concerning population means, especially when the sample size is small and the population standard deviation is unknown. The steps involved in conducting a t-test include:

State the null hypothesis (H₀) and the alternative hypothesis (H₁).
Choose the significance level (α).
Calculate the t-statistic using the sample data.
Determine the critical t-value(s) from the t-table based on df and α.
Compare the calculated t-statistic to the critical value(s) to decide whether to reject H₀.

The decision hinges on whether the t-statistic falls in the critical region defined by the t-distribution for the given degrees of freedom.

Assumptions and Limitations

When using the t-distribution, several key assumptions must be met to ensure the validity of the results:

The data should be sampled from a population that follows a normal distribution, especially important for small sample sizes.
The sample observations must be independent.
The population variance is unknown and must be estimated from the sample.

Limitations of the t-distribution include decreased accuracy with highly non-normal data and larger deviations when sample sizes are extremely small.

Examples and Applications

Consider a scenario where a teacher wants to estimate the average score of a standardized test for her class. If she takes a sample of 10 students and calculates the sample mean and standard deviation, she can use the t-distribution to construct a confidence interval for the true average score. Alternatively, if she hypothesizes that the mean score is 75, she can perform a t-test to determine whether there is statistically significant evidence to reject this hypothesis based on her sample data.

Applications of the t-distribution extend beyond education to fields such as psychology, medicine, and business, where small sample studies are common and population parameters are often unknown. For instance, medical researchers may use the t-distribution to assess the efficacy of a new drug based on a limited number of trials, ensuring that their conclusions account for sample variability.

Comparison Table

Aspect	t-Distribution	Normal Distribution
Definition	A probability distribution used when estimating a population mean with small sample sizes and unknown population variance.	A continuous probability distribution characterized by its bell-shaped symmetric curve, used when population variance is known or sample size is large.
Shape	Heavier tails, which provide more flexibility for small sample sizes.	Standard bell-shaped curve with lighter tails.
Degrees of Freedom	Dependent on sample size, calculated as df = n - 1.	Not applicable; the normal distribution is parameterized by mean and variance.
Applications	Confidence intervals and hypothesis testing for means with small samples.	General statistical analyses, especially with large sample sizes.
Pros	Accounts for extra variability in small samples, providing more accurate estimates.	Simplicity and well-understood properties, suitable for large samples.
Cons	Less accurate with very small degrees of freedom; relies on normality assumption.	Requires large sample sizes or known population variance for accurate use.

Summary and Key Takeaways

The t-distribution is essential for making inferences about population means, especially with small sample sizes.
It accounts for additional uncertainty by having heavier tails compared to the normal distribution.
Degrees of freedom play a crucial role in defining the shape of the t-distribution.
The t-distribution is utilized in constructing confidence intervals and conducting hypothesis tests when the population variance is unknown.
Understanding the assumptions and limitations of the t-distribution ensures accurate and reliable statistical analysis.

Examiner Tip

Tips

To excel in AP Statistics, remember the acronym "SDF" to decide when to use the t-distribution: Small sample size, Degrees of freedom accounted for, and unknown population variance. Additionally, practice interpreting t-tables efficiently and always double-check your degrees of freedom calculation. Creating flashcards for t-formulas and common scenarios can also aid in retaining key concepts.

Did You Know

The t-distribution was first introduced by William Sealy Gosset, who published under the pseudonym "Student" to maintain confidentiality while working at Guinness Brewery. Additionally, the t-distribution is not only pivotal in statistics but also plays a significant role in various real-world applications, such as quality control in manufacturing and risk assessment in finance, where small sample sizes are common.

Common Mistakes

Students often confuse the t-distribution with the normal distribution, especially when deciding which to use for hypothesis testing. For example, using a Z-test instead of a t-test with a small sample size can lead to inaccurate results. Another common mistake is miscalculating degrees of freedom, such as forgetting to subtract one (df = n - 1), which affects the critical t-values and the resulting confidence intervals or hypothesis tests.

FAQ

When should I use the t-distribution instead of the normal distribution?

Use the t-distribution when dealing with small sample sizes (typically n < 30) and when the population standard deviation is unknown.

How do degrees of freedom affect the t-distribution?

Degrees of freedom determine the shape of the t-distribution. With higher degrees of freedom, the t-distribution approaches the normal distribution.

Can the t-distribution be used for proportions?

No, the t-distribution is specifically used for estimating means. For proportions, the normal distribution or other methods are typically used.

What happens to the t-distribution as the sample size increases?

As the sample size increases, the t-distribution becomes more similar to the normal distribution, reducing the impact of heavier tails.

What are the key assumptions of using the t-distribution?

The data should come from a normally distributed population, the samples must be independent, and the population variance should be unknown and estimated from the sample.

How is the t-statistic interpreted in hypothesis testing?

The t-statistic indicates how many standard errors the sample mean is away from the hypothesized population mean. A larger absolute t-value suggests stronger evidence against the null hypothesis.