All Topics
statistics | collegeboard-ap
Responsive Image
Skewness of Data

Topic 2/3

left-arrow
left-arrow
archive-add download share

Skewness of Data

Introduction

Skewness is a fundamental concept in statistics that measures the asymmetry of a data distribution. Understanding skewness is crucial for accurately interpreting data sets and making informed decisions based on statistical analyses. In the context of Collegeboard AP Statistics, grasping skewness helps students analyze real-world data effectively and enhances their ability to perform comprehensive data summaries.

Key Concepts

Definition of Skewness

Skewness quantifies the degree of asymmetry in a probability distribution of a real-valued random variable about its mean. It indicates whether the data points are skewed to the left (negatively skewed) or to the right (positively skewed) of the distribution's central peak. A skewness value of zero suggests a perfectly symmetrical distribution.

Types of Skewness

There are three primary types of skewness:

  • Positive Skewness (Right Skewness): The tail on the right side of the distribution is longer or fatter than the left side. The mean is typically greater than the median.
  • Negative Skewness (Left Skewness): The tail on the left side of the distribution is longer or fatter than the right side. The mean is typically less than the median.
  • Zero Skewness: The distribution is perfectly symmetrical, with no skewness. The mean equals the median.

Measuring Skewness

Skewness can be quantified using several methods, each providing a different perspective on data asymmetry. The most commonly used measure is the Pearson’s first and second coefficients of skewness, along with the standardized third moment.

Pearson’s First Coefficient of Skewness

Also known as the mode skewness, it is defined as:

$$ \text{Skewness} = \frac{\text{Mean} - \text{Mode}}{\text{Standard Deviation}} $$

This measure uses the mode as a central tendency metric, which can be less reliable due to the mode's susceptibility to anomalies in the data.

Pearson’s Second Coefficient of Skewness

Also known as the median skewness, it is given by:

$$ \text{Skewness} = \frac{3(\text{Mean} - \text{Median})}{\text{Standard Deviation}} $$

This coefficient utilizes the median, providing a more robust measure compared to the first coefficient.

Standardized Third Moment

The most widely accepted method, it is calculated as:

$$ \text{Skewness} = \frac{n}{(n-1)(n-2)} \sum \left( \frac{x_i - \bar{x}}{s} \right)^3 $$

Where:

  • n = sample size
  • xᵢ = each individual data point
  • bar{x} = sample mean
  • s = sample standard deviation

This formula standardizes skewness, allowing comparison across different data sets.

Interpreting Skewness

Understanding skewness is essential for interpreting the shape and nature of data distributions:

  • Positive Skewness: Indicates that the tail on the right side is longer. Common in income distributions where a small number of high incomes skew the data.
  • Negative Skewness: Indicates that the tail on the left side is longer. For example, age at retirement may exhibit negative skewness as most people retire around a similar age with fewer retiring early.
  • Zero Skewness: Suggests a symmetrical distribution, such as the normal distribution where mean and median coincide.

Implications of Skewness

Skewness affects various statistical analyses and interpretations:

  • Mean and Median Relationship: In positively skewed distributions, the mean is greater than the median. Conversely, in negatively skewed distributions, the mean is less than the median.
  • Data Transformation: Highly skewed data may require transformation (e.g., logarithmic) to meet the assumptions of parametric statistical tests.
  • Choice of Statistical Measures: Median may be a more appropriate measure of central tendency in skewed distributions as it is less affected by extreme values.

Applications of Skewness

Skewness is utilized across various fields to understand and interpret data:

  • Finance: Assessing the skewness of asset returns helps in understanding the risk of extreme losses or gains.
  • Disease Spread: Analyzing the skewness in the number of cases can inform public health strategies.
  • Quality Control: Monitoring the skewness of production processes ensures consistency and detects deviations.

Calculating Skewness: A Step-by-Step Example

Consider a data set representing the number of hours students study per week:

Data Set: 10, 12, 13, 15, 16, 18, 21, 24, 30
Mean ($\bar{x}$): 17.9
Median: 16
Standard Deviation ($s$): 6.8

Using Pearson’s Second Coefficient of Skewness:

$$ \text{Skewness} = \frac{3(17.9 - 16)}{6.8} \approx \frac{3(1.9)}{6.8} \approx \frac{5.7}{6.8} \approx 0.84 $$

An approximate skewness of 0.84 indicates a moderate positive skewness, suggesting a longer tail on the right side of the distribution.

Skewness vs. Kurtosis

While skewness measures the asymmetry of a distribution, kurtosis assesses the "tailedness" or the sharpness of the peak. Both are third- and fourth-moment measures, respectively, providing deeper insights into the shape of the data distribution.

Comparison Table

Aspect Skewness Kurtosis
Definition Measures the asymmetry of a distribution around its mean. Measures the "tailedness" or peak sharpness of a distribution.
Positive Value Indicates Right (positive) skew. Leptokurtic distribution with fatter tails.
Negative Value Indicates Left (negative) skew. Platykurtic distribution with thinner tails.
Zero Value Indicates Symmetrical distribution. Mesokurtic distribution similar to the normal distribution.
Applications Assessing balance in data, identifying outliers, and understanding distribution shape. Analyzing risk and outliers, understanding data distribution in finance and quality control.
Impact on Mean and Median In right skew, mean > median; in left skew, mean < median. Kurtosis does not directly affect mean and median relationships.

Summary and Key Takeaways

  • Skewness measures the asymmetry of data distributions.
  • Positive skew indicates a longer right tail; negative skew indicates a longer left tail.
  • Different methods exist for calculating skewness, with the standardized third moment being the most common.
  • Understanding skewness is essential for accurate data interpretation and statistical analysis.
  • Skewness differs from kurtosis, which measures the peakedness of a distribution.

Coming Soon!

coming soon
Examiner Tip
star

Tips

To remember skewness types, think "S for Skew" and "S for Side": Positive skew tails to the Right, Negative skew tails to the Left. For AP exam success, practice identifying skewness in various data sets and interpreting its implications. Use mnemonic devices like "Mean Moves with Skew" to recall that in skewed distributions, the mean shifts towards the tail. Additionally, always check for skewness before applying parametric tests to ensure accurate analysis.

Did You Know
star

Did You Know

Skewness isn't just a theoretical concept! In real estate, property prices often display positive skewness because while most homes are priced within a certain range, a few luxury homes can significantly increase the average price. Additionally, in natural phenomena like earthquake magnitudes, the data exhibits negative skewness, indicating that while minor earthquakes are common, major ones are rare.

Common Mistakes
star

Common Mistakes

One common mistake is confusing skewness with kurtosis, leading students to misinterpret data shapes. For example, assuming a high skewness value always indicates a problematic dataset. Another error is incorrectly calculating skewness by forgetting to standardize the third moment, which can result in misleading interpretations. Lastly, relying solely on the mean without considering skewness can distort the understanding of a data set's central tendency.

FAQ

What does a skewness value of zero indicate?
A skewness value of zero indicates a perfectly symmetrical distribution, where the mean and median are equal.
How does skewness affect the mean and median?
In a positively skewed distribution, the mean is typically greater than the median. In a negatively skewed distribution, the mean is usually less than the median.
Can skewness impact statistical tests?
Yes, skewness can impact the assumptions of parametric statistical tests, which often assume normality. Highly skewed data may require transformation or the use of non-parametric tests.
What is the difference between skewness and kurtosis?
Skewness measures the asymmetry of a distribution, while kurtosis assesses the "tailedness" or sharpness of the peak.
How can skewness be corrected in a data set?
Skewness can be corrected by applying data transformations such as logarithmic, square root, or Box-Cox transformations to reduce asymmetry.
Why is the standardized third moment preferred for measuring skewness?
The standardized third moment is preferred because it allows for consistent comparison across different data sets by standardizing skewness, making it dimensionless.
Download PDF
Get PDF
Download PDF
PDF
Share
Share
Explore
Explore