Skip to content

Basic Statistics

Histogram

A histogram is a graphical representation of data that shows how values are distributed across ranges (bins).

It helps to: - Visualize the shape of the data - Identify skewness and spread - Detect outliers or unusual patterns

Histograms are commonly used as a first step in data exploration.


Data distributions

To approximate and understand real-world data, we often assume that data follows a distribution.

Commonly used distributions include: - Normal distribution - Exponential distribution


What is a distribution?

A distribution is a graphical or mathematical description of how measurements are spread across possible values.

It answers questions such as: - Which values occur most often? - How spread out is the data? - Is the data symmetric or skewed?


Normal distribution

The normal distribution (also called the Gaussian distribution) has a: - Bell-shaped curve - Symmetric form around the center

Key properties: - Most values cluster around the mean - Mean, median, and mode are equal - Extreme values are increasingly rare

This distribution is common in natural and measurement-based data.


Exponential distribution

The exponential distribution is commonly used when estimating: - Time between two events

Characteristics: - Values are always positive - Many small values, fewer large values - Right-skewed shape

Typical use cases: - Time until a system failure - Time between customer arrivals - Waiting times


Mean, median, and mode

These are measures of central tendency.

Mean

  • The average of all values
  • Calculated as the sum of values divided by the number of values
  • Sensitive to outliers

Median

  • The middle value when data is ordered
  • Less sensitive to outliers
  • Useful for skewed distributions

Mode

  • The most frequently occurring value
  • Can be one value, multiple values, or none

Summary

  • Histograms visualize how data is distributed
  • Distributions describe patterns in measurements
  • Normal distribution is symmetric and bell-shaped
  • Exponential distribution models time between events
  • Mean, median, and mode summarize central tendency