Basic Statistics¶
Histogram¶
A histogram is a graphical representation of data that shows how values are distributed across ranges (bins).
It helps to: - Visualize the shape of the data - Identify skewness and spread - Detect outliers or unusual patterns
Histograms are commonly used as a first step in data exploration.
Data distributions¶
To approximate and understand real-world data, we often assume that data follows a distribution.
Commonly used distributions include: - Normal distribution - Exponential distribution
What is a distribution?¶
A distribution is a graphical or mathematical description of how measurements are spread across possible values.
It answers questions such as: - Which values occur most often? - How spread out is the data? - Is the data symmetric or skewed?
Normal distribution¶
The normal distribution (also called the Gaussian distribution) has a: - Bell-shaped curve - Symmetric form around the center
Key properties: - Most values cluster around the mean - Mean, median, and mode are equal - Extreme values are increasingly rare
This distribution is common in natural and measurement-based data.
Exponential distribution¶
The exponential distribution is commonly used when estimating: - Time between two events
Characteristics: - Values are always positive - Many small values, fewer large values - Right-skewed shape
Typical use cases: - Time until a system failure - Time between customer arrivals - Waiting times
Mean, median, and mode¶
These are measures of central tendency.
Mean¶
- The average of all values
- Calculated as the sum of values divided by the number of values
- Sensitive to outliers
Median¶
- The middle value when data is ordered
- Less sensitive to outliers
- Useful for skewed distributions
Mode¶
- The most frequently occurring value
- Can be one value, multiple values, or none
Summary¶
- Histograms visualize how data is distributed
- Distributions describe patterns in measurements
- Normal distribution is symmetric and bell-shaped
- Exponential distribution models time between events
- Mean, median, and mode summarize central tendency