📖 WIPIVERSE

🔍 Currently registered entries: 121,673건

Histogram

A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative data) and was first introduced by Karl Pearson.

The data is divided into intervals, also known as bins or classes, and for each bin, the height of a bar represents the number of data points that fall into that bin. The bins are typically adjacent and non-overlapping. The y-axis represents the frequency (count) or relative frequency (proportion) of occurrences for each bin.

Histograms are used to visualize the underlying frequency distribution (shape) of a set of continuous or discrete data that are measured on an interval scale. This visualization allows for quick assessment of characteristics such as:

  • Shape: Histograms reveal whether the data is symmetric, skewed (left or right), unimodal, bimodal, or multimodal.
  • Central Tendency: They visually indicate the approximate location of the mean, median, and mode of the data.
  • Spread (Variability): The width of the histogram indicates the range of the data and the degree of variation.
  • Outliers: Extreme values that lie far from the main cluster of data points are easily identifiable.

The choice of bin width can significantly impact the appearance of the histogram. Too few bins can oversimplify the distribution, while too many bins can create a noisy representation that obscures the underlying pattern. Various rules exist for determining optimal bin width based on the number of data points and the estimated standard deviation.

Histograms are distinguished from bar charts. A bar chart displays categorical data, with each bar representing a distinct category, whereas a histogram displays the distribution of continuous data over a range of values. In bar charts, the order of the bars is often arbitrary, but in histograms, the bins represent a continuous range and are therefore ordered sequentially. Furthermore, in a typical histogram, there are no gaps between bars (unless a bin is empty), to emphasize the continuous nature of the data.