A histogram visualizes the distribution of values in a dataset.
To plot data with Histogram, we use:
bins sets the number of points in our histogram.
By default, matplotlib creates a histogram with 10 bins of equal size spanning from the smallest sample to the largest sample in our data.
To change the number of bins, we use the keyword
For example, we can divide the histogram into 20 bins to see more details.
range sets the minimum and maximum datapoints that we will include in our histogram.
plt.hist(data, range=(xmin, xmax))
plt.hist(df.height, range=(50, 180))
Normalization reduces the height of each bar by a constant factor so that the sum of the areas of each bar adds to one.
It makes two histograms comparable even if the sample sizes are different.
We can normalize histogram with keyword
density=True. Each bar will represent a proportion of the entire dataset.
plt.hist(df.male_weight, density=True) plt.hist(df.female_weight, density=True)
When having multiple histograms, it can be difficult to read histograms on top of each other.
To solve this problem, we can use:
alpha(between 0 and 1) to set the transparency of the histogram
histtype='step'to draw just the outline of a histogram
plt.hist(budget1, bins=20, alpha=0.4) plt.hist(budget2, bins=20, alpha=0.4)
To classify a dataset, we can count the number of distinct peaks present in the graph