Missing Values

Missing values are everywhere, and you don’t want them interfering with your work.

Finding missing values

  • To detect missing values:
df.isna()
  • To check each column for missing values:
df.isna().any()
  • To count missing values:
df.isna().sum()
  • To plot missing values:
import matplotlib.pyplot as plt
df.isna().sum().plot(kind="bar")
plt.show()

Example 1

  • Print a DataFrame that shows whether each value is missing or not.
  • Print a summary that shows whether any value in each column is missing or not.
  • Create a bar plot of the total number of missing values in each column.
# Import matplotlib.pyplot with alias plt
import matplotlib.pyplot as plt

# Check individual values for missing values
print(avocados_2016.isna())

# Check each column for missing values
print(avocados_2016.isna().any())

# Bar plot of missing values by variable
avocados_2016.isna().sum().plot(kind=
"bar")

# Show plot
plt.show()

Removing missing values

One way to deal with missing values is to remove them from the dataset completely.

  • To remove missing values, we use .dropna():
df.dropna()

Example 2

  • Remove the rows containing missing values
  • Verify that all missing values have been removed. Calculate each columns has any NAs, and print.
# Remove rows with missing values
avocados_complete = avocados_2016.dropna()

# Check if any columns contain missing values
print(avocados_complete.isna().any())

Replacing missing values

Another way of handling missing values is to replace them all with the same value. For numerical variables, one option is to replace values with 0

  • To replace missing values
df.fillna(0)