Like A Girl

Pushing the conversation on gender equality.

Code Like A Girl

How to deal with NA values in a dataframe using Python

The Problem

If you come cross various datasets you will find empty values or null/NA values. These values make it difficult for data preprocessing. So How to deal with those?

The cells highlighted in yellow are the null values.

The Solution

There are various ways to tackle this problem:

  • Replace the null values with a space(“ “).
  • Replace the null values with mean/median/mode of the respective columns.
  • The final resort : delete the record/row containing the null value.NOTE: Do this only if your data is not important. Doing this might delete crucial information from your dataset.

Steps

  • Import the pandas library.
import pandas as pd
  • Create a pandas dataframe and import the csv file into it.
dataset=pd.read_csv('Data.csv')
  • Check the number of rows and columns in the dataframe.
dataset.shape
  • Check for null values.
dataset.isnull()
  • Check the sum of NA values.
dataset.isnull().sum()

Code snippets for the solution

  • To fill the null values with space(“ “) type the following code:
modifiedDataset=dataset.fillna(" ")
  • To replace the null values with mean/median/mode, type the following code:
dataset.fillna(dataset.mean())
  • To delete the entire cell with the null value (not recommended) type the following code:
modifiedDataset = dataset.dropna()