Cheat sheet for quick data preprocessing - Data Science Bootcamp journey

It happened! You are entering the Data Science field. In your dreams you create cool neural networks, write a voice assistant, beat financial marketing. But… One of the most important and time-consuming moments is processing data.

For myself I created a cheat sheet to obtain data before the project. Data example

import pandas as pd #import pandas
import numpy as np  #import numpy
df = pd.read_csv("AB_NYC_2019.csv") #read dataset and put it into df

Look at the first 3 lines to understand what the values look like:

df.head(3)

Demonstrating Column Information:

Display all columns with types
```
df.info()
```
and some more information about numeric columns:
```
df.describe()
```

Here we check few questions about dataset and create plan for next step:

Does the number of lines in each column correspond to the total number of lines?
What is data type in each column?
Can we see target columns?

Let’s check values:

missing values in columns:

import seaborn as sns
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='viridis')

do we have columns with only one value in all rows (they will not affect the result):

df = df[[c for c
        in list(df)
        if len(df[c].unique()) > 1]]

do we have duplicate values:
```
df.drop_duplicates(inplace=True)
```
work with empty values: