What Pandas is

Pandas is python library for data analysis. Its name came from “panel data”. We can import data from various source into datastructure so called “DataFrame” and do computation/manipulation operations on them as we wish.

The basics

DataFrame is the core class which consists Series(similiar to column)
Series is column of DataFrame

Read/Access operaions

Useful method to display information

  
df.head() 
# display first n rows (default 5)
df.tail() 
# display last n rows (default 5)
df.info() 
# display concise summary sucj as columns name, data type, memory usage
df.columns # this is not method
# display all columns label

Read methods

Data access mental model is similar to how we access data on DB table. We can select row(s) or column(s), also apply filter condition.

  
# select a column
df['first'] # or 
df.first
# select columns
df[['first', 'last']]

# select a row
index = 0 
df.loc[index] # or 
df.iloc[index] 
# select rows
df.loc[[0,1]] # or 
df.iloc[[0,1]]

# select portion of DataFrame
# df.loc[row(s), col(s)]
df.loc[0, ['first', 'last']] # or
df.iloc[0, [0, 1]]

df.loc[[0,1], ['first', 'email']] # or
df.iloc[[0,1], [0,2]]

# or with slice operator to do range select
df.loc[0:2, 'first':'email'] 

loc vs iloc method

Both method usage are the same, are used to access portion of Dataframe by row(s) or column(s). The difference is the parameters. loc accessor parameters are label(s) whereas iloc accessor parameters are index based.

Filtering row

We can pass Series of boolean to select rows as we want. To create boolean Series, we do as the following

  
filter = df['last'] == 'Vinci'
df[filter] # select all column
df.loc[filter, ['first', 'last']]  # select only 2 columns

Update

  
# Update a single cell
df.at[0, 'first'] = 'Klur2' # or
df.loc[0, 'first'] = 'Klur2'

## Update schema
## Add/Drop column
df['middle'] = "init middle"
df.drop(columns=['middle'], inplace=True)

The official cheat sheet from Pandas

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

Pandas 101

What Pandas is

The basics

Read/Access operaions

Useful method to display information

Read methods

loc vs iloc method

Filtering row

Update

The official cheat sheet from Pandas

Trending Tags

Pandas 101

What Pandas is

The basics

Read/Access operaions

Useful method to display information

Read methods

loc vs iloc method

Filtering row

Update

The official cheat sheet from Pandas

Further Reading

Java util.concurrent package

Scala 101

Scala Generic

Trending Tags