Pandas DataFrame — Complete Cheatsheet

1

Creating a DataFrame

foundation

from dict of lists (most common)

df = pd.DataFrame({
    'name':  ['Alice','Bob','Charlie'],
    'age':   [25, 30, 22],
    'score': [90, 85, 92]
})

name

age

score

0

Alice

25

90

1

Bob

30

85

2

Charlie

22

92

from list of dicts

data = [
  {'name':'Alice','age':25,'score':90},
  {'name':'Bob',  'age':30,'score':85}
]
df = pd.DataFrame(data)

Each dict is a row. Missing keys become NaN. Great for API response data.

from list of lists

df = pd.DataFrame(
  [['Alice',25,90],['Bob',30,85]],
  columns=['name','age','score']
)

from numpy array

arr = np.array([[1,2,3],[4,5,6]])
df = pd.DataFrame(arr, columns=['A','B','C'])

from series (concat)

s1 = pd.Series([1,2,3], name='A')
s2 = pd.Series([4,5,6], name='B')
df = pd.concat([s1, s2], axis=1)

empty df & with custom index

# Empty with defined columns
df = pd.DataFrame(columns=['name','age'])

# Custom row index
df = pd.DataFrame(
  {'score':[90,85,92]},
  index=['alice','bob','charlie']
)

DataFrame attributes

df.shape    # (3, 3) → (rows, cols)
df.columns  # Index(['name','age','score'])
df.index    # RangeIndex(start=0,stop=3)
df.dtypes   # dtype per column

df.size     # 9 → total elements
df.ndim     # 2 → always for DataFrame
df.values   # underlying NumPy array
df.empty    # True if no elements

2

Exploring Data

EDA first step

quick look

df.head()           # first 5 rows
df.head(10)         # first 10 rows
df.tail()           # last 5 rows
df.tail(3)          # last 3 rows
df.sample(5)        # 5 random rows
df.sample(frac=0.1) # random 10%

structure & types

df.info()      # dtypes + null counts + memory
df.describe()  # stats for numeric columns
df.describe(include='all')   # all cols
df.describe(include='object') # strings
df.shape       # (rows, cols)
df.dtypes      # type per column
df.columns     # column names

counts & frequencies

df.count()                      # non-null per col
df.nunique()                    # unique count per col
df['col'].value_counts()        # frequency table
df['col'].value_counts(normalize=True)
df.isnull().sum()               # NaN count per col
df.memory_usage(deep=True)      # bytes per col

interactive property inspector

click a method/property

Click any method above to inspect it

3

Selecting Columns

column access

single column → Series

df['name']       # returns Series
df.name          # dot notation (avoid spaces)

Dot notation breaks when column name has spaces or conflicts with a DataFrame method (like df.count). Use [] for safety.

multiple columns → DataFrame

df[['name','score']]   # double brackets!
cols = ['name','score']
df[cols]               # same, using variable

all columns except some

df.drop(columns=['score'])
df[df.columns.difference(['score'])]

# Select by dtype
df.select_dtypes(include='number')
df.select_dtypes(include='object')
df.select_dtypes(exclude='float')

# Select by column name pattern
df.filter(like='sales')      # contains 'sales'
df.filter(regex='^q')       # starts with q

reorder columns

df = df[['score','name','age']]          # manual reorder
df = df[sorted(df.columns)]               # alphabetical
df = df[['name'] + [c for c in df.columns if c != 'name']] # name first

4

Selecting Rows — loc, iloc, at

row access

.loc — label based (INCLUSIVE)

df.loc[0]              # row label 0
df.loc[0:2]           # rows 0,1,2 (end inclusive!)
df.loc[0, 'name']     # single value
df.loc[0:2, 'name':'age']  # rows + cols
df.loc[[0,2], 'name'] # rows 0 and 2
df.at[0, 'name']      # fastest single cell

.loc slice end is INCLUSIVE. So df.loc[0:2] returns 3 rows (0, 1, 2).

.iloc — position based (EXCLUSIVE end)

df.iloc[0]            # first row
df.iloc[0:2]          # rows 0,1 (end exclusive)
df.iloc[0, 1]         # row 0, col 1
df.iloc[-1]           # last row
df.iloc[:, 0]         # all rows, first col
df.iloc[0:3, 0:2]     # slice rows + cols
df.iat[0, 1]          # fastest scalar

.iloc slice end is EXCLUSIVE (like Python lists). df.iloc[0:2] → 2 rows.

loc vs iloc — key difference with non-default index

DATAFRAME

A

5

10

20

15

30

df.loc[5] → label 5

A = 10

Uses the INDEX LABEL 5

df.iloc[0] → position 0

A = 10

Uses POSITION 0 (1st row)

Avoid chained indexing: df['col'][0]. Use df.loc[0,'col'] or df.iloc[0,0] — chaining causes SettingWithCopyWarning and may not modify the original.

5

Filtering Data

boolean indexing

single condition

df[df['age'] > 25]
df[df['name'] == 'Alice']
df[df['score'] != 90]
df[df['score'].isnull()]   # NaN rows
df[df['score'].notnull()]  # non-NaN rows

.query() — string syntax (faster on large DFs)

df.query('age > 25')
df.query('age > 25 and score >= 90')
df.query('name == "Alice"')
df.query('age > 25 | score >= 90')

# Use Python variable with @
min_age = 25
df.query('age > @min_age')

multiple conditions

# AND — both must be true
df[(df['age'] > 25) & (df['score'] >= 90)]

# OR — either must be true
df[(df['age'] > 25) | (df['score'] >= 90)]

# NOT — negate
df[~(df['age'] > 25)]

Always wrap each condition in parentheses when using & or |. Without them Python's operator precedence causes errors.

special filters — isin, between, str

# isin — value in list
df[df['name'].isin(['Alice','Charlie'])]
df[~df['name'].isin(['Bob'])]  # NOT in

# between — inclusive range
df[df['age'].between(20, 30)]

# String filters
df[df['name'].str.startswith('A')]
df[df['name'].str.contains('lic')]
df[df['name'].str.contains('al',case=False)]

6

Adding & Removing Columns / Rows

modify

adding columns

# Direct assignment
df['grade'] = 'A'                    # constant
df['age_plus_10'] = df['age'] + 10

# insert() — at specific position
df.insert(1, 'grade', 'A')

# assign() — method chaining
df = df.assign(
  grade='A',
  age_plus_10=df['age'] + 10
)

# Conditional column via apply
df['result'] = df['score'].apply(
  lambda x: 'Pass' if x >= 60 else 'Fail'
)

removing columns

# drop() — preferred
df.drop(columns=['grade'])
df.drop(columns=['grade','age_plus_10'])
df.drop('grade', axis=1)          # axis=1
df.drop(columns=['grade'], inplace=True)

# del — modifies in place
del df['grade']

# pop() — removes and returns
removed = df.pop('grade')

adding rows

# pd.concat — recommended
new = pd.DataFrame([{'name':'Dave','age':28}])
df = pd.concat([df, new], ignore_index=True)

removing rows

df.drop(0)                           # drop row label 0
df.drop([0,1,2])                    # drop multiple rows
df.drop_duplicates()                  # remove duplicate rows
df.drop_duplicates(subset=['name'])  # based on column
df.drop_duplicates(keep='last')      # keep last occurrence

7

Sorting Data

order

sort by values

df.sort_values('age')                       # asc
df.sort_values('age', ascending=False)

# Multi-column sort
df.sort_values(['age','score'])
df.sort_values(['age','score'],
               ascending=[True,False])

# Sort and reset index
df.sort_values('age').reset_index(drop=True)

sort by index & top N

df.sort_index()
df.sort_index(ascending=False)

# Top / bottom N rows (faster than sort + head)
df.nlargest(3, 'score')         # top 3 scores
df.nsmallest(3, 'age')          # youngest 3
df.nlargest(3, ['score','age']) # tie-break

nlargest / nsmallest are faster than sort_values().head(n) for large DataFrames.

8

Renaming & Reindexing

labels

rename columns

# dict mapping
df.rename(columns={'name':'full_name','age':'years'})

# function on all columns
df.rename(columns=str.upper)
df.rename(columns=str.lower)
df.rename(columns=lambda x: x.strip())

# Reassign all at once
df.columns = ['full_name','years','score']

# Add prefix/suffix
df.add_prefix('col_')      # col_name, col_age
df.add_suffix('_2024')     # name_2024, age_2024

set & reset index

# Set a column as the row index
df.set_index('name')
df.set_index('name', inplace=True)

# Reset back to 0, 1, 2...
df.reset_index()           # old index → column
df.reset_index(drop=True) # discard old index

# Rename index
df.index.name = 'row_id'
df.rename_axis('row_id')
df.rename_axis(None)       # remove name

9

Handling Missing Data

data quality

detect

df.isnull()              # bool DataFrame
df.notnull()
df.isnull().sum()        # NaN count per col
df.isnull().sum().sum() # total NaN count
df.isnull().any()        # any NaN per col?
df.isnull().all()        # all NaN per col?
df.isnull().mean()       # fraction NaN per col

drop

df.dropna()               # any NaN → drop row
df.dropna(how='all')      # all NaN → drop row
df.dropna(subset=['age']) # NaN in 'age' → drop
df.dropna(axis=1)          # drop NaN columns
df.dropna(thresh=2)        # keep if ≥2 non-NaN

fill

df.fillna(0)                     # fill all NaN with 0
df['age'].fillna(0)
df.fillna({'age':0,'score':50}) # per column

# Fill with statistics
df['age'].fillna(df['age'].mean())
df['age'].fillna(df['age'].median())
df['name'].fillna(df['name'].mode()[0])

# Forward / backward fill
df.ffill()   df.bfill()

# Interpolate
df['score'].interpolate()        # linear

strategy guide

dropna

When missing = meaningless. Survey skips, optional fields. Removes rows entirely.

fillna(mean/median)

ML preprocessing — preserves row count. Use median for skewed data.

ffill / bfill

Time series — carry last known value forward. Stock prices, sensor data.

interpolate

Smooth numeric series — temperature, financial curves between points.

10

Data Types & Type Casting

dtypes

check & cast

df.dtypes                     # all column types

# Cast single column
df['age'] = df['age'].astype(int)
df['score'] = df['score'].astype(float)
df['name'] = df['name'].astype(str)

# Cast multiple at once
df = df.astype({'age': int, 'score': float})

# Memory-efficient types
df['age']   = df['age'].astype('int8')
df['score'] = df['score'].astype('float32')

safe conversions

# to_numeric — handles errors
df['age'] = pd.to_numeric(df['age'])
df['age'] = pd.to_numeric(df['age'],
             errors='coerce')  # invalid→NaN

# to_datetime
df['date'] = pd.to_datetime(df['date'])
df['date'] = pd.to_datetime(df['date'],
             format='%Y-%m-%d')

# category — low cardinality strings
df['gender'] = df['gender'].astype('category')

category dtype saves up to 90% memory for columns with few unique string values (gender, country, status).

dtype quick reference

dtype	python type	memory	use for
int64	int	8 bytes	whole numbers (default)
int8/16/32	int	1/2/4 bytes	small integers — save memory
float64	float	8 bytes	decimals (default)
float32	float	4 bytes	ML model inputs
object	str/mixed	variable	strings (inefficient)
category	Categorical	small	repeated strings (gender, country)
bool	bool	1 byte	True/False flags
datetime64	datetime	8 bytes	date/time values
Int64 (cap I)	int+NaN	8 bytes	nullable integer

11

String Operations (.str)

text data

case, clean & search

s = df['name']

s.str.upper()      s.str.lower()
s.str.title()      s.str.capitalize()
s.str.strip()      # remove whitespace
s.str.len()        # length per element
s.str.contains('ali', case=False)
s.str.startswith('A')
s.str.endswith('e')
s.str.find('li')    # position of first match

split, replace & extract

s.str.split(',')                # → list
s.str.split(',', expand=True)  # → DataFrame
s.str.split(',').str[0]         # first part

s.str.replace('alice','Alice')
s.str.replace(r'\d+','', regex=True)

s.str.extract(r'(\d+)')     # capture group
s.str.extractall(r'(\d+)')

s.str.isdigit()    s.str.isalpha()
s.str.zfill(5)    # zero-pad to width 5

Real-world uses: cleaning scraped text · email/domain extraction · parsing structured strings in a column · standardizing casing · feature extraction with regex · phone number formatting

Alternative: df['col'].apply(lambda x: x.upper()) — works but 5–10× slower than .str. Always prefer .str for vectorized string operations.

12

DateTime Operations (.dt)

time series

setup & components

df['date'] = pd.to_datetime(df['date'])

df['date'].dt.year
df['date'].dt.month
df['date'].dt.day
df['date'].dt.hour
df['date'].dt.quarter      # 1–4
df['date'].dt.dayofweek    # 0=Mon…6=Sun
df['date'].dt.day_name()   # 'Monday'…
df['date'].dt.month_name() # 'January'…
df['date'].dt.strftime('%d-%b-%Y')

filters, math & resample

# Boolean flags
df['date'].dt.is_month_end
df['date'].dt.is_leap_year

# Date arithmetic
df['date'] + pd.Timedelta(days=7)
df['date'] + pd.DateOffset(months=1)
df['end'] - df['start']  # → Timedelta

# Filter by date
df[df['date'] > '2024-06-01']
df[df['date'].dt.year == 2024]

# Resample (time-series groupby)
df.set_index('date').resample('M').sum()
df.set_index('date').resample('W').mean()

13

GroupBy & Aggregation

split-apply-combine

basic groupby

g = df.groupby('department')

g['salary'].mean()
g['salary'].sum()
g['salary'].count()
g['salary'].max()
g['salary'].min()
g['salary'].std()
g['salary'].median()
g['salary'].nunique()

# All numeric columns
g.mean()
g.describe()

multiple aggregations — agg()

# List of agg functions
g['salary'].agg(['mean','min','max'])

# Dict — different agg per column
g.agg({
  'salary': ['mean','max'],
  'age':    'median',
  'name':   'count'
})

# Named aggregations (clean column names)
g.agg(
  avg_salary=(   'salary','mean'),
  max_salary=(   'salary','max'),
  headcount=(    'name',  'count')
)

transform — keeps original index

# Returns same shape as original df
# Perfect for adding group stats to rows
df['dept_avg'] = df.groupby('dept')['salary'].\
    transform('mean')
df['rank_in_dept'] = df.groupby('dept')['salary'].\
    transform('rank')

transform() is one of the most powerful GroupBy tools — it lets you add group-level aggregations as a new column while keeping all original rows intact.

filter & apply

# filter — keep/remove entire groups
df.groupby('dept').filter(
  lambda x: x['salary'].mean() > 50000
)

# apply — custom function on each group
def top_2(group):
  return group.nlargest(2,'salary')

df.groupby('dept').apply(top_2)

# Multiple group keys
df.groupby(['dept','gender'])['salary'].mean()

14

Merging, Joining & Concatenating

combine

pd.concat() — stack DataFrames

# Vertical (same columns)
pd.concat([df1, df2])
pd.concat([df1, df2], ignore_index=True)
pd.concat([df1, df2], keys=['2023','2024'])

# Horizontal (side by side)
pd.concat([df1, df2], axis=1)

pd.merge() — SQL-style joins

pd.merge(df1, df2, on='id')              # inner
pd.merge(df1, df2, on='id', how='left')
pd.merge(df1, df2, on='id', how='right')
pd.merge(df1, df2, on='id', how='outer')

# Different key column names
pd.merge(df1, df2, left_on='emp_id', right_on='id')

# Suffixes for overlapping columns
pd.merge(df1, df2, on='id', suffixes=('_l','_r'))

# Show merge source
pd.merge(df1, df2, on='id', how='outer', indicator=True)

join types — interactive demo

df1 (employees)

id=1 | Alice

id=2 | Bob

id=3 | Charlie

⟶

df2 (departments)

id=1 | HR

id=2 | IT

id=4 | Finance

INNER JOIN result — only matching ids

id | name | department

1 | Alice | HR

2 | Bob | IT

(id=3 Charlie and id=4 Finance dropped — no match)

When to use: concat → stack same-structure dfs · merge → SQL-style column join · join → quick index-based join

15

Pivot Tables & Reshaping

reshape

pivot_table — with aggregation

pd.pivot_table(df,
  values='salary',
  index='department',
  columns='gender',
  aggfunc='mean',
  fill_value=0
)

# Multiple aggfuncs
pd.pivot_table(df,
  values='salary',
  index='department',
  columns='gender',
  aggfunc=['mean','count']
)

# With totals row/column
pd.pivot_table(df,
  values='salary',
  index='department',
  columns='gender',
  aggfunc='mean',
  margins=True,
  margins_name='Total'
)

pivot, crosstab, melt

# pivot — simple reshape (no agg)
df.pivot(index='date',
         columns='category',
         values='value')

# crosstab — frequency table
pd.crosstab(df['dept'], df['gender'])
pd.crosstab(df['dept'], df['gender'],
           normalize=True)   # proportions

# melt — wide to long format
df.melt(
  id_vars=['name','dept'],
  value_vars=['q1','q2','q3'],
  var_name='quarter',
  value_name='sales'
)

# stack / unstack
df.stack()    # columns → rows
df.unstack()  # rows → columns

16

Apply, Map & Lambda Functions

transform

apply — column or row level

# On a Series (element-wise)
df['score'].apply(lambda x: x * 2)
df['score'].apply(
  lambda x: 'Pass' if x >= 60 else 'Fail'
)

# On entire row (axis=1)
df.apply(
  lambda row: row['score'] * 2
               if row['age'] > 25
               else row['score'],
  axis=1
)

# On entire column (axis=0)
df.apply(lambda col: col.max() - col.min())

map & replace

# map — element-wise on Series
df['gender'] = df['gender'].map({
  'M': 'Male', 'F': 'Female'
})
df['name'] = df['name'].map(str.upper)

# replace — swap values
df.replace('M', 'Male')
df.replace({'M':'Male','F':'Female'})

# DataFrame.map — element-wise on whole df
df.map(lambda x: round(x,2)
        if isinstance(x, float)
        else x)

performance — vectorized vs apply

approach	speed	use for
df['col'] * 2	fastest	simple math on numeric column
np.where(cond, a, b)	very fast	conditional element-wise logic
df['col'].map(dict)	fast	value substitution / encoding
df['col'].str.method()	fast	string ops
df['col'].apply(fn)	slow	complex per-element logic
df.apply(fn, axis=1)	very slow	only when unavoidable
iterrows()	slowest	avoid — use vectorized instead

17

Window Functions

rolling/expanding

rolling — fixed window

df['sales'].rolling(7).mean()   # 7-day MA
df['sales'].rolling(30).sum()
df['sales'].rolling(7).max()
df['sales'].rolling(7).std()

# min_periods: compute with fewer values
df['sales'].rolling(7,min_periods=1).mean()

First (window-1) values = NaN unless min_periods is set. Window slides one row at a time.

expanding & ewm

# Expanding — growing from start
df['sales'].expanding().mean()
df['sales'].expanding().sum()  # = cumsum()
df['sales'].expanding().max()

# Cumulative (shorthand)
df['sales'].cumsum()
df['sales'].cumprod()
df['sales'].cummax()

# EWMA — weights recent values more
df['sales'].ewm(span=7).mean()
df['sales'].ewm(alpha=0.3).mean()

Real-world: stock 7-day moving average · smoothing sensor noise · sales trend detection · volatility windows

18

Multi-Level Index (MultiIndex)

advanced

create

# From groupby (most common)
df_m = df.groupby(['dept','gender']).\
  agg({'salary':'mean'})

# From arrays
arrays = [['Q1','Q1','Q2','Q2'],
          ['N','S','N','S']]
idx = pd.MultiIndex.from_arrays(
  arrays, names=['quarter','region'])
df_m = pd.DataFrame({'sales':[100,200,150,250]},
                       index=idx)

access & flatten

# loc with tuples
df_m.loc['Q1']
df_m.loc[('Q1','N')]

# xs — cross section
df_m.xs('Q1', level='quarter')
df_m.xs('N', level='region')

# IndexSlice
idx = pd.IndexSlice
df_m.loc[idx['Q1','N'], :]

# Flatten to regular columns
df_m.reset_index()
df_m.columns = ['_'.join(c) for c in df_m.columns]

19

Performance Optimization

speed

memory & dtype optimization

# Category for low-cardinality strings
df['gender'] = df['gender'].astype('category')

# Smaller numeric types
df['age']   = df['age'].astype('int8')
df['score'] = df['score'].astype('float32')

# Check memory usage
df.memory_usage(deep=True)

# Read only needed columns
df = pd.read_csv('data.csv',
                 usecols=['name','age'])

speed tips

# 1. Vectorized >> apply >> iterrows
df['double'] = df['score'] * 2    # fast

# 2. query() faster on large DFs
df.query('age > 25 and score > 80')

# 3. eval() for arithmetic
df.eval('result = score * 2 + age',
        inplace=True)

# 4. Chunked reading for huge files
chunks = pd.read_csv('big.csv',
                      chunksize=10000)
result = pd.concat([process(c) for c in chunks])

# 5. If you must iterate, use itertuples
for row in df.itertuples():
    print(row.name, row.score)

20

Reading & Writing Data

I/O

reading

# CSV
pd.read_csv('file.csv')
pd.read_csv('file.csv', sep=';')
pd.read_csv('file.csv', index_col='id')
pd.read_csv('file.csv', usecols=['a','b'])
pd.read_csv('file.csv', nrows=100)
pd.read_csv('file.csv', skiprows=5)
pd.read_csv('file.csv', na_values=['NA','?'])
pd.read_csv('file.csv', dtype={'age':int})

# Excel / JSON / SQL
pd.read_excel('file.xlsx', sheet_name='Sheet1')
pd.read_json('file.json')
pd.read_sql('SELECT * FROM tbl', conn)
pd.read_clipboard()          # from Excel
pd.read_csv('https://url.com/data.csv')

writing

# CSV
df.to_csv('out.csv', index=False)
df.to_csv('out.csv', sep='\t')

# Excel — single sheet
df.to_excel('out.xlsx', index=False)
df.to_excel('out.xlsx', sheet_name='Data')

# Excel — multiple sheets
with pd.ExcelWriter('out.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Sheet1')
    df2.to_excel(writer, sheet_name='Sheet2')

# JSON / SQL
df.to_json('out.json', orient='records')
df.to_sql('tbl', conn, if_exists='replace')

21

Quick Reference — Searchable Table

all methods

method	returns	category	use for
pd.DataFrame({...})	DataFrame	create	create from dict of lists
pd.read_csv('file.csv')	DataFrame	I/O	load CSV into DataFrame
pd.read_excel('file.xlsx')	DataFrame	I/O	load Excel file
df.to_csv('out.csv')	None	I/O	save to CSV
df.head() / df.tail()	DataFrame	explore	first/last N rows
df.info()	None	explore	column types, non-null counts
df.describe()	DataFrame	explore	statistical summary
df.shape	tuple	explore	number of rows and columns
df['col'] / df[['a','b']]	Series/DF	select	select one or multiple columns
df.select_dtypes()	DataFrame	select	select columns by dtype
df.loc[row, col]	scalar/DF	access	label-based row/col access
df.iloc[i, j]	scalar/DF	access	position-based access
df.at / df.iat	scalar	access	fastest single cell access
df[df['col'] > 10]	DataFrame	filter	boolean indexing
df.query('col > 10')	DataFrame	filter	string-based filtering
df['col'].isin([...])	bool Series	filter	filter by value list
df['col'].between(a, b)	bool Series	filter	inclusive range filter
df['new'] = ... / assign()	DataFrame	modify	add new column
df.drop(columns=[...])	DataFrame	modify	remove columns or rows
df.sort_values('col')	DataFrame	sort	sort by column values
df.nlargest(n, 'col')	DataFrame	sort	top N rows by column
df.rename(columns={...})	DataFrame	rename	rename columns
df.set_index('col')	DataFrame	index	set column as row index
df.reset_index()	DataFrame	index	reset to default integer index
df.isnull() / df.isna()	bool DF	missing	detect NaN values
df.dropna()	DataFrame	missing	remove rows with NaN
df.fillna(val)	DataFrame	missing	replace NaN with value
df.ffill() / df.bfill()	DataFrame	missing	forward/backward fill
df.astype(dtype)	DataFrame	types	change column data type
pd.to_numeric(df['col'])	Series	types	safely convert to number
pd.to_datetime(df['col'])	Series	datetime	convert to datetime
df['date'].dt.year	Series	datetime	extract datetime components
df.resample('M').sum()	DataFrame	datetime	time-frequency aggregation
df['col'].str.upper()	Series	strings	vectorized string ops
df['col'].str.contains()	bool Series	strings	string/regex search
df.groupby('col').agg()	DataFrame	groupby	group and aggregate
groupby.transform()	Series	groupby	group stats keeping all rows
groupby.filter(fn)	DataFrame	groupby	keep/drop entire groups
pd.concat([df1, df2])	DataFrame	combine	stack DataFrames vertically
pd.merge(df1, df2, on='id')	DataFrame	combine	SQL-style join on column
pd.pivot_table(df,...)	DataFrame	pivot	pivot with aggregation
df.melt(id_vars=[...])	DataFrame	pivot	wide to long format
df['col'].apply(fn)	Series	apply	custom per-element function
df['col'].map(dict)	Series	apply	value substitution via dict
df['col'].rolling(7).mean()	Series	window	rolling window stats
df['col'].expanding().sum()	Series	window	growing window (cumulative)
df['col'].cumsum()	Series	window	running cumulative total
df['col'].ewm(span=7).mean()	Series	window	exponentially weighted avg
df.drop_duplicates()	DataFrame	clean	remove duplicate rows
df['col'].value_counts()	Series	explore	frequency of each value
pd.MultiIndex.from_arrays()	MultiIndex	advanced	create hierarchical index
col.astype('category')	Series	performance	memory-efficient string type
df['col'].interpolate()	Series	missing	fill gaps by interpolation

No methods found for that query

Pandas DataFrameCheatsheet

Pandas DataFrame
Cheatsheet