pandas · dataframe · complete reference

Pandas DataFrame
Cheatsheet

Every method · explained · with alternatives · animated & interactive

Create Explore Select Filter Modify Clean GroupBy Merge Pivot Apply
1
Creating a DataFrame
foundation
from dict of lists (most common)
df = pd.DataFrame({
    'name':  ['Alice','Bob','Charlie'],
    'age':   [25, 30, 22],
    'score': [90, 85, 92]
})
name
age
score
0
Alice
25
90
1
Bob
30
85
2
Charlie
22
92
from list of dicts
data = [
  {'name':'Alice','age':25,'score':90},
  {'name':'Bob',  'age':30,'score':85}
]
df = pd.DataFrame(data)
Each dict is a row. Missing keys become NaN. Great for API response data.
from list of lists
df = pd.DataFrame(
  [['Alice',25,90],['Bob',30,85]],
  columns=['name','age','score']
)
from numpy array
arr = np.array([[1,2,3],[4,5,6]])
df = pd.DataFrame(arr, columns=['A','B','C'])
from series (concat)
s1 = pd.Series([1,2,3], name='A')
s2 = pd.Series([4,5,6], name='B')
df = pd.concat([s1, s2], axis=1)
empty df & with custom index
# Empty with defined columns
df = pd.DataFrame(columns=['name','age'])

# Custom row index
df = pd.DataFrame(
  {'score':[90,85,92]},
  index=['alice','bob','charlie']
)
DataFrame attributes
df.shape    # (3, 3) → (rows, cols)
df.columns  # Index(['name','age','score'])
df.index    # RangeIndex(start=0,stop=3)
df.dtypes   # dtype per column
df.size     # 9 → total elements
df.ndim     # 2 → always for DataFrame
df.values   # underlying NumPy array
df.empty    # True if no elements
2
Exploring Data
EDA first step
quick look
df.head()           # first 5 rows
df.head(10)         # first 10 rows
df.tail()           # last 5 rows
df.tail(3)          # last 3 rows
df.sample(5)        # 5 random rows
df.sample(frac=0.1) # random 10%
structure & types
df.info()      # dtypes + null counts + memory
df.describe()  # stats for numeric columns
df.describe(include='all')   # all cols
df.describe(include='object') # strings
df.shape       # (rows, cols)
df.dtypes      # type per column
df.columns     # column names
counts & frequencies
df.count()                      # non-null per col
df.nunique()                    # unique count per col
df['col'].value_counts()        # frequency table
df['col'].value_counts(normalize=True)
df.isnull().sum()               # NaN count per col
df.memory_usage(deep=True)      # bytes per col
interactive property inspector
click a method/property
Click any method above to inspect it
3
Selecting Columns
column access
single column → Series
df['name']       # returns Series
df.name          # dot notation (avoid spaces)
Dot notation breaks when column name has spaces or conflicts with a DataFrame method (like df.count). Use [] for safety.
multiple columns → DataFrame
df[['name','score']]   # double brackets!
cols = ['name','score']
df[cols]               # same, using variable
all columns except some
df.drop(columns=['score'])
df[df.columns.difference(['score'])]

# Select by dtype
df.select_dtypes(include='number')
df.select_dtypes(include='object')
df.select_dtypes(exclude='float')

# Select by column name pattern
df.filter(like='sales')      # contains 'sales'
df.filter(regex='^q')       # starts with q
reorder columns
df = df[['score','name','age']]          # manual reorder
df = df[sorted(df.columns)]               # alphabetical
df = df[['name'] + [c for c in df.columns if c != 'name']] # name first
4
Selecting Rows — loc, iloc, at
row access
.loc — label based (INCLUSIVE)
df.loc[0]              # row label 0
df.loc[0:2]           # rows 0,1,2 (end inclusive!)
df.loc[0, 'name']     # single value
df.loc[0:2, 'name':'age']  # rows + cols
df.loc[[0,2], 'name'] # rows 0 and 2
df.at[0, 'name']      # fastest single cell
.loc slice end is INCLUSIVE. So df.loc[0:2] returns 3 rows (0, 1, 2).
.iloc — position based (EXCLUSIVE end)
df.iloc[0]            # first row
df.iloc[0:2]          # rows 0,1 (end exclusive)
df.iloc[0, 1]         # row 0, col 1
df.iloc[-1]           # last row
df.iloc[:, 0]         # all rows, first col
df.iloc[0:3, 0:2]     # slice rows + cols
df.iat[0, 1]          # fastest scalar
.iloc slice end is EXCLUSIVE (like Python lists). df.iloc[0:2] → 2 rows.
loc vs iloc — key difference with non-default index
DATAFRAME
A
5
10
10
20
15
30
df.loc[5] → label 5
A = 10
Uses the INDEX LABEL 5
df.iloc[0] → position 0
A = 10
Uses POSITION 0 (1st row)

Avoid chained indexing: df['col'][0]. Use df.loc[0,'col'] or df.iloc[0,0] — chaining causes SettingWithCopyWarning and may not modify the original.

5
Filtering Data
boolean indexing
single condition
df[df['age'] > 25]
df[df['name'] == 'Alice']
df[df['score'] != 90]
df[df['score'].isnull()]   # NaN rows
df[df['score'].notnull()]  # non-NaN rows
.query() — string syntax (faster on large DFs)
df.query('age > 25')
df.query('age > 25 and score >= 90')
df.query('name == "Alice"')
df.query('age > 25 | score >= 90')

# Use Python variable with @
min_age = 25
df.query('age > @min_age')
multiple conditions
# AND — both must be true
df[(df['age'] > 25) & (df['score'] >= 90)]

# OR — either must be true
df[(df['age'] > 25) | (df['score'] >= 90)]

# NOT — negate
df[~(df['age'] > 25)]

Always wrap each condition in parentheses when using & or |. Without them Python's operator precedence causes errors.

special filters — isin, between, str
# isin — value in list
df[df['name'].isin(['Alice','Charlie'])]
df[~df['name'].isin(['Bob'])]  # NOT in

# between — inclusive range
df[df['age'].between(20, 30)]

# String filters
df[df['name'].str.startswith('A')]
df[df['name'].str.contains('lic')]
df[df['name'].str.contains('al',case=False)]
6
Adding & Removing Columns / Rows
modify
adding columns
# Direct assignment
df['grade'] = 'A'                    # constant
df['age_plus_10'] = df['age'] + 10

# insert() — at specific position
df.insert(1, 'grade', 'A')

# assign() — method chaining
df = df.assign(
  grade='A',
  age_plus_10=df['age'] + 10
)

# Conditional column via apply
df['result'] = df['score'].apply(
  lambda x: 'Pass' if x >= 60 else 'Fail'
)
removing columns
# drop() — preferred
df.drop(columns=['grade'])
df.drop(columns=['grade','age_plus_10'])
df.drop('grade', axis=1)          # axis=1
df.drop(columns=['grade'], inplace=True)

# del — modifies in place
del df['grade']

# pop() — removes and returns
removed = df.pop('grade')
adding rows
# pd.concat — recommended
new = pd.DataFrame([{'name':'Dave','age':28}])
df = pd.concat([df, new], ignore_index=True)
removing rows
df.drop(0)                           # drop row label 0
df.drop([0,1,2])                    # drop multiple rows
df.drop_duplicates()                  # remove duplicate rows
df.drop_duplicates(subset=['name'])  # based on column
df.drop_duplicates(keep='last')      # keep last occurrence
7
Sorting Data
order
sort by values
df.sort_values('age')                       # asc
df.sort_values('age', ascending=False)

# Multi-column sort
df.sort_values(['age','score'])
df.sort_values(['age','score'],
               ascending=[True,False])

# Sort and reset index
df.sort_values('age').reset_index(drop=True)
sort by index & top N
df.sort_index()
df.sort_index(ascending=False)

# Top / bottom N rows (faster than sort + head)
df.nlargest(3, 'score')         # top 3 scores
df.nsmallest(3, 'age')          # youngest 3
df.nlargest(3, ['score','age']) # tie-break
nlargest / nsmallest are faster than sort_values().head(n) for large DataFrames.
8
Renaming & Reindexing
labels
rename columns
# dict mapping
df.rename(columns={'name':'full_name','age':'years'})

# function on all columns
df.rename(columns=str.upper)
df.rename(columns=str.lower)
df.rename(columns=lambda x: x.strip())

# Reassign all at once
df.columns = ['full_name','years','score']

# Add prefix/suffix
df.add_prefix('col_')      # col_name, col_age
df.add_suffix('_2024')     # name_2024, age_2024
set & reset index
# Set a column as the row index
df.set_index('name')
df.set_index('name', inplace=True)

# Reset back to 0, 1, 2...
df.reset_index()           # old index → column
df.reset_index(drop=True) # discard old index

# Rename index
df.index.name = 'row_id'
df.rename_axis('row_id')
df.rename_axis(None)       # remove name
9
Handling Missing Data
data quality
detect
df.isnull()              # bool DataFrame
df.notnull()
df.isnull().sum()        # NaN count per col
df.isnull().sum().sum() # total NaN count
df.isnull().any()        # any NaN per col?
df.isnull().all()        # all NaN per col?
df.isnull().mean()       # fraction NaN per col
drop
df.dropna()               # any NaN → drop row
df.dropna(how='all')      # all NaN → drop row
df.dropna(subset=['age']) # NaN in 'age' → drop
df.dropna(axis=1)          # drop NaN columns
df.dropna(thresh=2)        # keep if ≥2 non-NaN
fill
df.fillna(0)                     # fill all NaN with 0
df['age'].fillna(0)
df.fillna({'age':0,'score':50}) # per column

# Fill with statistics
df['age'].fillna(df['age'].mean())
df['age'].fillna(df['age'].median())
df['name'].fillna(df['name'].mode()[0])

# Forward / backward fill
df.ffill()   df.bfill()

# Interpolate
df['score'].interpolate()        # linear
strategy guide
dropna
When missing = meaningless. Survey skips, optional fields. Removes rows entirely.
fillna(mean/median)
ML preprocessing — preserves row count. Use median for skewed data.
ffill / bfill
Time series — carry last known value forward. Stock prices, sensor data.
interpolate
Smooth numeric series — temperature, financial curves between points.
10
Data Types & Type Casting
dtypes
check & cast
df.dtypes                     # all column types

# Cast single column
df['age'] = df['age'].astype(int)
df['score'] = df['score'].astype(float)
df['name'] = df['name'].astype(str)

# Cast multiple at once
df = df.astype({'age': int, 'score': float})

# Memory-efficient types
df['age']   = df['age'].astype('int8')
df['score'] = df['score'].astype('float32')
safe conversions
# to_numeric — handles errors
df['age'] = pd.to_numeric(df['age'])
df['age'] = pd.to_numeric(df['age'],
             errors='coerce')  # invalid→NaN

# to_datetime
df['date'] = pd.to_datetime(df['date'])
df['date'] = pd.to_datetime(df['date'],
             format='%Y-%m-%d')

# category — low cardinality strings
df['gender'] = df['gender'].astype('category')
category dtype saves up to 90% memory for columns with few unique string values (gender, country, status).
dtype quick reference
dtypepython typememoryuse for
int64int8 byteswhole numbers (default)
int8/16/32int1/2/4 bytessmall integers — save memory
float64float8 bytesdecimals (default)
float32float4 bytesML model inputs
objectstr/mixedvariablestrings (inefficient)
categoryCategoricalsmallrepeated strings (gender, country)
boolbool1 byteTrue/False flags
datetime64datetime8 bytesdate/time values
Int64 (cap I)int+NaN8 bytesnullable integer
11
String Operations (.str)
text data
case, clean & search
s = df['name']

s.str.upper()      s.str.lower()
s.str.title()      s.str.capitalize()
s.str.strip()      # remove whitespace
s.str.len()        # length per element
s.str.contains('ali', case=False)
s.str.startswith('A')
s.str.endswith('e')
s.str.find('li')    # position of first match
split, replace & extract
s.str.split(',')                # → list
s.str.split(',', expand=True)  # → DataFrame
s.str.split(',').str[0]         # first part

s.str.replace('alice','Alice')
s.str.replace(r'\d+','', regex=True)

s.str.extract(r'(\d+)')     # capture group
s.str.extractall(r'(\d+)')

s.str.isdigit()    s.str.isalpha()
s.str.zfill(5)    # zero-pad to width 5
Real-world uses: cleaning scraped text · email/domain extraction · parsing structured strings in a column · standardizing casing · feature extraction with regex · phone number formatting
Alternative: df['col'].apply(lambda x: x.upper()) — works but 5–10× slower than .str. Always prefer .str for vectorized string operations.
12
DateTime Operations (.dt)
time series
setup & components
df['date'] = pd.to_datetime(df['date'])

df['date'].dt.year
df['date'].dt.month
df['date'].dt.day
df['date'].dt.hour
df['date'].dt.quarter      # 1–4
df['date'].dt.dayofweek    # 0=Mon…6=Sun
df['date'].dt.day_name()   # 'Monday'…
df['date'].dt.month_name() # 'January'…
df['date'].dt.strftime('%d-%b-%Y')
filters, math & resample
# Boolean flags
df['date'].dt.is_month_end
df['date'].dt.is_leap_year

# Date arithmetic
df['date'] + pd.Timedelta(days=7)
df['date'] + pd.DateOffset(months=1)
df['end'] - df['start']  # → Timedelta

# Filter by date
df[df['date'] > '2024-06-01']
df[df['date'].dt.year == 2024]

# Resample (time-series groupby)
df.set_index('date').resample('M').sum()
df.set_index('date').resample('W').mean()
13
GroupBy & Aggregation
split-apply-combine
basic groupby
g = df.groupby('department')

g['salary'].mean()
g['salary'].sum()
g['salary'].count()
g['salary'].max()
g['salary'].min()
g['salary'].std()
g['salary'].median()
g['salary'].nunique()

# All numeric columns
g.mean()
g.describe()
multiple aggregations — agg()
# List of agg functions
g['salary'].agg(['mean','min','max'])

# Dict — different agg per column
g.agg({
  'salary': ['mean','max'],
  'age':    'median',
  'name':   'count'
})

# Named aggregations (clean column names)
g.agg(
  avg_salary=(   'salary','mean'),
  max_salary=(   'salary','max'),
  headcount=(    'name',  'count')
)
transform — keeps original index
# Returns same shape as original df
# Perfect for adding group stats to rows
df['dept_avg'] = df.groupby('dept')['salary'].\
    transform('mean')
df['rank_in_dept'] = df.groupby('dept')['salary'].\
    transform('rank')
transform() is one of the most powerful GroupBy tools — it lets you add group-level aggregations as a new column while keeping all original rows intact.
filter & apply
# filter — keep/remove entire groups
df.groupby('dept').filter(
  lambda x: x['salary'].mean() > 50000
)

# apply — custom function on each group
def top_2(group):
  return group.nlargest(2,'salary')

df.groupby('dept').apply(top_2)

# Multiple group keys
df.groupby(['dept','gender'])['salary'].mean()
14
Merging, Joining & Concatenating
combine
pd.concat() — stack DataFrames
# Vertical (same columns)
pd.concat([df1, df2])
pd.concat([df1, df2], ignore_index=True)
pd.concat([df1, df2], keys=['2023','2024'])

# Horizontal (side by side)
pd.concat([df1, df2], axis=1)
pd.merge() — SQL-style joins
pd.merge(df1, df2, on='id')              # inner
pd.merge(df1, df2, on='id', how='left')
pd.merge(df1, df2, on='id', how='right')
pd.merge(df1, df2, on='id', how='outer')

# Different key column names
pd.merge(df1, df2, left_on='emp_id', right_on='id')

# Suffixes for overlapping columns
pd.merge(df1, df2, on='id', suffixes=('_l','_r'))

# Show merge source
pd.merge(df1, df2, on='id', how='outer', indicator=True)
join types — interactive demo
df1 (employees)
id=1 | Alice
id=2 | Bob
id=3 | Charlie
df2 (departments)
id=1 | HR
id=2 | IT
id=4 | Finance
INNER JOIN result — only matching ids
id | name | department
1 | Alice | HR
2 | Bob | IT
(id=3 Charlie and id=4 Finance dropped — no match)
When to use: concat → stack same-structure dfs · merge → SQL-style column join · join → quick index-based join
15
Pivot Tables & Reshaping
reshape
pivot_table — with aggregation
pd.pivot_table(df,
  values='salary',
  index='department',
  columns='gender',
  aggfunc='mean',
  fill_value=0
)

# Multiple aggfuncs
pd.pivot_table(df,
  values='salary',
  index='department',
  columns='gender',
  aggfunc=['mean','count']
)

# With totals row/column
pd.pivot_table(df,
  values='salary',
  index='department',
  columns='gender',
  aggfunc='mean',
  margins=True,
  margins_name='Total'
)
pivot, crosstab, melt
# pivot — simple reshape (no agg)
df.pivot(index='date',
         columns='category',
         values='value')

# crosstab — frequency table
pd.crosstab(df['dept'], df['gender'])
pd.crosstab(df['dept'], df['gender'],
           normalize=True)   # proportions

# melt — wide to long format
df.melt(
  id_vars=['name','dept'],
  value_vars=['q1','q2','q3'],
  var_name='quarter',
  value_name='sales'
)

# stack / unstack
df.stack()    # columns → rows
df.unstack()  # rows → columns
16
Apply, Map & Lambda Functions
transform
apply — column or row level
# On a Series (element-wise)
df['score'].apply(lambda x: x * 2)
df['score'].apply(
  lambda x: 'Pass' if x >= 60 else 'Fail'
)

# On entire row (axis=1)
df.apply(
  lambda row: row['score'] * 2
               if row['age'] > 25
               else row['score'],
  axis=1
)

# On entire column (axis=0)
df.apply(lambda col: col.max() - col.min())
map & replace
# map — element-wise on Series
df['gender'] = df['gender'].map({
  'M': 'Male', 'F': 'Female'
})
df['name'] = df['name'].map(str.upper)

# replace — swap values
df.replace('M', 'Male')
df.replace({'M':'Male','F':'Female'})

# DataFrame.map — element-wise on whole df
df.map(lambda x: round(x,2)
        if isinstance(x, float)
        else x)
performance — vectorized vs apply
approachspeeduse for
df['col'] * 2fastestsimple math on numeric column
np.where(cond, a, b)very fastconditional element-wise logic
df['col'].map(dict)fastvalue substitution / encoding
df['col'].str.method()faststring ops
df['col'].apply(fn)slowcomplex per-element logic
df.apply(fn, axis=1)very slowonly when unavoidable
iterrows()slowestavoid — use vectorized instead
17
Window Functions
rolling/expanding
rolling — fixed window
df['sales'].rolling(7).mean()   # 7-day MA
df['sales'].rolling(30).sum()
df['sales'].rolling(7).max()
df['sales'].rolling(7).std()

# min_periods: compute with fewer values
df['sales'].rolling(7,min_periods=1).mean()
First (window-1) values = NaN unless min_periods is set. Window slides one row at a time.
expanding & ewm
# Expanding — growing from start
df['sales'].expanding().mean()
df['sales'].expanding().sum()  # = cumsum()
df['sales'].expanding().max()

# Cumulative (shorthand)
df['sales'].cumsum()
df['sales'].cumprod()
df['sales'].cummax()

# EWMA — weights recent values more
df['sales'].ewm(span=7).mean()
df['sales'].ewm(alpha=0.3).mean()
Real-world: stock 7-day moving average · smoothing sensor noise · sales trend detection · volatility windows
18
Multi-Level Index (MultiIndex)
advanced
create
# From groupby (most common)
df_m = df.groupby(['dept','gender']).\
  agg({'salary':'mean'})

# From arrays
arrays = [['Q1','Q1','Q2','Q2'],
          ['N','S','N','S']]
idx = pd.MultiIndex.from_arrays(
  arrays, names=['quarter','region'])
df_m = pd.DataFrame({'sales':[100,200,150,250]},
                       index=idx)
access & flatten
# loc with tuples
df_m.loc['Q1']
df_m.loc[('Q1','N')]

# xs — cross section
df_m.xs('Q1', level='quarter')
df_m.xs('N', level='region')

# IndexSlice
idx = pd.IndexSlice
df_m.loc[idx['Q1','N'], :]

# Flatten to regular columns
df_m.reset_index()
df_m.columns = ['_'.join(c) for c in df_m.columns]
19
Performance Optimization
speed
memory & dtype optimization
# Category for low-cardinality strings
df['gender'] = df['gender'].astype('category')

# Smaller numeric types
df['age']   = df['age'].astype('int8')
df['score'] = df['score'].astype('float32')

# Check memory usage
df.memory_usage(deep=True)

# Read only needed columns
df = pd.read_csv('data.csv',
                 usecols=['name','age'])
speed tips
# 1. Vectorized >> apply >> iterrows
df['double'] = df['score'] * 2    # fast

# 2. query() faster on large DFs
df.query('age > 25 and score > 80')

# 3. eval() for arithmetic
df.eval('result = score * 2 + age',
        inplace=True)

# 4. Chunked reading for huge files
chunks = pd.read_csv('big.csv',
                      chunksize=10000)
result = pd.concat([process(c) for c in chunks])

# 5. If you must iterate, use itertuples
for row in df.itertuples():
    print(row.name, row.score)
20
Reading & Writing Data
I/O
reading
# CSV
pd.read_csv('file.csv')
pd.read_csv('file.csv', sep=';')
pd.read_csv('file.csv', index_col='id')
pd.read_csv('file.csv', usecols=['a','b'])
pd.read_csv('file.csv', nrows=100)
pd.read_csv('file.csv', skiprows=5)
pd.read_csv('file.csv', na_values=['NA','?'])
pd.read_csv('file.csv', dtype={'age':int})

# Excel / JSON / SQL
pd.read_excel('file.xlsx', sheet_name='Sheet1')
pd.read_json('file.json')
pd.read_sql('SELECT * FROM tbl', conn)
pd.read_clipboard()          # from Excel
pd.read_csv('https://url.com/data.csv')
writing
# CSV
df.to_csv('out.csv', index=False)
df.to_csv('out.csv', sep='\t')

# Excel — single sheet
df.to_excel('out.xlsx', index=False)
df.to_excel('out.xlsx', sheet_name='Data')

# Excel — multiple sheets
with pd.ExcelWriter('out.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Sheet1')
    df2.to_excel(writer, sheet_name='Sheet2')

# JSON / SQL
df.to_json('out.json', orient='records')
df.to_sql('tbl', conn, if_exists='replace')
21
Quick Reference — Searchable Table
all methods
methodreturnscategoryuse for
pd.DataFrame({...})DataFramecreatecreate from dict of lists
pd.read_csv('file.csv')DataFrameI/Oload CSV into DataFrame
pd.read_excel('file.xlsx')DataFrameI/Oload Excel file
df.to_csv('out.csv')NoneI/Osave to CSV
df.head() / df.tail()DataFrameexplorefirst/last N rows
df.info()Noneexplorecolumn types, non-null counts
df.describe()DataFrameexplorestatistical summary
df.shapetupleexplorenumber of rows and columns
df['col'] / df[['a','b']]Series/DFselectselect one or multiple columns
df.select_dtypes()DataFrameselectselect columns by dtype
df.loc[row, col]scalar/DFaccesslabel-based row/col access
df.iloc[i, j]scalar/DFaccessposition-based access
df.at / df.iatscalaraccessfastest single cell access
df[df['col'] > 10]DataFramefilterboolean indexing
df.query('col > 10')DataFramefilterstring-based filtering
df['col'].isin([...])bool Seriesfilterfilter by value list
df['col'].between(a, b)bool Seriesfilterinclusive range filter
df['new'] = ... / assign()DataFramemodifyadd new column
df.drop(columns=[...])DataFramemodifyremove columns or rows
df.sort_values('col')DataFramesortsort by column values
df.nlargest(n, 'col')DataFramesorttop N rows by column
df.rename(columns={...})DataFramerenamerename columns
df.set_index('col')DataFrameindexset column as row index
df.reset_index()DataFrameindexreset to default integer index
df.isnull() / df.isna()bool DFmissingdetect NaN values
df.dropna()DataFramemissingremove rows with NaN
df.fillna(val)DataFramemissingreplace NaN with value
df.ffill() / df.bfill()DataFramemissingforward/backward fill
df.astype(dtype)DataFrametypeschange column data type
pd.to_numeric(df['col'])Seriestypessafely convert to number
pd.to_datetime(df['col'])Seriesdatetimeconvert to datetime
df['date'].dt.yearSeriesdatetimeextract datetime components
df.resample('M').sum()DataFramedatetimetime-frequency aggregation
df['col'].str.upper()Seriesstringsvectorized string ops
df['col'].str.contains()bool Seriesstringsstring/regex search
df.groupby('col').agg()DataFramegroupbygroup and aggregate
groupby.transform()Seriesgroupbygroup stats keeping all rows
groupby.filter(fn)DataFramegroupbykeep/drop entire groups
pd.concat([df1, df2])DataFramecombinestack DataFrames vertically
pd.merge(df1, df2, on='id')DataFramecombineSQL-style join on column
pd.pivot_table(df,...)DataFramepivotpivot with aggregation
df.melt(id_vars=[...])DataFramepivotwide to long format
df['col'].apply(fn)Seriesapplycustom per-element function
df['col'].map(dict)Seriesapplyvalue substitution via dict
df['col'].rolling(7).mean()Serieswindowrolling window stats
df['col'].expanding().sum()Serieswindowgrowing window (cumulative)
df['col'].cumsum()Serieswindowrunning cumulative total
df['col'].ewm(span=7).mean()Serieswindowexponentially weighted avg
df.drop_duplicates()DataFramecleanremove duplicate rows
df['col'].value_counts()Seriesexplorefrequency of each value
pd.MultiIndex.from_arrays()MultiIndexadvancedcreate hierarchical index
col.astype('category')Seriesperformancememory-efficient string type
df['col'].interpolate()Seriesmissingfill gaps by interpolation
No methods found for that query