Pandas Series — Complete Cheatsheet

1

Creating a Series

foundation

from list (most common)

import pandas as pd

s = pd.Series([10, 20, 30],
             index=['a','b','c'],
             name='scores')

Index is optional — defaults to 0, 1, 2. name becomes the column header when converting to DataFrame.

a

10

b

20

c

30

from dict

s = pd.Series({'a': 10, 'b': 20, 'c': 30})

Dict keys become the index automatically. Clean for labeled data.

from scalar (broadcast)

s = pd.Series(5, index=['a','b','c'])

Fills same value for every label. Great for placeholders or default flags.

from numpy array

import numpy as np
s = pd.Series(np.array([1.1, 2.2, 3.3]))

when is Series useful?

single column of data time series (one metric) labeled 1D array row/column from DataFrame

Alternatives: Python list (no labels, no vectorized math) · NumPy array (faster but no labels) · dict (labels but no .mean()/.sum())

2

Properties & Attributes

introspection

all properties

s.values   # NumPy array underneath
s.index    # Index(['a','b','c'])
s.dtype    # dtype('int64')
s.name     # 'scores'
s.shape    # (3,) — always 1D
s.size     # 3 — same as len(s)
s.ndim     # 1 — always for Series
s.empty    # True if len == 0
s.hasnans  # True if any NaN

interactive — click a property

s = pd.Series([10, 20, 30], index=['a','b','c'], name='scores')

Click a property above to inspect it

3

Accessing Elements

selection

by label

s['a']              # → 10
s[['a','c']]         # → Series
s.loc['a']          # explicit label
s.loc['a':'c']      # label slice (inclusive)
s.at['a']           # fastest single

Use when index has meaningful labels. .loc slice is INCLUSIVE on both ends — unlike Python slicing.

by position (.iloc)

s.iloc[0]           # first → 10
s.iloc[-1]          # last element
s.iloc[0:2]         # first 2 (exclusive end)
s.iloc[[0,2]]       # 1st and 3rd
s.iat[0]            # fastest single

Always position-based. Safe even when index is non-integer. Like Python list slicing (end exclusive).

boolean filtering — interactive

s[s > 20]
s[(s>10) & (s<40)]
s[s.isin([10,30])]

threshold > 20

a: 10 b: 20 c: 30 d: 40

showing values > 20: c(30), d(40)

access methods compared

method	based on	slice end	use when
[]	label	exclusive	named index
.loc[]	label	inclusive	label slicing
.iloc[]	position	exclusive	always safe
.at[]	label	single	speed critical
.iat[]	position	single	speed critical

Avoid chained indexing: s[s>20][0] — use s.loc[s>20].iloc[0] to prevent SettingWithCopyWarning.

4

Math Operations

vectorized

element-wise (returns Series)

s + 10          # add to every element
s * 2           # multiply each
s ** 2          # square each
np.sqrt(s)     # numpy ufunc works!
np.log(s)      # natural log
s1 + s2        # aligns by index

When adding two Series, pandas aligns by index first. Mismatched labels produce NaN — always check with s.isna().sum() after arithmetic.

multiply demo — drag slider

multiply by 2×

s × 2 = [20, 40, 60]

a

20

b

40

c

60

s.sum()   s.mean()  s.median()
s.min()   s.max()   s.std()
s.var()   s.prod()  s.sem()

5

Handling Missing Values (NaN)

data quality

detect

s.isna()         # True where NaN
s.notna()        # True where not NaN
s.isna().sum()  # count of NaNs
s.isna().any()  # any missing?
s.isna().all()  # all missing?

fix

s.dropna()           # remove NaN rows
s.fillna(0)           # replace with 0
s.fillna(s.mean())   # impute with mean
s.ffill()             # forward fill
s.bfill()             # backward fill
s.interpolate()       # linear fill
s.interpolate(method='cubic')

strategy guide

dropna

When missing rows have no meaning. Survey skips, optional fields.

fillna(mean)

ML preprocessing — preserves row count, neutral assumption.

ffill

Time series — stock prices on weekends, sensor gaps.

interpolate

Smooth series — temperature, curves, between known points.

6

String Operations (.str)

text data

case & clean

s.str.upper()      # "hello" → "HELLO"
s.str.lower()      # "HELLO" → "hello"
s.str.title()      # "hello world" → "Hello World"
s.str.strip()      # remove surrounding spaces
s.str.lstrip()     # left only
s.str.rstrip()     # right only
s.str.len()        # length of each string

search & match

s.str.contains('py')       # bool
s.str.startswith('A')     # bool
s.str.endswith('ing')    # bool
s.str.match(r'^\d+')     # regex at start
s.str.fullmatch(r'\d+') # full match
s.str.count('l')        # count occurrences

split & extract

s.str.split(',')              # → list per row
s.str.split(',', expand=True) # → DataFrame
s.str.extract(r'(\d+)')       # capture group
s.str.extractall(r'(\d+)')   # all matches
s.str[0:3]                    # slice chars

replace & real-world uses

s.str.replace('a','A')
s.str.replace(r'\d','X', regex=True)

cleaning scraped text email domain extraction parsing CSV in a column standardizing casing regex feature extraction

Alternative: .apply(lambda x: x.upper()) works but is 5–10× slower than the vectorized .str accessor.

7

DateTime Operations (.dt)

time series

setup & components

s = pd.to_datetime([
  '2024-01-15', '2024-06-30'])

s.dt.year         # [2024, 2024]
s.dt.month        # [1, 6]
s.dt.day          # [15, 30]
s.dt.hour         # hour of day
s.dt.minute       # minute
s.dt.dayofweek   # 0=Mon … 6=Sun
s.dt.day_name()  # 'Monday'…
s.dt.quarter     # 1–4

boolean & advanced

s.dt.is_month_end    # bool
s.dt.is_month_start  # bool
s.dt.is_leap_year    # bool
s.dt.is_quarter_end  # bool

# Timezone ops
s.dt.tz_localize('UTC')
s.dt.tz_convert('Asia/Kolkata')

# Formatting
s.dt.strftime('%d-%b-%Y')

Must run pd.to_datetime() first. Then .dt unlocks 30+ time-aware properties. Ideal for groupby('month'), resampling, and filtering weekdays.

8

Descriptive Statistics

explore

all stat methods

s.describe()        # full summary
s.mean()   s.median()
s.std()    s.var()
s.min()    s.max()
s.skew()             # distribution skew
s.kurt()             # kurtosis (peakedness)
s.sem()              # standard error of mean
s.mad()              # mean absolute deviation
s.quantile(0.25)    # 25th percentile
s.quantile([.25,.5,.75])

describe() output explained

stat	value	meaning
count	4.0	non-null rows
mean	25.0	arithmetic average
std	12.9	spread around mean
min	10.0	smallest value
25%	17.5	Q1 — lower quartile
50%	25.0	median
75%	32.5	Q3 — upper quartile
max	40.0	largest value

9

Value Counts & Unique

frequency

methods

s.value_counts()             # freq table
s.value_counts(normalize=True) # proportions
s.value_counts(dropna=False)  # include NaN
s.value_counts(bins=4)        # histogram bins
s.unique()                    # array of uniques
s.nunique()                   # count of uniques
s.mode()                      # most common value(s)
s.mode()[0]                   # top mode

live frequency chart

s = ['cat','dog','cat','bird','dog','cat'] → value_counts()

cat

3

dog

2

bird

1

Real-world: count category frequencies · check class imbalance before ML · audit data quality · find mode values

10

Rank & Cumulative

running ops

ranking

s.rank()                 # 1-based rank
s.rank(ascending=False)  # highest = 1
s.rank(method='dense')  # no gaps
s.rank(method='min')    # tie = min rank
s.rank(pct=True)        # 0–1 percentile

Rank methods: 'average' (default), 'min', 'max', 'first' (order of appearance), 'dense' (no gaps for ties).

cumulative — visual flow

s.cumsum()   # running total
s.cumprod()  # running product
s.cummax()   # running maximum
s.cummin()   # running minimum

cumsum([10, 20, 30, 40])

step 1

10

+20→

step 2

30

+30→

step 3

60

+40→

step 4

100

11

Rolling & Expanding

window ops

rolling window (fixed size)

s.rolling(3).mean()   # 3-point moving avg
s.rolling(3).sum()
s.rolling(3).min()
s.rolling(3).std()
s.rolling(3).max()
s.rolling(3, min_periods=1).mean()

First (window−1) values are NaN unless min_periods is set. Window slides one step at a time across the Series.

expanding window (grows from start)

s.expanding().mean()  # growing average
s.expanding().sum()   # = cumsum()
s.expanding().max()
s.expanding(2).std()  # min 2 periods

Like rolling but window grows — always uses all data up to the current point.

Real-world: stock moving average · noise smoothing · detecting trend shifts · volatility windows · sales 7-day average

12

Compare & Filter

conditions

comparison methods

s.between(10, 30)      # 10 ≤ x ≤ 30 → bool
s.isin([10, 30])       # exact match → bool
s.clip(lower=10, upper=30) # clamp range
s.where(s > 20)        # keep, else NaN
s.mask(s > 20)         # NaN where True
s.gt(20)  s.lt(20)      # .gt .lt .ge .le .eq

clip() — clamping demo

s.clip(lower=15, upper=35)

value

clip result

why

5

15

below lower

20

in range

30

in range

50

35

above upper

where vs mask: .where keeps values where condition is True (NaN elsewhere). .mask is the opposite — NaN where True.

13

Map, Replace & Apply

transform

map — element lookup

s.map({1:'low', 2:'mid', 3:'high'})
s.map(lambda x: x * 10)
s.map(str)  # convert each to string

Dict lookup: unmatched keys become NaN. Function form never produces NaN from matching.

replace & apply

s.replace(10, 99)            # exact swap
s.replace({10:99, 20:88})   # multi-swap
s.replace(r'\d+', 'N', regex=True)
s.apply(lambda x: x ** 2)    # custom fn
s.apply(np.log)               # numpy fn

when to use which

method	input	unmatched	use for
.map(dict)	dict	NaN	label encoding, lookup tables
.map(fn)	function	—	per-element transform
.replace()	dict/value	unchanged	fix specific values
.apply(fn)	function	—	complex logic, any return type

14

Duplicate Handling

dedup

detect & remove

s.duplicated()           # bool mask
s.duplicated(keep='last')
s.duplicated(keep=False)  # mark ALL dupes
s.drop_duplicates()        # remove dupes
s.drop_duplicates(keep='last')
s[~s.duplicated()]         # filter unique

keep options explained

'first' (default)

Keep first occurrence, mark the rest as duplicate (True).

'last'

Keep last occurrence, mark earlier ones as duplicate.

False

Mark ALL occurrences as True — even the first one.

15

Shift, Diff & Pct Change

lag ops

code

s.shift(1)       # lag by 1 (prev row)
s.shift(-1)      # lead by 1 (next row)
s.diff()         # s − s.shift(1)
s.diff(2)        # 2-period difference
s.pct_change()   # (s−prev)/prev → %
s.pct_change(7) # 7-day % change

shift(1) visualized

ORIGINAL

0 10
1 20
2 30
3 40

AFTER shift(1)

0 NaN
1 10
2 20
3 30

Use cases: daily return = pct_change() · compare to yesterday = shift(1) · autocorrelation · lag features for ML

16

Index Operations

indexing

modify index

s.index = ['x','y','z']   # direct set
s.rename({'a':'A'})    # rename labels
s.rename(index=str.upper) # via function
s.reset_index()           # → DataFrame
s.reset_index(drop=True)  # keep as Series
s.set_axis(['p','q','r'])

reindex & align

s.reindex(['a','b','d'])    # new index
s.reindex(fill_value=0)      # fill missing
s.reindex(method='ffill')   # forward fill
s1.align(s2)                  # align two Series

reindex is great for making two Series share the same index before doing math — avoids NaN from misaligned operations.

17

Sorting

order

sort by values

s.sort_values()              # asc by default
s.sort_values(ascending=False)
s.sort_values(na_position='first')
s.nlargest(3)    # top 3 (fast!)
s.nsmallest(3)  # bottom 3 (fast!)

sort by index

s.sort_index()             # A→Z or 0→n
s.sort_index(ascending=False)
s.sort_index(na_position='last')

All sort methods return a new Series — original unchanged. Use inplace=True to modify in place (generally discouraged in modern pandas).

nlargest / nsmallest are faster than sort_values().tail(n) for large Series.

18

Combining Series

merge

concat — stacking

pd.concat([s1, s2])          # stack rows
pd.concat([s1, s2], axis=1)  # side by side
pd.concat([s1, s2],
          ignore_index=True)   # reindex 0,1,2…
pd.concat([s1, s2], keys=['a','b']) # multi-index

combine — smart merge

s1.combine(s2, max)          # element-wise max
s1.combine_first(s2)          # fill s1's NaN
                              # with values from s2

combine_first is perfect for merging two partial datasets where each has gaps the other fills.

Alternatives: np.where(s1.isna(), s2, s1) · pd.DataFrame({'a':s1,'b':s2}) for side-by-side comparison

19

Type Conversion & Export

convert

type conversion

s.astype(float)           # to float64
s.astype('Int64')         # nullable int
s.astype(str)             # to string (object)
s.astype('category')      # memory efficient
s.astype('datetime64[ns]')
pd.to_numeric(s)          # safe conversion
pd.to_numeric(s, errors='coerce') # NaN on fail

export

s.to_list()      # Python list
s.to_dict()      # {index: value}
s.to_frame()     # 1-column DataFrame
s.to_numpy()     # NumPy array
s.to_csv('f.csv')
s.to_json()      # JSON string
s.to_clipboard() # copy to clipboard

Use 'category' dtype when a Series has few unique values — saves up to 90% memory compared to object dtype.

20

Quick Reference — Searchable Table

all methods

method	returns	category	use for
pd.Series([...])	Series	create	create from list/dict/scalar
s['a'] / s.loc['a']	scalar/Series	access	label-based access
s.iloc[0] / s.iat[0]	scalar/Series	access	position-based access
s.isna() / s.notna()	bool Series	missing	detect NaN values
s.fillna(val)	Series	missing	replace NaN with value
s.dropna()	Series	missing	remove NaN rows
s.ffill() / s.bfill()	Series	missing	forward/backward fill
s.interpolate()	Series	missing	linear fill between values
s.sum() / s.mean()	scalar	math	aggregate statistics
s.describe()	Series	stats	full summary statistics
s.std() / s.var()	float	stats	spread measurement
s.skew() / s.kurt()	float	stats	distribution shape
s.quantile(0.25)	float	stats	percentile values
s.value_counts()	Series	frequency	frequency table
s.unique() / s.nunique()	array / int	frequency	distinct values
s.mode()	Series	frequency	most common value(s)
s.sort_values()	Series	sort	sort by value
s.sort_index()	Series	sort	sort by index label
s.nlargest(n)	Series	sort	top N values (fast)
s.str.upper()	Series	strings	vectorized string ops
s.str.contains()	bool Series	strings	string/regex search
s.str.split() / .extract()	Series/DF	strings	split or regex extract
s.dt.year / s.dt.month	Series	datetime	datetime extraction
s.dt.strftime(fmt)	Series	datetime	format datetime as string
s.rolling(n).mean()	Series	rolling	moving average/stats
s.expanding().mean()	Series	rolling	growing window stats
s.cumsum() / s.cumprod()	Series	cumulative	running total/product
s.rank()	Series	rank	rank each value
s.shift(1)	Series	shift	lag/lead values
s.diff() / s.pct_change()	Series	shift	period change / returns
s.between(lo, hi)	bool Series	filter	range filter
s.isin([...])	bool Series	filter	set membership test
s.clip(lower, upper)	Series	filter	clamp to range
s.where(cond)	Series	filter	keep where True, else NaN
s.map(dict/fn)	Series	transform	element lookup/transform
s.replace(old, new)	Series	transform	swap specific values
s.apply(fn)	Series	transform	custom function per element
s.drop_duplicates()	Series	dedup	remove duplicate values
s.reset_index()	DataFrame	index	reset to 0,1,2…
s.reindex([...])	Series	index	conform to new index
pd.concat([s1, s2])	Series	combine	stack or join series
s1.combine_first(s2)	Series	combine	fill NaN from another Series
s.astype(dtype)	Series	convert	change data type
s.to_list() / to_frame()	list/DF	export	convert to Python/pandas

No methods found for that query

Pandas SeriesCheatsheet

Pandas Series
Cheatsheet