pandas · series · complete reference

Pandas Series
Cheatsheet

Go to DataFrames

Every method · explained · with alternatives · animated & interactive

Create Access Math Missing Strings DateTime Stats Rolling
1
Creating a Series
foundation
from list (most common)
import pandas as pd

s = pd.Series([10, 20, 30],
             index=['a','b','c'],
             name='scores')
Index is optional — defaults to 0, 1, 2. name becomes the column header when converting to DataFrame.
a
10
b
20
c
30
from dict
s = pd.Series({'a': 10, 'b': 20, 'c': 30})
Dict keys become the index automatically. Clean for labeled data.
from scalar (broadcast)
s = pd.Series(5, index=['a','b','c'])
Fills same value for every label. Great for placeholders or default flags.
from numpy array
import numpy as np
s = pd.Series(np.array([1.1, 2.2, 3.3]))
when is Series useful?
single column of data time series (one metric) labeled 1D array row/column from DataFrame
Alternatives: Python list (no labels, no vectorized math) · NumPy array (faster but no labels) · dict (labels but no .mean()/.sum())
2
Properties & Attributes
introspection
all properties
s.values   # NumPy array underneath
s.index    # Index(['a','b','c'])
s.dtype    # dtype('int64')
s.name     # 'scores'
s.shape    # (3,) — always 1D
s.size     # 3 — same as len(s)
s.ndim     # 1 — always for Series
s.empty    # True if len == 0
s.hasnans  # True if any NaN
interactive — click a property
s = pd.Series([10, 20, 30], index=['a','b','c'], name='scores')
Click a property above to inspect it
3
Accessing Elements
selection
by label
s['a']              # → 10
s[['a','c']]         # → Series
s.loc['a']          # explicit label
s.loc['a':'c']      # label slice (inclusive)
s.at['a']           # fastest single
Use when index has meaningful labels. .loc slice is INCLUSIVE on both ends — unlike Python slicing.
by position (.iloc)
s.iloc[0]           # first → 10
s.iloc[-1]          # last element
s.iloc[0:2]         # first 2 (exclusive end)
s.iloc[[0,2]]       # 1st and 3rd
s.iat[0]            # fastest single
Always position-based. Safe even when index is non-integer. Like Python list slicing (end exclusive).
boolean filtering — interactive
s[s > 20]
s[(s>10) & (s<40)]
s[s.isin([10,30])]
20
a: 10 b: 20 c: 30 d: 40
showing values > 20: c(30), d(40)
access methods compared
methodbased onslice enduse when
[]labelexclusivenamed index
.loc[]labelinclusivelabel slicing
.iloc[]positionexclusivealways safe
.at[]labelsinglespeed critical
.iat[]positionsinglespeed critical

Avoid chained indexing: s[s>20][0] — use s.loc[s>20].iloc[0] to prevent SettingWithCopyWarning.

4
Math Operations
vectorized
element-wise (returns Series)
s + 10          # add to every element
s * 2           # multiply each
s ** 2          # square each
np.sqrt(s)     # numpy ufunc works!
np.log(s)      # natural log
s1 + s2        # aligns by index

When adding two Series, pandas aligns by index first. Mismatched labels produce NaN — always check with s.isna().sum() after arithmetic.

multiply demo — drag slider
s × 2 = [20, 40, 60]
a
20
b
40
c
60
s.sum()   s.mean()  s.median()
s.min()   s.max()   s.std()
s.var()   s.prod()  s.sem()
5
Handling Missing Values (NaN)
data quality
detect
s.isna()         # True where NaN
s.notna()        # True where not NaN
s.isna().sum()  # count of NaNs
s.isna().any()  # any missing?
s.isna().all()  # all missing?
fix
s.dropna()           # remove NaN rows
s.fillna(0)           # replace with 0
s.fillna(s.mean())   # impute with mean
s.ffill()             # forward fill
s.bfill()             # backward fill
s.interpolate()       # linear fill
s.interpolate(method='cubic')
strategy guide
dropna
When missing rows have no meaning. Survey skips, optional fields.
fillna(mean)
ML preprocessing — preserves row count, neutral assumption.
ffill
Time series — stock prices on weekends, sensor gaps.
interpolate
Smooth series — temperature, curves, between known points.
6
String Operations (.str)
text data
case & clean
s.str.upper()      # "hello" → "HELLO"
s.str.lower()      # "HELLO" → "hello"
s.str.title()      # "hello world" → "Hello World"
s.str.strip()      # remove surrounding spaces
s.str.lstrip()     # left only
s.str.rstrip()     # right only
s.str.len()        # length of each string
search & match
s.str.contains('py')       # bool
s.str.startswith('A')     # bool
s.str.endswith('ing')    # bool
s.str.match(r'^\d+')     # regex at start
s.str.fullmatch(r'\d+') # full match
s.str.count('l')        # count occurrences
split & extract
s.str.split(',')              # → list per row
s.str.split(',', expand=True) # → DataFrame
s.str.extract(r'(\d+)')       # capture group
s.str.extractall(r'(\d+)')   # all matches
s.str[0:3]                    # slice chars
replace & real-world uses
s.str.replace('a','A')
s.str.replace(r'\d','X', regex=True)
cleaning scraped text email domain extraction parsing CSV in a column standardizing casing regex feature extraction
Alternative: .apply(lambda x: x.upper()) works but is 5–10× slower than the vectorized .str accessor.
7
DateTime Operations (.dt)
time series
setup & components
s = pd.to_datetime([
  '2024-01-15', '2024-06-30'])

s.dt.year         # [2024, 2024]
s.dt.month        # [1, 6]
s.dt.day          # [15, 30]
s.dt.hour         # hour of day
s.dt.minute       # minute
s.dt.dayofweek   # 0=Mon … 6=Sun
s.dt.day_name()  # 'Monday'…
s.dt.quarter     # 1–4
boolean & advanced
s.dt.is_month_end    # bool
s.dt.is_month_start  # bool
s.dt.is_leap_year    # bool
s.dt.is_quarter_end  # bool

# Timezone ops
s.dt.tz_localize('UTC')
s.dt.tz_convert('Asia/Kolkata')

# Formatting
s.dt.strftime('%d-%b-%Y')

Must run pd.to_datetime() first. Then .dt unlocks 30+ time-aware properties. Ideal for groupby('month'), resampling, and filtering weekdays.

8
Descriptive Statistics
explore
all stat methods
s.describe()        # full summary
s.mean()   s.median()
s.std()    s.var()
s.min()    s.max()
s.skew()             # distribution skew
s.kurt()             # kurtosis (peakedness)
s.sem()              # standard error of mean
s.mad()              # mean absolute deviation
s.quantile(0.25)    # 25th percentile
s.quantile([.25,.5,.75])
describe() output explained
statvaluemeaning
count4.0non-null rows
mean25.0arithmetic average
std12.9spread around mean
min10.0smallest value
25%17.5Q1 — lower quartile
50%25.0median
75%32.5Q3 — upper quartile
max40.0largest value
9
Value Counts & Unique
frequency
methods
s.value_counts()             # freq table
s.value_counts(normalize=True) # proportions
s.value_counts(dropna=False)  # include NaN
s.value_counts(bins=4)        # histogram bins
s.unique()                    # array of uniques
s.nunique()                   # count of uniques
s.mode()                      # most common value(s)
s.mode()[0]                   # top mode
live frequency chart
s = ['cat','dog','cat','bird','dog','cat'] → value_counts()
cat
3
dog
2
bird
1
Real-world: count category frequencies · check class imbalance before ML · audit data quality · find mode values
10
Rank & Cumulative
running ops
ranking
s.rank()                 # 1-based rank
s.rank(ascending=False)  # highest = 1
s.rank(method='dense')  # no gaps
s.rank(method='min')    # tie = min rank
s.rank(pct=True)        # 0–1 percentile
Rank methods: 'average' (default), 'min', 'max', 'first' (order of appearance), 'dense' (no gaps for ties).
cumulative — visual flow
s.cumsum()   # running total
s.cumprod()  # running product
s.cummax()   # running maximum
s.cummin()   # running minimum
cumsum([10, 20, 30, 40])
step 1
10
+20→
step 2
30
+30→
step 3
60
+40→
step 4
100
11
Rolling & Expanding
window ops
rolling window (fixed size)
s.rolling(3).mean()   # 3-point moving avg
s.rolling(3).sum()
s.rolling(3).min()
s.rolling(3).std()
s.rolling(3).max()
s.rolling(3, min_periods=1).mean()
First (window−1) values are NaN unless min_periods is set. Window slides one step at a time across the Series.
expanding window (grows from start)
s.expanding().mean()  # growing average
s.expanding().sum()   # = cumsum()
s.expanding().max()
s.expanding(2).std()  # min 2 periods
Like rolling but window grows — always uses all data up to the current point.
Real-world: stock moving average · noise smoothing · detecting trend shifts · volatility windows · sales 7-day average
12
Compare & Filter
conditions
comparison methods
s.between(10, 30)      # 10 ≤ x ≤ 30 → bool
s.isin([10, 30])       # exact match → bool
s.clip(lower=10, upper=30) # clamp range
s.where(s > 20)        # keep, else NaN
s.mask(s > 20)         # NaN where True
s.gt(20)  s.lt(20)      # .gt .lt .ge .le .eq
clip() — clamping demo
s.clip(lower=15, upper=35)
value
clip result
why
5
15
below lower
20
20
in range
30
30
in range
50
35
above upper
where vs mask: .where keeps values where condition is True (NaN elsewhere). .mask is the opposite — NaN where True.
13
Map, Replace & Apply
transform
map — element lookup
s.map({1:'low', 2:'mid', 3:'high'})
s.map(lambda x: x * 10)
s.map(str)  # convert each to string
Dict lookup: unmatched keys become NaN. Function form never produces NaN from matching.
replace & apply
s.replace(10, 99)            # exact swap
s.replace({10:99, 20:88})   # multi-swap
s.replace(r'\d+', 'N', regex=True)
s.apply(lambda x: x ** 2)    # custom fn
s.apply(np.log)               # numpy fn
when to use which
methodinputunmatcheduse for
.map(dict)dictNaNlabel encoding, lookup tables
.map(fn)functionper-element transform
.replace()dict/valueunchangedfix specific values
.apply(fn)functioncomplex logic, any return type
14
Duplicate Handling
dedup
detect & remove
s.duplicated()           # bool mask
s.duplicated(keep='last')
s.duplicated(keep=False)  # mark ALL dupes
s.drop_duplicates()        # remove dupes
s.drop_duplicates(keep='last')
s[~s.duplicated()]         # filter unique
keep options explained
'first' (default)
Keep first occurrence, mark the rest as duplicate (True).
'last'
Keep last occurrence, mark earlier ones as duplicate.
False
Mark ALL occurrences as True — even the first one.
15
Shift, Diff & Pct Change
lag ops
code
s.shift(1)       # lag by 1 (prev row)
s.shift(-1)      # lead by 1 (next row)
s.diff()         # s − s.shift(1)
s.diff(2)        # 2-period difference
s.pct_change()   # (s−prev)/prev → %
s.pct_change(7) # 7-day % change
shift(1) visualized
ORIGINAL
0 10
1 20
2 30
3 40
AFTER shift(1)
0 NaN
1 10
2 20
3 30
Use cases: daily return = pct_change() · compare to yesterday = shift(1) · autocorrelation · lag features for ML
16
Index Operations
indexing
modify index
s.index = ['x','y','z']   # direct set
s.rename({'a':'A'})    # rename labels
s.rename(index=str.upper) # via function
s.reset_index()           # → DataFrame
s.reset_index(drop=True)  # keep as Series
s.set_axis(['p','q','r'])
reindex & align
s.reindex(['a','b','d'])    # new index
s.reindex(fill_value=0)      # fill missing
s.reindex(method='ffill')   # forward fill
s1.align(s2)                  # align two Series

reindex is great for making two Series share the same index before doing math — avoids NaN from misaligned operations.

17
Sorting
order
sort by values
s.sort_values()              # asc by default
s.sort_values(ascending=False)
s.sort_values(na_position='first')
s.nlargest(3)    # top 3 (fast!)
s.nsmallest(3)  # bottom 3 (fast!)
sort by index
s.sort_index()             # A→Z or 0→n
s.sort_index(ascending=False)
s.sort_index(na_position='last')
All sort methods return a new Series — original unchanged. Use inplace=True to modify in place (generally discouraged in modern pandas).
nlargest / nsmallest are faster than sort_values().tail(n) for large Series.
18
Combining Series
merge
concat — stacking
pd.concat([s1, s2])          # stack rows
pd.concat([s1, s2], axis=1)  # side by side
pd.concat([s1, s2],
          ignore_index=True)   # reindex 0,1,2…
pd.concat([s1, s2], keys=['a','b']) # multi-index
combine — smart merge
s1.combine(s2, max)          # element-wise max
s1.combine_first(s2)          # fill s1's NaN
                              # with values from s2
combine_first is perfect for merging two partial datasets where each has gaps the other fills.
Alternatives: np.where(s1.isna(), s2, s1) · pd.DataFrame({'a':s1,'b':s2}) for side-by-side comparison
19
Type Conversion & Export
convert
type conversion
s.astype(float)           # to float64
s.astype('Int64')         # nullable int
s.astype(str)             # to string (object)
s.astype('category')      # memory efficient
s.astype('datetime64[ns]')
pd.to_numeric(s)          # safe conversion
pd.to_numeric(s, errors='coerce') # NaN on fail
export
s.to_list()      # Python list
s.to_dict()      # {index: value}
s.to_frame()     # 1-column DataFrame
s.to_numpy()     # NumPy array
s.to_csv('f.csv')
s.to_json()      # JSON string
s.to_clipboard() # copy to clipboard
Use 'category' dtype when a Series has few unique values — saves up to 90% memory compared to object dtype.
20
Quick Reference — Searchable Table
all methods
methodreturnscategoryuse for
pd.Series([...])Seriescreatecreate from list/dict/scalar
s['a'] / s.loc['a']scalar/Seriesaccesslabel-based access
s.iloc[0] / s.iat[0]scalar/Seriesaccessposition-based access
s.isna() / s.notna()bool Seriesmissingdetect NaN values
s.fillna(val)Seriesmissingreplace NaN with value
s.dropna()Seriesmissingremove NaN rows
s.ffill() / s.bfill()Seriesmissingforward/backward fill
s.interpolate()Seriesmissinglinear fill between values
s.sum() / s.mean()scalarmathaggregate statistics
s.describe()Seriesstatsfull summary statistics
s.std() / s.var()floatstatsspread measurement
s.skew() / s.kurt()floatstatsdistribution shape
s.quantile(0.25)floatstatspercentile values
s.value_counts()Seriesfrequencyfrequency table
s.unique() / s.nunique()array / intfrequencydistinct values
s.mode()Seriesfrequencymost common value(s)
s.sort_values()Seriessortsort by value
s.sort_index()Seriessortsort by index label
s.nlargest(n)Seriessorttop N values (fast)
s.str.upper()Seriesstringsvectorized string ops
s.str.contains()bool Seriesstringsstring/regex search
s.str.split() / .extract()Series/DFstringssplit or regex extract
s.dt.year / s.dt.monthSeriesdatetimedatetime extraction
s.dt.strftime(fmt)Seriesdatetimeformat datetime as string
s.rolling(n).mean()Seriesrollingmoving average/stats
s.expanding().mean()Seriesrollinggrowing window stats
s.cumsum() / s.cumprod()Seriescumulativerunning total/product
s.rank()Seriesrankrank each value
s.shift(1)Seriesshiftlag/lead values
s.diff() / s.pct_change()Seriesshiftperiod change / returns
s.between(lo, hi)bool Seriesfilterrange filter
s.isin([...])bool Seriesfilterset membership test
s.clip(lower, upper)Seriesfilterclamp to range
s.where(cond)Seriesfilterkeep where True, else NaN
s.map(dict/fn)Seriestransformelement lookup/transform
s.replace(old, new)Seriestransformswap specific values
s.apply(fn)Seriestransformcustom function per element
s.drop_duplicates()Seriesdedupremove duplicate values
s.reset_index()DataFrameindexreset to 0,1,2…
s.reindex([...])Seriesindexconform to new index
pd.concat([s1, s2])Seriescombinestack or join series
s1.combine_first(s2)Seriescombinefill NaN from another Series
s.astype(dtype)Seriesconvertchange data type
s.to_list() / to_frame()list/DFexportconvert to Python/pandas
No methods found for that query