import pandas as pd
s = pd.Series([10, 20, 30],
index=['a','b','c'],
name='scores')
Index is optional — defaults to 0, 1, 2. name becomes the column header when converting to DataFrame.
a
10
b
20
c
30
from dict
s = pd.Series({'a': 10, 'b': 20, 'c': 30})
Dict keys become the index automatically. Clean for labeled data.
from scalar (broadcast)
s = pd.Series(5, index=['a','b','c'])
Fills same value for every label. Great for placeholders or default flags.
from numpy array
import numpy as np
s = pd.Series(np.array([1.1, 2.2, 3.3]))
when is Series useful?
single column of datatime series (one metric)labeled 1D arrayrow/column from DataFrame
Alternatives: Python list (no labels, no vectorized math) · NumPy array (faster but no labels) · dict (labels but no .mean()/.sum())
2
Properties & Attributes
introspection
all properties
s.values# NumPy array underneath
s.index# Index(['a','b','c'])
s.dtype# dtype('int64')
s.name# 'scores'
s.shape# (3,) — always 1D
s.size# 3 — same as len(s)
s.ndim# 1 — always for Series
s.empty# True if len == 0
s.hasnans# True if any NaN
interactive — click a property
s = pd.Series([10, 20, 30], index=['a','b','c'], name='scores')
Use when index has meaningful labels. .loc slice is INCLUSIVE on both ends — unlike Python slicing.
by position (.iloc)
s.iloc[0] # first → 10
s.iloc[-1] # last element
s.iloc[0:2] # first 2 (exclusive end)
s.iloc[[0,2]] # 1st and 3rd
s.iat[0] # fastest single
Always position-based. Safe even when index is non-integer. Like Python list slicing (end exclusive).
boolean filtering — interactive
s[s > 20]
s[(s>10) & (s<40)]
s[s.isin([10,30])]
20
a: 10b: 20c: 30d: 40
showing values > 20: c(30), d(40)
access methods compared
method
based on
slice end
use when
[]
label
exclusive
named index
.loc[]
label
inclusive
label slicing
.iloc[]
position
exclusive
always safe
.at[]
label
single
speed critical
.iat[]
position
single
speed critical
Avoid chained indexing: s[s>20][0] — use s.loc[s>20].iloc[0] to prevent SettingWithCopyWarning.
4
Math Operations
vectorized
element-wise (returns Series)
s + 10# add to every element
s * 2# multiply each
s ** 2# square each
np.sqrt(s) # numpy ufunc works!
np.log(s) # natural log
s1 + s2 # aligns by index
When adding two Series, pandas aligns by index first. Mismatched labels produce NaN — always check with s.isna().sum() after arithmetic.
s.isna() # True where NaN
s.notna() # True where not NaN
s.isna().sum() # count of NaNs
s.isna().any() # any missing?
s.isna().all() # all missing?
fix
s.dropna() # remove NaN rows
s.fillna(0) # replace with 0
s.fillna(s.mean()) # impute with mean
s.ffill() # forward fill
s.bfill() # backward fill
s.interpolate() # linear fill
s.interpolate(method='cubic')
strategy guide
dropna
When missing rows have no meaning. Survey skips, optional fields.
fillna(mean)
ML preprocessing — preserves row count, neutral assumption.
ffill
Time series — stock prices on weekends, sensor gaps.
interpolate
Smooth series — temperature, curves, between known points.
6
String Operations (.str)
text data
case & clean
s.str.upper() # "hello" → "HELLO"
s.str.lower() # "HELLO" → "hello"
s.str.title() # "hello world" → "Hello World"
s.str.strip() # remove surrounding spaces
s.str.lstrip() # left only
s.str.rstrip() # right only
s.str.len() # length of each string
search & match
s.str.contains('py') # bool
s.str.startswith('A') # bool
s.str.endswith('ing') # bool
s.str.match(r'^\d+') # regex at start
s.str.fullmatch(r'\d+') # full match
s.str.count('l') # count occurrences
split & extract
s.str.split(',') # → list per row
s.str.split(',', expand=True) # → DataFrame
s.str.extract(r'(\d+)') # capture group
s.str.extractall(r'(\d+)') # all matches
s.str[0:3] # slice chars
Must run pd.to_datetime() first. Then .dt unlocks 30+ time-aware properties. Ideal for groupby('month'), resampling, and filtering weekdays.
8
Descriptive Statistics
explore
all stat methods
s.describe() # full summary
s.mean() s.median()
s.std() s.var()
s.min() s.max()
s.skew() # distribution skew
s.kurt() # kurtosis (peakedness)
s.sem() # standard error of mean
s.mad() # mean absolute deviation
s.quantile(0.25) # 25th percentile
s.quantile([.25,.5,.75])
describe() output explained
stat
value
meaning
count
4.0
non-null rows
mean
25.0
arithmetic average
std
12.9
spread around mean
min
10.0
smallest value
25%
17.5
Q1 — lower quartile
50%
25.0
median
75%
32.5
Q3 — upper quartile
max
40.0
largest value
9
Value Counts & Unique
frequency
methods
s.value_counts() # freq table
s.value_counts(normalize=True) # proportions
s.value_counts(dropna=False) # include NaN
s.value_counts(bins=4) # histogram bins
s.unique() # array of uniques
s.nunique() # count of uniques
s.mode() # most common value(s)
s.mode()[0] # top mode
live frequency chart
s = ['cat','dog','cat','bird','dog','cat'] → value_counts()
cat
3
dog
2
bird
1
Real-world: count category frequencies · check class imbalance before ML · audit data quality · find mode values
10
Rank & Cumulative
running ops
ranking
s.rank() # 1-based rank
s.rank(ascending=False) # highest = 1
s.rank(method='dense') # no gaps
s.rank(method='min') # tie = min rank
s.rank(pct=True) # 0–1 percentile
Rank methods: 'average' (default), 'min', 'max', 'first' (order of appearance), 'dense' (no gaps for ties).
cumulative — visual flow
s.cumsum() # running total
s.cumprod() # running product
s.cummax() # running maximum
s.cummin() # running minimum
s.duplicated() # bool mask
s.duplicated(keep='last')
s.duplicated(keep=False) # mark ALL dupes
s.drop_duplicates() # remove dupes
s.drop_duplicates(keep='last')
s[~s.duplicated()] # filter unique
keep options explained
'first' (default)
Keep first occurrence, mark the rest as duplicate (True).
'last'
Keep last occurrence, mark earlier ones as duplicate.
False
Mark ALL occurrences as True — even the first one.
15
Shift, Diff & Pct Change
lag ops
code
s.shift(1) # lag by 1 (prev row)
s.shift(-1) # lead by 1 (next row)
s.diff() # s − s.shift(1)
s.diff(2) # 2-period difference
s.pct_change() # (s−prev)/prev → %
s.pct_change(7) # 7-day % change
shift(1) visualized
ORIGINAL
0 10 1 20 2 30 3 40
AFTER shift(1)
0 NaN 1 10 2 20 3 30
Use cases: daily return = pct_change() · compare to yesterday = shift(1) · autocorrelation · lag features for ML
16
Index Operations
indexing
modify index
s.index = ['x','y','z'] # direct set
s.rename({'a':'A'}) # rename labels
s.rename(index=str.upper) # via function
s.reset_index() # → DataFrame
s.reset_index(drop=True) # keep as Series
s.set_axis(['p','q','r'])
reindex & align
s.reindex(['a','b','d']) # new index
s.reindex(fill_value=0) # fill missing
s.reindex(method='ffill') # forward fill
s1.align(s2) # align two Series
reindex is great for making two Series share the same index before doing math — avoids NaN from misaligned operations.
17
Sorting
order
sort by values
s.sort_values() # asc by default
s.sort_values(ascending=False)
s.sort_values(na_position='first')
s.nlargest(3) # top 3 (fast!)
s.nsmallest(3) # bottom 3 (fast!)
sort by index
s.sort_index() # A→Z or 0→n
s.sort_index(ascending=False)
s.sort_index(na_position='last')
All sort methods return a new Series — original unchanged. Use inplace=True to modify in place (generally discouraged in modern pandas).
nlargest / nsmallest are faster than sort_values().tail(n) for large Series.
18
Combining Series
merge
concat — stacking
pd.concat([s1, s2]) # stack rows
pd.concat([s1, s2], axis=1) # side by side
pd.concat([s1, s2],
ignore_index=True) # reindex 0,1,2…
pd.concat([s1, s2], keys=['a','b']) # multi-index
combine — smart merge
s1.combine(s2, max) # element-wise max
s1.combine_first(s2) # fill s1's NaN# with values from s2
combine_first is perfect for merging two partial datasets where each has gaps the other fills.
Alternatives: np.where(s1.isna(), s2, s1) · pd.DataFrame({'a':s1,'b':s2}) for side-by-side comparison
19
Type Conversion & Export
convert
type conversion
s.astype(float) # to float64
s.astype('Int64') # nullable int
s.astype(str) # to string (object)
s.astype('category') # memory efficient
s.astype('datetime64[ns]')
pd.to_numeric(s) # safe conversion
pd.to_numeric(s, errors='coerce') # NaN on fail