Interpolation vs Iterative Imputer

Both techniques fill missing values — but their logic and use-cases are different. This page explains the difference with clear examples and a short summary table.

1. Interpolation — “Smooth guessing between neighbors”

Concept:
Interpolation uses only nearby values of the same feature (column) to estimate missing values. It assumes the data changes smoothly over time or sequence.

Example

Suppose you have knee angle measurements (one feature over frames):

Frame:     1     2     3     4
Knee_Y:   10   NaN    30    40

Linear interpolation fills frame 2 as:

Frame 2 = 10 + ((30 - 10) / (3 - 1)) * (2 - 1) = 20
Result: [10, 20, 30, 40]

Key traits

Works feature-wise (independent for each column).
Uses temporal or spatial continuity (smooth changes).
Assumes smooth progression between values.
Simple, fast, and intuitive.

2. Iterative Imputer — “Model-based prediction using all features”

Concept:
Iterative imputation builds a model (e.g. linear regression) for each column that has missing values, predicting it from the other columns. It repeats this process iteratively to refine predictions.

Example

Suppose you have three features per frame:

Frame	Hip_X	Knee_X	Ankle_X
1	100	80	50
2	110	NaN	60
3	120	90	70

To impute Knee_X at Frame 2:

- Use Hip_X and Ankle_X as predictors.
- Fit model: Knee_X ≈ a * Hip_X + b * Ankle_X + c
- Predict missing Knee_X = a*110 + b*60 + c

IterativeImputer repeats that across columns: if other columns also had missing values, they get modeled and filled in rounds until convergence.

Key traits

Uses all other columns/features to predict a missing column.
Assumes a statistical relationship among features.
Uses ML models (LinearRegression, BayesianRidge, etc.).
Better for multivariate data where features correlate.

️ Quick Comparison

Feature	Interpolation	Iterative Imputer
Data used	Only same-column neighbors	All other columns/features
Assumption	Smooth temporal/spatial change	Statistical relationships across features
Model	None (math between neighbors)	Regression or other estimators
Good for	Time-series single-feature gaps	Multivariate sensor or keypoint data
Complexity	Low	Higher
Speed	Very fast	Slower
Example method	`df.interpolate()`	`IterativeImputer(estimator=LinearRegression())`

Why your code uses `IterativeImputer`

Your gait dataset is multivariate — each frame contains many coordinates (≈50 columns) that are correlated.

If a knee coordinate is missing, hip and ankle coordinates in the same frame can predict it reliably.
Iterative imputation models these cross-feature relationships, giving physically consistent estimates versus naive time-only interpolation.

Interpolation vs Iterative Imputer

1. Interpolation — “Smooth guessing between neighbors”

Example

Key traits

2. Iterative Imputer — “Model-based prediction using all features”

Example

Key traits

️ Quick Comparison

Why your code uses IterativeImputer

Why your code uses `IterativeImputer`