Interpolation vs Iterative Imputer
Both techniques fill missing values — but their logic and use-cases are different. This page explains the difference with clear examples and a short summary table.
1. Interpolation — “Smooth guessing between neighbors”
Concept:
Interpolation uses only nearby values of the same feature (column) to estimate missing values. It assumes the data changes smoothly over time or sequence.
Example
Suppose you have knee angle measurements (one feature over frames):
Frame: 1 2 3 4
Knee_Y: 10 NaN 30 40
Linear interpolation fills frame 2 as:
Frame 2 = 10 + ((30 - 10) / (3 - 1)) * (2 - 1) = 20
Result: [10, 20, 30, 40]
Key traits
- Works feature-wise (independent for each column).
- Uses temporal or spatial continuity (smooth changes).
- Assumes smooth progression between values.
- Simple, fast, and intuitive.
2. Iterative Imputer — “Model-based prediction using all features”
Concept:
Iterative imputation builds a model (e.g. linear regression) for each column that has missing values, predicting it from the other columns. It repeats this process iteratively to refine predictions.
Example
Suppose you have three features per frame:
| Frame | Hip_X | Knee_X | Ankle_X |
|---|---|---|---|
| 1 | 100 | 80 | 50 |
| 2 | 110 | NaN | 60 |
| 3 | 120 | 90 | 70 |
To impute Knee_X at Frame 2:
- Use Hip_X and Ankle_X as predictors.
- Fit model: Knee_X ≈ a * Hip_X + b * Ankle_X + c
- Predict missing Knee_X = a*110 + b*60 + c
IterativeImputer repeats that across columns: if other columns also had missing values, they get modeled and filled in rounds until convergence.
Key traits
- Uses all other columns/features to predict a missing column.
- Assumes a statistical relationship among features.
- Uses ML models (LinearRegression, BayesianRidge, etc.).
- Better for multivariate data where features correlate.
️ Quick Comparison
| Feature | Interpolation | Iterative Imputer |
|---|---|---|
| Data used | Only same-column neighbors | All other columns/features |
| Assumption | Smooth temporal/spatial change | Statistical relationships across features |
| Model | None (math between neighbors) | Regression or other estimators |
| Good for | Time-series single-feature gaps | Multivariate sensor or keypoint data |
| Complexity | Low | Higher |
| Speed | Very fast | Slower |
| Example method | df.interpolate() | IterativeImputer(estimator=LinearRegression()) |
Why your code uses IterativeImputer
Your gait dataset is multivariate — each frame contains many coordinates (≈50 columns) that are correlated.
- If a knee coordinate is missing, hip and ankle coordinates in the same frame can predict it reliably.
- Iterative imputation models these cross-feature relationships, giving physically consistent estimates versus naive time-only interpolation.