Interpolation vs Iterative Imputer

Both techniques fill missing values — but their logic and use-cases are different. This page explains the difference with clear examples and a short summary table.


1. Interpolation — “Smooth guessing between neighbors”

Concept:
Interpolation uses only nearby values of the same feature (column) to estimate missing values. It assumes the data changes smoothly over time or sequence.

Example

Suppose you have knee angle measurements (one feature over frames):

Frame:     1     2     3     4
Knee_Y:   10   NaN    30    40
        

Linear interpolation fills frame 2 as:

Frame 2 = 10 + ((30 - 10) / (3 - 1)) * (2 - 1) = 20
Result: [10, 20, 30, 40]
        

Key traits

2. Iterative Imputer — “Model-based prediction using all features”

Concept:
Iterative imputation builds a model (e.g. linear regression) for each column that has missing values, predicting it from the other columns. It repeats this process iteratively to refine predictions.

Example

Suppose you have three features per frame:

FrameHip_XKnee_XAnkle_X
11008050
2110NaN60
31209070

To impute Knee_X at Frame 2:

- Use Hip_X and Ankle_X as predictors.
- Fit model: Knee_X ≈ a * Hip_X + b * Ankle_X + c
- Predict missing Knee_X = a*110 + b*60 + c
      

IterativeImputer repeats that across columns: if other columns also had missing values, they get modeled and filled in rounds until convergence.

Key traits

️ Quick Comparison

FeatureInterpolationIterative Imputer
Data usedOnly same-column neighborsAll other columns/features
AssumptionSmooth temporal/spatial changeStatistical relationships across features
ModelNone (math between neighbors)Regression or other estimators
Good forTime-series single-feature gapsMultivariate sensor or keypoint data
ComplexityLowHigher
SpeedVery fastSlower
Example methoddf.interpolate()IterativeImputer(estimator=LinearRegression())

Why your code uses IterativeImputer

Your gait dataset is multivariate — each frame contains many coordinates (≈50 columns) that are correlated.