What is Savitzky-Golay Filter?
Basic Concept
The Savitzky-Golay filter smooths noisy data by fitting a polynomial to a sliding window of data points. Unlike simple moving averages, it preserves the shape and features of your signal (like peaks and valleys).
How It Works
- Sliding Window: Takes a window of neighboring points around each data point
- Polynomial Fit: Fits a polynomial curve through those points
- Replace: Replaces the center point with the value from the fitted polynomial
- Slide: Moves to the next point and repeats
Key Parameters
Window Length (window_size)
What it is: Number of data points used for fitting
Must be: Odd number (e.g., 5, 7, 9, 11)
Effect:
- Larger → More smoothing
- Smaller → Less smoothing, preserves detail
Polynomial Order (polyorder)
What it is: Degree of polynomial used for fitting
Must be: Less than window_size
Common values: 2 or 3
Effect:
- Higher → Fits curves better
- Lower → More smoothing
Interactive Demonstration
Example: Gait Data
Typical Gait Analysis Settings
For hip, knee, and ankle angles sampled at 30 fps:
- Window Length: 11-15 (captures ~0.3-0.5 seconds)
- Polynomial Order: 2 or 3
from scipy.signal import savgol_filter
import pandas as pd
import numpy as np
# Load your gait data
df = pd.read_csv('gait_data.csv')
# Apply Savitzky-Golay filter
window_length = 11 # Must be odd
poly_order = 3 # Must be < window_length
# Smooth hip angles
df['RightHipAngle_smooth'] = savgol_filter(
df['RightHipAngle'],
window_length=window_length,
polyorder=poly_order
)
df['LeftHipAngle_smooth'] = savgol_filter(
df['LeftHipAngle'],
window_length=window_length,
polyorder=poly_order
)
# Smooth knee angles
df['RightKneeAngle_smooth'] = savgol_filter(
df['RightKneeAngle'],
window_length=window_length,
polyorder=poly_order
)
df['LeftKneeAngle_smooth'] = savgol_filter(
df['LeftKneeAngle'],
window_length=window_length,
polyorder=poly_order
)
# Smooth ankle angles
df['RightAnkleAngle_smooth'] = savgol_filter(
df['RightAnkleAngle'],
window_length=window_length,
polyorder=poly_order
)
df['LeftAnkleAngle_smooth'] = savgol_filter(
df['LeftAnkleAngle'],
window_length=window_length,
polyorder=poly_order
)
print("Smoothed data ready!")
Choosing the Right Parameters
Window Length Selection Guide
| Sampling Rate |
Recommended Window |
Time Span |
Use Case |
| 30 fps |
7-11 |
0.23-0.37 sec |
Moderate smoothing |
| 30 fps |
11-15 |
0.37-0.50 sec |
Heavy smoothing |
| 60 fps |
15-21 |
0.25-0.35 sec |
Moderate smoothing |
| 100 fps |
25-35 |
0.25-0.35 sec |
Research-grade data |
Polynomial Order Selection
| Poly Order |
Best For |
Pros |
Cons |
| 2 |
Simple curves |
More smoothing, stable |
May miss sharp features |
| 3 |
Gait data (recommended) |
Good balance |
Moderate complexity |
| 4-5 |
Complex patterns |
Preserves fine detail |
May amplify noise |
Step-by-Step Decision Process
Step 1: Determine Your Sampling Rate
Example: Video at 30 fps → 30 frames per second
Step 2: Choose Window Length
Rule of thumb: Window should span 0.3-0.5 seconds of data
window_length = int(sampling_rate * 0.4) # 0.4 seconds
# Make it odd
if window_length % 2 == 0:
window_length += 1
# For 30 fps: 30 * 0.4 = 12 → round to 11 or 13
Step 3: Choose Polynomial Order
- Start with polyorder = 3
- If too noisy → reduce to 2
- If losing important features → increase to 4
Step 4: Visualize and Validate
Always plot before/after to ensure you're not over-smoothing!
Common Mistakes to Avoid
❌ Too Large Window
Window = 51 for 30fps data
Problem: Loses important gait cycle features
❌ Too Small Window
Window = 3 or 5
Problem: Doesn't smooth enough, spikes remain
❌ Polyorder ≥ Window
polyorder = 5, window = 5
Problem: Error! Polyorder must be less than window
✅ Good Balance
Window = 11, polyorder = 3
Result: Smooth data, preserved features
Complete Working Example
import pandas as pd
import numpy as np
from scipy.signal import savgol_filter
import matplotlib.pyplot as plt
# Create sample noisy gait data
np.random.seed(42)
time = np.linspace(0, 4, 120) # 4 seconds at 30 fps
# Simulate hip angle with noise
clean_signal = 20 + 15 * np.sin(2 * np.pi * time) # Gait cycle
noise = np.random.normal(0, 2, len(time)) # Noise
noisy_signal = clean_signal + noise
# Apply Savitzky-Golay filter
window = 11
poly = 3
smoothed_signal = savgol_filter(noisy_signal, window, poly)
# Plot comparison
plt.figure(figsize=(12, 6))
plt.subplot(2, 1, 1)
plt.plot(time, noisy_signal, 'b-', alpha=0.5, label='Noisy Data')
plt.plot(time, clean_signal, 'g--', label='True Signal')
plt.ylabel('Hip Angle (degrees)')
plt.legend()
plt.title('Original Noisy Data')
plt.grid(True, alpha=0.3)
plt.subplot(2, 1, 2)
plt.plot(time, noisy_signal, 'b-', alpha=0.3, label='Noisy Data')
plt.plot(time, smoothed_signal, 'r-', linewidth=2, label='Filtered Data')
plt.plot(time, clean_signal, 'g--', label='True Signal')
plt.xlabel('Time (seconds)')
plt.ylabel('Hip Angle (degrees)')
plt.legend()
plt.title(f'Savitzky-Golay Filter (window={window}, poly={poly})')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('savgol_comparison.png', dpi=300, bbox_inches='tight')
plt.show()
# Apply to your DataFrame
df = pd.DataFrame({
'time': time,
'RightHipAngle': noisy_signal
})
# Smooth all angle columns
angle_columns = ['RightHipAngle', 'LeftHipAngle', 'RightKneeAngle',
'LeftKneeAngle', 'RightAnkleAngle', 'LeftAnkleAngle']
for col in angle_columns:
if col in df.columns:
df[f'{col}_smooth'] = savgol_filter(df[col], window, poly)
print("Smoothing complete!")
print(df.head())
When NOT to Use Savitzky-Golay
Avoid if:
- Very short sequences: Less than 20 data points
- Non-uniform sampling: Irregular time intervals
- Missing data: NaN values in the sequence (interpolate first!)
- Already smooth data: May introduce artifacts
Quick Reference Cheat Sheet
For Gait Analysis (30 fps):
# Recommended settings
window_length = 11 # Good default
polyorder = 3 # Good default
# Conservative smoothing (preserve detail)
window_length = 7
polyorder = 2
# Aggressive smoothing (remove more noise)
window_length = 15
polyorder = 3
Summary
Key Takeaways
- Start simple: window=11, poly=3
- Window must be odd and greater than polyorder
- Larger window = more smoothing but loses detail
- Higher poly = preserves curves but may keep noise
- Always visualize before and after filtering
- Interpolate NaN values before applying filter