📊 Savitzky-Golay Filter Guide

Understanding Window Size & Polynomial Order for Time Series Smoothing

What is Savitzky-Golay Filter?

Basic Concept

The Savitzky-Golay filter smooths noisy data by fitting a polynomial to a sliding window of data points. Unlike simple moving averages, it preserves the shape and features of your signal (like peaks and valleys).

How It Works

  1. Sliding Window: Takes a window of neighboring points around each data point
  2. Polynomial Fit: Fits a polynomial curve through those points
  3. Replace: Replaces the center point with the value from the fitted polynomial
  4. Slide: Moves to the next point and repeats

Key Parameters

Window Length (window_size)

What it is: Number of data points used for fitting

Must be: Odd number (e.g., 5, 7, 9, 11)

Effect:

  • Larger → More smoothing
  • Smaller → Less smoothing, preserves detail

Polynomial Order (polyorder)

What it is: Degree of polynomial used for fitting

Must be: Less than window_size

Common values: 2 or 3

Effect:

  • Higher → Fits curves better
  • Lower → More smoothing

Interactive Demonstration

Adjust Parameters

Example: Gait Data

Typical Gait Analysis Settings

For hip, knee, and ankle angles sampled at 30 fps:

  • Window Length: 11-15 (captures ~0.3-0.5 seconds)
  • Polynomial Order: 2 or 3
from scipy.signal import savgol_filter import pandas as pd import numpy as np # Load your gait data df = pd.read_csv('gait_data.csv') # Apply Savitzky-Golay filter window_length = 11 # Must be odd poly_order = 3 # Must be < window_length # Smooth hip angles df['RightHipAngle_smooth'] = savgol_filter( df['RightHipAngle'], window_length=window_length, polyorder=poly_order ) df['LeftHipAngle_smooth'] = savgol_filter( df['LeftHipAngle'], window_length=window_length, polyorder=poly_order ) # Smooth knee angles df['RightKneeAngle_smooth'] = savgol_filter( df['RightKneeAngle'], window_length=window_length, polyorder=poly_order ) df['LeftKneeAngle_smooth'] = savgol_filter( df['LeftKneeAngle'], window_length=window_length, polyorder=poly_order ) # Smooth ankle angles df['RightAnkleAngle_smooth'] = savgol_filter( df['RightAnkleAngle'], window_length=window_length, polyorder=poly_order ) df['LeftAnkleAngle_smooth'] = savgol_filter( df['LeftAnkleAngle'], window_length=window_length, polyorder=poly_order ) print("Smoothed data ready!")

Choosing the Right Parameters

Window Length Selection Guide

Sampling Rate Recommended Window Time Span Use Case
30 fps 7-11 0.23-0.37 sec Moderate smoothing
30 fps 11-15 0.37-0.50 sec Heavy smoothing
60 fps 15-21 0.25-0.35 sec Moderate smoothing
100 fps 25-35 0.25-0.35 sec Research-grade data

Polynomial Order Selection

Poly Order Best For Pros Cons
2 Simple curves More smoothing, stable May miss sharp features
3 Gait data (recommended) Good balance Moderate complexity
4-5 Complex patterns Preserves fine detail May amplify noise

Step-by-Step Decision Process

Step 1: Determine Your Sampling Rate

Example: Video at 30 fps → 30 frames per second

Step 2: Choose Window Length

Rule of thumb: Window should span 0.3-0.5 seconds of data

window_length = int(sampling_rate * 0.4) # 0.4 seconds # Make it odd if window_length % 2 == 0: window_length += 1 # For 30 fps: 30 * 0.4 = 12 → round to 11 or 13

Step 3: Choose Polynomial Order

  • Start with polyorder = 3
  • If too noisy → reduce to 2
  • If losing important features → increase to 4

Step 4: Visualize and Validate

Always plot before/after to ensure you're not over-smoothing!

Common Mistakes to Avoid

❌ Too Large Window

Window = 51 for 30fps data

Problem: Loses important gait cycle features

❌ Too Small Window

Window = 3 or 5

Problem: Doesn't smooth enough, spikes remain

❌ Polyorder ≥ Window

polyorder = 5, window = 5

Problem: Error! Polyorder must be less than window

✅ Good Balance

Window = 11, polyorder = 3

Result: Smooth data, preserved features

Complete Working Example

import pandas as pd import numpy as np from scipy.signal import savgol_filter import matplotlib.pyplot as plt # Create sample noisy gait data np.random.seed(42) time = np.linspace(0, 4, 120) # 4 seconds at 30 fps # Simulate hip angle with noise clean_signal = 20 + 15 * np.sin(2 * np.pi * time) # Gait cycle noise = np.random.normal(0, 2, len(time)) # Noise noisy_signal = clean_signal + noise # Apply Savitzky-Golay filter window = 11 poly = 3 smoothed_signal = savgol_filter(noisy_signal, window, poly) # Plot comparison plt.figure(figsize=(12, 6)) plt.subplot(2, 1, 1) plt.plot(time, noisy_signal, 'b-', alpha=0.5, label='Noisy Data') plt.plot(time, clean_signal, 'g--', label='True Signal') plt.ylabel('Hip Angle (degrees)') plt.legend() plt.title('Original Noisy Data') plt.grid(True, alpha=0.3) plt.subplot(2, 1, 2) plt.plot(time, noisy_signal, 'b-', alpha=0.3, label='Noisy Data') plt.plot(time, smoothed_signal, 'r-', linewidth=2, label='Filtered Data') plt.plot(time, clean_signal, 'g--', label='True Signal') plt.xlabel('Time (seconds)') plt.ylabel('Hip Angle (degrees)') plt.legend() plt.title(f'Savitzky-Golay Filter (window={window}, poly={poly})') plt.grid(True, alpha=0.3) plt.tight_layout() plt.savefig('savgol_comparison.png', dpi=300, bbox_inches='tight') plt.show() # Apply to your DataFrame df = pd.DataFrame({ 'time': time, 'RightHipAngle': noisy_signal }) # Smooth all angle columns angle_columns = ['RightHipAngle', 'LeftHipAngle', 'RightKneeAngle', 'LeftKneeAngle', 'RightAnkleAngle', 'LeftAnkleAngle'] for col in angle_columns: if col in df.columns: df[f'{col}_smooth'] = savgol_filter(df[col], window, poly) print("Smoothing complete!") print(df.head())

When NOT to Use Savitzky-Golay

Avoid if:

  • Very short sequences: Less than 20 data points
  • Non-uniform sampling: Irregular time intervals
  • Missing data: NaN values in the sequence (interpolate first!)
  • Already smooth data: May introduce artifacts

Quick Reference Cheat Sheet

For Gait Analysis (30 fps):

# Recommended settings window_length = 11 # Good default polyorder = 3 # Good default # Conservative smoothing (preserve detail) window_length = 7 polyorder = 2 # Aggressive smoothing (remove more noise) window_length = 15 polyorder = 3

Summary

Key Takeaways

  1. Start simple: window=11, poly=3
  2. Window must be odd and greater than polyorder
  3. Larger window = more smoothing but loses detail
  4. Higher poly = preserves curves but may keep noise
  5. Always visualize before and after filtering
  6. Interpolate NaN values before applying filter