Time Series Modeling with 1D CNN: Complete Mathematical Walkthrough

Problem Setup

We have gait data from 4 subjects, each with 5 timesteps and 3 features per timestep.

Input Shape: (4, 5, 3)

→ (batch_size, timesteps, features)

Example Data

# 4 subjects, 5 timesteps, 3 features (e.g., velocity, stride_length, acceleration)
X = [
    # Subject 1
    [[0.5, 1.2, 0.3],   # timestep 0
     [0.6, 1.3, 0.4],   # timestep 1
     [0.7, 1.1, 0.5],   # timestep 2
     [0.8, 1.4, 0.3],   # timestep 3
     [0.9, 1.2, 0.6]],  # timestep 4

    # Subject 2
    [[0.4, 1.0, 0.2],
     [0.5, 1.1, 0.3],
     [0.6, 0.9, 0.4],
     [0.7, 1.2, 0.2],
     [0.8, 1.0, 0.5]],

    # Subject 3
    [[0.3, 0.8, 0.1],
     [0.4, 0.9, 0.2],
     [0.5, 0.7, 0.3],
     [0.6, 1.0, 0.1],
     [0.7, 0.8, 0.4]],

    # Subject 4
    [[0.6, 1.5, 0.4],
     [0.7, 1.6, 0.5],
     [0.8, 1.4, 0.6],
     [0.9, 1.7, 0.4],
     [1.0, 1.5, 0.7]]
]

# Labels: 1 = ASD, 0 = No ASD
y = [1, 0, 0, 1]

1D CNN Architecture

Input (4, 5, 3)

↓

Conv1D (kernel=3, filters=8)

↓

ReLU Activation

↓

MaxPooling1D (pool=2)

↓

Flatten

↓

Dense Layer (32 neurons)

↓

Output (1 neuron, Sigmoid)

↓

Prediction (0 or 1)

Step 1: Conv1D Layer

What is 1D Convolution?

A sliding window that moves along the time axis to detect patterns.

Parameters:

Kernel size: 3 (window looks at 3 consecutive timesteps)
Filters: 8 (we'll learn 8 different patterns)
Stride: 1 (move window by 1 timestep)

Filter Structure

Each filter has shape (kernel_size, input_features) = (3, 3)

Example Filter 1: W₁ = [[0.1, 0.2, -0.1], # weights for timestep t [0.3, -0.2, 0.4], # weights for timestep t+1 [-0.1, 0.5, 0.2]] # weights for timestep t+2 b₁ = 0.1 # bias

Convolution Operation for Subject 1

Input for Subject 1: [[0.5, 1.2, 0.3], # t=0 [0.6, 1.3, 0.4], # t=1 [0.7, 1.1, 0.5], # t=2 [0.8, 1.4, 0.3], # t=3 [0.9, 1.2, 0.6]] # t=4

Window Position 1 (timesteps 0-2):

Extract window:

window₁ = [[0.5, 1.2, 0.3], # t=0 [0.6, 1.3, 0.4], # t=1 [0.7, 1.1, 0.5]] # t=2

Compute convolution (element-wise multiply + sum):

result₁ = Σ(window₁ ⊙ W₁) + b₁ = (0.5×0.1 + 1.2×0.2 + 0.3×(-0.1)) + # t=0 (0.6×0.3 + 1.3×(-0.2) + 0.4×0.4) + # t=1 (0.7×(-0.1) + 1.1×0.5 + 0.5×0.2) + # t=2 0.1 # bias = (0.05 + 0.24 - 0.03) + # = 0.26 (0.18 - 0.26 + 0.16) + # = 0.08 (-0.07 + 0.55 + 0.10) + # = 0.58 0.1 = 0.26 + 0.08 + 0.58 + 0.1 = 1.02

Window Position 2 (timesteps 1-3):

window₂ = [[0.6, 1.3, 0.4], # t=1 [0.7, 1.1, 0.5], # t=2 [0.8, 1.4, 0.3]] # t=3 result₂ = Σ(window₂ ⊙ W₁) + b₁ = (0.6×0.1 + 1.3×0.2 + 0.4×(-0.1)) + (0.7×0.3 + 1.1×(-0.2) + 0.5×0.4) + (0.8×(-0.1) + 1.4×0.5 + 0.3×0.2) + 0.1 = (0.06 + 0.26 - 0.04) + # = 0.28 (0.21 - 0.22 + 0.20) + # = 0.19 (-0.08 + 0.70 + 0.06) + # = 0.68 0.1 = 1.25

Window Position 3 (timesteps 2-4):

window₃ = [[0.7, 1.1, 0.5], # t=2 [0.8, 1.4, 0.3], # t=3 [0.9, 1.2, 0.6]] # t=4 result₃ = Σ(window₃ ⊙ W₁) + b₁ = 1.18

After Conv1D with Filter 1:

For Subject 1, using Filter 1, we get:

output_filter1_subject1 = [1.02, 1.25, 1.18] # Shape: (3,)

Note: We had 5 timesteps, kernel size 3, so output length = 5 - 3 + 1 = 3

With All 8 Filters:

We repeat this process with 8 different filters (each learning different patterns).

CORRECT Representation - Shape (3, 8):

# Subject 1 after Conv1D - CORRECT FORMAT
# Each row = one output timestep with all 8 filter responses
conv_output_subject1 = [
    [1.02, 0.85, 1.15, 0.78, 1.10, 0.92, 1.08, 0.88],   # Output timestep 0
    [1.25, 0.92, 1.30, 0.95, 1.18, 1.05, 1.22, 0.98],   # Output timestep 1
    [1.18, 0.88, 1.22, 0.81, 1.25, 0.98, 1.15, 0.92]    # Output timestep 2
]
# Shape: (3, 8)  → (timesteps_out, filters)
#        ↑  ↑
#        │  └─ 8 filters
#        └─ 3 output timesteps

Explanation:

Row 0: Output at position 0 (from window covering timesteps 0-2), showing responses from all 8 filters
Row 1: Output at position 1 (from window covering timesteps 1-3), showing responses from all 8 filters
Row 2: Output at position 2 (from window covering timesteps 2-4), showing responses from all 8 filters

After processing all 4 subjects:

Conv1D output shape: (4, 3, 8) → (batch, timesteps_out, filters)

Step 2: ReLU Activation

Apply ReLU(x) = max(0, x) element-wise:

# Subject 1 (all values already positive, so unchanged)
relu_output_subject1 = [
    [1.02, 0.85, 1.15, 0.78, 1.10, 0.92, 1.08, 0.88],   # Timestep 0
    [1.25, 0.92, 1.30, 0.95, 1.18, 1.05, 1.22, 0.98],   # Timestep 1
    [1.18, 0.88, 1.22, 0.81, 1.25, 0.98, 1.15, 0.92]    # Timestep 2
]
# Shape: (3, 8)

Step 3: MaxPooling1D

Pool size: 2

Take max over every 2 consecutive timesteps

How It Works:

We pool along the timestep axis (axis 0). For each filter, divide timesteps into groups of 2 and take the maximum.

For Filter 1 (first column):

Input: [1.02, 1.25, 1.18] (3 timesteps) ↓ Group 1: [1.02, 1.25] → max = 1.25 Group 2: [1.18] → max = 1.18 (leftover single value) ↓ Output: [1.25, 1.18]

For Filter 2 (second column):

Input: [0.85, 0.92, 0.88] ↓ Group 1: [0.85, 0.92] → max = 0.92 Group 2: [0.88] → max = 0.88 ↓ Output: [0.92, 0.88]

After MaxPooling:

# Subject 1 - CORRECT FORMAT
# Each row = one pooled timestep with all 8 filter responses
pooled_subject1 = [
    [1.25, 0.92, 1.30, 0.95, 1.18, 1.05, 1.22, 0.98],   # Pooled timestep 0
    [1.18, 0.88, 1.22, 0.81, 1.25, 0.98, 1.15, 0.92]    # Pooled timestep 1
]
# Shape: (2, 8)  → (timesteps_pooled, filters)
#        ↑  ↑
#        │  └─ 8 filters
#        └─ 2 pooled timesteps

After processing all 4 subjects:

MaxPool output shape: (4, 2, 8) → (batch, timesteps_pooled, filters)

Intuition: Pooling reduces dimensionality while keeping the strongest activations (most important patterns detected).

Step 4: Flatten

Convert 3D tensor to 2D for dense layers:

# Subject 1
# Flatten by concatenating all timesteps and filters
flattened_subject1 = [
    1.25, 0.92, 1.30, 0.95, 1.18, 1.05, 1.22, 0.98,  # Timestep 0
    1.18, 0.88, 1.22, 0.81, 1.25, 0.98, 1.15, 0.92   # Timestep 1
]
# Shape: (16,)  → (2 timesteps × 8 filters = 16 features)

Flatten output shape: (4, 16) → (batch, features)

Step 5: Dense Layer (32 neurons)

Weight matrix: W_dense = (16, 32)

Bias vector: b_dense = (32,)

Calculation for Subject 1:

Z_dense = flattened_subject1 · W_dense + b_dense # Example calculation for first neuron: Z_dense[0] = Σ(flattened_subject1[i] × W_dense[i,0]) + b_dense[0] = 1.25×0.15 + 0.92×0.22 + ... + 0.92×(-0.18) + 0.05 = 0.85 # For all 32 neurons: Z_dense = [0.85, 0.92, 0.73, ..., 0.68] # Shape: (32,)

Apply ReLU: A_dense = ReLU(Z_dense) = [0.85, 0.92, 0.73, ..., 0.68] # Shape: (32,)

Dense output shape: (4, 32) → (batch, hidden_features)

Step 6: Output Layer (Sigmoid)

Weight matrix: W_out = (32, 1)

Bias: b_out = 0.0

For Subject 1:

Z_out = A_dense · W_out + b_out = Σ(A_dense[i] × W_out[i]) + b_out = 0.85×0.12 + 0.92×0.15 + ... + 0.68×0.10 + 0.0 = 0.65 ŷ = sigmoid(Z_out) = 1 / (1 + e^(-0.65)) = 1 / (1 + 0.522) = 0.657

Interpretation: Subject 1 has 65.7% probability of ASD

Final Predictions:

predictions = [
    0.657,  # Subject 1 (actual label: 1 - ASD)
    0.412,  # Subject 2 (actual label: 0 - No ASD)
    0.385,  # Subject 3 (actual label: 0 - No ASD)
    0.723   # Subject 4 (actual label: 1 - ASD)
]

# Apply threshold (0.5):
binary_predictions = [1, 0, 0, 1]  # Perfect match!

What Makes This "Time Series"?

Static Model (Your Current)

[t₀: 0.5,1.2,0.3, t₁: 0.6,1.3,0.4, ..., t₄: 0.9,1.2,0.6] ↓ (all treated as independent features) ↓ [flatten] ↓ [0.5, 1.2, 0.3, 0.6, 1.3, 0.4, ..., 0.9, 1.2, 0.6]

Time Series Model (1D CNN)

Looks at PATTERNS across time: Window 1: [t₀, t₁, t₂] → detects "increasing velocity" Window 2: [t₁, t₂, t₃] → detects "stable stride" Window 3: [t₂, t₃, t₄] → detects "irregular acceleration"

What the CNN Learns:

Each filter learns different temporal patterns:

Filter	Pattern Detected
Filter 1	Detects sudden acceleration changes
Filter 2	Identifies rhythmic stride patterns
Filter 3	Recognizes irregular gait (common in ASD)
Filter 4	Captures velocity trends
Filters 5-8	Other diagnostic patterns

Example Pattern Detection:

Filter detecting "irregular acceleration": If timesteps show: [smooth, smooth, SPIKE] → High activation If timesteps show: [smooth, smooth, smooth] → Low activation

Summary: Information Flow

Input (4, 5, 3) ↓ [Conv1D: Scan 3-timestep windows, detect 8 patterns] ↓ (4, 3, 8) ← 3 windows per subject, 8 pattern types ↓ [ReLU: Keep only positive activations] ↓ (4, 3, 8) ↓ [MaxPool: Keep strongest activations in 2-timestep groups] ↓ (4, 2, 8) ← Reduced to 2 time positions ↓ [Flatten: Convert to feature vector] ↓ (4, 16) ← 16 features per subject ↓ [Dense Layer: Combine pattern features] ↓ (4, 32) ↓ [Output Layer: Make final decision] ↓ (4, 1) ← ASD probability for each subject

Key Advantages of 1D CNN for Time Series

Local Pattern Detection: Learns patterns in small time windows