Problem Setup
We have gait data from 4 subjects, each with 5 timesteps and 3 features per timestep.
Input Shape: (4, 5, 3)
→ (batch_size, timesteps, features)
Example Data
# 4 subjects, 5 timesteps, 3 features (e.g., velocity, stride_length, acceleration)
X = [
# Subject 1
[[0.5, 1.2, 0.3], # timestep 0
[0.6, 1.3, 0.4], # timestep 1
[0.7, 1.1, 0.5], # timestep 2
[0.8, 1.4, 0.3], # timestep 3
[0.9, 1.2, 0.6]], # timestep 4
# Subject 2
[[0.4, 1.0, 0.2],
[0.5, 1.1, 0.3],
[0.6, 0.9, 0.4],
[0.7, 1.2, 0.2],
[0.8, 1.0, 0.5]],
# Subject 3
[[0.3, 0.8, 0.1],
[0.4, 0.9, 0.2],
[0.5, 0.7, 0.3],
[0.6, 1.0, 0.1],
[0.7, 0.8, 0.4]],
# Subject 4
[[0.6, 1.5, 0.4],
[0.7, 1.6, 0.5],
[0.8, 1.4, 0.6],
[0.9, 1.7, 0.4],
[1.0, 1.5, 0.7]]
]
# Labels: 1 = ASD, 0 = No ASD
y = [1, 0, 0, 1]
1D CNN Architecture
Input (4, 5, 3)
↓
Conv1D (kernel=3, filters=8)
↓
ReLU Activation
↓
MaxPooling1D (pool=2)
↓
Flatten
↓
Dense Layer (32 neurons)
↓
Output (1 neuron, Sigmoid)
↓
Prediction (0 or 1)
Step 1: Conv1D Layer
What is 1D Convolution?
A sliding window that moves along the time axis to detect patterns.
Parameters:
- Kernel size: 3 (window looks at 3 consecutive timesteps)
- Filters: 8 (we'll learn 8 different patterns)
- Stride: 1 (move window by 1 timestep)
Filter Structure
Each filter has shape (kernel_size, input_features) = (3, 3)
Example Filter 1:
W₁ = [[0.1, 0.2, -0.1], # weights for timestep t
[0.3, -0.2, 0.4], # weights for timestep t+1
[-0.1, 0.5, 0.2]] # weights for timestep t+2
b₁ = 0.1 # bias
Convolution Operation for Subject 1
Input for Subject 1:
[[0.5, 1.2, 0.3], # t=0
[0.6, 1.3, 0.4], # t=1
[0.7, 1.1, 0.5], # t=2
[0.8, 1.4, 0.3], # t=3
[0.9, 1.2, 0.6]] # t=4
Window Position 1 (timesteps 0-2):
Extract window:
window₁ = [[0.5, 1.2, 0.3], # t=0
[0.6, 1.3, 0.4], # t=1
[0.7, 1.1, 0.5]] # t=2
Compute convolution (element-wise multiply + sum):
result₁ = Σ(window₁ ⊙ W₁) + b₁
= (0.5×0.1 + 1.2×0.2 + 0.3×(-0.1)) + # t=0
(0.6×0.3 + 1.3×(-0.2) + 0.4×0.4) + # t=1
(0.7×(-0.1) + 1.1×0.5 + 0.5×0.2) + # t=2
0.1 # bias
= (0.05 + 0.24 - 0.03) + # = 0.26
(0.18 - 0.26 + 0.16) + # = 0.08
(-0.07 + 0.55 + 0.10) + # = 0.58
0.1
= 0.26 + 0.08 + 0.58 + 0.1
= 1.02
Window Position 2 (timesteps 1-3):
window₂ = [[0.6, 1.3, 0.4], # t=1
[0.7, 1.1, 0.5], # t=2
[0.8, 1.4, 0.3]] # t=3
result₂ = Σ(window₂ ⊙ W₁) + b₁
= (0.6×0.1 + 1.3×0.2 + 0.4×(-0.1)) +
(0.7×0.3 + 1.1×(-0.2) + 0.5×0.4) +
(0.8×(-0.1) + 1.4×0.5 + 0.3×0.2) +
0.1
= (0.06 + 0.26 - 0.04) + # = 0.28
(0.21 - 0.22 + 0.20) + # = 0.19
(-0.08 + 0.70 + 0.06) + # = 0.68
0.1
= 1.25
Window Position 3 (timesteps 2-4):
window₃ = [[0.7, 1.1, 0.5], # t=2
[0.8, 1.4, 0.3], # t=3
[0.9, 1.2, 0.6]] # t=4
result₃ = Σ(window₃ ⊙ W₁) + b₁ = 1.18
After Conv1D with Filter 1:
For Subject 1, using Filter 1, we get:
output_filter1_subject1 = [1.02, 1.25, 1.18] # Shape: (3,)
Note: We had 5 timesteps, kernel size 3, so output length = 5 - 3 + 1 = 3
With All 8 Filters:
We repeat this process with 8 different filters (each learning different patterns).
CORRECT Representation - Shape (3, 8):
# Subject 1 after Conv1D - CORRECT FORMAT
# Each row = one output timestep with all 8 filter responses
conv_output_subject1 = [
[1.02, 0.85, 1.15, 0.78, 1.10, 0.92, 1.08, 0.88], # Output timestep 0
[1.25, 0.92, 1.30, 0.95, 1.18, 1.05, 1.22, 0.98], # Output timestep 1
[1.18, 0.88, 1.22, 0.81, 1.25, 0.98, 1.15, 0.92] # Output timestep 2
]
# Shape: (3, 8) → (timesteps_out, filters)
# ↑ ↑
# │ └─ 8 filters
# └─ 3 output timesteps
Explanation:
- Row 0: Output at position 0 (from window covering timesteps 0-2), showing responses from all 8 filters
- Row 1: Output at position 1 (from window covering timesteps 1-3), showing responses from all 8 filters
- Row 2: Output at position 2 (from window covering timesteps 2-4), showing responses from all 8 filters
After processing all 4 subjects:
Conv1D output shape: (4, 3, 8) → (batch, timesteps_out, filters)
Step 2: ReLU Activation
Apply ReLU(x) = max(0, x) element-wise:
# Subject 1 (all values already positive, so unchanged)
relu_output_subject1 = [
[1.02, 0.85, 1.15, 0.78, 1.10, 0.92, 1.08, 0.88], # Timestep 0
[1.25, 0.92, 1.30, 0.95, 1.18, 1.05, 1.22, 0.98], # Timestep 1
[1.18, 0.88, 1.22, 0.81, 1.25, 0.98, 1.15, 0.92] # Timestep 2
]
# Shape: (3, 8)
Step 3: MaxPooling1D
Pool size: 2
Take max over every 2 consecutive timesteps
How It Works:
We pool along the timestep axis (axis 0). For each filter, divide timesteps into groups of 2 and take the maximum.
For Filter 1 (first column):
Input: [1.02, 1.25, 1.18] (3 timesteps)
↓
Group 1: [1.02, 1.25] → max = 1.25
Group 2: [1.18] → max = 1.18 (leftover single value)
↓
Output: [1.25, 1.18]
For Filter 2 (second column):
Input: [0.85, 0.92, 0.88]
↓
Group 1: [0.85, 0.92] → max = 0.92
Group 2: [0.88] → max = 0.88
↓
Output: [0.92, 0.88]
After MaxPooling:
# Subject 1 - CORRECT FORMAT
# Each row = one pooled timestep with all 8 filter responses
pooled_subject1 = [
[1.25, 0.92, 1.30, 0.95, 1.18, 1.05, 1.22, 0.98], # Pooled timestep 0
[1.18, 0.88, 1.22, 0.81, 1.25, 0.98, 1.15, 0.92] # Pooled timestep 1
]
# Shape: (2, 8) → (timesteps_pooled, filters)
# ↑ ↑
# │ └─ 8 filters
# └─ 2 pooled timesteps
After processing all 4 subjects:
MaxPool output shape: (4, 2, 8) → (batch, timesteps_pooled, filters)
Intuition: Pooling reduces dimensionality while keeping the strongest activations (most important patterns detected).
Step 4: Flatten
Convert 3D tensor to 2D for dense layers:
# Subject 1
# Flatten by concatenating all timesteps and filters
flattened_subject1 = [
1.25, 0.92, 1.30, 0.95, 1.18, 1.05, 1.22, 0.98, # Timestep 0
1.18, 0.88, 1.22, 0.81, 1.25, 0.98, 1.15, 0.92 # Timestep 1
]
# Shape: (16,) → (2 timesteps × 8 filters = 16 features)
Flatten output shape: (4, 16) → (batch, features)
Step 5: Dense Layer (32 neurons)
Weight matrix: W_dense = (16, 32)
Bias vector: b_dense = (32,)
Calculation for Subject 1:
Z_dense = flattened_subject1 · W_dense + b_dense
# Example calculation for first neuron:
Z_dense[0] = Σ(flattened_subject1[i] × W_dense[i,0]) + b_dense[0]
= 1.25×0.15 + 0.92×0.22 + ... + 0.92×(-0.18) + 0.05
= 0.85
# For all 32 neurons:
Z_dense = [0.85, 0.92, 0.73, ..., 0.68] # Shape: (32,)
Apply ReLU:
A_dense = ReLU(Z_dense) = [0.85, 0.92, 0.73, ..., 0.68] # Shape: (32,)
Dense output shape: (4, 32) → (batch, hidden_features)
Step 6: Output Layer (Sigmoid)
Weight matrix: W_out = (32, 1)
Bias: b_out = 0.0
For Subject 1:
Z_out = A_dense · W_out + b_out
= Σ(A_dense[i] × W_out[i]) + b_out
= 0.85×0.12 + 0.92×0.15 + ... + 0.68×0.10 + 0.0
= 0.65
ŷ = sigmoid(Z_out) = 1 / (1 + e^(-0.65))
= 1 / (1 + 0.522)
= 0.657
Interpretation: Subject 1 has 65.7% probability of ASD
Final Predictions:
predictions = [
0.657, # Subject 1 (actual label: 1 - ASD)
0.412, # Subject 2 (actual label: 0 - No ASD)
0.385, # Subject 3 (actual label: 0 - No ASD)
0.723 # Subject 4 (actual label: 1 - ASD)
]
# Apply threshold (0.5):
binary_predictions = [1, 0, 0, 1] # Perfect match!
What Makes This "Time Series"?
Static Model (Your Current)
[t₀: 0.5,1.2,0.3, t₁: 0.6,1.3,0.4, ..., t₄: 0.9,1.2,0.6]
↓
(all treated as independent features)
↓
[flatten]
↓
[0.5, 1.2, 0.3, 0.6, 1.3, 0.4, ..., 0.9, 1.2, 0.6]
Time Series Model (1D CNN)
Looks at PATTERNS across time:
Window 1: [t₀, t₁, t₂] → detects "increasing velocity"
Window 2: [t₁, t₂, t₃] → detects "stable stride"
Window 3: [t₂, t₃, t₄] → detects "irregular acceleration"
What the CNN Learns:
Each filter learns different temporal patterns:
| Filter |
Pattern Detected |
| Filter 1 |
Detects sudden acceleration changes |
| Filter 2 |
Identifies rhythmic stride patterns |
| Filter 3 |
Recognizes irregular gait (common in ASD) |
| Filter 4 |
Captures velocity trends |
| Filters 5-8 |
Other diagnostic patterns |
Example Pattern Detection:
Filter detecting "irregular acceleration":
If timesteps show: [smooth, smooth, SPIKE] → High activation
If timesteps show: [smooth, smooth, smooth] → Low activation
Summary: Information Flow
Input (4, 5, 3)
↓
[Conv1D: Scan 3-timestep windows, detect 8 patterns]
↓
(4, 3, 8) ← 3 windows per subject, 8 pattern types
↓
[ReLU: Keep only positive activations]
↓
(4, 3, 8)
↓
[MaxPool: Keep strongest activations in 2-timestep groups]
↓
(4, 2, 8) ← Reduced to 2 time positions
↓
[Flatten: Convert to feature vector]
↓
(4, 16) ← 16 features per subject
↓
[Dense Layer: Combine pattern features]
↓
(4, 32)
↓
[Output Layer: Make final decision]
↓
(4, 1) ← ASD probability for each subject
Key Advantages of 1D CNN for Time Series
- Local Pattern Detection: Learns patterns in small time windows