🎯 1. Introduction & Motivation
T-SMOTE (Temporal-SMOTE) is a specialized oversampling technique designed to address class imbalance in time-series classification problems while preserving temporal dependencies and dynamics.
1.1 The Class Imbalance Problem
In many real-world applications, datasets exhibit severe class imbalance where the minority class (positive cases) is significantly underrepresented compared to the majority class (negative cases). This imbalance causes several critical issues:
- Model Bias: Classifiers tend to predict the majority class to maximize overall accuracy
- Poor Minority Detection: The minority class, often the one we care about most (e.g., disease, fraud, failure), gets ignored
- Evaluation Pitfalls: High accuracy can be misleading when 95% of data belongs to one class
- Business Impact: Missing rare events can have severe consequences (missed diagnoses, undetected fraud, equipment failures)
🤔 Why Do We Need T-SMOTE?
The Real-World Scenarios:
Medical Diagnosis
In gait analysis for autism spectrum disorder (ASD) detection, you might have 1000 normal gait sequences but only 50 ASD cases. Without proper handling, your LSTM model will simply learn to classify everything as "normal" and achieve 95% accuracy—completely missing the point.
Equipment Failure Prediction
Industrial sensors record thousands of hours of normal operation but only a few failure events. Predicting these rare failures is crucial for maintenance scheduling and preventing costly downtime.
Financial Fraud Detection
Among millions of legitimate transactions, fraudulent ones are rare but extremely costly. The temporal pattern of how fraud develops is key to detection.
Why Traditional Methods Fail
Standard oversampling techniques like SMOTE treat time series as static feature vectors, destroying the temporal order that contains critical information about how patterns evolve and transition between classes.
1.2 Core Innovation of T-SMOTE
T-SMOTE introduces three fundamental innovations:
- Temporal Awareness: Treats time as a first-class dimension, not just another feature
- Progressive Subsequencing: Generates samples at different temporal positions using "leading times"
- Confidence-Guided Synthesis: Uses model predictions to guide interpolation, ensuring synthetic samples are realistic and useful
1.3 Key Terminology
- Time Series
- A sequence of observations ordered in time, where each observation is a vector of features measured at a specific time point.
- Class Imbalance
- A situation where one class (minority) has significantly fewer samples than the other class (majority), typically with a ratio more extreme than 1:10.
- Oversampling
- A technique to balance class distribution by generating synthetic samples for the minority class.
- Temporal Dependency
- The relationship between observations at different time points, where the current state depends on previous states.
- Decision Boundary
- The hyperplane or surface that separates different classes in feature space. Samples near this boundary are hardest to classify.
1.4 When to Use T-SMOTE
T-SMOTE is particularly effective when:
✅ Ideal Use Cases
- Your data is sequential (time series, sensor data, behavioral sequences)
- You have severe class imbalance (minority class < 20% of total)
- The temporal evolution of patterns is important (not just final state)
- You have a pretrained classifier that can provide confidence scores
- Transitional patterns matter (how normal becomes abnormal)
⚠️ Not Recommended When
- Your data is static/tabular without temporal order → use standard SMOTE
- Classes are already balanced → no oversampling needed
- You have very short sequences (e.g., <10 time steps) → limited room for subsequencing
- Temporal order doesn't matter to the classification task
⚠️ 2. The Problem with Standard SMOTE
2.1 How Standard SMOTE Works
SMOTE (Synthetic Minority Over-sampling Technique), introduced by Chawla et al. in 2002, is a foundational technique for handling imbalanced data in traditional machine learning.
Mathematical Formulation
For static feature vectors:
SMOTE generates synthetic samples by linear interpolation:
Step-by-Step Process
Standard SMOTE Algorithm
- Select a minority class sample X_A
- Find its k nearest neighbors (typically k=5) in feature space
- Randomly choose one neighbor X_B
- Generate random λ ∈ [0,1]
- Create synthetic sample: X_new = X_A + λ(X_B - X_A)
- Repeat until desired class balance is achieved
📊 Concrete Example: Credit Scoring
Sample A (defaulter):
| Age | Income | Credit Score |
|---|---|---|
| 35 | 45000 | 580 |
Sample B (defaulter, nearest neighbor):
| Age | Income | Credit Score |
|---|---|---|
| 40 | 50000 | 600 |
With λ = 0.6:
✅ This works perfectly because these features are independent and static.
2.2 Why SMOTE Fails for Time Series
🚫 Critical Failure Modes
Problem 1: Temporal Order Destruction
When you flatten a time series into a feature vector, you lose the sequential structure:
SMOTE then treats f₁₁ (feature 1 at time 1) and f₃₂ (feature 2 at time 3) as if they're interchangeable—completely ignoring that they occur at different times.
Problem 2: Unrealistic Temporal Mixing
SMOTE might interpolate between samples at completely different temporal phases:
- Mixing the beginning of one gait cycle with the end of another
- Combining early-stage failure indicators with late-stage indicators
- Blending different phases of a heartbeat cycle
Result: Physically impossible synthetic sequences
Problem 3: Loss of Dynamics
Time series contain information in their dynamics—velocity, acceleration, trends. SMOTE interpolation destroys these:
- Smooth trends become jagged
- Periodic patterns get distorted
- Temporal correlations are broken
🎭 Illustrative Example: Gait Analysis
Sequence A (ASD gait): Complete gait cycle from heel strike to toe-off
Sequence B (ASD gait): Similar but different timing
What SMOTE produces: Random mixing of phases
❌ This creates biomechanically impossible movement patterns!
2.3 Comparison: What Works vs. What Doesn't
✅ SMOTE Works Great For:
- Tabular data: Customer demographics, financial ratios
- Image features: Pixel values, color histograms
- Static measurements: Lab test results, survey responses
- Independent features: Where feature order doesn't matter
Why? These features don't have temporal dependencies
❌ SMOTE Fails For:
- Time series: Sensor readings, physiological signals
- Sequential data: Video frames, speech signals
- Behavioral sequences: User actions, transaction patterns
- Temporal patterns: Where order and dynamics are crucial
Why? Temporal dependencies get destroyed
⏰ 3. Understanding Time-Series Structure
3.1 Mathematical Representation
A time series is fundamentally different from static data because it contains ordered observations over time.
Formal Definition
📝 Understanding the Notation
Subscript i: Identifies which time series (e.g., patient #5, sensor #12)
Superscript t: Identifies the time step within that series
Example: x³₅ means "features at time step 3 of time series #5"
3.2 Anatomy of Time-Series Data
🚶 Concrete Example: Gait Analysis Data
Setup: Motion capture of a person walking for 2 seconds at 30 FPS
- T = 60 time steps (frames)
- d = 12 features per frame (4 joints × 3 coordinates each)
- Features: [hip_x, hip_y, hip_z, knee_x, knee_y, knee_z, ankle_x, ankle_y, ankle_z, foot_x, foot_y, foot_z]
Data structure:
| Time | hip_x | hip_y | hip_z | ... | foot_z |
|---|---|---|---|---|---|
| t=1 | 0.45 | 0.92 | 0.15 | ... | 0.02 |
| t=2 | 0.46 | 0.93 | 0.16 | ... | 0.03 |
| ... | ... | ... | ... | ... | ... |
| t=60 | 0.52 | 0.89 | 0.18 | ... | 0.01 |
Shape: (60, 12) — a matrix where each row is one time step
3.3 What Makes Time Series Special
🎯 Critical Properties of Time Series
1. Temporal Order Matters
Frame 5 → Frame 6 → Frame 7 represents physical reality. Reversing or shuffling this order creates meaningless data.
2. Temporal Dependencies
Current values depend on past values:
Example: Your foot position at frame 10 is influenced by where it was at frame 9.
3. Patterns Evolve Over Time
The transition from normal to abnormal happens gradually:
- Frames 1-20: Normal walking
- Frames 21-40: Subtle asymmetry appears
- Frames 41-60: Clear ASD gait pattern
4. Dynamics Matter
Not just position, but velocity and acceleration:
- Position: Where the joint is
- Velocity: How fast it's moving (x^t - x^(t-1))
- Acceleration: How velocity changes
3.4 Challenges in Time-Series Classification
🎯 Key Challenges
Variable Length
Different sequences may have different lengths (some walks are longer than others). Solutions: padding, truncation, or subsequencing.
Temporal Misalignment
Similar patterns may occur at different time offsets. One person's gait cycle might start at frame 5, another's at frame 15.
High Dimensionality
With T=60 and d=12, you have 720 features. This creates the "curse of dimensionality" problem.
Class Imbalance
In medical/industrial applications, abnormal cases are rare. This is where T-SMOTE comes in!
3.5 Why Standard Methods Fail
Visualization: What Happens When You Flatten Time Series
Original time series (meaningful):
[12 features]
[12 features]
[12 features]
[12 features]
After flattening for SMOTE (order lost):
Now it's just a 720-dimensional vector. The model has no way to know that f₁₁ (hip_x at time 1) should be close to f₂₁ (hip_x at time 2).
📏 4. The Leading Time Concept
The leading time is T-SMOTE's most innovative concept. It captures the idea that for classification tasks with temporal events (like failure, disease onset, or pattern occurrence), the most informative samples are those that capture the transition period—not just the final state.
4.1 Mathematical Definition
- Leading Time (l)
- The temporal offset from the end of a sequence. It determines how far back in time we extract a subsequence.
- Subsequence with Leading Time l
-
X⁽ˡ⁾ᵢ = [xᵢ^(T-l-w+1), xᵢ^(T-l-w+2), ..., xᵢ^(T-l)] where: - T: total sequence length - w: window size (subsequence length) - l: leading time (0, 1, 2, ..., L)
Intuitive Understanding
Think of leading time as "rewinding" the sequence:
- l=0: The most recent w frames (ending at time T)
- l=1: One step earlier (ending at time T-1)
- l=2: Two steps earlier (ending at time T-2)
- And so on...
🤔 Why Do We Need Leading Time?
The Problem with Just Using Final Frames
If you only look at the last window (l=0) for all positive samples:
- All samples are deep in the positive region
- Model doesn't learn the transition from negative → positive
- Can't detect early-stage patterns
- Poor performance on borderline cases
What Leading Time Achieves
By creating subsequences at different leading times:
- l=0: Captures fully developed positive pattern (high confidence)
- l=3: Captures mid-stage pattern (medium confidence)
- l=7: Captures early-stage pattern (low confidence, near boundary)
This gives the model examples of how patterns evolve, not just their final state.
4.2 Visual Demonstration
Example: 10-Frame Sequence with Window Size w=5
Complete original sequence:
Frames:12345678910Opacity represents pattern strength: darker = more obvious ASD pattern
X⁽⁰⁾ (l=0): Last 5 frames [6,7,8,9,10]
Extracted:1-5678910Model confidence: s⁽⁰⁾ = 0.95 (very confident this is ASD)
Meaning: Clear, fully-developed ASD gait pattern
X⁽¹⁾ (l=1): Frames [5,6,7,8,9]
Extracted:1-45678910Model confidence: s⁽¹⁾ = 0.78 (fairly confident)
Meaning: Pattern is developing but not fully established
X⁽²⁾ (l=2): Frames [4,5,6,7,8]
Extracted:1-3456789-10Model confidence: s⁽²⁾ = 0.54 (uncertain, borderline)
Meaning: Transition phase—could be ASD or normal
X⁽³⁾ (l=3): Frames [3,4,5,6,7]
Extracted:1-2345678-10Model confidence: s⁽³⁾ = 0.32 (looks more normal)
Meaning: Early stage, before pattern fully emerges
The Magic of Leading Time: By generating subsequences at different leading times, we create a temporal spectrum from "clearly positive" to "borderline" to "almost negative." This teaches the model to recognize patterns at all stages of development.4.3 Calculating Leading Time Indices
Step-by-Step Calculation
Given:
- Total sequence length: T = 10
- Window size: w = 5
- Leading time: l = 2
Calculate start and end indices:
Start index: T - l - w + 1 = 10 - 2 - 5 + 1 = 4 End index: T - l = 10 - 2 = 8 Therefore: X⁽²⁾ = [x₄, x₅, x₆, x₇, x₈]Verify:
- Length = 8 - 4 + 1 = 5 ✓ (matches window size)
- Ends at T-l = 8 ✓ (two steps before the end)
4.4 Choosing Maximum Leading Time (L)
📐 Practical Guidelines
Maximum Possible Value
L_max = T - wThis ensures you don't try to extract a subsequence that starts before the beginning of the series.
Recommended Range
Typically, L is set to capture the meaningful transition period:
- Short sequences (T < 50): L = 3 to 5
- Medium sequences (50 ≤ T ≤ 200): L = 5 to 10
- Long sequences (T > 200): L = 10 to 20
Domain Knowledge Matters
For gait analysis: If a gait cycle is ~30 frames and the transition takes ~15 frames, set L ≈ 15
For equipment failure: If failure indicators appear 100 time steps before failure, set L ≈ 100
4.5 Why Not Just Use the Entire Sequence?
❌ Using Full Sequence
- Very long input to model
- Computational cost increases
- Early frames may be irrelevant noise
- Harder to learn what's important
- Doesn't focus on transition period
✅ Using Subsequences with Leading Time
- Fixed-length windows (easier to batch)
- Computationally efficient
- Focuses on relevant time period
- Multiple training examples from one sequence
- Captures temporal evolution
🎯 5. Model Confidence Scores
Model confidence scores are the bridge between the temporal subsequences and the synthetic sample generation. They tell us how "positive" each subsequence looks, which guides how we mix them.
5.1 What is Model Confidence?
- Model Confidence Score (s⁽ˡ⁾ᵢ)
- The predicted probability that subsequence X⁽ˡ⁾ᵢ belongs to the positive (minority) class, as output by a trained classifier.
- Mathematical Form
-
s⁽ˡ⁾ᵢ = f(X⁽ˡ⁾ᵢ) ∈ [0, 1] where: - f: trained classifier (LSTM, CNN, etc.) - s⁽ˡ⁾ᵢ: confidence score - Range: 0 (definitely negative) to 1 (definitely positive)
🤔 Why Do We Need Confidence Scores?
Problem: Not All Subsequences Are Equally Useful
Consider two subsequences from an ASD gait sequence:
- X⁽⁰⁾: Last 5 frames — clearly shows ASD pattern
- X⁽⁸⁾: Very early frames — looks completely normal
If we mix these randomly with equal weight, we might generate samples that are too normal to be useful, or too mixed to be realistic.
Solution: Use Model's Own Assessment
The trained model can tell us:
- Which subsequences are "deep" in the positive region (high confidence)
- Which are borderline (medium confidence)
- Which look negative (low confidence)
We use this information to intelligently guide the mixing process.
5.2 Computing Confidence Scores
Step-by-Step Process
Train a Base Classifier
First, train an initial classifier on your imbalanced dataset (before applying T-SMOTE). This can be:
- LSTM (for sequential dependencies)
- 1D CNN (for local patterns)
- Transformer (for long-range dependencies)
- Any model that outputs probabilities
Note: This doesn't need to be perfect—it just needs to provide reasonable confidence estimates.
Generate Subsequences
For each positive sample in your dataset, create subsequences at different leading times:
Run Through Classifier
Pass each subsequence through the trained model to get probability outputs:
Store and Use
Store these confidence scores—you'll use them to:
- Determine mixing weights (via Beta distribution)
- Calculate synthetic sample confidences
- Filter unreliable samples (via weighted sampling)
5.3 Interpreting Confidence Scores
| Score Range | Interpretation | Position in Feature Space | Usefulness for Training |
|---|---|---|---|
| 0.9 - 1.0 | Very high confidence positive | Deep in positive region | Good for establishing class center |
| 0.7 - 0.9 | Confident positive | Solidly in positive region | Most useful for training |
| 0.5 - 0.7 | Likely positive | Approaching decision boundary | Critical for learning boundaries |
| 0.3 - 0.5 | Uncertain/borderline | Near or on decision boundary | Handle carefully—may be mislabeled |
| 0.0 - 0.3 | Looks negative | In negative region | Likely mislabeled or very early stage |
5.4 The Confidence Progression Pattern
📊 Typical Pattern for an ASD Gait Sequence
Sequence Length: T = 60 frames, Window: w = 20 frames
| Leading Time (l) | Frames Used | Confidence (s⁽ˡ⁾) | Description |
|---|---|---|---|
| 0 | [41-60] | 0.94 | Clear ASD pattern established |
| 5 | [36-55] | 0.88 | Pattern visible but less pronounced |
| 10 | [31-50] | 0.76 | Transitional phase beginning |
| 15 | [26-45] | 0.58 | Subtle abnormalities emerging |
| 20 | [21-40] | 0.42 | Mostly normal with hints |
| 25 | [16-35] | 0.31 | Appears normal |
Key Observation: Confidence scores decrease as we go back in time, showing the gradual evolution from normal to ASD gait.
5.5 Edge Cases and Considerations
⚠️ Common Pitfalls
1. Poor Base Classifier
Problem: If your initial classifier is terrible (random guessing), confidence scores will be meaningless.
Solution: Ensure your base classifier achieves at least moderate performance (e.g., AUC > 0.6) before using T-SMOTE.
2. All High Confidences
Problem: If all subsequences have confidence > 0.9, you're not capturing the transition.
Solution: Increase maximum leading time L to go further back in time.
3. All Low Confidences
Problem: If all scores are < 0.5, the sample might be mislabeled.
Solution: Review labels or exclude this sample from augmentation.
4. Non-Monotonic Progression
Problem: Sometimes scores don't decrease smoothly (e.g., s⁽³⁾ > s⁽¹⁾).
Solution: This is normal due to noise. T-SMOTE is robust to small fluctuations.
💡 Pro Tip: Warm-Start Strategy
If your initial dataset is very imbalanced (e.g., 1:100), your base classifier might struggle. Try this approach:
- Apply simple oversampling (duplication) to get to 1:10 ratio
- Train a base classifier on this
- Use this classifier to compute T-SMOTE confidence scores
- Apply T-SMOTE to get to 1:1 ratio
- Train final classifier on T-SMOTE augmented data
📊 6. Beta Distribution & Mixing Weights
The Beta distribution is the mathematical heart of T-SMOTE. It determines how to mix two temporal neighbors based on their confidence scores, ensuring synthetic samples are both diverse and realistic.
6.1 What is the Beta Distribution?
- Beta Distribution
- A continuous probability distribution defined on the interval [0, 1], parameterized by two shape parameters α and β (often denoted as a and b).
- Mathematical Form
-
X ~ Beta(a, b) Probability Density Function: f(x; a, b) = (x^(a-1) * (1-x)^(b-1)) / B(a, b) where B(a,b) is the Beta function (normalization constant) Mean: E[X] = a / (a + b) Variance: Var[X] = (a*b) / ((a+b)²(a+b+1))
6.2 Why Beta Distribution?
🎯 Perfect Properties for Our Task
1. Bounded to [0,1]
For interpolation X_new = α·X⁽ˡ⁾ + (1-α)·X⁽ˡ⁺¹⁾, we need α ∈ [0,1]. Beta naturally lives in this range.
2. Flexible Shapes
By varying parameters a and b, Beta can be:
- Uniform: Beta(1,1) → equal probability for all α
- Left-skewed: Beta(0.3, 0.8) → favors small α
- Right-skewed: Beta(0.8, 0.3) → favors large α
- Bell-shaped: Beta(5, 5) → concentrated around 0.5
3. Natural Interpretation
When a = s⁽ˡ⁾ and b = s⁽ˡ⁺¹⁾:
- If s⁽ˡ⁾ > s⁽ˡ⁺¹⁾: α tends toward 1 → more weight on recent (confident) subsequence
- If s⁽ˡ⁾ < s⁽ˡ⁺¹⁾: α tends toward 0 → more weight on earlier subsequence
- If s⁽ˡ⁾ ≈ s⁽ˡ⁺¹⁾: α around 0.5 → balanced mixing
4. Incorporates Model Knowledge
By using confidence scores as parameters, we let the model's own assessment guide the augmentation process.
6.3 T-SMOTE's Use of Beta
The Formula
Intuitive Interpretation
Think of a and b as "votes" for which subsequence to favor:
- a = s⁽ˡ⁾ = 0.9: 9 votes for X⁽ˡ⁾ (recent, confident)
- b = s⁽ˡ⁺¹⁾ = 0.4: 4 votes for X⁽ˡ⁺¹⁾ (earlier, less confident)
- Expected α: 0.9/(0.9+0.4) ≈ 0.69 → leans toward recent one
6.4 Visual Examples
Case 1: High vs Low Confidence
Parameters: Beta(0.90, 0.30)
- Mean: 0.90/(0.90+0.30) = 0.75
- Shape: Strongly right-skewed
- Interpretation: Most samples will have α around 0.7-0.8, heavily favoring X⁽ˡ⁾
Distribution Shape:
Probability
▲
1.5│ ╱█
│ ╱ █
1.0│ ╱ █
│ ╱ █
0.5│ ▁▁▁▁▁▁▁╱ █
│ ▁▁▁▁▁▁▁ █
0.0└────────────────────────────▶ α
0 0.2 0.4 0.6 0.8 1.0
Result: Synthetic samples will be very similar to X⁽⁰⁾ (recent, confident)
Case 2: Similar Confidences
Parameters: Beta(0.65, 0.60)
- Mean: 0.65/(0.65+0.60) = 0.52
- Shape: Roughly symmetric
- Interpretation: Balanced mixing with slight favor to X⁽ˡ⁾
Distribution Shape:
Probability
▲
1.5│ ╱█╲
│ ╱ █ ╲
1.0│ ╱ █ ╲
│ ╱ █ ╲
0.5│ ▁▁▁╱ █ ╲▁▁▁
│ ▁▁▁ █ ▁▁▁
0.0└────────────────────────────▶ α
0 0.2 0.4 0.6 0.8 1.0
Result: Diverse synthetic samples spanning both subsequences
6.5 Numerical Example
🔢 Step-by-Step Calculation
Setup:
- Subsequence X⁽¹⁾: confidence s⁽¹⁾ = 0.84
- Subsequence X⁽²⁾: confidence s⁽²⁾ = 0.58
Set Beta Parameters
Calculate Expected Value
Interpretation: On average, synthetic samples will be 59.2% from X⁽¹⁾ and 40.8% from X⁽²⁾
Sample α (in practice)
Create Synthetic Sample
This preserves temporal structure while creating a slightly earlier version of the pattern.
6.6 Why Not Simpler Alternatives?
❌ Uniform Random (α ~ U[0,1])
- Ignores confidence information
- Treats all mixing equally likely
- Could create unrealistic samples
- No model guidance
Example: Might mix 90% confident with 30% confident subsequence using α=0.1, creating mostly negative-looking sample
✅ Beta Distribution
- Incorporates confidence
- Adaptively weights mixing
- Creates realistic samples
- Model-guided augmentation
Same scenario: Beta(0.9, 0.3) naturally produces α around 0.7-0.8, keeping samples positive
❌ Fixed α (e.g., α=0.5)
- No diversity
- All synthetics are identical
- Overfitting risk
- Doesn't explore space
✅ Sampled α from Beta
- Natural diversity
- Different synthetics each time
- Better generalization
- Explores around mean
⚗️ 7. Synthesizing New Samples
Now we bring everything together: temporal subsequences, confidence scores, and Beta-sampled mixing weights combine to create synthetic time-series samples that are both temporally coherent and strategically positioned in feature space.
7.1 The Core Synthesis Formula
📐 Element-Wise Operation
The interpolation happens for every single value in the matrices:
This ensures temporal coherence—we're not mixing different time steps!
7.2 Complete Example with Real Numbers
🔢 Full Worked Example
Scenario: 3-feature gait data, window size w=4
Step 1: Two Temporal Subsequences
X⁽⁰⁾ (recent, l=0, confidence = 0.84):
| Time | hip_x | knee_x | ankle_x |
|---|---|---|---|
| 1 | 1.6 | 0.9 | 2.5 |
| 2 | 1.8 | 1.0 | 2.7 |
| 3 | 2.0 | 1.1 | 2.9 |
| 4 | 2.2 | 1.3 | 3.0 |
X⁽¹⁾ (earlier, l=1, confidence = 0.58):
| Time | hip_x | knee_x | ankle_x |
|---|---|---|---|
| 1 | 1.3 | 0.8 | 2.3 |
| 2 | 1.6 | 0.9 | 2.5 |
| 3 | 1.8 | 1.0 | 2.7 |
| 4 | 2.0 | 1.1 | 2.9 |
Step 2: Sample Mixing Weight
Step 3: Compute Synthetic Sample (Element-by-Element)
Time step 1:
hip_x: 0.59×1.6 + 0.41×1.3 = 0.944 + 0.533 = 1.477
knee_x: 0.59×0.9 + 0.41×0.8 = 0.531 + 0.328 = 0.859
ankle_x: 0.59×2.5 + 0.41×2.3 = 1.475 + 0.943 = 2.418
Time step 2:
hip_x: 0.59×1.8 + 0.41×1.6 = 1.062 + 0.656 = 1.718
knee_x: 0.59×1.0 + 0.41×0.9 = 0.590 + 0.369 = 0.959
ankle_x: 0.59×2.7 + 0.41×2.5 = 1.593 + 1.025 = 2.618
Time step 3:
hip_x: 0.59×2.0 + 0.41×1.8 = 1.180 + 0.738 = 1.918
knee_x: 0.59×1.1 + 0.41×1.0 = 0.649 + 0.410 = 1.059
ankle_x: 0.59×2.9 + 0.41×2.7 = 1.711 + 1.107 = 2.818
Time step 4:
hip_x: 0.59×2.2 + 0.41×2.0 = 1.298 + 0.820 = 2.118
knee_x: 0.59×1.3 + 0.41×1.1 = 0.767 + 0.451 = 1.218
ankle_x: 0.59×3.0 + 0.41×2.9 = 1.770 + 1.189 = 2.959
Step 4: Final Synthetic Sample (X_new)
| Time | hip_x | knee_x | ankle_x |
|---|---|---|---|
| 1 | 1.477 | 0.859 | 2.418 |
| 2 | 1.718 | 0.959 | 2.618 |
| 3 | 1.918 | 1.059 | 2.818 |
| 4 | 2.118 | 1.218 | 2.959 |
✅ Verification:
- All values lie between X⁽⁰⁾ and X⁽¹⁾ ✓
- Temporal progression is smooth ✓
- Closer to X⁽⁰⁾ (since α=0.59) ✓
- Physically plausible joint positions ✓
7.3 Synthetic Label Confidence
Along with the synthetic sequence, we also compute its expected confidence:
Continuing Our Example:
Interpretation: The synthetic sample is expected to have ~73% confidence of being ASD—still positive, but closer to the decision boundary than X⁽⁰⁾.
7.4 Why This Works: The Geometry
Geometric Interpretation in Feature Space
Imagine plotting confidence scores along a temporal axis:
Confidence
1.0│ ● X⁽⁰⁾ (0.84)
│ ╱
0.8│ ★ ← X_new (0.73)
│ ╱
0.6│ ● X⁽¹⁾ (0.58)
│ ╱
0.4│ ● X⁽²⁾
│ ╱
0.2│ ● X⁽³⁾
│
0.0└─────────────────────────────────▶ Time
earlier recent
Key Points:
- Synthetic sample (★