🏔️ Understanding Plateaus & ReduceLROnPlateau
1. What is a Plateau?
Plateau: When your model's loss (or accuracy) stops improving and stays flat for several epochs.
Notice: Epochs 10-25 show a plateau - the loss barely changes! The model is "stuck" at 0.3 loss.
2. Why Does a Plateau Happen?
❌ Problem: Learning Rate Too Large
Steps are too big - the model "bounces around" the optimal point without settling in.
Analogy: Like trying to walk down stairs by jumping 5 steps at a time - you'll miss the bottom!
❌ Problem: Learning Rate Too Small
Steps are too tiny - progress is extremely slow, appears stuck.
Analogy: Like walking down stairs taking baby steps - it takes forever!
3. What is ReduceLROnPlateau?
A callback that automatically reduces the learning rate when training plateaus.
The Logic: "If the model hasn't improved in a while, let's try smaller steps!"
The Logic: "If the model hasn't improved in a while, let's try smaller steps!"
See the magic: When plateau detected at epoch 10, LR reduced from 0.01 to 0.005. Loss starts improving again! This happens twice more.
4. How ReduceLROnPlateau Works
| Epoch | Val Loss | Best So Far | Patience Counter | Action | Learning Rate |
|---|---|---|---|---|---|
| 1 | 0.500 | 0.500 | 0 | ✅ New best! | 0.01 |
| 2 | 0.450 | 0.450 | 0 | ✅ New best! | 0.01 |
| 3 | 0.455 | 0.450 | 1 | ⚠️ No improvement | 0.01 |
| 4 | 0.452 | 0.450 | 2 | ⚠️ No improvement | 0.01 |
| 5 | 0.451 | 0.450 | 3 | ⚠️ No improvement | 0.01 |
| 6 | 0.450 | 0.450 | 0 (reset) | 🔻 REDUCE LR! (patience=3 reached) | 0.005 |
| 7 | 0.430 | 0.430 | 0 | ✅ New best! (smaller LR helps) | 0.005 |
5. Code Example
from tensorflow.keras.callbacks import ReduceLROnPlateau
reduce_lr = ReduceLROnPlateau(
monitor='val_loss', # Watch validation loss
factor=0.5, # Reduce LR by half (new_lr = old_lr * 0.5)
patience=10, # Wait 10 epochs without improvement
min_lr=1e-6, # Don't go below 0.000001
verbose=1 # Print messages
)
model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=100,
callbacks=[reduce_lr]
)
6. Key Parameters Explained
| Parameter | What It Does | Example |
|---|---|---|
| monitor | Which metric to watch | 'val_loss', 'val_accuracy' |
| patience | How many epochs to wait before reducing | 10 = wait 10 epochs with no improvement |
| factor | How much to reduce LR | 0.5 = cut in half, 0.2 = reduce to 20% |
| min_lr | Minimum learning rate (stop reducing here) | 1e-6 = don't go below 0.000001 |
| verbose | Print messages when LR is reduced | 1 = yes, 0 = no |
7. Real Example Output
Epoch 1/100 loss: 0.6500 - val_loss: 0.6200 Epoch 10/100 loss: 0.3200 - val_loss: 0.3500 Epoch 20/100 loss: 0.3150 - val_loss: 0.3480 Epoch 21: ReduceLROnPlateau reducing learning rate to 0.0005. Epoch 25/100 loss: 0.2800 - val_loss: 0.3100 ← Loss improving again! Epoch 35/100 loss: 0.2750 - val_loss: 0.3050 Epoch 36: ReduceLROnPlateau reducing learning rate to 0.00025.
8. When to Use ReduceLROnPlateau?
✅ Use When:
- You're not sure what LR to use
- Training long-term (100+ epochs)
- You want automatic LR adjustment
- Working with small datasets
❌ Don't Use When:
- Doing LR finder experiment
- Using LearningRateScheduler
- Training very few epochs
- You need fixed LR for comparison
9. Summary
🎯 The Big Picture:
- Plateau = Model stops improving (stuck)
- ReduceLROnPlateau = Automatically makes learning rate smaller when stuck
- Why it works = Smaller steps help fine-tune and escape local minimums
- Result = Better final performance without manual tuning!
Think of it as: Your model's GPS saying "You're close to the destination, take smaller steps now!"