Complete Guide to Hyperparameter Tuning with Optuna

Python TensorFlow / Keras BiLSTM Optuna SMOTE / Imbalance
1. What is Hyperparameter Tuning? Concept

A hyperparameter is anything you choose before training:

  • Learning rate
  • Number of LSTM units (e.g., 32, 64, 128)
  • Number of layers (1 LSTM vs 2 stacked BiLSTM)
  • Dropout rate (e.g. 0.2–0.5)
  • Batch size (e.g. 16, 32, 64)
  • Optimizer type ("adam" vs "rmsprop")

Hyperparameter tuning is the process of searching for the combination of these values that gives the best validation performance (e.g., highest validation accuracy or lowest validation loss).

2. What is Optuna?

Optuna is a modern, Python-based library for automatic hyperparameter optimization. Instead of manually guessing hyperparameters, Optuna:

  • Defines a search space of possibilities.
  • Runs many trials (experiments) with different combinations.
  • Uses smart algorithms (like TPE) to focus on promising regions.
  • Optionally prunes bad trials early to save time.
Key benefits: Why Optuna?
  • Flexible: Works with any Python code (Keras, PyTorch, XGBoost, etc.).
  • Efficient: Uses Bayesian optimization (TPE) instead of naive grid search.
  • Supports pruning: Stops unpromising trials early.
  • Good for deep learning: Integrates with Keras through callbacks.

3. Core Concepts: Study, Trial, Objective

3.1 Objective Function

The objective(trial) function is the heart of Optuna. Optuna keeps calling this function with different trial objects. Inside it you:

  1. Ask for hyperparameters with trial.suggest_....
  2. Build and train a model using those hyperparameters.
  3. Return a scalar metric (e.g., best validation accuracy).

3.2 Trial

A trial = one complete run of the objective function with one particular set of hyperparameters.

  • trial.suggest_categorical(...) → picks one value from a list.
  • trial.suggest_float(...) → picks a float from a range.
  • trial.suggest_int(...) → picks an integer from a range.

For example, if you use n_trials=40, Optuna will run 40 different trials with different hyperparameters.

3.3 Study

A study is an Optuna object that:

  • Manages all trials.
  • Stores trial results and hyperparameters.
  • Knows which trial is the best.
import optuna

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=40)

print("Best value:", study.best_value)
print("Best params:", study.best_params)

You can think of the study as the “experiment manager” that records everything.

4. Installation & Basic Template

4.1 Install Optuna

pip install optuna

4.2 Minimal Example (Non-Deep-Learning)

We want to maximize the function f(x) = -(x - 2)^2 + 10.

import optuna

def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    return -(x - 2) ** 2 + 10

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50)

print("Best value:", study.best_value)
print("Best params:", study.best_params)

Optuna will discover that the best x is around 2.

5. Defining Search Spaces

The search space is defined via trial.suggest_* methods. Each call tells Optuna:

  • What the hyperparameter is named (e.g. "learning_rate").
  • Which values are allowed (range or list).
Common Suggest Methods
Method Use case Example
suggest_categorical Choose from discrete options. trial.suggest_categorical("optimizer", ["adam", "rmsprop"])
suggest_float Continuous value (uniform range). trial.suggest_float("dropout", 0.2, 0.5)
suggest_float(..., log=True) Continuous value on log scale (for LR). trial.suggest_float("lr", 1e-5, 1e-3, log=True)
suggest_int Integer values. trial.suggest_int("lstm_units", 32, 128, step=16)
Example: BiLSTM Search Space Deep Learning
def build_model(trial, input_shape):
    lstm_units_1 = trial.suggest_categorical("lstm_units_1", [16, 32, 64, 96, 128])
    lstm_units_2 = trial.suggest_categorical("lstm_units_2", [8, 16, 24, 32, 48])
    dense_units  = trial.suggest_categorical("dense_units",  [16, 32, 48, 64])
    dropout      = trial.suggest_float("dropout_rate", 0.2, 0.5)
    use_second   = trial.suggest_categorical("use_second_lstm", [True, False])
    lr           = trial.suggest_float("learning_rate", 1e-5, 1e-3, log=True)
    opt_choice   = trial.suggest_categorical("optimizer_choice", ["adam", "rmsprop"])
    ...

suggest_categorical means the trial must pick exactly one value from the list. It is NOT a range; it is a discrete set of allowed choices.

6. The Objective Function in Depth

6.1 General pattern

def objective(trial):
    # 1. Ask Optuna for hyperparameters
    hp1 = trial.suggest_...( ...)
    hp2 = trial.suggest_...( ...)

    # 2. Build model using these hyperparameters
    model = build_model(trial, input_shape)

    # 3. Train model (on train set, evaluate on validation set)
    history = model.fit(...)

    # 4. Compute scalar metric from history (e.g. best val_accuracy)
    score = max(history.history["val_accuracy"])

    # 5. Return that scalar
    return score

6.2 Who uses the returned value?

The value you return from objective() is used by:

  • study.optimize() behind the scenes.
  • Optuna uses this metric to decide how good a trial is.
  • Based on all returned values, it explores better hyperparameters next.

In other words: the returned metric is how the trial is judged.

7. What to Return: Last vs Best Validation Accuracy

7.1 history.history Explained

When you call model.fit(), it returns a History object:

  • history.history is a dict: metric name → list of values per epoch.
  • history.history["val_accuracy"] = list of validation accuracies across epochs.
  • history.history["val_accuracy"][-1] = validation accuracy in the last epoch.
7.2 Last vs Max Important Choice

If you return:

  • history.history["val_accuracy"][-1]: performance of the final epoch.
  • max(history.history["val_accuracy"]): best epoch performance.

With EarlyStopping (restore_best_weights=True), training often stops near the best epoch, but it’s still safer to use the max.

def objective(trial):
    ...
    history = model.fit(...)

    # Option 1: last validation accuracy
    # score = history.history["val_accuracy"][-1]

    # Option 2 (recommended with EarlyStopping): best validation accuracy
    score = max(history.history["val_accuracy"])
    return score

The validation accuracy is computed over the entire validation set, not per-single sample.

8. Samplers: TPE and Others

A sampler decides how Optuna chooses hyperparameters in each trial.

  • TPE (Tree-structured Parzen Estimator): default, Bayesian optimization.
  • RandomSampler: pure random search.
  • CmaEsSampler: CMA-ES (useful for continuous spaces).
TPE Sampler in a Study Bayesian Optimization
import optuna

study = optuna.create_study(
    direction="maximize",
    sampler=optuna.samplers.TPESampler(seed=42)
)
study.optimize(objective, n_trials=40)

TPE learns from previous trials: it builds probability distributions of “good” vs “bad” hyperparameters and samples more often from the “good” area. This is why Optuna is usually faster than grid or random search.

9. Pruning vs EarlyStopping

9.1 EarlyStopping (Keras)

EarlyStopping is used inside a single training run to stop when validation performance stops improving.

from tensorflow.keras.callbacks import EarlyStopping

es = EarlyStopping(
    monitor="val_loss",
    patience=10,
    restore_best_weights=True
)

history = model.fit(..., callbacks=[es])

It saves time and reduces overfitting for one model.

9.2 Pruning (Optuna + TFKerasPruningCallback)

Pruning operates at the trial level:

  • If a trial’s validation metric is clearly worse than other trials early on, Optuna stops this entire trial.
  • Then it moves on to a new set of hyperparameters.
from optuna.integration import TFKerasPruningCallback

def objective(trial):
    model = build_model(trial, input_shape)
    pruning_cb = TFKerasPruningCallback(trial, monitor="val_accuracy")

    history = model.fit(
        X_train, y_train,
        validation_data=(X_val, y_val),
        epochs=80,
        callbacks=[pruning_cb],
        verbose=0
    )

    return max(history.history["val_accuracy"])
Difference:
EarlyStopping: stops this model’s training early.
Pruning: stops this whole trial early and lets Optuna start another trial.

10. Keras / BiLSTM + Optuna Example

Below is a simplified BiLSTM tuning pipeline similar to your gait model.

import optuna
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Bidirectional, Dropout, Dense, GlobalAveragePooling1D
from tensorflow.keras.models import Model

def build_model_optuna(trial, input_shape):
    lstm1 = trial.suggest_categorical("lstm_units_1", [16, 32, 64, 96, 128])
    lstm2 = trial.suggest_categorical("lstm_units_2", [8, 16, 24, 32, 48])
    dense_units = trial.suggest_categorical("dense_units", [16, 32, 48, 64])
    dropout = trial.suggest_float("dropout_rate", 0.2, 0.5)
    use_second = trial.suggest_categorical("use_second_lstm", [True, False])
    lr = trial.suggest_float("learning_rate", 1e-5, 1e-3, log=True)
    opt_choice = trial.suggest_categorical("optimizer_choice", ["adam", "rmsprop"])

    inputs = Input(shape=input_shape)
    x = Bidirectional(LSTM(lstm1, return_sequences=True))(inputs)
    x = Dropout(dropout)(x)

    if use_second:
        x = Bidirectional(LSTM(lstm2, return_sequences=False))(x)
    else:
        x = GlobalAveragePooling1D()(x)

    x = Dropout(dropout)(x)
    x = Dense(dense_units, activation="relu")(x)
    x = Dropout(dropout)(x)
    outputs = Dense(1, activation="sigmoid")(x)
    model = Model(inputs, outputs)

    if opt_choice == "adam":
        opt = tf.keras.optimizers.Adam(learning_rate=lr)
    else:
        opt = tf.keras.optimizers.RMSprop(learning_rate=lr)

    model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"])
    return model

def objective(trial):
    model = build_model_optuna(trial, X_train_res.shape[1:])
    pruning = TFKerasPruningCallback(trial, monitor="val_accuracy")

    history = model.fit(
        X_train_res, y_train_res,
        validation_data=(X_val, y_val),
        epochs=80,
        batch_size=trial.suggest_categorical("batch_size", [16, 32, 64]),
        callbacks=[EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True),
                   pruning],
        verbose=0,
        class_weight=class_weight
    )

    return max(history.history["val_accuracy"])

study = optuna.create_study(direction="maximize", sampler=optuna.samplers.TPESampler(seed=42))
study.optimize(objective, n_trials=40)

11. Using Best Hyperparameters for Final Training

After study.optimize(), you should rebuild a fresh model using the best hyperparameters, and then train it properly (with callbacks).

best_params = study.best_trial.params
print("Best params:", best_params)

best_batch = best_params["batch_size"]
best_model = build_model_optuna(study.best_trial, X_train_res.shape[1:])

callbacks = [
    EarlyStopping(monitor="val_loss", patience=12, restore_best_weights=True),
    ModelCheckpoint("Model_biLSTM_optuna.keras", monitor="val_loss", save_best_only=True),
    ReduceLROnPlateau(monitor="val_loss", patience=6, factor=0.5, min_lr=1e-6)
]

best_model.fit(
    X_train_res, y_train_res,
    validation_data=(X_val, y_val),
    batch_size=best_batch,
    epochs=150,
    callbacks=callbacks,
    class_weight=class_weight,
    verbose=1
)

loss, acc = best_model.evaluate(X_test, y_test)
print("Test Accuracy:", acc)
Note: During tuning, we train multiple (cheaper) models to explore the space. After that, we train one final full model using the best hyperparameters.

12. Reproducibility & Seeds

For stable Optuna results (especially on GPU), set seeds and deterministic options:

import os, random, numpy as np, tensorflow as tf

SEED = 42

def set_seed():
    tf.random.set_seed(SEED)
    tf.keras.utils.set_random_seed(SEED)
    np.random.seed(SEED)
    random.seed(SEED)
    os.environ["PYTHONHASHSEED"] = str(SEED)
    os.environ["TF_DETERMINISTIC_OPS"] = "1"
    os.environ["TF_CUDNN_DETERMINISTIC"] = "1"
    tf.config.experimental.enable_op_determinism()

set_seed()

This reduces randomness and makes hyperparameter search more repeatable.

13. Common Pitfalls in Hyperparameter Tuning

Frequent Issues Pitfalls
  • Data leakage: Tuning hyperparameters using test data. Always keep a separate test set untouched until the very end.
  • Too few trials: With very complex models, 10–20 trials might not be enough. Try 40–100 trials if you can.
  • Too many hyperparameters: Huge search spaces make optimization harder. Start with a small but meaningful space.
  • Ignoring batch size: Batch size affects convergence and generalization; consider tuning it too.
  • Returning the wrong metric: Be sure the returned value matches your goal (e.g., maximizing validation accuracy).
  • Inconsistent final training: If you don’t use the tuned batch size / hyperparameters / callbacks in final training, performance might drop.

14. Advanced Features & Visualization

14.1 Saving & Loading a Study

study = optuna.create_study(
    direction="maximize",
    storage="sqlite:///optuna_study.db",
    study_name="bilstm_study",
    load_if_exists=True
)

14.2 Visualization

import optuna.visualization as vis

vis.plot_optimization_history(study).show()
vis.plot_param_importances(study).show()
vis.plot_parallel_coordinate(study).show()

These interactive plots help you see:

  • How validation accuracy improved over trials.
  • Which hyperparameters are most important.
  • How hyperparameters interact.

15. Practical Checklist

  • ✅ Define a clean train / validation / test split.
  • ✅ Decide your objective metric (e.g., maximize val_accuracy).
  • ✅ Start with a reasonable search space (don’t go too crazy at first).
  • ✅ Implement objective(trial) that:
    • Builds a model from trial hyperparameters.
    • Trains with EarlyStopping + (optional) TFKerasPruningCallback.
    • Returns best validation metric.
  • ✅ Run study.optimize(...) with sufficient n_trials.
  • ✅ Rebuild a fresh model with study.best_trial and train fully.
  • ✅ Evaluate on test set only at the end.
  • ✅ Save the final model and best hyperparameters.

With this, you have a full understanding of Optuna’s fundamentals and how to use it in deep learning projects (like your BiLSTM gait classifier). You can extend this template to more complex architectures, add attention, or use nested cross-validation if needed.