Hydra Advanced Configuration

LitDet leverages Hydra to provide a modular, hierarchical configuration system. Every run is governed by a composition of several YAML modules that define the data, model, and training behavior.

LitDet Architecture

The high-level architecture of LitDet.

Hydra instantiates five primary objects based on your configuration. You can find these organized in the configs/ directory.

Data

The data module configures the LightningDataModule, which handles all data loading, splitting, and preprocessing.

COCO configuration
_convert_: "object" # Convert ListConfig objects from transformation into a list
defaults:
  - _self_
  - transformation@train_transforms: train_transforms
  - transformation@test_transforms: test_transforms
  - transformation@predict_transforms: predict_transforms
  - transformation@post_transforms: post_transforms

_target_: litdet.data.coco_datamodule.COCODataModule
data_dir: ${paths.data_dir}
data_name: "COCO"
batch_size: 2 # Needs to be divisible by the number of devices (e.g., if in a distributed setup)
auto_partition_from_train: false
train_val_test_split: [0.8, 0.1, 0.1]
num_workers: 0
pin_memory: true
test_on_val_split: false
auto_download_dataset: false
include_background: false
is_contiguous: True

Dataset Selection: By default, LitDet uses COCO (configs/data/coco.yaml), but it also supports MNIST for classification.

Handling Missing Labels: If your dataset lacks test ground-truth (like official COCO), set data.test_on_val_split=True to use the validation set for final evaluation.

Automatic Partitioning: If you only have a training set, set data.auto_partition_from_train=True and data.train_val_test_split to automatically generate validation and test splits.

Image Augmentations: Transforms are managed via sub-configs. You can update torchvision augmentations for your config (like RandomHorizontalFlip or ScaleJitter).

Warning

We have a distinction between user-defined data augmentations (e.g. RandomPhotometricDistort), and internal pre-processing functions (e.g, ToDtype). While you can modify the former at runtime to suit your ML experiments, the latter are required for model integrity and must not be included as they are defined within the library.

Performance: To improve data loading speeds, you can keep the dataset in memory using data.pin_memory=True and accelerate batch processing by increasing data.num_workers.

Task

The task module defines the LightningModule for object detection task, which contains the model architecture, loss functions, and optimization logic.

Detect task configuration
defaults:
  - _self_
  - model: faster_rcnn

_target_: litdet.tasks.detect_module.DetectLitModule

optimizer:
  _target_: torch.optim.Adam
  _partial_: true
  lr: 1e-3
  weight_decay: 1e-6

lr_decay_config: null

lr_scheduler_config: null

compile: false

model:
  _partial_: true

Model Architectures: You can swap between detection models like Faster R-CNN (task.model=faster_rcnn) or SSD (task.model=ssd).

Model configuration
_target_: litdet.tasks.components.build_ssd300_vgg16

weights:
  _target_: hydra.utils.get_object
  path: torchvision.models.detection.SSD300_VGG16_Weights.DEFAULT

num_classes: 91

weights_backbone:
  _target_: hydra.utils.get_object
  path: torchvision.models.VGG16_Weights.DEFAULT

trainable_backbone_layers: 4

Depending on the model, you should define the number of classes (task.model.num_class) and optionally pre-trained weights (task.model.weights). For a full-list of pre-trained weights, you should check the torchvision documentation or our Model Zoo.

Optimization: Configure the training by choosing an optimizer (e.g. Adam [KB17]) and adjusting its base learning rate (task.optimizer.lr) or schedule (task.lr_scheduler_config).

Check below for an example on how to setup a learning rate scheduler with ReduceLROnPlateau.

Learning rate scheduler configuration
    monitor: "train/loss"
    interval: "step"
    frequency: 1
    strict: True,
    name: None,
    scheduler:
      _target_: torch.optim.lr_scheduler.ReduceLROnPlateau
      _partial_: true
      mode: min
      factor: 0.1
      patience: 10000

Note

Several features are currently in development, including support for state-of-the-art detection models like DINOv3 [SVS+25] and the implementation of layer-wise learning rate decay [BDPW22]. Furthermore, our roadmap includes expanding LitDet to support other core computer vision tasks, such as classification and segmentation.

Trainer

This module configures the lightning Trainer, controlling the hardware and the high-level training loop.

Hardware: Switch between trainer=cpu, trainer=gpu or trainer=mps (for Apple Silicon).

Performance: Enable 16-bit precision quantization with trainer=mixed_precision for faster training on compatible GPUs.

Distributed Training and Inference: Use trainer=ddp to scale across multiple GPUs. We provide more information for a distributed setup in this section.

Callbacks

Callbacks are modular tools injected into the training loop to add extra functionality.

Default callbacks used by LitDet
defaults:
  - model_checkpoint
  - model_best
  - model_summary
  - rich_progress_bar
  - prediction_writer
  - _self_

Custom Visualizers: Inspect augmented images before they enter the model with callbacks=batch_visualizer. You can also view model predictions with a custom score threshold with callbacks=prediction_visualizer.

Custom Writers: Easily write the predictions and images to disk with callbacks=batch_writer, and the annotation file with the prediction_writer callback.

Warning

The prediction_writer callback derive from the lightning BasePredictionWriter callback and is available only during the predict stage.

Standard Tools: Includes model_checkpoint for saving best model checkpoints and model_best to save the best model weights, early_stopping to prevent overfitting or rich_progress_bar for nice console output.

Note

To append multiple callbacks together, use the list operator from Hydra like so: callbacks=[batch_writer,prediction_visualizer]. And to disable all callbacks, you should use callbacks=none.

Loggers

Track your metrics and visualize experiment progress with popular lightning loggers. In addition, LitDet provides specific features for Aim and TensorBoard such as metrics visualization and predictions per valid step.

Multirun

The multi-run feature is a powerful tool for automating hyperparameter sweeps or running the same training pipeline across different configurations. It allows you to launch multiple jobs sequentially or in parallel from a single command.

To trigger a multi-run, append the -m (or --multirun) flag to your command. You can then provide a comma-separated list of values for any configuration parameter. For example, to trigger a multirun training with different models:

light-train -m +experiment=my_config.yaml task/model=faster_rcnn,ssd,ssdlite

Multiple folders will be saved, one for each model following hydra directory pattern.