Quickstart¶

This page gets you from a blank environment to a written benchmark report in about 60 seconds.

1. Install¶

pip install 'rpx-benchmark[depth]'

See Installation for the full extras matrix.

2. List what's available¶

rpx ls        # tasks + ESD splits + required modalities
rpx models    # runnable adapter names (also shows deferred stubs)

Example output of rpx models:

Runnable models:
  depth_anything_v2_metric_indoor_base
  depth_anything_v2_metric_indoor_large
  depth_anything_v2_metric_indoor_small
  depth_pro
  metric3d_v2_vit_giant2
  metric3d_v2_vit_large
  metric3d_v2_vit_small
  unidepth_v2_vitb
  unidepth_v2_vitl
  zoedepth_nyu

Deferred (registered for visibility; raise on resolve):
  depth_anything_3
  prompt_depth_anything_vits
  video_depth_anything_large

3. Run a registered model on the Hard split¶

rpx bench monocular_depth --model depth_pro --split hard

4. Or run ANY HuggingFace depth checkpoint — zero code¶

rpx bench monocular_depth \
    --hf-checkpoint depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf \
    --split hard

Works with any model loadable via transformers.AutoModelForDepthEstimation.

5. Run segmentation¶

rpx bench object_segmentation \
    --hf-checkpoint facebook/mask2former-swin-tiny-coco-instance \
    --split hard

6. Read the output¶

Every run writes two files under ./rpx_results/<model>/<split>/:

result.json — machine-readable. Contains:
- aggregated: metrics averaged over the split.
- per_sample: one row per sample with metric values and id / phase / difficulty metadata, so downstream analysis can group back to scenes and phases without re-reading the manifest.
- deployment_readiness: Weighted Phase Score, State-Transition Robustness, Temporal Stability, FLOPs, median latency, parameter count.
summary.md — human-readable. Rendered with the same tables the CLI prints.

7. Live terminal output¶

If rich is installed (pulled in by [depth]), the CLI renders Claude-Code-style panels:

A header panel with model / split / device
Live progress bar during inference
Aggregated metric table
ESD-weighted phase score table with coloured Δ_int / Δ_rec
Efficiency table (params / FLOPs / latency)
Footer pointing at the output files

Pass --plain to disable the rich UI for CI logs / plain ssh / tmux weirdness.

8. Global flags¶

rpx --verbose <command>   # DEBUG-level logging
rpx --quiet <command>     # WARNING-level logging only

Exit codes:

Code	Meaning
`0`	Success
`1`	`RPXError` (config / dataset / model / metric / download failure)
`2`	CLI argument error (handled by argparse)
`130`	KeyboardInterrupt

All errors subclass rpx.RPXError and carry a hint line telling the user exactly what to fix.