Core types (rpx_benchmark.api)¶
The stable public surface: task enums, phase / difficulty labels,
prediction and ground-truth dataclasses, and the
BenchmarkModel abstract base.
api
¶
Core task enums, data contracts, and the base model interface.
This module is the stable public surface users bind against when they
plug a model into the benchmark. Everything here is either an
Enum, a plain @dataclass prediction/ground-truth container, or
the :class:BenchmarkModel abstract base that defines what a model
looks like to the runner.
The three pluggable systems that sit on top of these types are:
- :mod:
rpx_benchmark.adapters— turns an arbitrary model into a :class:BenchmarkModel-shaped object via theInputAdapter / model / OutputAdaptercontract. - :mod:
rpx_benchmark.metrics— task → calculator plugin registry. - :mod:
rpx_benchmark.tasks.registry— task → runner plugin registry.
Stability
Enums and dataclasses in this module are append-only: adding new tasks or new fields is fine; renaming or removing them is a breaking change that requires a major version bump.
TaskType
¶
Bases: str, Enum
Enumeration of every task the benchmark toolkit recognises.
Members are plain strings so they serialise cleanly to JSON and can be used as dict keys for logging / table rows.
Members
MONOCULAR_DEPTH Dense metric depth from a single RGB frame. OBJECT_DETECTION Closed-vocabulary detection with category labels. OBJECT_SEGMENTATION Instance segmentation masks with per-pixel instance IDs. OBJECT_TRACKING Multi-object tracking with persistent track IDs. RELATIVE_CAMERA_POSE 6-DoF pose of frame B relative to frame A. OPEN_VOCAB_DETECTION Detection conditioned on a free-text vocabulary. VISUAL_GROUNDING Referring expression → bounding box on the image. SPARSE_DEPTH Depth values at a sparse set of image locations only. NOVEL_VIEW_SYNTHESIS RGB synthesis from a held-out target pose. KEYPOINT_MATCHING Dense/sparse correspondences between two images.
Examples:
>>> from rpx_benchmark.api import TaskType
>>> TaskType.MONOCULAR_DEPTH.value
'monocular_depth'
>>> TaskType("monocular_depth") is TaskType.MONOCULAR_DEPTH
True
Phase
¶
Bases: str, Enum
Capture phases of the three-phase RPX reconfiguration protocol.
Every scene is recorded in three phases so the benchmark can attribute performance changes to scene state rather than to lighting / viewpoint / camera identity.
Members
CLUTTER Initial dense object arrangement; significant inter-object occlusion. INTERACTION Human operator grasps and moves objects. Introduces hand-object contact and transient occlusion. CLEAN Same objects re-organised sparsely. Serves as a within-scene control for the other two phases.
Difficulty
¶
Bases: str, Enum
Effort-Stratified Difficulty (ESD) split label.
ESD splits are derived per (scene, phase) from the
annotation-effort signal described in paper §4. See
:mod:rpx_benchmark.deployment for the scoring details.
Members
EASY Few annotation iterations, low occlusion, stable visibility. MEDIUM HARD Many annotation iterations, dense occlusion, high depth- invalid fraction, high jerk.
Sample(id: str, rgb: np.ndarray, ground_truth: Any, metadata: Dict[str, Any] | None = None, phase: Phase | None = None, difficulty: Difficulty | None = None, camera_pose: np.ndarray | None = None)
dataclass
¶
One input unit handed by :class:RPXDataset to a model.
Samples are produced by the loader and consumed by
BenchmarkModel.predict. Every field is deliberately simple
(numpy arrays, enums, plain dicts) so models and adapters don't
need to know anything about the on-disk dataset format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
Unique identifier of the form |
required |
rgb
|
ndarray
|
H × W × 3 uint8 RGB image in row-major order. |
required |
ground_truth
|
Any
|
Task-specific GroundTruth dataclass (e.g.
:class: |
required |
metadata
|
dict
|
Free-form metadata the loader can attach; conventionally holds fisheye images, secondary RGB frames for pair tasks, and any label paths that do not fit into the ground-truth dataclass. Consumers should treat unknown keys as opaque. |
None
|
phase
|
Phase
|
Capture phase the frame belongs to. Required for ESD-weighted phase scoring. |
None
|
difficulty
|
Difficulty
|
ESD difficulty label of the |
None
|
camera_pose
|
ndarray
|
4 × 4 float64 SE(3) matrix (camera → world) sourced from the T265 tracker. Used for the temporal-stability metric. |
None
|
BenchmarkModel
¶
Bases: ABC
Abstract base class every RPX-compatible model must implement.
In practice, most users should not subclass this directly —
instead compose a :class:rpx_benchmark.adapters.BenchmarkableModel
from an input adapter, a model callable, and an output adapter.
BenchmarkableModel already implements :meth:predict and
:meth:setup correctly for you.
Subclass only when you need complete control over how samples are routed to your model (e.g. true minibatching across GPU devices).
Attributes:
| Name | Type | Description |
|---|---|---|
task |
TaskType
|
The task this model solves. Must be set by subclasses (either
at class level or in |
Examples:
Minimal subclass::
class MyDepth(BenchmarkModel):
task = TaskType.MONOCULAR_DEPTH
def setup(self):
self.net = load_my_checkpoint()
def predict(self, batch):
return [
DepthPrediction(depth_map=self.net(s.rgb))
for s in batch
]
Composed via :class:BenchmarkableModel::
bm = rpx.BenchmarkableModel(
task=TaskType.MONOCULAR_DEPTH,
input_adapter=MyInputAdapter(),
model=my_nn_module,
output_adapter=MyOutputAdapter(),
name="my_model",
)
setup() -> None
abstractmethod
¶
Load checkpoints, warm CUDA, and do any other one-time init.
The runner calls this exactly once before iterating the
dataset, unless BenchmarkRunner(call_setup=False) was
passed — in which case the caller is responsible.
Source code in rpx_benchmark/api.py
predict(batch: Sequence[Sample]) -> Sequence[Any]
abstractmethod
¶
Run inference on a batch of samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
sequence of Sample
|
One or more samples. Length equals |
required |
Returns:
| Type | Description |
|---|---|
sequence
|
One task-specific Prediction dataclass per input sample,
in the same order. The prediction dataclass must match
what :class: |
Raises:
| Type | Description |
|---|---|
ModelError
|
(By convention) when a sample cannot be processed. The runner surfaces it as a clean error rather than a stack trace. |
Source code in rpx_benchmark/api.py
validate_prediction(task: TaskType, prediction: Any, sample: Sample | None = None) -> None
¶
Validate a Prediction dataclass's shape and type for a given task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task
|
TaskType
|
Task the runner is evaluating. |
required |
prediction
|
Any
|
Prediction dataclass the model just returned. |
required |
sample
|
Sample
|
The sample the prediction was produced for; used for shape cross-checks (e.g. segmentation mask vs RGB size). |
None
|
Raises:
| Type | Description |
|---|---|
ModelError
|
If the prediction is the wrong type or the wrong shape for the task. |
Source code in rpx_benchmark/api.py
393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 | |