NRTK Tutorial¶

A guide to running NRTK via checkmaite.

What is NRTK?¶

The Natural Robustness Toolkit (NRTK) is an open source toolkit for generating perturbations to images designed to mimic events that may be encountered in the real world.

See the Official NRTK Documentation for the latest.

Background and Overview¶

NRTK Components¶

In the context of checkmaite, these two NRTK interfaces are most relevant:

Perturbers - Algorithms that modify an image based on a preconfigured set of rules. For example, when using a brightness perturber with the factor set to 0.5, the brightness of the initial image will be halved.
Perturbation Factories - A generator of Perturbers of a common class with input configurations varying along one or more dimensions. Theta keys are the parameters that will be changed and the Thetas are the values that will be in those parameters in each perturber.

The checkmaite NRTK Capability¶

In short, the NRTK checkmaite capability works as follows:

Specify one MAITE-compatible model, one dataset, one metric. (You will also need to specify a threshold value for displaying results)
Generate an NRTK Robustness capability configuration which specifies a Perturbation Factory, including:
- A Perturber which specifies an algorithm. For example AverageBlurPerturber applies bluring to an image.
- A Perturbation Factory which specifies one or more theta_keys and any additional parameters made available by the implementation. In the example that follows, the perturber factory accepts one theta key, as well as start, stop, and step. (See definitions below.)
Execute the capability, which will:
- Apply each yielded perturbation (i.e. possible combination of values for theta_keys) to the dataset
- Run the configured metric against each perturbed dataset
- Generate a report that displays the change in the model performance (as measured by the provided metric) as each theta_key changes within its specified range.

Keep in mind these two considerations when using the NrtkRobustness.

Be aware of performance considerations. For each perturber configuration, the NrtkRobustness capability must perturb and generate predictions over the entire dataset, and that the number of perturber configurations is a product of all of the possible theta values. For example, if three theta keys are specified and each key takes five values, the capability will generate 5*5*5=125 predictions across the entire dataset.
NrtkRobustness does NOT save the image perturbations - only the metrics resulting from the theta key changes.

Demonstration¶

In plain language, we will be measuring how a model accuracy degrades at five levels of increasing camera blur.

This is the NrtkRobustness capability configuration which we will use as our baseline as we demonstrate the features of the NRTK stack.

{
    "name": "natural_robustness_blur_model_and_sim",
    "perturber_factory": {
        "type": "nrtk.impls.perturb_image_factory.PerturberStepFactory",
        "nrtk.impls.perturb_image_factory.PerturberStepFactory": {
            "perturber": "nrtk.impls.perturb_image.photometric.blur.AverageBlurPerturber",
            "theta_key": "ksize",
            "start": 1,
            "stop": 10,
            "step": 2,
            "to_int": True
        }
}

We will use a PerturberStepFactory which generates a linear distribution of perturbations across the provided range. It has the following relevant configurations.

perturber – Python implementation type of the PerturbImage interface to produce.
theta_key – Perturber parameter to vary between instances.
start – Initial value of desired range (inclusive).
stop – Final value of desired range (exclusive).
step – Number of instances to generate.
to_int – Changes the output data type to int

First, we create the three necessary MAITE-wrapped objects - a dataset, a compatible model, and a metric.

First, a VisdroneDetectionDataset wrapper is generated using data found in our test directory, a four-image subset of the Visdrone test dataset.

Then a VisdroneODModel wrapper with resnet18 architecture is generated using a ResNet model with weights pre-trained by Kitware, Inc.

Finally, the default map50_torch_metric_factory function is used to generate a mean average precision metric to be used for evaluating performance.

In [1]:

Copied!





from pathlib import Path
from checkmaite.core.object_detection.dataset_loaders import VisdroneDetectionDataset
from checkmaite.core.object_detection.models import VisdroneODModel
from checkmaite.core.object_detection.metrics import map50_torch_metric_factory

BASE_DIR = Path.cwd().parents[1]
dataset_root_path = BASE_DIR / "tests/data_for_tests/visdrone_dataset"
dataset_ann_file_path = BASE_DIR / "tests/data_for_tests/visdrone_dataset/ann_file.json"
model_name = "resnet18"

print("Loading Visdrone dataset...")
# Use the example Visdrone dataset
dataset = VisdroneDetectionDataset(root=dataset_root_path)
print(f"Dataset loaded with {len(dataset)} images")

model = VisdroneODModel(arch=model_name)

metric = map50_torch_metric_factory()
metric.metadata["id"] = metric.return_key
from pathlib import Path
from checkmaite.core.object_detection.dataset_loaders import VisdroneDetectionDataset
from checkmaite.core.object_detection.models import VisdroneODModel
from checkmaite.core.object_detection.metrics import map50_torch_metric_factory

BASE_DIR = Path.cwd().parents[1]
dataset_root_path = BASE_DIR / "tests/data_for_tests/visdrone_dataset"
dataset_ann_file_path = BASE_DIR / "tests/data_for_tests/visdrone_dataset/ann_file.json"
model_name = "resnet18"

print("Loading Visdrone dataset...")
# Use the example Visdrone dataset
dataset = VisdroneDetectionDataset(root=dataset_root_path)
print(f"Dataset loaded with {len(dataset)} images")

model = VisdroneODModel(arch=model_name)

metric = map50_torch_metric_factory()
metric.metadata["id"] = metric.return_key

/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/xaitk_saliency/__init__.py:3: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources

/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Loading Visdrone dataset...
Dataset loaded with 3 images

/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/smqtk_detection/impls/detect_image_objects/centernet.py:897: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast()
/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py:54: UserWarning: CUDA is not available or torch_xla is imported. Disabling autocast.
  super().__init__(

Next, we initialize an NrtkRobustness object. We load the dataset, model, and metric wrapped above.

After loading all the necessary stage inputs, we run the tool.

In [2]:

Copied!





from checkmaite.core.object_detection import NrtkRobustness, NrtkRobustnessConfig

nrtk_config_dict = {
    "name": "natural_robustness_blur_model_and_sim",
    "perturber_factory": {
        "type": "nrtk.impls.perturb_image_factory.PerturberStepFactory",
        "nrtk.impls.perturb_image_factory.PerturberStepFactory": {
            "perturber": "nrtk.impls.perturb_image.photometric.blur.AverageBlurPerturber",
            "theta_key": "ksize",
            "start": 1,
            "stop": 10,
            "step": 2,
            "to_int": True
        }
    },
}
nrtk_config = NrtkRobustnessConfig(**nrtk_config_dict)
capabililty = NrtkRobustness()
output = capabililty.run(use_cache=False, datasets=[dataset], models=[model], metrics=[metric], config=nrtk_config)
from checkmaite.core.object_detection import NrtkRobustness, NrtkRobustnessConfig

nrtk_config_dict = {
    "name": "natural_robustness_blur_model_and_sim",
    "perturber_factory": {
        "type": "nrtk.impls.perturb_image_factory.PerturberStepFactory",
        "nrtk.impls.perturb_image_factory.PerturberStepFactory": {
            "perturber": "nrtk.impls.perturb_image.photometric.blur.AverageBlurPerturber",
            "theta_key": "ksize",
            "start": 1,
            "stop": 10,
            "step": 2,
            "to_int": True
        }
    },
}
nrtk_config = NrtkRobustnessConfig(**nrtk_config_dict)
capabililty = NrtkRobustness()
output = capabililty.run(use_cache=False, datasets=[dataset], models=[model], metrics=[metric], config=nrtk_config)

  0%|          | 0/3 [00:00<?, ?it/s]

 33%|███▎      | 1/3 [00:02<00:05,  2.91s/it]

 67%|██████▋   | 2/3 [00:05<00:02,  2.52s/it]

100%|██████████| 3/3 [00:07<00:00,  2.38s/it]

100%|██████████| 3/3 [00:07<00:00,  2.45s/it]

/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: Encountered more than 100 detections in a single image. This means that certain detections with the lowest scores will be ignored, that may have an undesirable impact on performance. Please consider adjusting the `max_detection_threshold` to suit your use case. To disable this warning, set attribute class `warn_on_many_detections=False`, after initializing the metric.
  warnings.warn(*args, **kwargs)

  0%|          | 0/3 [00:00<?, ?it/s]

 33%|███▎      | 1/3 [00:02<00:04,  2.21s/it]

 67%|██████▋   | 2/3 [00:04<00:02,  2.20s/it]

100%|██████████| 3/3 [00:06<00:00,  2.20s/it]

100%|██████████| 3/3 [00:06<00:00,  2.20s/it]

  0%|          | 0/3 [00:00<?, ?it/s]

 33%|███▎      | 1/3 [00:02<00:04,  2.20s/it]

 67%|██████▋   | 2/3 [00:04<00:02,  2.20s/it]

100%|██████████| 3/3 [00:06<00:00,  2.20s/it]

100%|██████████| 3/3 [00:06<00:00,  2.20s/it]

  0%|          | 0/3 [00:00<?, ?it/s]

 33%|███▎      | 1/3 [00:02<00:04,  2.20s/it]

 67%|██████▋   | 2/3 [00:04<00:02,  2.20s/it]

100%|██████████| 3/3 [00:06<00:00,  2.21s/it]

100%|██████████| 3/3 [00:06<00:00,  2.21s/it]

  0%|          | 0/3 [00:00<?, ?it/s]

 33%|███▎      | 1/3 [00:02<00:04,  2.20s/it]

 67%|██████▋   | 2/3 [00:04<00:02,  2.21s/it]

100%|██████████| 3/3 [00:06<00:00,  2.21s/it]

100%|██████████| 3/3 [00:06<00:00,  2.21s/it]

Report¶

Once the run has completed, the code below uses the markdown output functionality to create a report of the results of the NRTK capability. We also load a threshold which will become an input for visualizations in the reports rendered by this capability. In the example below we have arbitrarily chosen 0.15, arbitrarily selected to appear relevant next to the actual metric outputs).

In [3]:

Copied!





from checkmaite.core.report._markdown import create_markdown_output

# construct MD report with summarized results

THRESHOLD = 0.15

report_path = Path("report")
report_path.mkdir(parents=True, exist_ok=True)
report_filename = 'NRTK_Sample_Report.md'
report = create_markdown_output(output.collect_md_report(threshold=THRESHOLD), report_path, report_filename)
print(f"Markdown report saved in {report_path}.")
from checkmaite.core.report._markdown import create_markdown_output

# construct MD report with summarized results

THRESHOLD = 0.15

report_path = Path("report")
report_path.mkdir(parents=True, exist_ok=True)
report_filename = 'NRTK_Sample_Report.md'
report = create_markdown_output(output.collect_md_report(threshold=THRESHOLD), report_path, report_filename)
print(f"Markdown report saved in {report_path}.")

Markdown report saved in report.