Image Classification Workflow via API¶
How to evaluate common workflows via the checkmaite API
This notebook demonstrates a full end to end workflow of the checkmaite backend. The general steps are:
- Create wrapper objects as necessary (not all wrappers are needed for every tool)
- Create a model wrapper
- Create a dataset wrapper
- Create a metric wrapper
- Create configuration to define analyses to run
- Generate "capability" object(s)
- Run analyses
All checkmaite wrappers are maite-compliant (https://mit-ll-ai-technology.github.io/maite/).
IMPORTANT:
- Since we are using synthetic, all black images, the values presented in the results will be meaningless. Use full datasets with models trained on similar data to view accurate results.
- This notebook requires a dev install since it utilizes data from the test suite.
from checkmaite import cache_path
from checkmaite._docs import create_expandable_output
import os
import json
import torch
from pathlib import Path
from pprint import pprint as print
import warnings
for cat in (UserWarning, FutureWarning, RuntimeWarning):
warnings.filterwarnings("ignore", category=cat)
/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/xaitk_saliency/__init__.py:3: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources
Set cache path¶
Checkmaite has a default output cache path, but it can be overriden as needed. Note, the cache will only be written if the argument use_prediction_and_evaluation_cache=True is passed into the capability run.
print(f'The default cache path is {cache_path()}')
# set the cache path
cache_path('tscache')
print(f'The current cache path is {cache_path()}')
'The default cache path is /home/runner/.cache/checkmaite'
('The current cache path is '
'/home/runner/work/checkmaite/checkmaite/docs/get-started/tscache')
Define task¶
task = 'image_classification'
Create a model wrapper¶
Model wrappers hold the model object, weights and metadata.
Checkmaite has one image classification model wrapper - torchvision Below are examples of each of the available configuration options.
Torchvision IC Model using default weights from Torchvision¶
from checkmaite.core.image_classification.models import TorchvisionICModel
model_name = "resnext50_32x4d"
model_wrapper = TorchvisionICModel(model_name=model_name)
Downloading: "https://download.pytorch.org/models/resnext50_32x4d-1a0047aa.pth" to /home/runner/.cache/torch/hub/checkpoints/resnext50_32x4d-1a0047aa.pth
0%| | 0.00/95.8M [00:00<?, ?B/s]
1%| | 896k/95.8M [00:00<00:11, 8.80MB/s]
7%|▋ | 6.88M/95.8M [00:00<00:02, 40.1MB/s]
38%|███▊ | 36.6M/95.8M [00:00<00:00, 163MB/s]
76%|███████▋ | 73.1M/95.8M [00:00<00:00, 249MB/s]
100%|██████████| 95.8M/95.8M [00:00<00:00, 216MB/s]
Torchvision IC Model using custom weights and config¶
Note: The configuration file is a JSON formatted text file that contains index2label as a key with its value being a dictionary of {index: category label} for the model.
# save metadata and state_dict from previous cell to disk
config_path = cache_path() / "my_model.json"
pickle_path = cache_path() / "my_model.pt"
os.makedirs(os.path.dirname(config_path), exist_ok=True)
with open(config_path, "w") as f:
json.dump({"index2label": model_wrapper.index2label}, f)
_ = torch.save(model_wrapper.model.state_dict(), pickle_path)
# extra kwargs to pass through to torchvision model object
kwargs = {}
model_wrapper = TorchvisionICModel(
model_name=model_name,
weights_path=pickle_path,
config_path=config_path,
model_id="arbitraryidnumber",
**kwargs,
)
Create dataset wrapper¶
Dataset wrappers control the access to the dataset and contain metadata about the dataset and about individual images.
Checkmaite has one dataset wrapper for image classification - YOLO. In the cell below, a dummy YOLO-formatted dataset is generated,
followed by an example of how to load it.
from checkmaite.core.image_classification.dataset_loaders import YoloClassificationDataset
from PIL import Image
classes = ["cat", "dog"]
num_images_per_class = 3
img_shape = (64, 128)
root_dir = Path('temp_yolo_dataset').resolve()
split = 'test'
os.makedirs(root_dir / split, exist_ok=True)
for class_name in classes:
class_dir = root_dir / split / class_name
os.makedirs(class_dir, exist_ok=True)
for i in range(num_images_per_class):
img = Image.new("RGB", img_shape, color=(i, i, i))
img.save(class_dir / f"{i}_{class_name}.jpg")
dataset_wrapper = YoloClassificationDataset(
dataset_id="temp_yolo_dataset", root_dir=root_dir, split="test"
)
Create metric wrapper¶
Metric wrappers provide standardized access to metric algorithms across the pydata ecosystem.
Checkmaite has two image classification metric wrappers - Accuracy and F1 score.
from checkmaite.core.image_classification.metrics import accuracy_multiclass_torch_metric_factory
metric_wrapper = accuracy_multiclass_torch_metric_factory(num_classes=12)
metric_wrapper.metadata['id'] = metric_wrapper.return_key
/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Instantiate and run each Capability¶
As each capability is run, we collect slide data for Gradient PowePoint generation and add the results to our markdown report.
NOTE: Since we are using synthetic, all black images, the values presented in the results will be meaningless.
# collect all the report slides in a list (to be convert to ppt after all stages are run)
slides = []
# Collect all the report MD strings in a single object (to be rendered and saved after all stages are executed)
report_md = ""
MAITE - Baseline evaluation¶
%%time
from checkmaite.core.image_classification import MaiteEvaluation
# instantiate capability object
capability = MaiteEvaluation()
# run analysis for this capability
maite_eval_run = capability.run(use_cache=False, datasets= [dataset_wrapper],
metrics=[metric_wrapper], models=[model_wrapper]) # NOTE: `use_cache=True` will bypass the compute and load cached results if available
# view results
maite_eval_run.outputs
0%| | 0/6 [00:00<?, ?it/s]
17%|█▋ | 1/6 [00:00<00:00, 9.09it/s]
50%|█████ | 3/6 [00:00<00:00, 10.56it/s]
83%|████████▎ | 5/6 [00:00<00:00, 11.40it/s]
100%|██████████| 6/6 [00:00<00:00, 11.12it/s]
CPU times: user 1.05 s, sys: 13.8 ms, total: 1.07 s Wall time: 544 ms
MaiteEvaluationOutputs(overall_metric_name='accuracy', result={'accuracy': 0.0}, class_metrics=None)
# collect the markdown outputs for the final report
_md = maite_eval_run.collect_md_report(threshold=0.5)
report_md += _md
NRTK¶
%%time
from checkmaite.core.image_classification import NrtkRobustness, NrtkRobustnessConfig
# define nrtk config
nrtk_config_dict = {
'name': 'natural_robustness_TestFactory',
'perturber_factory': {
'type': 'nrtk.impls.perturb_image_factory.PerturberOneStepFactory',
'nrtk.impls.perturb_image_factory.PerturberOneStepFactory': {
'perturber': 'nrtk.impls.perturb_image.photometric.enhance.BrightnessPerturber',
'theta_key': 'factor',
'theta_value': 10.0,
},
},
}
nrtk_config = NrtkRobustnessConfig(**nrtk_config_dict)
# instantiate capability object
capability = NrtkRobustness()
# run analysis for this capability
nrtk_run = capability.run(use_cache=False, datasets= [dataset_wrapper],
metrics=[metric_wrapper], models=[model_wrapper], config=nrtk_config) # NOTE: `use_cache=True` will bypass the compute and load cached results if available
# view results
nrtk_run.outputs
0%| | 0/6 [00:00<?, ?it/s]
33%|███▎ | 2/6 [00:00<00:00, 10.26it/s]
67%|██████▋ | 4/6 [00:00<00:00, 11.57it/s]
100%|██████████| 6/6 [00:00<00:00, 12.01it/s]
100%|██████████| 6/6 [00:00<00:00, 11.67it/s]
CPU times: user 11.3 s, sys: 276 ms, total: 11.6 s Wall time: 10 s
NrtkRobustnessOutputs(perturbations=[{'accuracy': tensor(0.)}], return_key='accuracy')
# collect the markdown outputs for the final report
_md = nrtk_run.collect_md_report(threshold=0.5)
report_md += _md
Dataeval - Bias¶
%%time
from checkmaite.core.image_classification import DataevalBias
# instantiate capability object (no parameters necessary)
capability = DataevalBias()
# run analysis for this capability
bias_run = capability.run(use_cache=False, datasets=[dataset_wrapper]) # NOTE: `use_cache=True` will bypass the compute and load cached results if available
Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /home/runner/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth
0%| | 0.00/20.5M [00:00<?, ?B/s]
3%|▎ | 640k/20.5M [00:00<00:03, 6.44MB/s]
27%|██▋ | 5.50M/20.5M [00:00<00:00, 31.9MB/s]
100%|██████████| 20.5M/20.5M [00:00<00:00, 79.9MB/s]
CPU times: user 792 ms, sys: 204 ms, total: 996 ms Wall time: 708 ms
# collect the markdown outputs for the final report
_md = bias_run.collect_md_report(threshold=0)
report_md += _md
# view results
# output is large, so expandable outputs are shown instead of raw outputs here
create_expandable_output(bias_run.outputs.balance)
create_expandable_output(bias_run.outputs.coverage)
Show full output
total=6 uncovered_indices=array([5]) critical_value_radii=array([1.40883256, 1.40883256, 1.5544856 , 1.40883256, 1.40883256,
1.59248615]) coverage_radius=1.5734858762040544 image=
create_expandable_output(bias_run.outputs.diversity)
Dataeval - Feasibility¶
%%time
from checkmaite.core.image_classification import DataevalFeasibility
# instantiate capability object (no parameters necessary)
capability = DataevalFeasibility()
# run analysis for this capability
feasibility_run = capability.run(use_cache=False, datasets=[dataset_wrapper]) # NOTE: `use_cache=True` will bypass the compute and load cached results if available
# view results
feasibility_run.outputs
CPU times: user 845 ms, sys: 131 ms, total: 975 ms Wall time: 300 ms
DataevalFeasibilityOutputs(ber=0.5, ber_lower=0.5)
# collect the markdown outputs for the final report
_md = feasibility_run.collect_md_report(threshold=0.5)
report_md += _md
Dataeval - Cleaning¶
%%time
from checkmaite.core.image_classification import DataevalCleaning
# instantiate capability object (no parameters necessary)
capability = DataevalCleaning()
# run analysis for this capability
cleaning_run = capability.run(use_cache=False, datasets=[dataset_wrapper]) # NOTE: `use_cache=True` will bypass the compute and load cached results if available
# view results
cleaning_run.outputs
# output is large, so expandable outputs are shown instead of raw outputs here
create_expandable_output(cleaning_run.outputs)
CPU times: user 24.3 ms, sys: 10 ms, total: 34.4 ms Wall time: 30.3 ms
Show full output
duplicates=DataevalCleaningDuplicatesOutputs(exact=[[0, 3], [1, 4], [2, 5]], near=[[0, 1, 3, 4]]) image_outliers={} image_stats=DataevalCleaningStatsOutputs(source_index=[SourceIndex(0), SourceIndex(1), SourceIndex(2), SourceIndex(3), SourceIndex(4), SourceIndex(5)], object_count=[0, 0, 0, 0, 0, 0], image_count=6, invalid_box_count=[0, 0, 0, 0, 0, 0], dim_stats=DataevalCleaningDimensionStatsOutputs(offset_x=array([0., 0., 0., 0., 0., 0.], dtype=float32), offset_y=array([0., 0., 0., 0., 0., 0.], dtype=float32), width=array([64., 64., 64., 64., 64., 64.], dtype=float32), height=array([128., 128., 128., 128., 128., 128.], dtype=float32), channels=array([3, 3, 3, 3, 3, 3]), size=array([8192., 8192., 8192., 8192., 8192., 8192.], dtype=float32), aspect_ratio=array([-0.5, -0.5, -0.5, -0.5, -0.5, -0.5], dtype=float32), depth=array([1, 1, 8, 1, 1, 8]), center=array([[32., 64.],
[32., 64.],
[32., 64.],
[32., 64.],
[32., 64.],
[32., 64.]], dtype=float32), distance_center=array([0., 0., 0., 0., 0., 0.], dtype=float32), distance_edge=array([0., 0., 0., 0., 0., 0.], dtype=float32), invalid_box=array([False, False, False, False, False, False])), vis_stats=DataevalCleaningVisualStatsOutputs(brightness=array([0. , 1. , 0.00784314, 0. , 1. ,
0.00784314], dtype=float32), contrast=array([0., 0., 0., 0., 0., 0.], dtype=float32), darkness=array([0. , 1. , 0.00784314, 0. , 1. ,
0.00784314], dtype=float32), sharpness=array([0., 0., 0., 0., 0., 0.], dtype=float32), percentiles=array([[0. , 0. , 0. , 0. , 0. ],
[1. , 1. , 1. , 1. , 1. ],
[0.00784314, 0.00784314, 0.00784314, 0.00784314, 0.00784314],
[0. , 0. , 0. , 0. , 0. ],
[1. , 1. , 1. , 1. , 1. ],
[0.00784314, 0.00784314, 0.00784314, 0.00784314, 0.00784314]],
dtype=float32), missing=array([0., 0., 0., 0., 0., 0.], dtype=float32), zeros=array([1., 0., 0., 1., 0., 0.], dtype=float32)), ratio_stats=None) label_stats=DataevalCleaningLabelStatsOutputs(label_counts_per_class={0: 3, 1: 3}, label_counts_per_image=[1, 1, 1, 1, 1, 1], image_counts_per_class={0: 3, 1: 3}, image_indices_per_class={0: [0, 1, 2], 1: [3, 4, 5]}, image_count=6, class_count=2, label_count=6, class_names=['cat', 'dog']) box_outliers=None box_stats=None
# collect the markdown outputs for the final report
_md = cleaning_run.collect_md_report(threshold=0)
report_md += _md
Dataeval - Shift¶
%%time
from checkmaite.core.image_classification import DataevalShift
# instantiate capability object (no parameters necessary)
capability = DataevalShift()
# run analysis for this capability
shift_run = capability.run(use_cache=False, datasets=[dataset_wrapper, dataset_wrapper]) # NOTE: `use_cache=True` will bypass the compute and load cached results if available
# view results
# output is large, so expandable outputs are shown instead of raw outputs here
create_expandable_output(shift_run.outputs)
CPU times: user 3.13 s, sys: 35.4 ms, total: 3.17 s Wall time: 1.26 s
Show full output
drift=DataevalShiftDriftOutputs(mmd=DriftOutput(drifted=False, threshold=0.05, distance=-0.16259467601776123, metric_name='mmd2', details={'p_val': 0.8999999761581421, 'distance_threshold': 0.5238606333732605}), cvm=DriftOutput(drifted=False, threshold=0.008333333333333333, distance=0.0, metric_name='cvm_distance', details={'p_val': 1.0, 'feature_drift': array([False, False, False, False, False, False]), 'feature_threshold': 0.05, 'p_vals': array([1., 1., 1., 1., 1., 1.], dtype=float32), 'distances': array([0., 0., 0., 0., 0., 0.], dtype=float32)}), ks=DriftOutput(drifted=False, threshold=0.008333333333333333, distance=0.0, metric_name='ks_distance', details={'p_val': 1.0, 'feature_drift': array([False, False, False, False, False, False]), 'feature_threshold': 0.05, 'p_vals': array([1., 1., 1., 1., 1., 1.], dtype=float32), 'distances': array([0., 0., 0., 0., 0., 0.], dtype=float32)})) ood=DataevalShiftOODOutputs(ood_knn=DataevalShiftOODKNNOutput(is_ood=array([False, False, False, False, False, False]), instance_score=array([1.171088 , 0.42721528, 0.4406994 , 1.171088 , 0.42721528,
0.4406994 ], dtype=float32), feature_score=None))
# collect the markdown outputs for the final report
_md = shift_run.collect_md_report(threshold=0)
report_md += _md
# XAITK is deprecated.
# See https://gitlab.jatic.net/jatic/reference-implementation/reference-implementation/-/issues/345
#%%time
#from checkmaite.core.image_classification import XaitkExplainable, XaitkExplainableConfig
#
## define xaitk config
#xaitk_config_dict = {
# 'name': 'saliency_XAITKApp_0',
# 'saliency_generator': {
# 'type': 'xaitk_saliency.impls.gen_image_classifier_blackbox_sal.rise.RISEStack',
# 'xaitk_saliency.impls.gen_image_classifier_blackbox_sal.rise.RISEStack': {
# 'n': 50,
# 's': 7,
# 'p1': 0.7,
# 'seed': 42,
# 'threads': 8,
# 'debiased': True,
# },
# },
# 'img_batch_size': 1,
#}
#xaitk_config = XaitkExplainableConfig(**xaitk_config_dict)
## instantiate capability object
#capability = XaitkExplainable()
#
## run analysis for this capability
#xaitk_run = capability.run(use_cache=False, datasets=[dataset_wrapper],
# models=[model_wrapper], config=xaitk_config) # NOTE: `use_cache=True` will bypass the compute and load cached results if available
#
## XAITK produces one slide per object per image. The example dataset is too large to generate a slidedeck of this size.
# # collect the slides for the final report
# capability_slides = capability_run.collect_report_consumables()
# # add to overall slide list
# slides += capability_slides
# collect the markdown outputs for the final report
#_md = xaitk_run.collect_md_report(threshold=0)
#report_md += _md
# view results
# output is large, so expandable outputs are shown instead of raw outputs here
# create_expandable_output(xaitk_run.outputs.results)
Construct final report¶
Finally, we build our reports using the collected capability run outputs.
Below we include cells for both generating PowerPoint and HTML-based reports using the external Gradient capability, and a Markdown-based report using a package-native rendering feature.
from checkmaite.core.report._markdown import create_markdown_output
# construct MD report with summarized results
report_path = cache_path() / "report"
report_path.mkdir(parents=True, exist_ok=True)
report_filename = 'checkmaite_IC_sample_report.md'
report = create_markdown_output(report_md, report_path, report_filename)