dataeval Feasibility Tutorial¶
A guide to running the dataeval feasibility tools via checkmaite.
NOTE: The
dataevalpackage can normally be used in the checkmaite framework for both image classification (IC) and object detection (OD) tasks. However, feasibility can only be used for IC tasks until checkmaite has upgraded to MAITE v0.8. As such, this tutorial will only cover the IC scenario.
What is dataeval?¶
The dataeval package analyzes datasets and models to give users the ability to train and test performant, unbiased, and reliable AI models and monitor data for impactful shifts to deployed models.
The tools demonstrated in this tutorial are a subset of the larger dataeval framework. They are specifically focused on estimating the maxmium achievable performance of a model on a dataset. Another way of saying this is that the feasibility tools in this tutorial attempt to measure the irreducible error of a dataset - variability in a dataset that is inherently random and cannot be explained by a predictive model. This quantity is of interest because it informs an engineer about the inherent difficulty of a problem. If this difficulty surpasses operational performance requirements, then the problem must be changed in order to become feasible.
There are currently only two feasibility tools in dataeval:
- the Bayes Error Rate (BER) for IC calculations
- the Upper-Bound Average Precision (UAP) for OD calculations
The dataeval feasibility tools are generally applied to an entire dataset. Their computational demands are low-to-moderate, and are run on CPU.
Running the dataeval feasibility algorithms inside checkmaite¶
The following section uses the checkmaite API to run the dataeval feasibility test stage for Image Classification.
First, we create some machinery for generating a Yolo-style IC dataset for demonstration purposes.
import os
import tempfile
from contextlib import contextmanager
from pathlib import Path
from typing import Iterable, Tuple, Iterator
from PIL import Image
CLASSES: list[str] = ["cat", "dog"]
NUM_IMAGES_PER_CLASS: int = 4
IMG_SHAPE: Tuple[int, int] = (64, 128) # (H, W)
def create_fake_yolo_dataset(
root_dir: Path,
split: str,
classes: Iterable[str],
num_images_per_class: int,
image_shape: Tuple[int, int],
) -> None:
"""Populate *root_dir/split/* with sub-dirs and RGB images."""
root_dir = Path(root_dir)
(root_dir / split).mkdir(parents=True, exist_ok=True)
for class_name in classes:
class_dir = root_dir / split / class_name
class_dir.mkdir(parents=True, exist_ok=True)
for i in range(num_images_per_class):
img = Image.new("RGB", image_shape, color=(i, i, i))
img.save(class_dir / f"{i}_{class_name}.jpg")
@contextmanager
def fake_dataset(
*,
split: str = "test",
classes: Iterable[str] = CLASSES,
num_images_per_class: int = NUM_IMAGES_PER_CLASS,
image_shape: Tuple[int, int] = IMG_SHAPE,
) -> Iterator[tuple[Path, list[str], int, Tuple[int, int]]]:
"""
Context-manager that yields a temporary dataset root and
removes it automatically afterwards.
Example
-------
>>> with fake_dataset() as (root, classes, n_per_cls, img_shape):
... print(root)
... # use files under `root / 'test' / <class_name>` ...
"""
with tempfile.TemporaryDirectory() as tmp:
dataset_root = Path(tmp)
create_fake_yolo_dataset(
root_dir=dataset_root,
split=split,
classes=classes,
num_images_per_class=num_images_per_class,
image_shape=image_shape,
)
yield dataset_root, list(classes), num_images_per_class, image_shape
# directory is automatically cleaned up on exit
We then create the necessary MAITE-wrapped dataset and initialize a DatasetImageClassificationFeasibilityTestStage object. We load the wrapped dataset into this object, set a performance-threshold and then execute the test stage. If the performance-threshold is found to be above the BER, the problem is said to be infeasible.
from pathlib import Path
from checkmaite.core.image_classification import DataevalFeasibility
from checkmaite.core.image_classification.dataset_loaders import YoloClassificationDataset
with fake_dataset() as (root, _, _, _):
dataset = YoloClassificationDataset(root_dir=root, dataset_id="test")
capability = DataevalFeasibility()
output = capability.run(use_cache=False, datasets=[dataset])
/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/xaitk_saliency/__init__.py:3: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources
/home/runner/work/checkmaite/checkmaite/.venv/lib/python3.10/site-packages/dataeval/extractors/_torch.py:157: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:213.) tensor = torch.as_tensor(img)
Slide Deck¶
Once the test stage has completed, the code below uses the gradient package to create HTML and PPTX formatted reports of the results of the dataeval feasibility test stage.
import os
from checkmaite.core.report._markdown import create_markdown_output
output_dir = Path("dataeval_feasibility_example_output")
os.makedirs(output_dir, exist_ok=True)
create_markdown_output(output.collect_md_report(threshold=0.6), output_dir, md_filename='Dataeval_Feasibility_Example_Report.md')
print(f"Markdown report saved in {output_dir}.")
Markdown report saved in dataeval_feasibility_example_output.