Checkmaite (API) Conventions
Overview
The MAITE protocols for metrics, models, and datasets allow for a great deal of flexibility, and only define specific functionality where needed by the integrations that are nearly universal to the workflow of a model evaluation.
During the course of testing capability development, and particularly when new testing capabilities are developed or acquired as part of the JATIC program, we will likely identify integrations where the protocols lack the specificity required to smoothly incorporate those new capabilities into our existing test cases (or new ones).
This document serves to declare the specific narrowing of scope that the checkmaite will use for creating classes and objects that align to MAITE protocol compliant objects.
Note: These conventions ensure that the objects comply with the protocols, and any additional specificity required by these protocols must be consistent with the generalized form of the protocols.
As these conventions are incorporated into checkmaite, they may be absorbed into the MAITE protocols in future releases. If this occurs, the checkmaite team will update checkmaite to align with the new protocols and remove any redundant sections from this document.
Models
Model conventions
-
Model classes and wrappers created within
checkmaite(and any models that will be evaluated usingcheckmaite) will be built with the underlying model API (pytorch, tensorflow, keras, etc.) object accessible under an attribute calledmodel. For example, a thinly wrappedfasterrcnn_resnet50_fpnmodel calledwrappedwill have the underlyingfasterrcnn_resnet50_fpnmodel accessible via a call towrapped.model. Becausemodelis not an attribute of the protocols for models, using it in a test stage or elsewhere will trigger a failure of the type checker, so use# pyright: ignore[reportUnknownMemberType]on those lines that depend on themodelattribute. -
Model classes within
checkmaitewill have a mapping from the ids to the names of the classes that they predict, under the attributeindex2label. This class attribute will be of typedict[int, str]. Becauseindex2labelis not an attribute of the protocols for models, using it in a test stage or elsewhere will trigger a failure of the type checker, so use# pyright: ignore[reportTypedDictNotRequiredAccess]on those lines that depend on theindex2labelattribute. -
Object detection models will return MAITE-compliant object detection targets, which consist of
ArrayLikeof predicted classes,ArrayLikeof scores, andArrayLikeof bounding boxes. -
Image classification models will return MAITE-compliant image classification targets, which consist of an
ArrayLikeof pseudo-probabilities for each class it is trained to predict. -
Models which find no prediction on a given target will return an
ObjectDetectionTargetwith emptyArrayLikeelements for bounding boxes/labels.
Supported models
In order for users to be able to bring their own models to checkmaite, without having to write any additional code to support the loading and wrapping of that model into MAITE-compliant objects, we must provide some generalized wrappers for commonly used models. This list contains the set of models that checkmaite has (or will) support.
Object Detection
- fasterrcnn_resnet50_fpn
- fasterrcnn_resnet50_fpn_v2
- maskrcnn_resnet50_fpn_v2
- maskrcnn_resnet50_fpn
- retinanet_resnet50_fpn
- retinanet_resnet50_fpn_v2
Image Classification
Datasets
Dataset conventions
-
Dataset classes and wrappers created within
checkmaite(and any datasets used within checkmaite) will have the mapping from id to classes accessible under the attributeindex2label, and will return adict[int, str]. Becauseindex2labelis not an attribute of the protocols for datasets, using it in a test stage or elsewhere will trigger a failure of the type checker, so use# pyright: ignore[reportTypedDictNotRequiredAccess]on those lines that depend on theindex2labelattribute. -
Object detection bounding boxes will be defined as
ArrayLikesof integers in thexyxyformat (the top-left and bottom-right corners of the bounding box). -
Labels will be defined as
ArrayLikesof integers, whose values map to the keys inindex2label. -
Other datum-level metadata will be stored as strings in
DatumMetadata.
Supported dataset annotation formats
In order for users to be able to bring their own datasets to checkmaite, without having to write any additional code to support the loading and wrapping of that dataset into MAITE-compliant objects, we must provide some generalized wrappers for commonly used annotation formats for loading data. This list contains the set of dataset annotation and storage formats that checkmaite has (or will) support. This includes the ability to load the datasets from a user-provided location.
Metrics
Metric conventions
- Classes for computing metrics within
checkmaitewill follow these conventions: -
The
Metricclass will have an attribute calledreturn_keywhich describes the top-level performance metric:- E.g., for a metric that computes
map_50, thereturn_keywill bemap_50. - Because
return_keyis not an attribute of the protocols for metrics, using it in a test stage or elsewhere will trigger a failure of the type checker, so use# pyright: ignore[reportUnknownMemberType]on those lines that depend on thereturn_keyattribute.
- E.g., for a metric that computes
-
The list of keys returned by
compute()will include a key that matches thereturn_keyof the metric, which describes the top-level performance of predictions against the ground truth provided in theupdate()method. -
The values of each element in the dictionary returned by
compute()will adhere to thenumpy.ArrayLiketype, and will consist of an array containing a single floating point value such that:- The value can safely be cast to a float with
float(<value>). - The value will possess the
<value>.numpy()method.
- The value can safely be cast to a float with
-
Datum-level metrics will not be guaranteed by the metric classes at this time, and tools that rely on datum-level metrics will need to compute them within their test stage classes by iterating through the dataset and computing the metric on a "Dataset" with only a single datum in it.