Skip to main content

Testsets

A testset is a versioned bag of testcases: an artifact you commit rows into, where each commit produces an immutable revision. Evaluations pin a specific testset revision, so the same run keeps replaying the same inputs even after you add or change rows.

Testsets are versioned. For commit semantics, include_archived, and how revision IDs stay stable, see Versioning. The rest of this page covers what is specific to testsets.

Body

A testset revision commits an ordered list of testcase IDs plus commit metadata. The testcases themselves are immutable blobs scoped to the testset.

A testcase payload:

{
"id": "e44bf373-44ec-52ce-aa5c-a3bd982e783b",
"testset_id": "019d9ca2-0000-0000-0000-000000000000",
"data": {
"country": "Germany",
"capital": "Berlin",
"testcase_dedup_id": "gr-001"
}
}

The schema of a testcase is implicit in the keys of data. There is no separate column list. Adding a column means committing rows that have the new key.

Dedup IDs

A testcase can carry a caller-supplied dedup ID that the server uses to identify the same row across re-imports and edits. If you don't provide one, the server generates one from the content hash.

Two equivalent forms are accepted:

  • data.testcase_dedup_id: the canonical JSON-payload key, inside data. Used by JSON commits and the SDK.
  • __dedup_id__: the metadata-column form, alongside the other __X__ import/export columns (__id__, __flags__, __tags__, __meta__). Used by CSV uploads and downloads.

The server normalises one form to the other in both directions, so you can commit through one path and download through the other and the dedup ID round-trips.

Testcase immutability

A testcase is content-addressed. Its id is a hash of data plus the testset_id (plus an optional salt), so two testcases with the same content in the same testset collapse to the same row.

Because a testcase is a blob, editing a row does not mutate the existing testcase. The commit writes a new blob with a new ID, then creates a new revision whose testcase list points at the new ID. The old blob is still reachable by its original ID. That is why revision history is reproducible: a revision is just a list of pointers to blobs that never change.

Import and export

Testcases move in and out of a testset as CSV or JSON arrays.

EndpointAction
POST /testsets/revisions/{id}/uploadCommit a new revision from a CSV or JSON file.
POST /testsets/revisions/{id}/downloadDownload the revision's testcases as CSV or JSON.
POST /simple/testsets/uploadCreate a new testset and seed its first revision in one call.
POST /simple/testsets/{id}/uploadCommit a new revision on an existing simple testset.

Each row may include the metadata columns __id__ (preserve an existing testcase ID), __dedup_id__ (caller-supplied dedup ID, see Dedup IDs above), __flags__, __tags__, and __meta__. Empty metadata columns are dropped from exports.

Retrieve semantics

POST /testsets/revisions/retrieve resolves a testset, variant, or revision reference to a single revision. Two options control what comes back:

  • include_testcases defaults to true. Full testcase objects are returned unless you opt out.
  • include_testcase_ids defaults to true. The ordered ID list is returned unless you opt out.

For instance, to retrieve a revision with only the ordered ID list and no testcase bodies:

curl -X POST "$AGENTA_HOST/api/testsets/revisions/retrieve" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{
"testset_ref": {"id": "019d9ca1-0000-0000-0000-000000000000"},
"include_testcases": false
}'
info

This is the opposite default of POST /queries/revisions/retrieve (see Query Pattern), where include_traces defaults to false because trace materialisation is expensive. Keep the asymmetry in mind when you build clients that consume both.

Simple endpoints

The /simple/testsets/ surface collapses the artifact, variant, and latest revision into one flat record with its testcases merged in:

curl "$AGENTA_HOST/api/simple/testsets/019d9ca1-0000-0000-0000-000000000000" \
-H "Authorization: ApiKey $AGENTA_API_KEY"

Use it when you want the "current testset" without tracking lineage. For commits, forks, or specific-revision retrieval, use /testsets/, /testsets/variants/, and /testsets/revisions/. See Simple Endpoints for the general pattern.

Relationship to evaluations

Evaluations are run against a testset. Each evaluation pins a specific testset_revision_id when the run is configured. Committing a new revision on the testset does not retroactively change a pinned run.

Example

Create a testset with two rows, add a row, and retrieve the latest revision.

# 1. Create a simple testset with seed rows.
curl -X POST "$AGENTA_HOST/api/simple/testsets/" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{
"testset": {
"slug": "country-capitals",
"name": "country-capitals",
"data": {
"testcases": [
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Japan", "capital": "Tokyo"}}
]
}
}
}'

The response includes id, revision_id, and the testcase IDs that were created.

# 2. Add a row as a new revision using the structured commit endpoint.
curl -X POST "$AGENTA_HOST/api/testsets/revisions/commit" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{
"testset_revision_commit": {
"testset_id": "019d9ca1-0000-0000-0000-000000000000",
"testset_revision_id": "019d9ca1-0000-0000-0000-000000000001",
"message": "Add Brazil",
"delta": {
"rows": {
"add": [{"data": {"country": "Brazil", "capital": "Brasilia"}}]
}
}
}
}'
# 3. Retrieve the latest revision of the testset.
curl -X POST "$AGENTA_HOST/api/testsets/revisions/retrieve" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{"testset_ref": {"id": "019d9ca1-0000-0000-0000-000000000000"}}'
info

An evaluation that pinned revision 019d9ca1-0000-0000-0000-000000000001 keeps replaying the original two rows even though the testset now has three.

Lifecycle

Testsets, variants, and revisions are soft-deleted. Use POST /testsets/{id}/archive and POST /testsets/{id}/unarchive to flip deleted_at. The same pattern applies at the variant and revision level. Archiving a testset hides it from /query responses unless the request sets include_archived: true. See Versioning.