Skip to main content

How to evaluate on a specific dataset version

Recommended reading

Before diving into this content, it might be helpful to read the guide on versioning datasets. Additionally, it might be helpful to read the guide on fetching examples.

Using list_examples

You can take advantage of the fact that evaluate / aevaluate allows passing in an iterable of examples to evaluate on a particular version of a dataset. Simply use list_examples / listExamples to fetch examples from a particular version tag using as_of / asOf and pass that in to the data argument.

from langsmith import Client

ls_client = Client()

# Assumes actual outputs have a 'class' key.
# Assumes example outputs have a 'label' key.
def correct(outputs: dict, reference_outputs: dict) -> bool:
return outputs["class"] == reference_outputs["label"]

results = ls_client.evaluate(
lambda inputs: {"class": "Not toxic"},
# Pass in filtered data here:
data=ls_client.list_examples(
dataset_name="Toxic Queries",
as_of="latest", # specify version here
),
evaluators=[correct],
)
  • Learn more about how to fetch views of a dataset here

Was this page helpful?


You can leave detailed feedback on GitHub.