Introduction to Evaluations
Weave is a toolkit for developing AI-powered applications.
This notebook demonstrates how to evaluate a model or function using Weave’s Evaluation API.
In Weave, you evaluate your application by running it against a dataset of examples and scoring the outputs using custom-defined functions. This helps you to measure and improve your application’s performance.
In this notebook, you define a simple model, create a labeled dataset, track scoring functions with @weave.op, run an evaluation, and review the results in the Weave UI.
This workflow forms the foundation for more advanced workflows like fine tuning an LLM model, detecting regressions, and comparing models.
To get started, complete the prerequisites. Then, define a Weave Model with a predict method, create a labeled dataset and scoring function, and run an evaluation using weave.Evaluation.evaluate().
Run your first evaluation
In this example, we’re using W&B Inference or OpenAI. Learn more about our inference API.Using another provider? We support all major clients and frameworks.