Glossary
LLM as a judge
In short
Using one model to evaluate, rank, or merge the outputs of other models.
Once several models have answered, something has to decide what the final result is. An LLM judge reads the candidate answers and either scores them, ranks them, or fuses the best parts into one.
It is a practical way to get a quality result without a human reviewing every output. The judge does not have to be the most expensive model; a capable, efficient one is often enough for the merging step.
In LLMWeave
Weaves use a judge model for the synthesis and ranking steps. An efficient model fills that role by default, which keeps the combining step cheap while the responding models do the heavy thinking.
Related terms
Response synthesis
Merging several model answers into one clean result, rather than showing them side by side and making you pick.
Model ensemble
Combining the outputs of several models into one result, so the group performs better than any single member.
Best-of-N
Generating several candidate answers and keeping the best one, instead of trusting a single attempt.
Try multi-model on your task
One prompt, several models, one answer. Free to start, no card.
Get started