LLM as a judge

In short

Using one model to evaluate, rank, or merge the outputs of other models.

Once several models have answered, something has to decide what the final result is. An LLM judge reads the candidate answers and either scores them, ranks them, or fuses the best parts into one.

It is a practical way to get a quality result without a human reviewing every output. The judge does not have to be the most expensive model; a capable, efficient one is often enough for the merging step.

In LLMWeave

Weaves use a judge model for the synthesis and ranking steps. An efficient model fills that role by default, which keeps the combining step cheap while the responding models do the heavy thinking.

Related terms

Try multi-model on your task

One prompt, several models, one answer. Free to start, no card.

Get started

LLM as a judge

In LLMWeave

Response synthesis

Model ensemble

Best-of-N

Try multi-model on your task