Pairwise Evaluations

Tally's Corpus Pairing feature lets you generate pairwise combinations of content items for comparative evaluation. This is useful any time you need to evaluate relationships between items rather than properties of individual items — for example:

Influence analysis — evaluate the impact of academic paper A on paper B
Plagiarism detection — assess whether writing A appears to have been derived from writing B
Agreement prediction — estimate how likely the authors of A and B are to agree on a given subject

Corpus Pairing works with your existing rubrics and evaluation infrastructure. The only difference is that the content being evaluated is a structured combination of two source items instead of a single item.

How It Works

A corpus pairing takes a source corpus and produces a target corpus whose items are pairwise combinations of the source content. Each pair is formatted using a configurable template that wraps the two content bodies in labeled tags:

<ContentItemA>
[body of item A]
</ContentItemA>

<ContentItemB>
[body of item B]
</ContentItemB>

You can customize this template and the labels to fit your evaluation context. For example, a plagiarism use case might use <OriginalWork> and <SubmittedWork> tags.

The generated target corpus can then be evaluated with any rubric, just like a regular corpus.

Pairing Options

When creating a pairing, you control which pairs are generated:

Include reciprocals — if (A, B) is created, should (B, A) also be created? Enable this when direction matters (e.g., "did A influence B?" is different from "did B influence A?"). Disable for symmetric comparisons (e.g., "how similar are A and B?").
Include self-pairs — should (A, A) pairs be created? Occasionally useful as a baseline or control.

Sampling

For large corpora, the full cartesian product can be enormous (N items produce up to N² pairs). Tally supports percentage-based sampling to select a subset.

Sampling uses a hash-based approach: for each candidate pair, the content hashes of the two source items are combined and hashed to produce a deterministic value. The pair is included if that value falls below the configured percentage threshold.

This gives you important stability properties:

Adding or removing items doesn't change which existing pairs are included — only new pairs involving the changed items are affected.
Changing the template doesn't affect sampling, since it's based on source content hashes, not the combined output.
Results are deterministic — the same source content always produces the same sample.

Snapshot vs. Live Mode

Corpus pairings support two modes:

Snapshot — generates pairs once at creation time. The target corpus is static and won't change even if the source corpus is updated. Good for one-time analyses.
Live — the target corpus stays in sync with the source. When source content is added, removed, or changed, you can re-sync to regenerate pairs. Good for ongoing evaluation pipelines where the source corpus grows over time.