Literature Synthesis at Scale: From 500 Papers to One Dossier
Table of Contents
The Synthesis Problem
Reading 500 papers and forming a coherent view of a field is cognitively demanding in a way that is hard to parallelise. You can split the reading across a team, but integrating 20 people’s notes into a coherent synthesis is its own problem.
assay.it approaches this differently: instead of summarising each paper independently, it builds a shared evidence structure across the entire corpus, then synthesises from that structure.
Building the Evidence Structure
The process has three phases.
Phase 1 — Entity Resolution
All entities (concepts, compounds, methods, organisms, institutions) are resolved to canonical identifiers. “mTOR inhibitor”, “rapamycin”, and “sirolimus” are recognised as referring to the same class of compounds. This allows claims about the same thing to be aggregated regardless of terminology.
Phase 2 — Claim Aggregation
All claims about each entity pair are collected and grouped by type (causal, correlational, definitional). Claims with compatible semantics are merged; conflicting claims are flagged.
Phase 3 — Evidence Grading
The aggregated evidence is graded by study design, sample size, replication count, and recency.
What the Output Looks Like
The final dossier contains:
- Evidence map — a network of entities and the evidence-weighted relationships between them
- Consensus view — a narrative synthesis of the strongest evidence
- Controversy log — a list of claims where the evidence is conflicting or insufficient
- Open questions — gaps where the corpus contains no direct evidence
Practical Tips
- Start with a focused research question; broad queries produce large corpora that take longer to synthesise
- Use the entity resolution step to align terminology before synthesis
- Always review the controversy log — disagreements in the literature are often the most interesting findings