Open Problems in AI-Assisted Research Tools
Table of Contents
A Field That Moves Fast and Misrepresents Itself
AI-assisted research tools have proliferated rapidly. Most claim to “summarise” or “analyse” documents. Few are transparent about what they actually do, what they get wrong, and where the hard problems remain unsolved.
This post is an attempt at honesty about the state of the field — including our own limitations.
Open Problem 1: Hallucination in Low-Evidence Regimes
When a document is silent on a topic, a model may fill the gap with plausible-sounding fabrication. Grounding outputs strictly to retrieved passages reduces the problem but does not eliminate it.
What needs to be solved — calibrated uncertainty quantification that reflects actual epistemic state, not just output confidence scores.
Open Problem 2: Long-Document Faithfulness
Current architectures struggle to maintain consistent attention across very long documents. Claims from early sections may be ignored when synthesising conclusions that appear late in the document.
Open Problem 3: Domain Shift
Models trained on general scientific text may perform well in biomedicine but poorly in highly specialised sub-fields. Domain-specific fine-tuning is expensive and requires labelled data that may not exist.
Open Problem 4: Evaluation
How do you measure whether a hypothesis was correctly extracted? We do not yet have robust benchmarks for most research-assistance tasks. This makes it difficult to compare tools or track progress rigorously.
What We Are Working On
At assay.it, our current research priorities are:
- Retrieval-augmented grounding with explicit evidence attribution
- Uncertainty-aware claim scoring
- Evaluation datasets for hypothesis extraction and literature synthesis
- Domain adaptation for specialised corpora
We publish our findings and limitations openly. If you are working on any of these problems, we would like to hear from you.