Open Problems in AI-Assisted Research Tools

A Field That Moves Fast and Misrepresents Itself

AI-assisted research tools have proliferated rapidly. Most claim to “summarise” or “analyse” documents. Few are transparent about what they actually do, what they get wrong, and where the hard problems remain unsolved.

This post is an attempt at honesty about the state of the field — including our own limitations.

Open Problem 1: Hallucination in Low-Evidence Regimes

When a document is silent on a topic, a model may fill the gap with plausible-sounding fabrication. Grounding outputs strictly to retrieved passages reduces the problem but does not eliminate it.

What needs to be solved — calibrated uncertainty quantification that reflects actual epistemic state, not just output confidence scores.

Open Problem 2: Long-Document Faithfulness

Current architectures struggle to maintain consistent attention across very long documents. Claims from early sections may be ignored when synthesising conclusions that appear late in the document.

Open Problem 3: Domain Shift

Models trained on general scientific text may perform well in biomedicine but poorly in highly specialised sub-fields. Domain-specific fine-tuning is expensive and requires labelled data that may not exist.

Open Problem 4: Evaluation

How do you measure whether a hypothesis was correctly extracted? We do not yet have robust benchmarks for most research-assistance tasks. This makes it difficult to compare tools or track progress rigorously.

What We Are Working On

At assay.it, our current research priorities are:

Retrieval-augmented grounding with explicit evidence attribution
Uncertainty-aware claim scoring
Evaluation datasets for hypothesis extraction and literature synthesis
Domain adaptation for specialised corpora

We publish our findings and limitations openly. If you are working on any of these problems, we would like to hear from you.