How to build a Research assistant with LLM
In the last few years, LLMs were real game changers in a lot of domains. However, LLMs are just models trained to predict the next single token, and this is not enough to build complex application.
The need of chaining the LLMs with traditional software stack to build larger applications gave rise to Langchain, one of the most fast-growing Open-Source projects of all time.
The simple idea of Langchain is to allow the interaction between models and other software (APIs, databases, …) through a common interface.
One of the most powerful technologies (simply integrable with Langchain) is Retrieval Augmented Generation
RAG was proposed by Meta in NeurIPS 2020. They combined LLM (parametric memories) with information retrieval (non-parametric memories) to increase and update the knowledge without additional training. More specifically, they combine these components in a probabilistic model:
A typical RAG application has two main components:
The indexing is composed by several substeps in Langchain:
The problem faced in this project is to have a large bibliography with a huge number of papers, and they are the starting point to write a paper considering those topics. It is hard for an LLM to help because the knowledge required is really specific, otherwise the writing style will not be suitable for a scientific paper. LLMs suffers also of hallucination problems, so it is hard to understand if they are generating wrong sentences. The idea is to consider to integrate RAG over these papers to have a chatbot able to help.
I started with the bibliography of my Master Thesis (40 papers) about WSOD in RSI. I used to the LLM to answer a question about PCIR, one of the most relevant networks in the literature. There is not additional experiment/question, since it was made just for assessment.
I used the Langchain framework, and most of the tools in the indexing pipeline are the default ones:
Grobid is a machine learning library for extracting, parsing, and re-structuring raw documents. It is designed to be used with academic papers and it uses a cascade of sequence labelling models, that can be selected among ML architecture such as CRF and other DL models. The models do not work on text but on Layout Tokens. Grobid could be used (and integrated into Langchain) to parse papers in PDF format into Documents.
The basic experiment (without RAG) is performed using this prompt:
System: The user will provide one or more questions
related to a scientific argument.
You are a paper writing assistant that
writes a paragraph that answers to the questions.
Notice that the paragraph will be
part of a scientific paper,
so the writing style must be technical and formal.
Human: What are the main components and the
key innovations of Progressive Contextual Instance Refinement (PCIR)?
The answer is inaccurate and partially hallucinating.
Progressive Contextual Instance Refinement (PCIR) is a
novel approach in the field of computer vision that
aims to improve the accuracy of object detection and
instance segmentation tasks.
The main components of PCIR include a backbone network,
a contextual module, and an instance refinement module.
The backbone network is responsible for extracting
high-level features from the input image.
The contextual module utilizes contextual information
to enhance the representation of objects and their surrounding context.
This module incorporates a contextual attention mechanism
that selectively focuses on relevant regions.
The instance refinement module refines the initial object
proposals by iteratively updating the bounding box coordinates
and segmentation masks.
It employs a progressive refinement strategy that
gradually improves the accuracy of the detected instances.
The key innovation of PCIR lies in its ability to
effectively leverage contextual information and refine
object instances in a progressive manner,
leading to improved performance in object detection and
instance segmentation tasks.
Applying RAG, the prompt became the following (with the context filled at run time):
System: The user will provide one or more questions
related to a scientific argument.
You are a paper writing assistant that
writes a paragraph that answers to the questions.
Notice that the paragraph will be
part of a scientific paper,
so the writing style must be technical and formal.
Use the following pieces of retrieved context
to write a more complete paragraph:
Context: {context}
Human: What are the main components and the
key innovations of Progressive Contextual Instance Refinement (PCIR)?
In this case the result is much more precise and compatible with the writing style of a scientific paper.
The main components of Progressive Contextual Instance Refinement
(PCIR) are the DCIR module and the PPSP module.
The DCIR module is responsible for mining all
potential instances existing in the images,
while the PPSP module removes noisy negative samples
and maintains the diversity of the samples.
These two modules work together to progressively
refine the contextual instances.
The key innovation of PCIR lies in its ability
to effectively combine these two modules to
achieve superior performance in weakly supervised object detection.
Experiments conducted on the NWPU VHR-10.v2 and
DIOR data sets have clearly demonstrated the
effectiveness of PCIR compared to state-of-the-art
methods under a weakly supervised paradigm.