Contents
In this article we will explore why 128K tokens (and more) models can’t fully replace using RAG.
We’ll start with a brief reminder of the problems that can be solved with RAG, before looking at the improvements in LLMs and their impact on the need to use RAG.
RAG isn’t really new
The idea of injecting a context to let a language model get access to up-to-date data is quite “old” (at the LLM level). It was first introduced by Facebook AI/Meta researcher in this 2020 paper “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”. In comparison the first version of ChatGPT was only released on November 2022.
In this paper they distinguish two kind of memory:
- the parametric one, which is what is inherent to the LLM, what it learned while being fed lot and lot of texts during training,
- the non-parametric one, which is the memory you can provide by feeding a context to the prompt.