
How Tellen is Using AI and Large Language Models: RAGs to Riches
How Tellen is Using AI and Large Language Models: RAGs to Riches
Tellen builds audit quality solutions for accounting firms, leveraging the latest innovations in Large Language Models (LLMs) and Artificial Intelligence (AI). With private endpoints, secure architecture, and an elegant user interface, Tellen provides applications ranging from chatbots trained on customers' own data to audit compilation assistance.
The Data Challenge
To accomplish this, we need to efficiently extract data from source documents—PDFs, Word documents, images, spreadsheets, audio, and video—and use that source data to answer questions posed by users and our systems, often routing those questions through LLMs.
However, LLMs come with significant limitations:
- Finite context windows
- Domain knowledge gaps
- "Hallucinated" answers¹
While we can work to counter these issues by expanding context windows, scraping more data, and fine-tuning models, LLMs will always fundamentally lack complete domain knowledge and the ability to efficiently access precise knowledge required for specific tasks.
The RAG Solution
What do we do? One method of mitigating these issues is to augment queries with the information required to answer them—sourced from more relevant and sourceable datasets than the LLM's training data.
For enterprises, this dataset is often:
- Proprietary firm data (e.g., the firm's procedural handbook for completing an audit)
- Client data (e.g., a client's commercial lease contract)
This information was not available to the LLM during initial training but provides the specificity required to answer user questions more precisely. A user's query and this augmenting information are sent to the LLM as a longer query, demonstrating far more accurate results.
Additional benefits:
- Augmenting information is citable as sources
- Reduces the "black box" nature of LLM responses
RAG Phase 1: Rudimentary Semantic Search
This is where Retrieval Augmented Generation (RAG) comes in. RAG is the concept of retrieving better data to augment your LLM's context window, then generating useful output.
Tellen began its journey with the most rudimentary form of RAG:
- Split input text into "chunks" based solely on character count
- Created high-dimensional vectors (embeddings) to represent them
- Stored vectors in a database alongside original chunks
- For user queries, used the same embedding model to create query vectors
- Calculated distances between query and chunk vectors (using cosine similarity)
- Retrieved the "closest" information to augment the original query
This process is known as semantic search.
RAG Phase 2: Precision Chunking and MetaDocumenting Agents
Tellen's current data sources include text, tables, and images. (Images undergo OCR to become text; audio/video sources are transcribed to text, though multimodal LLMs are preferred for these formats.)
Precision Chunking
We developed "Precision Chunking" to obtain chunks not by simply counting characters, but by delineating paragraphs and tables using Document Intelligence AI models. This:
- Prevents corrupting input data by splitting sentences, paragraphs, or tables
- Maintains minimal subject separation between chunks
- Stores metadata alongside chunks and vectors, including PDF bounding boxes for precise source identification
MetaDocumenting
For answers across multiple documents, we use "MetaDocumenting," which:
- Summarizes documents entirely
- Applies embeddings to summaries
- Runs semantic search on MetaDocuments to choose most relevant documents
- Runs secondary semantic search over chunks within selected documents
- Concatenates original query with received chunks for LLM synthesis
RAG Phase 3: Agentic Workflows (Future)
We're building an advanced system of agents stitched together into workflows to optimize answers. This will form a RAG pipeline with multiple modules:
Planned Components
- More sophisticated routing
- Reranking capabilities
- Refined semantic search
- Workflow-specific modules based on query type
System Architecture
- Routing agent: Receives queries and determines paths through various modules
- Adaptive paths: Agent paths vary based on module outputs
- Primary tools: LLMs and embedding models for multi-dimensional vectors
- Dual workflows: Some scripted by Tellen (e.g., parsing audit files), others driven by chatbots
Users or Tellen's backend will enter queries into the routing agent, which will execute appropriate workflows depending on the query type.
Looking Forward
In our next post, we'll dive into precisely what modules we'll be building as part of our advanced agentic workflow system.
¹ Though in common parlance, "hallucination" is a poor choice of words given that, with its statistical backbone, all LLM output is hallucinated. Some hallucinations are more accurate than others, and those below a certain accuracy threshold are termed hallucinations.
Reference: https://arxiv.org/pdf/2312.10997