Powering LLM-powered Chatbots to Generate Relevant Answers

Less Is More: Enabling LLM-powered Chatbots to Generate Relevant Answers with Curated, Accurate Information

Of the over 100 million people who have tried out ChatGPT, most have been impressed by its ability to answer almost any question on any topic. Even the least technical among us know that ChatGPT and OpenAI have access to vast quantities of data. Hence, it is easy to assume that always to provide the correct answer; it’s best to be armed with as much information as possible.

That makes sense at first glance: if I don’t know the question, having as much information as I can muster should prepare me to answer just about anything. In short, the more information I have, the less chance I won’t be able to answer a question. But odd as it may sound, that’s not necessarily so.

In fact, the opposite is often the case: less but more relevant and accurate information typically generates better results. In this post, we explore why this concept is important for enterprises to grasp if they are to leverage LLMs in their business applications.

Among the most initially attractive elements of large language models (LLMs) are their scale and general applicability. Indeed, GPT—the term adopted by OpenAI as the prefix for their series of LLMs—means “Generative Pretrained Transformer.” The volume of data used to train the models and the vast number of dimensions (possible next words/phrases in a pattern) each term learned from that training ensures that general LLMs—sometimes referred to as “foundation models” —can, in theory, be applied to various enterprise use cases.

Where we’re looking to provide something more specific, for example a customer- or employee-facing support bot, using a pure LLM in isolation will likely create troublesome issues that we’ll need to resolve. In this case, the general nature of the LLM becomes only partially useful; the ability to structure the response linguistically continues to be highly useful, but the substantive data on which the answer might rely is likely lacking in several ways.

Firstly, our enterprise chatbot use case will always have an associated and highly specific context that needs to be applied. Secondly, the human-bot interaction will occur within one of our owned web properties, and a conversation may be initiated from a specific product or service page within that property. Both of these elements give us part of the context for the question being asked of the bot. They tell what we know of the user—whether explicit as a known entity or implicit from collated visit data—that influences both the question and the correct response.

We also need to control what information is contained, and who has access to this information. Ignoring these requirements by utilizing LLMs in isolation to respond to interactions opens an enterprise to the potential for hallucinatory or otherwise inaccurate information, both of which are unpredictable because the training sources for the LLM are out of the enterprise’s control (and often somewhat deliberately oblique).

An alternative way to make the best use of LLMs while managing the content, context, access to, and veracity of the responses is to use a hybrid approach, with a separate handling application adding user context and approved information to each query (prompt) to build an augmented prompt, which then can be sent to the LLM to generate a complete response. SearchUnify’s FRAG methodology is a good example of this approach.

This way of addressing the technology has several advantages. First, it puts the enterprise in control of which interactions get sent to the LLM. Regular, repetitive, and easily fulfilled queries can be filtered out and responded to without burning any token with an LLM provider. For example, users can be redirected quickly to the login or account pages without further recourse and at a lower cost. Indeed, many other recurring and easily resolved examples will be found within your existing search log files.

For more advanced queries, this application can build a set of information (e.g., approved links) deduced from the user’s context within the property and any active login status to create a complete detailed response before being augmented by the LLM and returned to the user. In all such cases, the enterprise remains in full control of managing the initial query (prompt), how that can be added to before being sent to an LLM (tuning), and any post-response actions that might be applied (filtering). All returned information will be up to date, approved, and subject to any access controls or permissions that the application assigns to the user and their context. So, by leveraging LLMs’ scope and scale and their rich linguistic knowledge alongside local knowledge, control, and context, your chatbot interactions will be richer and more natural, while maintaining your enterprise information standards without worry.

When something is as hyped as Generative AI and LLMs it’s easy to be initially impressed and then quickly dismissive. But the fact is that LLMs are here to stay, yet what works impressively for answering random questions about random topics isn’t necessarily going to work so well for responding to difficult questions about highly specialized topics. Enterprises around the world are testing and experimenting with LLMs, and lessons are being learned about how to utilize their potential in the business world. The first major lesson we have learned is that less is often more and that to generate value from LLMs their use needs to be adapted and modified for the enterprise. Your organization has some things that generic LLMs don’t have, such as understanding of context and access to highly curated, relevant, and accurate specialized information sources. If used wisely, these are the key ingredients to fast-track your business to LLM success.