Building a Customer Support Chat Bot

I hate chat bots. So naturally, in a fit of self-defiance, I decided to build one. This consisted of fine-tuning an OpenAI model and then developing a simple UI wrapper around the resulting custom chat endpoint.

Preparing the Training Dataset

I’ve been running a small business for about ten years, so I started by collecting all of the customer support emails that we’ve received to date. I built a simple spreadsheet with two columns: prompt and completion. The customer’s question is in the prompt column (“Are you open on Sundays?”). The response is in the completion column (“I’m sorry, we are not open on Sundays. We are open Monday through Friday.”).

In the case of extended back and forth conversations, just the very last response is in the completion column. The rest of the back and forth conversation all goes into the prompt column. This was one of the more pleasant surprises I learned while preparing the training data for the fine-tuning. You may have a situation that starts off one way but then evolves into an escalated exchange — it may seem important to capture this dynamic, and you can. You feed that entire arc of the conversation in at the “prompt” and then train the model to understand the appropriate final response as the “completion.”

So I went through all of our emails and prepped this dataset. This was the most time-consuming aspect of the entire process.

Fine-Tuning a Custom Model

Once the dataset was ready, I simply saved it as a csv file, then ran two scripts in the command line. That’s it. Just two lines.

This first line validates and reformats the data as a JSONL file ready for fine-tuning:

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

The actual fine-tuning is then initiated with this line, with the “-m davinci” flag indicating the base model (in this case, “davinci”) used for fine-tuning:

openai api fine_tunes.create -t <TRAIN_FILE> -m davinci

Note that as of July 2023, the fine-tuning is not yet available on GPT3.5 or GPT4 models. So we’re working with the davinci model, which is far less dynamic and convincing when compared with the more recent models.

In my case, with only a few hundred customer support emails, the actual fine-tuning process took less than ten minutes and cost about $3. The final output is simply a custom model name that you will now put in the model field when you make a request to the createCompletion endpoint:

const response = await openai.createCompletion({
   *** note the model value ***
  model: "davinci:ft-personal-custom-file-name",
  prompt: promptText,
  ...
  });

First Results and Retraining the Model

I was surprised by how much the fine-tuned model hallucinated even on basic facts that had been included in the training data several times. The model hallucinated operating hours, where the business is located, and even made up an elaborate business origin story that has no basis in facts from the training set.

Not wanting to blame this all on the fine-tuning process, I decided to perform another fine-tuning with an improved data set. I added some clear language to more often say “I do not know the answer to your question.” This did not have a major effect. I then created additional “synthetic data” by doubling the entire training set and slightly rewording the customer emails (“Where are you located?” → “What is the address?”). This gave me several hundred additional examples to train on. Still no major improvements in outcomes.

I then experimented with the number of epochs to train the fine-tuning model.

And I specified in the prompt for the model to simply say “I am not sure” if the model was at all uncertain of the answer. Still, despite all of this effort, the results contained too many hallucinations to be useful. The only way I could get consistent, reliable results that I would feel comfortable putting in production was by constraining the user input on the front end to toggling pre-written questions. So this becomes less of a chat interface and more of an interactive FAQ.

As I developed this pitiful user interface to mitigate the poor performance of the fine-tuned model, it’s lack of genuine choice and limited granularity increasingly reminded me of another terrible user experience that has become ubiquitous on modern websites: the cookie consent banner.

A generous interpretation would be that chat bots and cookie consent banners start out with good intentions but then end up becoming frustrating user experiences that are mostly ineffective. While chat bots aim to provide automated assistance and convenience, they often fall short in delivering accurate and nuanced responses. Similarly, cookie consent banners aim to protect user privacy, but their implementation often leads to consent fatigue and fails to provide users with meaningful control over their data. In both cases, the initial intentions of improving user experiences and privacy are overshadowed by the subsequent shortcomings and total disregard for users in the development of these technologies.

One Good Takeaway

The vast amount of the work required to fine-tune a customer support chat bot is in the data preparation itself. This is a tedious process, but it was during this process that I came to the one good takeaway of this whole project: The vast majority of user questions were the result of poor website design.

Instead of building a system to more efficiently answer these customer questions, I realized that our customers would benefit more from a website that made it so that they didn’t even need to ask these questions in the first place.

If I‘m so annoyed that people are asking the same questions over and over that I build a chat bot to more efficiently provide a response, I’m missing the point entirely — nobody should need to write in and ask for our location or if we are open on Sunday. Most of the questions we were getting could have been avoided entirely with basic UX/UI improvements.

I don’t regret the time spent combing through all of our customer support emails and organizing them into a cohesive dataset. It just became clear at the end of the day that building a chat bot was the wrong next step.

We want websites that are intuitive and easy to navigate. Whether this is the IRS or a small business website, using customer support emails sent in due to poor website design to then deploy a fine-tuned customer support bot on top of that same website is a Kafka-esque nightmare.

So what do I think when I see a chat bot pop up in the bottom corner of a company website? I think that I must be one of hundreds of customers who have struggled with basic questions about how to process a return, change an appointment, close an account, or track a package. Rather than improving the website by proactively addressing these common concerns, the mid-level managers have scooped up the customer support emails born out of poor website design and used these to fuel the deployment of a flawed chat bot on the very same website. I minimize the chat bot modal and close out the cookie consent banner.

(I decided not to deploy the chat bot.)