Why organizations should (and shouldn’t) build their own LLMs

Full name

11 Jan 2022

•

5 min read

Introduction

In this article, we will go over the pros and cons of building proprietary LLMs. One motivation for this discussion are the recent events at OpenAI. But beyond that singular occurrence, there have been several developments lately in the applied AI research community that help both sides of this debate.

"This framework is meant to help teams decide if building an LLM is a good idea for them."

What does “building an LLM” even mean?

Before we get into the specifics, let’s clarify what “build your own LLM” actually means. Many consider fine tuning an existing open-(or closed-)source LLM equivalent to building their own. While fine tuning might sometimes be enough, there are several other steps that organizations may need to take depending on their use case. This includes pre-training, reinforcement learning (either synthetically via an adversarial approach or with a human in the loop), ensembles, RAG (everyone’s current favorite approach) and more. For this article, we consider all permutations and combinations of these steps to be “building your own LLM.”

Pros

Model sovereignty: Building your own model means that you own it. Ideally, that’s not just written in a contract somewhere; it means exclusively owning all of the weights of your model. There are many reasons why this is important for some organizations, but most critically this makes the model your intellectual property. It can also form the basis of a moat: because no one else has your data and training process, no one else can recreate the same results. As an added benefit, it eliminates vendor lock-in; this means you can run your model anywhere, particularly useful if one of your vendors is going through corporate upheaval.
Reliability and resource planning: Because custom built models are more flexible, you get complete control over the kind of infrastructure you require - this means fewer incidents, less downtime and easier resource planning. It also means lower latency, control over regional deployments and having the ability to decide what kind of hardware the model runs on. Enterprise engineering teams make these decisions for all of their other production systems today, why should it be any different for LLMs?
Higher accuracy: Building a model with a handful of specific use cases in mind and using directly relevant data is a sure way to get good results. Existing models like GPT4 and Claude sound human because they’re trained on reams of data pulled from the internet, places like Reddit and Twitter. Unfortunately that also means that the data contains factual inaccuracies. Training an LLM on factually accurate, albeit specific, pieces of data is a good way to mitigate the hallucination problem. While GPT4 might do well in a demo, to actually make it consistently perform across millions or billions of inferences requires a lot of work.
Lower operating costs at scale: Although building a model costs a good bit of money upfront, if you’re running a lot of inference, it is cheaper in the long run. This is because the model is relatively small, and because you can negotiate a reserved instance rate for said computing resources with your cloud provider. Also consider the fact that getting GPT4 to work for a complicated use-case usually requires multi-shot prompting, which uses up a lot more tokens than a model that’s trained on the problem already. Some of the latest applied AI research - vLLM, HuggingFace’s TGI, FlashAttention, S-LoRA etc - revolves around making inference faster and easier to run on cheaper hardware. Training is also an area of interest - we’ve seen LoRA, PEFT, bitsandbytes and more find success in the fine-tuning space, and think it’s only a matter of time before the other parts of the building process see similar leaps.‍
Privacy: While a “private LLM” usually means that it can run in a VPC, the customer’s cloud or on a dedicated instance hosted by the infra provider (and that’s usually sufficient), some companies aren’t cloud native yet and need physical air-gapping. The beauty of a purpose built LLM is that it can even run on-premise - yes that means a server rack - if it needs to.
More Explainability: While LLMs are notoriously black boxes, building one’s own adds a lot of scope for making them more explainable. There are a number of approaches to consider as discussed in this paper: step-by-step, recall then generate, recall-reason-generate, direct answer, etc. The ultimate takeaway here is that by knowing precisely what training data went into the model, carefully crafting the training process, having a good RAG pipeline and focusing the model on familiar, well documented tasks, LLMs can become less of a black box.

Cons

High initial time and financial cost: Building a model starts at tens of thousands of dollars, takes weeks (or even months) and usually requires a fair bit of expertise. Obviously, this is usually not a good approach for a pre-PMF or MVP product. It’s usually also unnecessary for low value use-cases. Starting with OpenAI or an equivalent, and then potentially moving to a custom model over time is a more reasonable approach.
Data requirements: Working around data silos is probably one of the hardest challenges in enterprise technology today - and building a custom model requires millions, even billions, of tokens of data that is likely spread across said silos. Furthermore, cleaning and fact-checking these datasets is an intense exercise that also requires great care, experience and resources.
Ongoing maintenance: In this respect, an LLM is like any other piece of software, and sometimes even more finicky. The model needs to continuously improve to keep showing value, and it (just like all other software) will make mistakes that need to be fixed. That means your first training run isn’t your last, and even the simplest of hallucinations or mistakes can involve a fair bit of work to get the model back in shape.
GPU availability: This is a temporary problem and, we think, will improve steadily in the next 2-3 years. However, for now, getting access to enough GPUs to train and run inference is no small feat. There are lots of serverless GPU companies (Replicate, Banana.dev, Modal, to name a few) and they can be a great option for those wishing to get started.‍
Less generalizability: Because these models are trained on such specific datasets and tasks, they are very unlikely to be able to solve anything outside the purview of their training well. This means that adding a new task to the model’s repertoire is generally not a small feat; in the best case scenario, it could take a few days, and in the worst case several weeks or even months. Additionally, if the use-case is high value, the model needs to be extensively red teamed on every task before it’s allowed to touch a production environment. This is still a relatively manual process and can take a while.

Takeaways

In summary, building your own LLM is not for the faint of heart. It is also a collection of steps, not just fine-tuning. But the fact that it’s so complicated also makes it an incredibly powerful alternative to using an off the shelf model, and for some use-cases that’s just what’s needed. We find that privacy, long term cost and additional explainability are usually the driving forces in making enterprise buyers interested in them.

Want to build your own LLM? Get in touch with us by booking a call using the link above!

Share this post

Tag one

Tag two

Tag three