top of page

Subscribe to our newsletter - Modern Data Stack

Thanks for subscribing!

The Ultimate Pricing Cheat-Sheet for Large Language Models

Updated: Mar 5

Last update - 05 March 2024

Latest updates: Pricing updates from Claude, OpenAI and Google are now in the data


How much does it cost for you to integrate with a large language model? Here's an article that will show the rough relative costs of using different models for chat / completion models, costs for embedding and for making use of a fine-tuned model. I will keep this page updated, so bookmark this and use this as a reference guide whenever you're working on your next cool idea. Happy building!


Additionally, if you have information on other models OR if you have additional pricing inputs to share, do write in the comments.


A quick intro before moving ahead with the pricing comparison

Interest and usage of large language models have exploded. Most of you have already tried ChatGPT / Bard, and the next value unlock will be to customize the powerful capabilities of an LLM towards your specific use case and application. The 2 key questions to answer for you are:


1. Which model should I choose?
2. How much will it cost me / my organization?

While I cannot answer question 1 in this article (the short answer is: it depends), we can certainly look into question 2 to understand how much it costs, in order for us to make a better business case for the development of a custom app. If you're looking for some help with assessing which model to use, its solution design or development of your LLM based use case, fill up this form and let's talk!


Tokens vs Characters vs Everything else


The big challenge in comparing pricing across providers is the usage of different terms for pricing - OpenAI uses tokens, Google uses characters, Cohere uses a mix of "generations" / "classifications" / "summarization units" (someone's been busy there!), and I'm pretty sure we'll see more pricing levers being introduced as time goes on. What I've tried to do instead, is to normalize all pricing from all providers to a single unit: price per 1,000 tokens. So, I've converted Google's character based pricing and Cohere's generation / classification etc, all to pricing per 1,000 tokens. Note that this is by no means perfect, as there are likely other costs involved, and also 1,000 tokens is not the standard pricing unit for all vendors. Nevertheless, it's a good start for you to start estimating costs.


Some definitions:

1. Session - I created the concept of a "session" to make it easier for us to visualize costs for an application we're building out. The definition of 1 session is: A session between a human and your bot consisting of 100 words x 5 characters each = 500 characters = 125 tokens (in english language). Assume 25% of the tokens are consumed by a human and 75% is consumed by your bot generating outputs.


2. Page - I like to think about embeddings in terms of pages rather than tokens. In this sheet, I've assumed 1 page = 500 words x 5 characters = 625 tokens per page


Pricing

So let's get into it!


A. Usage of General Prompt / Completion or Chat models

Let's first look at costs for all completion and chat models, the ones that we would use for most often: "ChatGPT for my App", chatbots, knowledge retrieval bots (+ add costs of embeddings to this)

1. Costs for models with separate prompt and completion costs are calculated as 25% x prompt cost + 75% x completion cost

2. Assuming 1 session = 100 words x 5 characters = 500 characters = 125 tokensA


B. Embedding Costs

What does it cost for you to embed your knowledge base using one of the embedding models available? Let's look at 3 scenarios - a knowledge base of 10,000 pages, 100,000 pages and 1 million pages.

1. Assuming 1 page = 500 words x 5 characters = 625 tokens per page

2. Azure OpenAI may have additional infrastructure costs for embeddings

3. Costs for hosting / licensing of vector databases are not included


C. Cost of Fine-tuning a model + usage:

How much would it cost us to fine-tune a model and then use this fine-tuned model within our app? This is surprisingly not very high when using OpenAI's models. We can't put any pricing to Google's & Azure OpenAI's fine-tuning costs, because they require the use of compute costs which can be variable. Cohere's custom models also require us to talk to their sales teams, which makes a clear comparison impossible.


Pricing scenario: Comparing [Embeddings + Prompt models] vs [Fine-tuning + completions]

Now here's an interesting comparison for you - there's currently a raging debate in the AI community about the approach towards building domain specific bots:


1. Should we run a semantic search on our entire knowledge base + create prompts to a standard model for our answers? This requires us to use embeddings and general LLM models.


OR


2. Should we create a fine-tuned model specifically on our data, and then use that for our application? This requires us to create a set of prompt-completions and then use that model exclusively.


Now I know I'm simplifying the comparison here, but if we were to compare the pricing required for these 2 scenarios, here's what it looks like:


1 - An LLM bot that has been trained on ~100,000 pages of information with 100,000 sessions / month of usage on GPT-4-32k, which is an extremely capable LLM and the best right now - the cost would be ~$1,400 / month
2 - An LLM bot that has been trained on ~5,000 prompt-completion pairs, with usage of 100,000 sessions a month on the text-da-vinci model will cost about $1,500 / month

Which one is better for your use case? The answer as always is "it depends" - but the fact that you could have your very own custom trained model of OpenAI for nearly the same price as GPT-4's generic 8k model will be compelling to explore, especially for super specific use cases for the enterprise.


Did you enjoy this article? Have I missed something in the pricing calculations? Drop a note below and get the discussion started!


4,459 views1 comment

1 Comment


Guest
a day ago

Appreciate your writing on this topic, as I understand it is very difficult to write a comparison of various models and approach of usage. Just a small correction - In the "Usage of General Promot" section - Last column suppose to be 1 Million and looks it is misprinted with one less "0".

Like
bottom of page