Opinions & Insights

When Small is Mighty: Time to Ditch LLMs for Smaller AI Models

Last Updated on June 07, 2024 10 min read


We're in the era of AI, undoubtedly, and nowadays, large language models (LLMs) like ChatGPT, Gemini and Claude3 OPUS are all the rage. Naturally, machine learning and AI has been around for a while, but these models have taken the field by storm. They are incredibly versatile, powering everything from chatbots to content creation. So don't get me wrong – they're incredible, no doubt about it. However, I've noticed a growing trend: people are now relying on LLMs so much for tasks where smaller, fine-tuned models would actually do a better job.

In this article, I'll share my thoughts on why smaller, task-specific models can sometimes outperform these giant LLMs. We'll explore their strengths, limitations, and when it's best to use each. By the end, you'll have a better understanding of how to choose the right model for your next project.

Table of contents

The Versatility of Large Language Models

First, let's talk about why LLMs have become so popular.

Strengths of LLMs

As I mentioned earlier, LLMs are super versatile. For natural language processing (NLP) tasks, they're all-purpose, and can handle a wide range of tasks without needing specific training for each one. This versatility comes from being trained on massive datasets that cover a broad spectrum of human language. For example:

  • Text Generation: Writing essays, stories, or even code. ChatGPT, for instance, can generate human-like text based on prompts, making it useful for content creation and coding assistance.
  • Translation: Converting text from one language to another. They can translate between multiple languages, making them valuable tools for global communication.
  • Summarization: Breaking down long articles into concise summaries. LLMs can read lengthy documents and provide succinct summaries, which is helpful in research and information retrieval.
  • Vision and Audio Tasks: Although primarily designed for text, LLMs are being adapted to handle tasks like image captioning (e.g., describing the content of a photo) and audio transcription (e.g., converting speech to text).

And more - my guess is that you use them for a lot more than these. These capabilities are impressive, super impressive. However, versatility often comes at the expense of specialization. While LLMs are good at many things, they aren't always the best at everything.

Limitations of LLMs

Here’s where the cracks start to show:

  • Performance: LLMs might be overkill for simple tasks. They can also struggle with tasks requiring high precision or domain-specific knowledge. For example, while an LLM might generate a decent summary, a model specifically trained for summarization in the legal domain will likely produce more accurate and relevant results.
  • Speed: Because they're so big, they can be slow, which is a problem in real-time applications. For instance, if you need to translate text on the fly during a live conversation, you don't want to wait for an LLM to churn through the data.
  • Resource Intensive: Running and fine-tuning these models requires crazy and substantial computational power and infrastructure. For example, GPT-4 was trained over 3months!! 3 freaking months, using thousands of GPUs and enormous energy resources. Sam Altman estimated that it cost almost $100 million for these resources!!!! That's a lot of power and money, and not everyone has access to it (I most certainly don't).
  • Deployment: Deploying LLMs is a significant challenge. Even though there are open-source LLMs like Llama3 and Gemma available on HuggingFace Hub, running them in production requires specialized systems loaded with NPUs or TPUs and enough memory to handle inference. That's why most of the time, we just opt for API services from companies like OpenAI or Google, who have the resources to deploy these large models. In contrast, smaller models can run on a a laptop or even on a 2GB RAM DigitalOcean droplet, making them much easier to deploy.

The Challenges of Fine-Tuning LLMs

Let's talk a bit about fine-tuning models. Fine-tuning is a process of training a pre-trained model on a specific dataset to adapt it to a particular task. This process can significantly improve the model's performance for that task. However, fine-tuning LLMs comes with its own set of challenges.

To fine-tune an LLM, you need serious computational resources. We're talking about powerful GPUs or TPUs, ample storage, and a robust infrastructure to handle the training process. This setup is expensive and often out of reach for many organizations. There have been different approaches to reduce the computational cost of fine-tuning LLMs, like distillation, but they still require significant resources. Also, it's not just about having the right hardware. It involves a deep understanding of the model architecture, careful preprocessing of data, and extensive training time. The costs add up quickly, both in terms of money and expertise required. In the end, the practicality of fine-tuning an LLM is questionable for most organizations.

Now, let's switch gears and talk about smaller, task-specific models.

What Are Task-Specific Models?

Task-specific models are designed and trained to excel in a particular task. Just as Cardiologists specialize in heart health or Neurologists focus on the brain, and you would prefer to see a Cardiologist for heart issues and a Neurologist for brain issues, task-specific models are tailored to handle specific problems. For example:

  • Sentiment Analysis: A model trained specifically to determine the sentiment of a piece of text. For instance, analyzing customer reviews to detect positive or negative sentiment, which is crucial for businesses. A more specific example would be a sentiment analysis model tailored to detect the unique language and slang used by teenagers on social media. It's a unique niche that a general sentiment analysis model might struggle with.
  • Spam Detection: A model focused solely on identifying spam emails. A model fine-tuned for financial sector emails will perform better than a general spam detector.
  • Named Entity Recognition (NER): A model trained to identify names, dates, and other entities in text. NER models are vital in extracting structured information from unstructured data. For eg, A NER model can finetuned to extracting medical terms from text.
  • Image-to-Text: Models like DenseNet or ResNet were specifically trained to convert images to text descriptions and perform really well in this task. There are also custom smaller models from PaddlePaddle like PaddleOCR that are designed for Optical Character Recognition (OCR) tasks. These models super small (sometimes < 80MB) but still provide high accuracy in OCR tasks.
  • Object Detection: Models like YOLO (You Only Look Once) or SSD (Single Shot MultiBox Detector), are designed to detect objects within images, and they're super fast and efficient.
  • Automatic Speech Recognition (ASR): Models like DeepSpeech, Whisper or wav2vec, trained to convert spoken language into written text.

Advantages of Task-Specific Models

  • Performance: Because these models are trained for a specific task, they often perform better than LLMs in their specialized domain. For example, a sentiment analysis model trained to detect sentiment in social media posts about mental health will be more accurate than a general LLM.
  • Size: They are smaller in size compared to LLMs, making them easier to deploy and run on modest hardware. We're talking about models that are sometimes less than 100MB in size, which is a huge difference compared to LLMs that can be several GBs in size (Llama3 70B is 40GB in size).
  • Speed: Smaller models are faster and more efficient, which is crucial for real-time applications like chatbots or recommendation systems.
  • Resource Efficiency: They require less computational power, making them accessible to more organizations without needing massive infrastructure. Heck, some don't even need GPUs to run, just a decent CPU and a few GBs of RAM.

Case Study: Embedding Models

To illustrate this point, let's take a closer look at embedding models. These models are designed for feature extraction, turning text into numerical representations (embeddings) that capture the meaning and context.

Using an LLM for embeddings might seem like a good idea since it understands a lot of language nuances. However, there are downsides. They carry a lot of unnecessary baggage for simple embedding tasks, can be slow, and tuning them to generate high-quality embeddings can be challenging and resource-intensive.

However, there are models specifically designed for generating embeddings, like BERT or FastText. These models are tailored for the job, they generate embeddings quickly, without the overhead of a full LLM, produce better quality embeddings that capture the meaning of words, sentences, or paragraphs more effectively, and are easier to fine-tune and deploy, even on modest hardware.

Hybrid Approaches

In some cases, a hybrid approach might actually be the best solution. One way can be to use an LLM for initial processing and then passing the output to a smaller, fine-tuned model for specific tasks. This way, you get the best of both worlds – the versatility of an LLM and the efficiency of a task-specific model.

Another method worth mentioning is distillation. Distillation is a machine learning technique of transferring the learnings of a large model to a smaller model. The idea is to train a smaller model to mimic the behavior of a larger model. This approach can help reduce the computational cost of training and deploying large models while maintaining their performance. On HuggingFace Hub, you can find distilled versions of popular LLMs like DistilBERT and TinyBERT, which offer a balance between performance and efficiency.

One other strategy worth mentioning is using LLMs to generate training data for smaller models. Sourcing high-quality training data can be difficult and time-consuming. However, LLMs can generate good enough data to augment your training datasets. For example, if you're training a sentiment analysis model, you can prompt an LLM to generate various sentences with different sentiments, providing a diverse dataset to fine-tune your smaller, task-specific model. Of course, you'll need to validate the generated data carefully to ensure its quality and relevance. There's also a risk of introducing biases if the LLM itself is biased.

Imagine you need a dataset to train a spam detection model for a niche field, such as medical emails. You can prompt an LLM to generate examples of both spam and non-spam emails in this context, providing a valuable dataset to fine-tune your smaller, task-specific model.

Practical Applications and Recommendations

So, when should you use smaller models, and when are LLMs the right choice?

Consider smaller, fine-tuned models when:

  • Specific Tasks: You need high performance for a specific task, like sentiment analysis or spam detection.
  • Real-Time Applications: Speed is crucial, such as in chatbots or live recommendation systems.
  • Limited Resources: You don't have the infrastructure to support large models.

LLMs on the othe hand shine in scenarios where:

  • Versatility is Needed: You need a model that can handle multiple tasks without specific training for each one.
  • Exploratory Projects: You're experimenting with different NLP applications and need a flexible tool.
  • Resource Availability: You have the infrastructure and budget to support the computational demands.

In some cases, a hybrid approach might be the best solution. For instance, you could use an LLM for initial processing and then pass the output to a smaller, fine-tuned model for specific tasks. This way, you get the best of both worlds – the versatility of an LLM and the efficiency of a task-specific model.


To wrap things up, while large language models like ChatGPT and Claude3 are powerful and versatile, they aren't always the best tool for every job. Sometimes, bigger isn't always better 😉. Smaller, fine-tuned models offer significant advantages in terms of performance, speed, and resource efficiency, especially for specific tasks.

It's very important to understand the strengths and limitations of each type of model as you can make more informed decisions that lead to better outcomes in your projects.

So next time you're faced with an NLP or Vision task, take a moment to consider whether a smaller, fine-tuned model might be the better choice. You might be surprised at the results.

About Me

Hey there, welcome to my blog! I'm Kyrian, a programming enthusiast with a passion for web and game development. I love sharing my knowledge with others and helping them discover new things. Head over to my about page to learn more about me and my journey in the tech world. See you there!.

Get to know me