Run Hugging Face LLMs Free on Google Colab

Running LLMs from Hugging Face Hub for Free on Google Colab

If you’ve ever wanted to experiment with large language models but lacked the hardware, here’s the good news: you can run Hugging Face models directly in Google Colab, taking advantage of free T4 GPUs.

Here’s the setup in a nutshell:

Open a Colab notebook and select GPU (T4).
Obtain a token from Hugging Face Hub (for accessing models).
Use transformers library to load a model via the Hugging Face Hub.
Run inference locally in Colab. No paid API or hosting required!

Hugging Face Can Feel Overwhelming at First

If you’re new to Hugging Face, it’s easy to get lost in its rich ecosystem: transformers, datasets, inference, and more. Each library plays a different role, and understanding how they connect can take a bit of time.

A key distinction many newcomers miss is how and where your model actually runs and that’s where the difference between pipeline and InferenceClient becomes important:

pipeline. Downloads the model weights and runs it locally (on your Colab T4 or your own GPU). Great for learning, experimentation, and custom workflows.

from transformers import pipeline
pipe = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.2")
result = pipe("Explain quantum computing in simple terms:")
print(result)

InferenceClient. Sends your request to the Hugging Face Inference API, where the model runs remotely on one of their AI infrastructure providers.You don’t need to manage hardware. The compute is handled entirely by Hugging Face and their partners.

from huggingface_hub import InferenceClient
from dotenv import load_dotenv
load_dotenv()
hf_token = os.getenv("HF_TOKEN")
    
client = InferenceClient(token=hf_token)
resp = client.text_generation(
    prompt='Tell me a math joke', 
    model="meta-llama/Llama-3.1-8B-Instruct",
    max_new_tokens=100,  # Generate up to 100 new tokens
    temperature=0.7,     # Add some randomness
    do_sample=True       # Enable sampling for more creative responses
)
print(resp)

Running LLMs from Hugging Face Hub for Free on Google Colab

Hugging Face Can Feel Overwhelming at First

Enjoy Reading This Article?