Unleashing Linguistic Magic: Running a Large Language Model Locally with Ollama
In a world where words are the building blocks of dreams, the rise of Large Language Models (LLMs) has sparked a revolution in how we interact with language. These models, like the mighty GPT-3, have the power to generate text, answer questions, and even engage in meaningful conversations. But what if I told you that you can harness this linguistic magic right on your local machine, without the need for a powerful GPU? Yes, it’s possible! Join me on this journey as we explore the art of running an LLM on your local machine, no GPU required.
Within the realm of running LLMs on local machines, a variety of tools exist, such as Ollama, LM Studio, and others. In this blog, our spotlight shines on Ollama. Why? Because among its peers, it stands out for its speed and user-friendly nature.
Step 1: Setup Ollama on Your Local Machine
To begin, download Ollama from the link provided below:
heck ollama website
Once the download is complete, proceed to install the setup on your machine. After installation, you will find the Ollama icon conveniently located on your taskbar.
Next, you need to have a local Ollama server running. To do this, follow these steps:
Run the following command to install Ollama:curl https://ollama.ai/install.sh | sh
Start the Ollama server:ollama serve
Step 2: Setting the Stage – Choosing the Right Model
Ollama embraces a range of models, such as Llama 2, Code Llama, and more, encapsulating model weights, configuration, and data in a single package known as a Model File.
The top five models most favored on Ollama include:
- llama2
- mistral
- codellama
- dolphin-mixtral
- mistral-openorca
- llama2-uncensored
These models are lighter on resources but still pack a punch when it comes to generating text.
Step 3: Unleashing the Magic – Loading Your Model
To download an Ollama model, use the following command:
ollama pull <model_name>
For example, to download the latest version of Code Llama, use:
ollama pull codellama:latest
Once the model is downloaded, verify it by running the command:
ollama list
Now, you can run your selected model with:
ollama run codellama
Step 4: Casting Spells – Generating Text
There are many ways to connect to the model for generating text:
Using the CLI
Run the command:
ollama run <model_name>
Hitting the API
Ollama also binds with port number 11434 on your local machine. Use the following curl command to generate text:
curl http://localhost:11434/api/generate -d
'{
"model": "llama2"
,
"prompt": "What is wexa ai?"
}'
Step 5: Installing and Using the Ollama Python Library
The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.
Prerequisites
You need to have a local Ollama server running. To do this:
- Download Ollama
- Run an LLM:
- Example:
ollama run llama2
- Example:
ollama run llama2:70b
- Example:
Install the Ollama Python Library
pip install ollama
Usage
Here's how to use the Ollama library in your Python code:
import
ollama
, messages=[
response = ollama.chat(model='llama3'
{'role': 'user'
,'content': 'What is wexa ai?'
,
},
])print(response['message']['content'
])
Streaming Responses
Response streaming can be enabled by setting stream=True
, modifying function calls to return a Python generator where each part is an object in the stream.
import
ollama
stream =ollama.chat
(model='llama3'
,messages=[{'role': 'user', 'content': 'What is wexa ai?'
}],stream=True
,
)for chunk in
stream
:print(chunk['message']['content'], end='', flush=True
)
Embracing the Magic: The World is Your Canvas
As you delve deeper into the world of LLMs, you’ll discover endless possibilities. Write stories, compose poems, or even engage in philosophical debates with the model. The power of language is now at your fingertips, and with a few lines of code, you can create worlds that were once only possible in dreams.