Unleashing Linguistic Magic: Running a Large Language Model Locally with Ollama

Unleashing Linguistic Magic: Running a Large Language Model Locally with Ollama

In a world where words are the building blocks of dreams, the rise of Large Language Models (LLMs) has sparked a revolution in how we interact with language. These models, like the mighty GPT-3, have the power to generate text, answer questions, and even engage in meaningful conversations. But what if I told you that you can harness this linguistic magic right on your local machine, without the need for a powerful GPU? Yes, it’s possible! Join me on this journey as we explore the art of running an LLM on your local machine, no GPU required.

Within the realm of running LLMs on local machines, a variety of tools exist, such as Ollama, LM Studio, and others. In this blog, our spotlight shines on Ollama. Why? Because among its peers, it stands out for its speed and user-friendly nature.

Step 1: Setup Ollama on Your Local Machine

To begin, download Ollama from the link provided below:

heck ollama website

Once the download is complete, proceed to install the setup on your machine. After installation, you will find the Ollama icon conveniently located on your taskbar.

Next, you need to have a local Ollama server running. To do this, follow these steps:

Run the following command to install Ollama:
curl https://ollama.ai/install.sh | sh

Start the Ollama server:
ollama serve

Step 2: Setting the Stage – Choosing the Right Model

Ollama embraces a range of models, such as Llama 2, Code Llama, and more, encapsulating model weights, configuration, and data in a single package known as a Model File.

The top five models most favored on Ollama include:

  1. llama2
  2. mistral
  3. codellama
  4. dolphin-mixtral
  5. mistral-openorca
  6. llama2-uncensored

These models are lighter on resources but still pack a punch when it comes to generating text.

Step 3: Unleashing the Magic – Loading Your Model

To download an Ollama model, use the following command:

ollama pull <model_name>

For example, to download the latest version of Code Llama, use:

ollama pull codellama:latest

Once the model is downloaded, verify it by running the command:

ollama list

Now, you can run your selected model with:

ollama run codellama

Step 4: Casting Spells – Generating Text

There are many ways to connect to the model for generating text:

Using the CLI

Run the command:

ollama run <model_name>

Hitting the API

Ollama also binds with port number 11434 on your local machine. Use the following curl command to generate text:

curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "What is wexa ai?"
}'

Step 5: Installing and Using the Ollama Python Library

The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.

Prerequisites

You need to have a local Ollama server running. To do this:

  1. Download Ollama
  2. Run an LLM:
    • Example: ollama run llama2
    • Example: ollama run llama2:70b

Install the Ollama Python Library

pip install ollama

Usage

Here's how to use the Ollama library in your Python code:

import ollama

response = ollama.chat(model='llama3'
, messages=[
{
'role': 'user',
'content': 'What is wexa ai?',
},
])

print(response['message']['content'])

Streaming Responses

Response streaming can be enabled by setting stream=True, modifying function calls to return a Python generator where each part is an object in the stream.

import ollama

stream = ollama.chat(
model='llama3',
messages=[{'role': 'user', 'content': 'What is wexa ai?'}],
stream=True,
)

for chunk in stream:
print(chunk['message']['content'], end='', flush=True)

Embracing the Magic: The World is Your Canvas

As you delve deeper into the world of LLMs, you’ll discover endless possibilities. Write stories, compose poems, or even engage in philosophical debates with the model. The power of language is now at your fingertips, and with a few lines of code, you can create worlds that were once only possible in dreams.