How to run an LLM locally on your Mac

Explore this post with:

Ever wondered if you could run something like ChatGPT locally on your Mac without needing the internet? With just a bit of setup, you actually can. That too for free. Whether you want to keep your chats private or just want offline access to AI, here’s how you can run powerful large language models locally on your Mac.

What do you need to run an LLM locally on a Mac?

Before we dive in and check out the setup, here’s what you’ll need:

A Mac with Apple Silicon (M1, M2, or M3 recommended)
At least 8GB RAM (16GB or more is better)
Around 4–10GB of free disk space (depending on model)
A working internet connection (only for setup)
Basic familiarity with using the macOS Terminal app (you don’t need to be a pro)

How to run an LLM locally on a Mac

We’ll be using a free tool called Ollama, which lets you download and run LLMs locally with just a few commands. Here’s how to get started:

Step 1: Install Homebrew (skip if already installed)

Homebrew is a package manager for macOS that helps you install apps from the Terminal app. If you already have Homebrew installed on your Mac, you can skip this step. But if you don’t, here’s how you can install it:

FaceTime Like a Pro:

Get our exclusive Ultimate FaceTime Guide 📚 — absolutely FREE when you sign up for our newsletter below.

Open the Terminal app from Launchpad or Spotlight.
Copy and paste the following command and hit Return:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Wait for the installation to finish. It may take a few minutes. Once done, check if it’s working by typing:

brew doctor

If you see “Your system is ready to brew”, you’re good to go.

If you’re having any issues or want a more detailed step-by-step process, check out our guide on how to install Homebrew on a Mac.

Step 2: Install and run Ollama

Now that Homebrew is installed and ready to use on your Mac, let’s install Ollama:

In a Terminal window, run:

brew install ollama

After installation, start the Ollama service by running:

ollama serve

Leave this window open or minimize it. This command keeps Ollama running in the background.

Alternatively, download the Ollama app and install it like any regular Mac app. Once done, open the app and keep it running in the background.

Step 3: Download and run a model

Ollama gives you access to popular LLMs like DeepSeek, Meta’s Llama, Mistral, Gemma, and more. Here’s how you can choose and run one:

Visit the Ollama Search page to find all the AI models you can run locally via Ollama on your Mac.
Choose the LLM model you want to install. (I’m using DeepSeek-R1 as it’s a small model that only takes 1.1 GB of space and is efficient.)
On that page, you’ll find a command like this: ollama run [model-name]. It’s different for different models.

For DeepSeek R1 1.5B model: ollama run deepseek-r1:1.5b
For Llama 3 model: ollama run llama3
For Mistral: ollama run mistral

Copy and paste that command in the Terminal window. The first time you run this, Ollama will download the model. This may take a few minutes depending on your internet speed.
Once the model is downloaded, you’ll see a prompt to type your input. You’re now chatting with the model!

If you choose a large model, expect some lag—after all, the entire model is running locally on your MacBook. Smaller models respond faster, but they can struggle with accuracy, especially for math and logic-related tasks. Also, remember that since these models have no internet access, they can’t fetch real-time information.

That said, for things like checking grammar, writing emails, or brainstorming ideas, they work brilliantly. I’ve used DeepSeek-R1 extensively on my MacBook with a web UI setup, which also lets me upload images and paste code snippets. While its answers—and especially its coding skills—aren’t as sharp as top-tier models like ChatGPT or DeepSeek 671B, it still gets most everyday tasks done without needing the internet.

Step 4: Chat with the model in Terminal

Once the model is running, you can simply type your message and hit Return. The model will respond right below.

To exit the session, press Control+D on your keyboard. When you want to start chatting again, just use the same ollama run [model-name] command. Since the model is already downloaded, it will launch instantly.

Step 5: View and manage installed models

To check which models are currently downloaded, run:

ollama list

To delete a model you don’t need anymore, use:

ollama rm [model-name]

Bonus: Use Ollama with a UI on the web

While Ollama runs in the Terminal, it also starts a local API service at http://localhost:11434, allowing you to connect it to a web interface for visual interaction with the models—similar to using a chatbot. One popular option for this is Open WebUI, which provides a user-friendly interface on top of Ollama’s core functionality. Let’s see how to set it up.

Step 1: Install Docker

Docker is a tool that lets you package a program and all its essential elements into a portable container so you can run it easily on any device. We’ll use it to open a web-based chat interface for your AI model.

If your Mac doesn’t have it already, follow these steps to install Docker:

Download Docker Desktop for your Mac, run it, and follow the on-screen instructions to complete installation.
Launch the Docker app and log in.
Open Terminal and type the command below to confirm installation:

docker --version

If the command returns a version number, it means Docker is installed on your Mac.

Step 2: Pull the Open WebUI image

Open WebUI is a simple tool that gives you a chat window in your browser. Pulling the image just means downloading the files needed to run it.

To do this, go to the Terminal app and type:

docker pull ghcr.io/open-webui/open-webui:main

This will download the necessary files for the interface.

Step 3: Run the Docker container and open WebUI

Now, it’s time to run the Open WebUI using Docker. You’ll see a clean interface where you can chat with your AI—no Terminal needed. Here’s what you need to do:

Start the Docker container with persistent data storage and mapped ports:

docker run -d -p 9783:8080 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

Wait a few seconds for the container to start.
Open your browser and go to:

http://localhost:9783/

Create an account when prompted, and you’ll be redirected to the main interface.

From here, you can chat with any installed model in a clean, user-friendly browser interface. This step is optional, but it gives you a smoother chat experience without using the Terminal.

Your Mac, your AI: no servers, no strings

That’s it! In just a few steps, you’ve set up your Mac to run a powerful AI model completely offline. No accounts, no cloud, and no internet needed after setup. Whether you want private conversations, local text generation, or just want to experiment with LLMs, Ollama makes it easy and accessible—even if you’re not a developer. Give it a try!

Check out these helpful guides too:

Explore this post with:

ChatGPT Perplexity Grok Google AI