Technically Impossible

Lets look at the weak link in your statement. Anything "Technically Impossible" basically means we haven't figured out how yet.

GPT on Linux PC with 8GB RAM, No GPU, No Container, and No Python


2023-03-23

Performance of prediction in this post is not practical, a single token per a minute. The next post shows the other case. Its performance is better due to small size language model. If you are interested, please refer to it.
impsbl.hatenablog.jp

Abstract

One of the popular topic in "AI" is GPT (Generative Pre-trained Transformer). Although it usually requires rich computational resources, running it on minimal environment as Raspberry Pi is technically possible now*1. And regardless of whether it is practical, it can run on consumer PC as well. "ggerganov/ggml"*2, especially "gpt-j" there made running GPT on PC with 16GB RAM.

Using 8GB swap, it can run on Linux PC with 8GB RAM. This post introduces its how-to and performance. Specification of the test machine in this post is as following.

OS Clear Linux 38590*3
CPU Intel Core i5-8250U
RAM 8GB
Storage SSD: 256GB

FYI, prediction performance in this spec is a single token per around a minute. And this is much slower than Raspberry Pi's case mentioned above.

Build and installation of ggml

In an environment that cmake and make are initially prepared as Linux, build and installation of ggml is quite simple. As described at the GitHub page, requireds commands are for

  1. download (clone) source cord
  2. make a directory if required
  3. cmake and make

To download "ggml" under the home directory, and build only gpt-j there, commands will be

cd ~
git clone https://github.com/ggerganov/ggml
cd ggml
mkdir build
cd build
cmake ..
make -j4 gpt-j

Then, "gpt-j" is build under the directory "~/ggml/build/bin/".

Download GPT-J 6B model

GPT model is required to run "gpt-j". Although the GitHub pages says to run "download-ggml-model.sh", taking the other way in this post. The size of this model is around 12GB. I think it would be better to directly download it with familiar way. Download "ggml-model-gpt-jt-6B.bin" from Hugging Face below, and save it under "~/ggml/build/bin/".

huggingface.co

Create swap

As described at the GitHub page, "gpt-j" requires 16GB RAM.

No video card required. You just need to have 16 GB of RAM.

ggml/examples/gpt-j at master · ggerganov/ggml · GitHub

This is because it loads GPT model on memory. If required amount of memory is not acquired, it causes segmentation fault. To avoid this, make swap with the commands below. In this case, the swap file "swap.img" is created under the "~/ggml/build/bin/"

sudo dd if=/dev/zero of=~/ggml/build/bin/swap.img bs=1M count=8096

sudo chown 0:0 ~/ggml/build/bin/swap.img
sudo chmod 600 ~/ggml/build/bin/swap.img

sudo mkswap ~/ggml/build/bin/swap.img
sudo swapon ~/ggml/build/bin/swap.img

Run "gpt-j"

Now run GPT. Options of "gpt-j" is defined in "examples/utils.cpp". Major one is

-m path of model
-n token
so to speak, count of responded words
-p prompt
-t threads assigned for processing

In this post, the model and "gpt-j" is saved on the common folder. Then, typical command will be

cd ~/ggml/build/bin
./gpt-j -m ggml-model-gpt-j-6B.bin -p "hello"

To measure and compare processing performance, run this with different options as

./gpt-j -n 5 -t 4 -m ggml-model-gpt-j-6B.bin -p "hello"
./gpt-j -n 5 -t 8 -m ggml-model-gpt-j-6B.bin -p "hello"
./gpt-j -n 10 -t 4 -m ggml-model-gpt-j-6B.bin -p "hello"

Results shows that more threads reduces predict time per token, but more tokens has no impact to it. Test machine predict a single token per around a minute.

And in case of the next command with default options, it spend around 4 hours by end.

./gpt-j -m ggml-model-gpt-j-6B.bin -p "hello"

🔎result of the commands above

Google Colab

Although "gpt-j" can be built and run on Google Coalb*4 with the commands below, its process is forcedly terminated within few minutes.

!git clone https://github.com/ggerganov/ggml
%cd /content/ggml
%mkdir build
%cd build
!cmake ..
!make -j4 gpt-j

!../examples/gpt-j/download-ggml-model.sh 6B
!./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "who are you?"
!./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "Do you know what day today is?"

*1:

twitter.com

*2:github.com

*3:clearlinux.org

*4:colab.research.google.com