GPT on Linux PC with 8GB RAM, No GPU, No Container, and No Python - case of alpaca.cpp

2023-03-29 - To Windows user

This post is for Linux. For Windows case, please refer the next post.
impsbl.hatenablog.jp

Abstract

The last post*1 indicated unpractical performance as prediction of a single token per a minute. Performance this time is enough practical. Using smaller language model as 4GB file size. It is stored on RAM without swap, and consumer PC with 8GB RAM can run function as ChatGPT at practical level.

Just using different program and model from the last time, work procedures are common with. This post introduces the same procedure with "rupeshs/alpaca.cpp"*2.

Specification of the test machine is also common as following.

OS	Clear Linux 38590*3
CPU	Intel Core i5-8250U
RAM	8GB
Storage	SSD: 256GB

2023-03-29 - To Windows user
Abstract
Build and installation of chat
Download Alpaca 7B model
Run "chat"
Google Colab

Build and installation of chat

In an environment that cmake and make are initially prepared as Linux, build and installation of "chat" is quite simple. As described at the GitHub page, required commands are for

download (clone) source cord
make a directory if required
cmake and make

To download "alpaca.cpp" under the home directory, and build only "chat" there, commands will be

cd ~
git clone https://github.com/rupeshs/alpaca.cpp
cd alpaca.cpp
mkdir build
cd build
cmake ..
make chat

Download Alpaca 7B model

The file size of this model is 4GB. The GitHub page introduces downloading it with

github.com

In any case, download it to the folder "~/alpaca.cpp/build/bin/". "chat" is also stored here.

Run "chat"

Now run "chat". Its options are defined in "utils.cpp". Major one is

-n	token so to speak, count of responded words
-s	seed of random number
-t	threads assigned for processing

In case to run with 8 cores, the command will be

cd ~/alpaca.cpp/build/bin
./chat -t 8

As htop shows below, almost full CPUs are busy, RAM still has spare.

"chat" responds quickly enough from trivia to programming even in PC with 8GB RAM, though low precision and accuracy due to small size model.
The image at the top of this post shows "chat" behaves as chatbot. Next images show answering about "alpaca", and outputting JavaScript code.

Google Colab

"chat" can be built and run on Google Coalb*4 with the commands below. But its performance is not better than the 8th gen i5.

!git clone https://github.com/rupeshs/alpaca.cpp
%cd /content/alpaca.cpp
%mkdir build
%cd build
!cmake ..
!make chat

!curl -o ggml-alpaca-7b-q4.bin -C - https://gateway.estuary.tech/gw/ipfs/QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC

!./chat