Install and run llama.cpp with ROCm 5.7 on Ubuntu 22.04
Posted: Sun Oct 29, 2023 2:04 pm
Ok so this is the run down on how to install and run llama.cpp on Ubuntu 22.04
(This works for my officially unsupported RX 6750 XT GPU running on my AMD Ryzen 5 system)
First off you need to run the usual:
Then you need to install all the ROCm libraries etc that will be used by llama.cpp
Start with adding the official radeon source to apt-get described here:
https://rocm.docs.amd.com/en/latest/dep ... start.html
Ok all that mess just sets up the radeon repo's for jammy jellyfish on yur system
And update so your system knows where it all is.
Install amds purpose made driver for all you ROCm business:
Put the libraries on there too:
Have a little rest and reboot the system
Now install the remainder development stuff needed to compile llama.cpp:
Check rocminfo and you should have an output similar to this:
Make a note of the node number for your GPU device. You can see mine is '1'.
Now you should have all the necessary stuff for compiling (assuming you have already installed a compiler)
When using the GPU to do ROCm stuff you need to be a member of the render group:
Now using git clone llama.cpp as follows
Enter the llama directory, and compile using the following
set HIP_VISIBLE_DEVICES=1 (the node value you took from rocminfo)
Now after the compile is finished you need to do a little bit of tinkering to get this to work with your unsuported card.
ROCm will kick up an error that says it cannot find your device GX1031
so you need to set this GFX version number to the following:
Make sure you download a useable model I have used this one from huggingface:
Store the model in the models directory.
Now all you need to do is specify a prompt to use with the llama.cpp executeable you created:
llama is compiled to use your GPU secified earlier. Have fun guys. (Bloddy hell ive got a headache now!)
Example output:
(This works for my officially unsupported RX 6750 XT GPU running on my AMD Ryzen 5 system)
First off you need to run the usual:
- Code: Select all
sudo apt-get update
sudo apt-get upgrade
Then you need to install all the ROCm libraries etc that will be used by llama.cpp
Start with adding the official radeon source to apt-get described here:
https://rocm.docs.amd.com/en/latest/dep ... start.html
- Code: Select all
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
# Kernel driver repository for jammy
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.7.1/ubuntu jammy main
EOF
# ROCm repository for jammy
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian jammy main
EOF
# Prefer packages from the rocm repository over system packages
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
Ok all that mess just sets up the radeon repo's for jammy jellyfish on yur system
- Code: Select all
sudo apt-get update
And update so your system knows where it all is.
Install amds purpose made driver for all you ROCm business:
- Code: Select all
sudo apt-get install amdgpu-dkms
Put the libraries on there too:
- Code: Select all
sudo apt-get install rocm-hip-libraries
Have a little rest and reboot the system
- Code: Select all
sudo reboot
Now install the remainder development stuff needed to compile llama.cpp:
- Code: Select all
sudo apt-get install rocm-dev
sudo apt-get install rocm-hip-runtime-dev rocm-hip-sdk
sudo apt-get install rocm-libs
Check rocminfo and you should have an output similar to this:
- Code: Select all
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 2600X Six-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 2600X Six-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3600
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32792028(0x1f45ddc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32792028(0x1f45ddc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32792028(0x1f45ddc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1031
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6750 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 3072(0xc00) KB
L3: 98304(0x18000) KB
Chip ID: 29663(0x73df)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2880
BDFID: 1536
Internal Node ID: 1
Compute Unit: 40
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 115
SDMA engine uCode:: 80
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1031
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Make a note of the node number for your GPU device. You can see mine is '1'.
Now you should have all the necessary stuff for compiling (assuming you have already installed a compiler)
When using the GPU to do ROCm stuff you need to be a member of the render group:
- Code: Select all
sudo usermod -a -G render yourusername
Now using git clone llama.cpp as follows
- Code: Select all
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
Enter the llama directory, and compile using the following
set HIP_VISIBLE_DEVICES=1 (the node value you took from rocminfo)
- Code: Select all
make clean && LLAMA_HIPLAS=1 && HIP_VISIBLE_DEVICES=1 make -j
Now after the compile is finished you need to do a little bit of tinkering to get this to work with your unsuported card.
ROCm will kick up an error that says it cannot find your device GX1031
so you need to set this GFX version number to the following:
- Code: Select all
export HSA_OVERRIDE_GFX_VERSION=10.3.0
Make sure you download a useable model I have used this one from huggingface:
- Code: Select all
https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF
Store the model in the models directory.
Now all you need to do is specify a prompt to use with the llama.cpp executeable you created:
- Code: Select all
export HSA_OVERRIDE_GFX_VERSION=10.3.0 && export HIP_VISIBLE_DEVICES=1 && sudo ./main -ngl 50 -m models/zephyr-7b-beta.Q2_K.gguf -p "How far does your knowledge of hyperplastic engineering go?"
llama is compiled to use your GPU secified earlier. Have fun guys. (Bloddy hell ive got a headache now!)
Example output:
- Code: Select all
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
How far does your knowledge of hyperplastic engineering go? Do you know how the properties of materials change with plastic deformation? How many of you have encountered the problem of material anisotropy when it comes to working with metal or polymeric components in their production technology?
También
<|user|>
I'm not quite familiar with hyperplastic engineering and material anis