fujia.site

愿君千万岁，无岁不逢春 —— 李远《剪彩》

docker

安装 NVIDIA Container Toolkit

异常信息：

               1
               could not select device driver "" with capabilities: [[gpu]].

               COPY

参考资料

Installing the NVIDIA Container Toolkit： https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
国内静态资料（使用该方式安装成功）：https://mirror.cs.uchicago.edu/nvidia-docker/libnvidia-container/stable/ubuntu16.04/amd64/
Setting Up NVIDIA CUDA Toolkit in a Docker Container on Debian/Ubuntu - https://linuxconfig.org/setting-up-nvidia-cuda-toolkit-in-a-docker-container-on-debian-ubuntu
nvidia-docker/issues: https://github.com/NVIDIA/nvidia-docker/issues/1034

Nvidia 操作

监听显卡

               1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
               watch nvidia-smi

Wed Dec  6 00:58:44 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:01:00.0 Off |                  N/A |
| 34%   21C    P8              13W / 170W |    403MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2000      G   /usr/lib/xorg/Xorg                          236MiB |
|    0   N/A  N/A      2628    C+G   ...libexec/gnome-remote-desktop-daemon      104MiB |
|    0   N/A  N/A      2668      G   /usr/bin/gnome-shell                         20MiB |
|    0   N/A  N/A     25523      G   rustdesk                                     24MiB |
+---------------------------------------------------------------------------------------+

               COPY

显卡使用优化

这里希望显卡都用于 AI 计算。

Xorg

gnome-remote-desktop-daemon

https://manpages.ubuntu.com/manpages/noble/en/man1/grdctl.1.html

gnome-remote-desktop 是使用 PipeWire 的 GNOME 远程桌面守护进程。

查看状态

               1
2
3
4
5
6
7
8
9
10
11
12
13
14
               grdctl status

RDP:
    Status: enabled
    TLS certificate: /home/cogito/.local/share/gnome-remote-desktop/rdp-tls.crt
    TLS key: /home/cogito/.local/share/gnome-remote-desktop/rdp-tls.key
    View-only: no
    Username: (hidden)
    Password: (hidden)
VNC:
    Status: disabled
    Auth method: prompt
    View-only: no
    Password: (empty)

               COPY

其它

重置root 密码

               1
               sudo passwd root

               COPY

AI FAQs(Frequently Asked Questions)

1. Can't use Nvidia GPU in docker

**Inputs: **

In ubuntu 24.04, I am using localAI(https://github.com/mudler/LocalAI , don't visit) + docker to deploy one AI service I run below command:

               1
2
               docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12


               COPY

however, I got errors as follows:

               1
               docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

               COPY

In host machine, the Nvidia GPU's info:

               1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
               +-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:01:00.0  On |                  N/A |
| 34%   31C    P8             16W /  170W |     335MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      3243      G   /usr/lib/xorg/Xorg                            173MiB |
|    0   N/A  N/A      3666      G   ...libexec/gnome-remote-desktop-daemon          2MiB |
|    0   N/A  N/A      3793      G   /usr/bin/gnome-shell                          136MiB |
+-----------------------------------------------------------------------------------------+

               COPY

**Responses: **

This error typically occurs when the NVIDIA Container Toolkit isn't properly installed or configured in your Ubuntu system. Let's fix this step by step:

First, let's install the NVIDIA Container Toolkit:

               1
2
3
4
5
6
7
8
9
10
11
12
               # Add NVIDIA package repositories
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/ubuntu24.04/libnvidia-container.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Update package listing
sudo apt-get update

# Install nvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit

               COPY

Configure the Docker daemon to use NVIDIA runtime:

               1
               sudo nvidia-ctk runtime configure --runtime=docker

               COPY

Restart the Docker daemon:

               1
               sudo systemctl restart docker

               COPY

Verify the installation:

               1
               sudo docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

               COPY

If the verification step shows your GPU information similar to what you see on your host machine, you can then try running your LocalAI container again:

               1
               docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12

               COPY

If you still encounter issues, you can try these troubleshooting steps:

Check if the NVIDIA runtime is properly configured:

               1
               sudo docker info | grep -i runtime

               COPY

Make sure your Docker daemon configuration includes the NVIDIA runtime by checking:

               1
               cat /etc/docker/daemon.json

               COPY

It should contain something like:

               1
2
3
4
5
6
7
8
               {
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

               COPY

Let me know if you need any clarification or if you're still experiencing issues after trying these steps.

参考资料

nvidia-smi 报错：无法与 nvidia driver 通信： https://askubuntu.com/questions/927199/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-ma
无法连接 NVIDIA 驱动：https://blog.csdn.net/hangzuxi8764/article/details/86572093
nvidia-smi 报错：无法与 nvidia driver 通信：https://zhuanlan.zhihu.com/p/89714824
Linux Find Out Graphics Card Information：https://www.cyberciti.biz/faq/linux-tell-which-graphics-vga-card-installed/
Ubuntu Linux Install Nvidia Driver (Latest Proprietary Driver)：https://www.cyberciti.biz/faq/ubuntu-linux-install-nvidia-driver-latest-proprietary-driver/
How to uninstall the NVIDIA drivers on Ubuntu 22.04 Jammy Jellyfish Linux：https://linuxconfig.org/how-to-uninstall-the-nvidia-drivers-on-ubuntu-22-04-jammy-jellyfish-linux
Nvidia driver is not working on Ubuntu 22.04：https://forums.developer.nvidia.com/t/nvidia-driver-is-not-working-on-ubuntu-22-04/250747
Disabling secure boot in UEFI：https://www.linuxfordevices.com/tutorials/linux/disable-secure-boot#:~:text=Step 1%3A Navigate to the,choose Yes to confirm it .
How to Check GPU (Intel/AMD/NVIDIA) Usage in Ubuntu 22.04 | 20.04: https://ubuntuhandbook.org/index.php/2022/12/gpu-usage-ubuntu-22-04/
GPU usage monitoring (CUDA):https://unix.stackexchange.com/questions/38560/gpu-usage-monitoring-cuda
How To Install gnome-remote-desktop on Ubuntu 22.04：https://installati.one/install-gnome-remote-desktop-ubuntu-22-04/
Installati.one：https://installati.one/ubuntu/22.04/

Ubuntu | 实践：NVIDIA 显卡

docker

安装 NVIDIA Container Toolkit

参考资料

Nvidia 操作

显卡使用优化

Xorg

gnome-remote-desktop-daemon

其它

AI FAQs(Frequently Asked Questions)

1. Can't use Nvidia GPU in docker

参考资料

评论区