APP下载

Ubuntu | 实践:NVIDIA 显卡

原创

Linux/Ubuntu

愿君千万岁,无岁不逢春 —— 李远《剪彩》

docker

安装 NVIDIA Container Toolkit

  1. 异常信息:
               
  • 1
could not select device driver "" with capabilities: [[gpu]]. COPY

参考资料

  1. Installing the NVIDIA Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html 
  2. 国内静态资料(使用该方式安装成功):https://mirror.cs.uchicago.edu/nvidia-docker/libnvidia-container/stable/ubuntu16.04/amd64/ 
  3. Setting Up NVIDIA CUDA Toolkit in a Docker Container on Debian/Ubuntu - https://linuxconfig.org/setting-up-nvidia-cuda-toolkit-in-a-docker-container-on-debian-ubuntu 
  4. nvidia-docker/issues: https://github.com/NVIDIA/nvidia-docker/issues/1034 

Nvidia 操作

  1. 监听显卡
               
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
watch nvidia-smi Wed Dec 6 00:58:44 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A | | 34% 21C P8 13W / 170W | 403MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 2000 G /usr/lib/xorg/Xorg 236MiB | | 0 N/A N/A 2628 C+G ...libexec/gnome-remote-desktop-daemon 104MiB | | 0 N/A N/A 2668 G /usr/bin/gnome-shell 20MiB | | 0 N/A N/A 25523 G rustdesk 24MiB | +---------------------------------------------------------------------------------------+ COPY

显卡使用优化

这里希望显卡都用于 AI 计算。

Xorg

gnome-remote-desktop-daemon

https://manpages.ubuntu.com/manpages/noble/en/man1/grdctl.1.html 

gnome-remote-desktop 是使用 PipeWire 的 GNOME 远程桌面守护进程。

  1. 查看状态
               
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
grdctl status RDP: Status: enabled TLS certificate: /home/cogito/.local/share/gnome-remote-desktop/rdp-tls.crt TLS key: /home/cogito/.local/share/gnome-remote-desktop/rdp-tls.key View-only: no Username: (hidden) Password: (hidden) VNC: Status: disabled Auth method: prompt View-only: no Password: (empty) COPY

其它

  1. 重置root 密码
               
  • 1
sudo passwd root COPY

AI FAQs(Frequently Asked Questions)

1. Can't use Nvidia GPU in docker


**Inputs: **

In ubuntu 24.04, I am using localAI(https://github.com/mudler/LocalAI , don't visit) + docker to deploy one AI service I run below command:

               
  • 1
  • 2
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12 COPY

however, I got errors as follows:

               
  • 1
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. COPY

In host machine, the Nvidia GPU's info:

               
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 On | N/A | | 34% 31C P8 16W / 170W | 335MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3243 G /usr/lib/xorg/Xorg 173MiB | | 0 N/A N/A 3666 G ...libexec/gnome-remote-desktop-daemon 2MiB | | 0 N/A N/A 3793 G /usr/bin/gnome-shell 136MiB | +-----------------------------------------------------------------------------------------+ COPY

**Responses: **

This error typically occurs when the NVIDIA Container Toolkit isn't properly installed or configured in your Ubuntu system. Let's fix this step by step:

  1. First, let's install the NVIDIA Container Toolkit:
               
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
# Add NVIDIA package repositories curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -fsSL https://nvidia.github.io/libnvidia-container/ubuntu24.04/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list # Update package listing sudo apt-get update # Install nvidia-container-toolkit sudo apt-get install -y nvidia-container-toolkit COPY
  1. Configure the Docker daemon to use NVIDIA runtime:
               
  • 1
sudo nvidia-ctk runtime configure --runtime=docker COPY
  1. Restart the Docker daemon:
               
  • 1
sudo systemctl restart docker COPY
  1. Verify the installation:
               
  • 1
sudo docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi COPY

If the verification step shows your GPU information similar to what you see on your host machine, you can then try running your LocalAI container again:

               
  • 1
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12 COPY

If you still encounter issues, you can try these troubleshooting steps:

  1. Check if the NVIDIA runtime is properly configured:

                   
    • 1
    sudo docker info | grep -i runtime COPY
  2. Make sure your Docker daemon configuration includes the NVIDIA runtime by checking:

                   
    • 1
    cat /etc/docker/daemon.json COPY

It should contain something like:

               
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
{ "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } } COPY

Let me know if you need any clarification or if you're still experiencing issues after trying these steps.


参考资料

  1. nvidia-smi 报错:无法与 nvidia driver 通信: https://askubuntu.com/questions/927199/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-ma 
  2. 无法连接 NVIDIA 驱动:https://blog.csdn.net/hangzuxi8764/article/details/86572093 
  3. nvidia-smi 报错:无法与 nvidia driver 通信:https://zhuanlan.zhihu.com/p/89714824 
  4. Linux Find Out Graphics Card Information:https://www.cyberciti.biz/faq/linux-tell-which-graphics-vga-card-installed/ 
  5. Ubuntu Linux Install Nvidia Driver (Latest Proprietary Driver):https://www.cyberciti.biz/faq/ubuntu-linux-install-nvidia-driver-latest-proprietary-driver/ 
  6. How to uninstall the NVIDIA drivers on Ubuntu 22.04 Jammy Jellyfish Linux:https://linuxconfig.org/how-to-uninstall-the-nvidia-drivers-on-ubuntu-22-04-jammy-jellyfish-linux 
  7. Nvidia driver is not working on Ubuntu 22.04:https://forums.developer.nvidia.com/t/nvidia-driver-is-not-working-on-ubuntu-22-04/250747 
  8. Disabling secure boot in UEFI:https://www.linuxfordevices.com/tutorials/linux/disable-secure-boot#:~:text=Step 1%3A Navigate to the,choose Yes to confirm it .
  9. How to Check GPU (Intel/AMD/NVIDIA) Usage in Ubuntu 22.04 | 20.04: https://ubuntuhandbook.org/index.php/2022/12/gpu-usage-ubuntu-22-04/ 
  10. GPU usage monitoring (CUDA):https://unix.stackexchange.com/questions/38560/gpu-usage-monitoring-cuda 
  11. How To Install gnome-remote-desktop on Ubuntu 22.04:https://installati.one/install-gnome-remote-desktop-ubuntu-22-04/ 
  12. Installati.one:https://installati.one/ubuntu/22.04/ 

评论区

写评论

登录

所以,就随便说点什么吧...

这里什么都没有,快来评论吧...