愿君千万岁,无岁不逢春 —— 李远《剪彩》
docker
安装 NVIDIA Container Toolkit
- 异常信息:
- 1
could not select device driver "" with capabilities: [[gpu]].
COPY
参考资料
- Installing the NVIDIA Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
- 国内静态资料(使用该方式安装成功):https://mirror.cs.uchicago.edu/nvidia-docker/libnvidia-container/stable/ubuntu16.04/amd64/
- Setting Up NVIDIA CUDA Toolkit in a Docker Container on Debian/Ubuntu - https://linuxconfig.org/setting-up-nvidia-cuda-toolkit-in-a-docker-container-on-debian-ubuntu
- nvidia-docker/issues: https://github.com/NVIDIA/nvidia-docker/issues/1034
Nvidia 操作
- 监听显卡
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
watch nvidia-smi
Wed Dec 6 00:58:44 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 Off | N/A |
| 34% 21C P8 13W / 170W | 403MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2000 G /usr/lib/xorg/Xorg 236MiB |
| 0 N/A N/A 2628 C+G ...libexec/gnome-remote-desktop-daemon 104MiB |
| 0 N/A N/A 2668 G /usr/bin/gnome-shell 20MiB |
| 0 N/A N/A 25523 G rustdesk 24MiB |
+---------------------------------------------------------------------------------------+
COPY
显卡使用优化
这里希望显卡都用于 AI 计算。
Xorg
gnome-remote-desktop-daemon
https://manpages.ubuntu.com/manpages/noble/en/man1/grdctl.1.html
gnome-remote-desktop 是使用 PipeWire 的 GNOME 远程桌面守护进程。
- 查看状态
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
grdctl status
RDP:
Status: enabled
TLS certificate: /home/cogito/.local/share/gnome-remote-desktop/rdp-tls.crt
TLS key: /home/cogito/.local/share/gnome-remote-desktop/rdp-tls.key
View-only: no
Username: (hidden)
Password: (hidden)
VNC:
Status: disabled
Auth method: prompt
View-only: no
Password: (empty)
COPY
其它
- 重置root 密码
- 1
sudo passwd root
COPY
AI FAQs(Frequently Asked Questions)
1. Can't use Nvidia GPU in docker
**Inputs: **
In ubuntu 24.04, I am using localAI(https://github.com/mudler/LocalAI , don't visit) + docker to deploy one AI service I run below command:
- 1
- 2
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
COPY
however, I got errors as follows:
- 1
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
COPY
In host machine, the Nvidia GPU's info:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 On | N/A |
| 34% 31C P8 16W / 170W | 335MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3243 G /usr/lib/xorg/Xorg 173MiB |
| 0 N/A N/A 3666 G ...libexec/gnome-remote-desktop-daemon 2MiB |
| 0 N/A N/A 3793 G /usr/bin/gnome-shell 136MiB |
+-----------------------------------------------------------------------------------------+
COPY
**Responses: **
This error typically occurs when the NVIDIA Container Toolkit isn't properly installed or configured in your Ubuntu system. Let's fix this step by step:
- First, let's install the NVIDIA Container Toolkit:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
# Add NVIDIA package repositories
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/ubuntu24.04/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Update package listing
sudo apt-get update
# Install nvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit
COPY
- Configure the Docker daemon to use NVIDIA runtime:
- 1
sudo nvidia-ctk runtime configure --runtime=docker
COPY
- Restart the Docker daemon:
- 1
sudo systemctl restart docker
COPY
- Verify the installation:
- 1
sudo docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
COPY
If the verification step shows your GPU information similar to what you see on your host machine, you can then try running your LocalAI container again:
- 1
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
COPY
If you still encounter issues, you can try these troubleshooting steps:
Check if the NVIDIA runtime is properly configured:
- 1
sudo docker info | grep -i runtime
COPYMake sure your Docker daemon configuration includes the NVIDIA runtime by checking:
- 1
cat /etc/docker/daemon.json
COPY
It should contain something like:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
COPY
Let me know if you need any clarification or if you're still experiencing issues after trying these steps.
参考资料
- nvidia-smi 报错:无法与 nvidia driver 通信: https://askubuntu.com/questions/927199/nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-ma
- 无法连接 NVIDIA 驱动:https://blog.csdn.net/hangzuxi8764/article/details/86572093
- nvidia-smi 报错:无法与 nvidia driver 通信:https://zhuanlan.zhihu.com/p/89714824
- Linux Find Out Graphics Card Information:https://www.cyberciti.biz/faq/linux-tell-which-graphics-vga-card-installed/
- Ubuntu Linux Install Nvidia Driver (Latest Proprietary Driver):https://www.cyberciti.biz/faq/ubuntu-linux-install-nvidia-driver-latest-proprietary-driver/
- How to uninstall the NVIDIA drivers on Ubuntu 22.04 Jammy Jellyfish Linux:https://linuxconfig.org/how-to-uninstall-the-nvidia-drivers-on-ubuntu-22-04-jammy-jellyfish-linux
- Nvidia driver is not working on Ubuntu 22.04:https://forums.developer.nvidia.com/t/nvidia-driver-is-not-working-on-ubuntu-22-04/250747
- Disabling secure boot in UEFI:https://www.linuxfordevices.com/tutorials/linux/disable-secure-boot#:~:text=Step 1%3A Navigate to the,choose Yes to confirm it .
- How to Check GPU (Intel/AMD/NVIDIA) Usage in Ubuntu 22.04 | 20.04: https://ubuntuhandbook.org/index.php/2022/12/gpu-usage-ubuntu-22-04/
- GPU usage monitoring (CUDA):https://unix.stackexchange.com/questions/38560/gpu-usage-monitoring-cuda
- How To Install gnome-remote-desktop on Ubuntu 22.04:https://installati.one/install-gnome-remote-desktop-ubuntu-22-04/
- Installati.one:https://installati.one/ubuntu/22.04/
评论区
写评论
登录
所以,就随便说点什么吧...
这里什么都没有,快来评论吧...