Failed to initialize NVML: Driver/library version mismatch
NVML library version: 570.172
今天执行nvidia-smi -l 查看GPU使用情况的时候,发现不能显示了,提示上面的信息。查询一番后,了解到nvidia的驱动版和内核版本不匹配造成的。
1、查看最近的nvidia相关的更新:果然发现上周(9月16号)有驱动的更新记录
cat /var/log/dpkg.log | grep nvidia
2025-09-16 06:56:23 upgrade nvidia-driver-570-server:amd64 570.133.20-0ubuntu0.24.04.1 570.172.08-0ubuntu0.24.04.1
2、查看当前nvidia驱动版本:当前的驱动版本为570.172.08
dpkg -l | grep nvidia
ii libnvidia-cfg1-570-server:amd64 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-570-server 570.172.08-0ubuntu0.24.04.1 all Shared files used by the NVIDIA libraries
ii libnvidia-compute-570-server:amd64 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA libcompute package
ii libnvidia-decode-570-server:amd64 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA Video Decoding runtime libraries
ii libnvidia-egl-wayland1:amd64 1:1.1.17-0ubuntu0~gpu24.04.1 amd64 Wayland EGL External Platform library -- shared library
ii libnvidia-encode-570-server:amd64 570.172.08-0ubuntu0.24.04.1 amd64 NVENC Video Encoding runtime library
ii libnvidia-extra-570-server:amd64 570.172.08-0ubuntu0.24.04.1 amd64 Extra libraries for the NVIDIA Server Driver
ii libnvidia-fbc1-570-server:amd64 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-570-server:amd64 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii nvidia-compute-utils-570-server 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA compute utilities
ii nvidia-dkms-570-server 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA DKMS package
ii nvidia-driver-570-server 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA Server Driver metapackage
ii nvidia-firmware-570-server-570.172.08 570.172.08-0ubuntu0.24.04.1 amd64 Firmware files used by the kernel module
ii nvidia-kernel-common-570-server 570.172.08-0ubuntu0.24.04.1 amd64 Shared files used with the kernel module
ii nvidia-kernel-source-570-server 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA kernel source package
ii nvidia-utils-570-server 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA Server Driver support binaries
ii xserver-xorg-video-nvidia-570-server 570.172.08-0ubuntu0.24.04.1 amd64 NVIDIA binary Xorg driver
3、查看当前内核版本:当前内核版本为570.133.20,570.133.20<570.172.08,所以是内核版本滞后于驱动版本
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 570.133.20 Sun Apr 13 04:50:56 UTC 2025
GCC version: gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04)
4、更新内核版本:重启系统往往也能更新内核,这里不选择重启,采取下面移除相应模块然后重新加载的方式,操作之后就能看到相关GPU信息了
sudo rmmod nvidia_uvm #nvidia_uvm
sudo rmmod nvidia_drm #移除nvidia_drm 模块
sudo rmmod nvidia_modeset #移除nvidia_modeset 模块
sudo rmmod nvidia #移除nvidia 模块
sudo nvidia-smi #nvidia-smi发现没有kernel mod的时候,会自动装载
Reference:
《已解决【nvidia-smi】Failed to initialize NVML: Driver/library version mismatch解决方法》
《驱动版本与库文件不匹配(Failed to initialize NVML: Driver/library version mismatch)导致nvidia驱动无法运行的解决思路(不重启)》