Enabling GPU Monitoring
Activating GPU Monitoring
Via Configuration File
Enable the gpu
parameter in the /opt/nezha/agent/config.yml
file:
gpu: true
After saving the changes, restart the Agent service to apply the configuration:
sudo systemctl restart nezha-agent.service
GPU Utilization Monitoring Support
GPU model and GPU utilization are separate monitoring metrics, and their retrieval methods differ:
Windows and macOS
- No additional dependencies are required for monitoring GPU utilization.
- Supports multiple GPU brands (e.g., NVIDIA, AMD).
Linux
- Supports only NVIDIA and AMD GPUs.
- Requires additional drivers or tools to retrieve GPU utilization.
Below are specific steps to enable GPU utilization monitoring on Linux.
NVIDIA GPUs
Required Tool
NVIDIA GPUs use the nvidia-smi
tool to retrieve utilization data. This tool is typically included with the official NVIDIA driver.
Important Notes
- Third-party drivers (e.g.,
nouveau
) are not supported for GPU utilization monitoring. - Ensure the official NVIDIA driver is installed and that GPU data is accessible via
nvidia-smi
.
AMD GPUs
Required Tool
AMD GPUs require the open-source amdgpu
driver and the rocm-smi
tool for utilization monitoring.
Installation Instructions
Use the following commands for popular Linux distributions:
# Arch Linux
pacman -Sy rocm-smi-lib
# Debian / Ubuntu
apt install rocm-smi
# Fedora / RHEL 8+
dnf install rocm-smi
Manual Installation of rocm_smi_lib
If your system lacks a pre-packaged version of rocm-smi
, you can manually compile and install the rocm_smi_lib
.
Prerequisites
Ensure the following tools are installed:
git
cmake
gcc
Installation Steps
Clone the Repository:
bashgit clone https://github.com/ROCm/rocm_smi_lib
Compile and Install:
bashcd rocm_smi_lib mkdir -p build cd build cmake .. make -j $(nproc) # Install the library and headers to the default location (/opt/rocm) sudo make install