Skip to content

Enabling GPU Monitoring

Activating GPU Monitoring

Via Configuration File

Enable the gpu parameter in the /opt/nezha/agent/config.yml file:

yaml
gpu: true

After saving the changes, restart the Agent service to apply the configuration:

bash
sudo systemctl restart nezha-agent.service

GPU Utilization Monitoring Support

GPU model and GPU utilization are separate monitoring metrics, and their retrieval methods differ:

  1. Windows and macOS

    • No additional dependencies are required for monitoring GPU utilization.
    • Supports multiple GPU brands (e.g., NVIDIA, AMD).
  2. Linux

    • Supports only NVIDIA and AMD GPUs.
    • Requires additional drivers or tools to retrieve GPU utilization.
      Below are specific steps to enable GPU utilization monitoring on Linux.

NVIDIA GPUs

Required Tool

NVIDIA GPUs use the nvidia-smi tool to retrieve utilization data. This tool is typically included with the official NVIDIA driver.

Important Notes

  • Third-party drivers (e.g., nouveau) are not supported for GPU utilization monitoring.
  • Ensure the official NVIDIA driver is installed and that GPU data is accessible via nvidia-smi.

AMD GPUs

Required Tool

AMD GPUs require the open-source amdgpu driver and the rocm-smi tool for utilization monitoring.

Installation Instructions

Use the following commands for popular Linux distributions:

bash
# Arch Linux
pacman -Sy rocm-smi-lib

# Debian / Ubuntu
apt install rocm-smi

# Fedora / RHEL 8+
dnf install rocm-smi

Manual Installation of rocm_smi_lib

If your system lacks a pre-packaged version of rocm-smi, you can manually compile and install the rocm_smi_lib.

Prerequisites

Ensure the following tools are installed:

  • git
  • cmake
  • gcc
Installation Steps
  1. Clone the Repository:

    bash
    git clone https://github.com/ROCm/rocm_smi_lib
  2. Compile and Install:

    bash
    cd rocm_smi_lib
    mkdir -p build
    cd build
    cmake ..
    make -j $(nproc)
    # Install the library and headers to the default location (/opt/rocm)
    sudo make install