Chimera Performance Dashboard

Project Overview This project focuses on collecting and displaying Chimera performance data in a clear and user-friendly dashboard.
Metrics This section will show important system and GPU metrics such as CPU usage, memory usage, GPU utilization, temperature, and power consumption.
Documentation

Our task is to collect performance metrics from servers on the UMass Boston CS cluster and visually displays them on a dashboard website. The project uses two main tools:

  • Prometheus - software that reads data metrics from a server and stores them in a database.
  • Grafana - software that displays the data stored by Prometheus as visual charts and graphs.

In order for Prometheus to collect data, a node exporter must be deployed on each server. A node exporter is the program that takes live readings from the system - CPU usage, memory, system load, and more. Prometheus reads from the node exporter in timed intervals and stores the results in a database, building a history of the system's performance over time.

March 31, 2026
diagram_4.31.png

So far, we have accomplished 4 main tasks:

  • Set up virtual machine on CS server - The VM runs Ubuntu on the CS server. All group members have access by SSHing into the machine.
  • Run node exporter on Babbage and Chimera - Babbage is the CS department's GPU server. Chimera is the cluster we are monitoring. We ran node_exporter on both, successfully reading their metrics.
  • Prometheus data collection - Prometheus is successfully scraping data from the node exporters and storing it.
  • Grafana display - Grafana is connected to Prometheus and correctly displaying the collected metrics.
Team Members

Blake, Phoenix, Stephanie, Desmond, Dhruv, Aihemaiti