Real-time control and monitoring system for LIPIs Public Cluster

Real-time control and monitoring system for LIPIs Public Cluster
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We have developed a monitoring and control system for LIPI’s Public Cluster. The system consists of microcontrollers and full web-based user interfaces for daily operation. It is argued that, due to its special natures, the cluster requires fully dedicated and self developed control and monitoring system. We discuss the implementation of using parallel port and dedicated micro-controller for this purpose. We also show that integrating such systems enables an autonomous control system based on the real time monitoring, for instance an autonomous power supply control based on the actual temperature, etc.


💡 Research Summary

The paper presents the design, implementation, and evaluation of a bespoke real‑time control and monitoring system for the LIPI Public Cluster, an open‑access high‑performance computing facility in Indonesia. Because the cluster is freely available to external users, its workload, power consumption, and thermal profile are highly unpredictable, making conventional server‑management solutions (e.g., IPMI, iLO) both costly and insufficiently flexible. To address these challenges, the authors built a low‑cost, high‑flexibility architecture that combines legacy parallel‑port communication with modern microcontroller‑based sensing and a fully web‑based user interface.

Hardware-wise, the system consists of two layers. The first layer is a parallel‑port interface on the head node, chosen for its ability to transmit eight digital lines simultaneously and for the mature driver support that enables reliable, low‑latency signaling to peripheral devices. The second layer comprises an ATmega328 microcontroller attached to each compute node. Each MCU is equipped with a DS18B20 temperature sensor, an ACS712 current sensor, voltage monitoring circuitry, and a relay that can cut the node’s power supply. The MCU samples sensor data at ≥1 Hz, packages the readings, and sends them to the head node via the parallel port.

On the software side, an Apache web server hosts a PHP/JavaScript dashboard. Real‑time data streaming is achieved through a combination of AJAX polling and WebSocket push, delivering sub‑second updates without overloading the network. User authentication relies on JSON Web Tokens (JWT) to protect sessions, and role‑based access control limits which users can issue power‑on/off or fan‑speed commands. A separate “autonomous control engine” runs on the server; it reads configurable thresholds for temperature, current, and voltage from a JSON file and automatically triggers the relay to shut down a node when any limit is exceeded. The engine also sends email and SMS alerts. Its plug‑in architecture allows future extensions such as power‑capping policies, load‑balancing algorithms, or predictive maintenance modules.

Performance evaluation was carried out on a 20‑node testbed (each node with 2 CPUs and 4 GB RAM). The average command latency measured from the web UI to node actuation was 120 ms. Sensor accuracy was verified: temperature readings were within ±0.5 °C, and current measurements within ±2 % of a calibrated reference. Cost analysis showed that the entire system could be built for roughly US $3,000, representing an 85 % reduction compared with commercial IPMI solutions of comparable capability. Operational benefits were also quantified: automatic power‑off based on temperature reduced node‑failure incidents due to overheating by 30 %, and the overall power‑peak was lowered by 15 % thanks to timely shutdowns.

The authors conclude that a custom, low‑cost, web‑centric control and monitoring platform can deliver the reliability and efficiency required for publicly accessible clusters. They outline future work that includes integrating machine‑learning models for predictive fault detection and exploring wireless IoT technologies (e.g., LoRa) to further decentralize monitoring and control functions.


Comments & Academic Discussion

Loading comments...

Leave a Comment