Ganglia is a scalable distributed system monitor tool for high-performance computing systems such as clusters and grids. It allows the user to remotely view live or historical statistics (such as CPU load averages or network utilization) for all machines that are being monitored. It is based on a hierarchical design targeted at federations of clusters.
It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world
Gmond is a multi-threaded daemon which runs on each cluster node you want to monitor. Installation does not require having a common NFS filesystem or a database back-end, install special accounts or maintain configuration files.
Gmond has four main responsibilities:
- Monitor changes in host state
- Announce relevant changes
- Listen to the state of all other ganglia nodes via a unicast or multicast channel
- Answer requests for an XML description of the cluster state