Monitoring Standby Servers with ICMP and Heartbeat Polling
A Geo SCADA Expert server monitors the status of its network partner Hot Standby servers, and other specified hosts, by using ICMP and Heartbeat polling as follows:
- ICMP polling (or pinging) detects whether a network-connected remote computer is reachable see Define the ICMP Poll Interval and Retry Count for a Server.
- Heartbeat polling determines the status of a Geo SCADA Expert server running on a remote computer. A heartbeat is a signal that networked servers (and other devices) send at regular intervals to indicate normal operation or to synchronize with other servers and devices. Hot Standby servers use heartbeat polling to establish which server is the Main server. Heartbeat polling also provides a mechanism by which a change in the role of a server can be undertaken and detected.
By using both types of polling, Geo SCADA Expert can differentiate between a non-running Geo SCADA Expert server on a remote computer and an unreachable computer (a Geo SCADA Expert server may or may not still be running on the unreachable computer).
Geo SCADA Expert also uses these polls to:
- Determine the status of partner servers
- Ensure there is only one Main server in a network and that when a Main server goes offline, another Standby server becomes a Main server.
You can configure ICMP and Heartbeat polling in the Standby tab of the Partners section of the Server Administration tool. Use the following fields:
- ICMP interval: see Define the ICMP Poll Interval and Retry Count for a Server
- ICMP retry count: see Define the ICMP Poll Interval and Retry Count for a Server
- Heartbeat interval: see Define the Monitor Timeout Settings for a Server
- Heartbeat timeout: see Define the Monitor Timeout Settings for a Server.
If you change the value in any of these fields, you will need to restart the Geo SCADA Expert server for the change to take effect.
Example:
The ICMP Interval is 5000 milliseconds and the ICMP retry count is 2.
The Heartbeat interval is 5000 milliseconds and the Heartbeat timeout is 30000 milliseconds.
Geo SCADA Expert sends ICMP polls to a remote computer every 5 seconds (the ICMP Interval). If a poll does not get a response after 5 seconds, it will send another poll. This is the first retry. If this poll doesn't get a response after a further 5 seconds then Geo SCADA Expert will send another poll. This is the second retry. In this example, the ICMP retry count is set to 2. Therefore, if this poll doesn't get a response, Geo SCADA Expert will determine that the computer is unreachable.
Likewise, the Heartbeat poll will indicate an issue if it fails to receive a response within 30 seconds (the Heartbeat Timeout). If there is no response to the Heartbeat poll, this implies that either the Geo SCADA Expert server is not running or that it is unreachable. The failure of the ICMP polls resolves this uncertainty - the remote computer is unreachable and the state of the remote Geo SCADA Expert server is unknown. If ICMP polls fail because of a network failure, the remote Geo SCADA Expert server may still be running in isolation.
You can use the Server Status tool to view details of the ICMP and Heartbeat monitoring (see Working with the Server Status Tool).
You use the ICMP Polls and Polls categories of the Standby group as follows:
- ICMP Polls shows the responses to ICMP polls and the status associated with them (see ICMP Polls in the Geo SCADA Expert Guide to the Server Status Tool).
- Polls shows the Heartbeat and ICMP polls and the status associated with them (see Polls in the Geo SCADA Expert Guide to the Server Status Tool).