Contents

Why did a switchover occur?

The event log is the fastest way to determine why a switchover occurred. Be sure to check the event log on both computers. If the backup server can still communicate with the active server, it updates the event log on the active server. Otherwise, it updates the event log only on the backup server.

After each switchover, you should examine the Switchover subsystem log from the Backup server for an entry that contains "Switchover Service: Switch Initiated." Immediately before this entry should be a message that states why the switchover occurred. It can be one of the following:

Loss of Connection to Notifier - The Switchover subsystem on the server in the backup role lost connection with the Notifier subsystem on the server in the primary role.
TS Ping Failure - The Switchover system lost communications with TSServer on the active computer.
Manual Switchover - Something other than standard diagnostics started the switchover. For example:
- The hardware tests on TSServer
- A user who used the Switchover Control Panel

Loss of Connection to Notifier

The Switchover subsystem on the server in the backup role lost connection with the Notifier subsystem on the server in the primary role.

To address this:

Check the Notifier log at Notes logging level on the active server to determine what it was doing at the time. A message like remote subsystem disconnects generally indicates network. A message like took too long generally indicates server performance.

This example shows the lost Notifier connection from a Switchover subsystem point of view:

Time	Message
08:59:20.000_0004	[Error] CNotifierSocketConnection::OnListenToNotifier: detected socket failure 10060 on connection 150 Closing connection
08:59:20.000_0012	[Error] CNotifierSocketConnection::OnListenToNotifier: detected socket failure 10060 on connection 149 Closing connection
08:59:20.000_0084	[Warning] DispatcherLib::CallbackRouter::svc(): Lost notifier connection.
08:59:20.016_0014	[Warning] DispatcherLib::CallbackRouter::svc(): Lost notifier connection.
08:59:20.109_0000	SwitchoverState::StartReconnectTimer: Scheduling reconnect timeout to execute in 30 second(s)
08:59:50.125_0006	[Error] SwitchoverState::Reconnecting: Reconnect period expired, posting [event=eConnectionsDown] to initiate a switchover
08:59:50.125_0007	[Warning] SwitchoverState::StateBackup: Lost connections to the primary, going to StatePrimary and switching now
08:59:50.125_0012	CVirtualSwitchControl::Switch(): Initiating switch

Note: Use the Switchover Reconnect Timeout server parameter to adjust how long it takes for us to start a switch.

TS Ping Failure

The Switchover system lost communications with TSServer on the active computer.

This occurs when TSServer sent a TSP packet, and it failed to make it to the backup computer.

To address this issue:

Check the TS log on the active server to determine what it was doing at the time. If you don't see the ping request, check for more information in the Notifier log if the NOTIFIERLIB subtopic is at NOTES (80) or higher.

Here is a sample of a TS ping failure, taken from the backup Switchover log on the backup server:

Time	Topic	Message
`14:40:00.402`	`SystemMonitor`	`[Enter] PingModule::ModuleByName::Ping: TsServer error count=0 timeout=10000`
`14:40:10.452`	`SystemMonitor`	`[Error] error pinging TsServer count=1`
`14:40:10.452`	`SystemMonitor`	`[Error] error in ping count=1`
`14:40:10.452`	`SystemMonitor`	`[Error] scheduling retry in 1000 ms`
`14:40:11.453`	`SystemMonitor`	`[Enter] PingModule::ModuleByName::Ping: TsServer error count=1 timeout=10000`
`14:40:21.483`	`SystemMonitor`	`[Error] exceeded max error count - signaling system down`
`14:40:21.503`	`SystemMonitor`	`[Error] error in ping count=2`
`14:40:21.503`	`SystemMonitor`	`[Error] maximum error count reached - not scheduling retry`
`14:40:21.543`	`Switchover`	`[Enter] SwitchoverState::StateBackup: event=eTSDown`
`14:40:21.543`	`Switchover`	`[Error] SwitchoverMain::StatusMessage: Backup server switching to primary mode`

Note:
In this example the two pings are failures; the first ping fails, and after a one-second sleep, a second attempt fails. After the second failure, a switch is attempted 21 seconds after the first ping transmission.

Manual Switchover

Something other than the Switchover system's standard diagnostics started the switchover.

For example:

The hardware tests on TSServer forced the switchover through its own diagnostics.
A user forced the switchover using the Switchover Control Panel.

To address this issue:

To determine if TS forced the switchover, check the TS log on the computer that was formerly the active server around the time this switchover occurred. The following entries illustrate how the time of a switchover is indicated in the TS log.

Time	Topic	Message
`20:26:52.069`	`Switchover`	`[Enter] SwitchoverState::StateBackup: event=eInternalSwitchReq`
`20:26:52.069`	`Switchover`	`[Error] SwitchoverMain::StatusMessage: Backup server switching to primary mode`

Often you need to find specific details about a switchover, such as the exact time it occurred.

Open the Switchover log file on the computer that was the backup server before the switchover occurred.
Near the end of the file, search for Initiating Connections Switch.
Just before this line, you should see a line that indicates why the backup computer switched over.