Feedback

  • Contents
 

Why did a switchover occur?

The event log is the fastest way to determine why a switchover occurred. Be sure to check the event log on both computers. If the backup server can still communicate with the active server, it updates the event log on the active server. Otherwise, it updates the event log only on the backup server.

After each switchover, you should examine the Switchover subsystem log from the Backup server for an entry that contains "Switchover Service: Switch Initiated." Immediately before this entry should be a message that states why the switchover occurred. It can be one of the following:

  • Loss of Connection to Notifier - The Switchover subsystem on the server in the backup role lost connection with the Notifier subsystem on the server in the primary role.

  • TS Ping Failure - The Switchover system lost communications with TSServer on the active computer.

  • Manual Switchover - Something other than standard diagnostics started the switchover. For example:

    • The hardware tests on TSServer

    • A user who used the Switchover Control Panel

Loss of Connection to Notifier

The Switchover subsystem on the server in the backup role lost connection with the Notifier subsystem on the server in the primary role.

To address this:

Check the Notifier log at Notes logging level on the active server to determine what it was doing at the time. A message like remote subsystem disconnects generally indicates network. A message like took too long generally indicates server performance.

This example shows the lost Notifier connection from a Switchover subsystem point of view:

Time

Message

08:59:20.000_0004 [Error] CNotifierSocketConnection::OnListenToNotifier: detected socket failure 10060 on connection 150  Closing connection
08:59:20.000_0012 [Error] CNotifierSocketConnection::OnListenToNotifier: detected socket failure 10060 on connection 149  Closing connection
08:59:20.000_0084

[Warning] DispatcherLib::CallbackRouter::svc(): Lost notifier connection.

08:59:20.016_0014 [Warning] DispatcherLib::CallbackRouter::svc(): Lost notifier connection.
08:59:20.109_0000 SwitchoverState::StartReconnectTimer: Scheduling reconnect timeout to execute in 30 second(s)
08:59:50.125_0006 [Error] SwitchoverState::Reconnecting: Reconnect period expired, posting [event=eConnectionsDown] to initiate a switchover
08:59:50.125_0007 [Warning] SwitchoverState::StateBackup: Lost connections to the primary, going to StatePrimary and switching now
08:59:50.125_0012 CVirtualSwitchControl::Switch(): Initiating switch

Note: Use the Switchover Reconnect Timeout server parameter to adjust how long it takes for us to start a switch.

TS Ping Failure

The Switchover system lost communications with TSServer on the active computer.

This occurs when TSServer sent a TSP packet, and it failed to make it to the backup computer.

To address this issue:

Check the TS log on the active server to determine what it was doing at the time. If you don't see the ping request, check for more information in the Notifier log if the NOTIFIERLIB subtopic is at NOTES (80) or higher.

Here is a sample of a TS ping failure, taken from the backup Switchover log on the backup server:

Time

Topic

Message

14:40:00.402

SystemMonitor

[Enter] PingModule::ModuleByName::Ping: TsServer error count=0 timeout=10000

14:40:10.452

SystemMonitor

[Error] error pinging TsServer count=1

14:40:10.452

SystemMonitor

[Error] error in ping count=1

14:40:10.452

SystemMonitor

[Error] scheduling retry in 1000 ms

14:40:11.453

SystemMonitor

[Enter] PingModule::ModuleByName::Ping: TsServer error count=1 timeout=10000

14:40:21.483

SystemMonitor

[Error] exceeded max error count - signaling system down

14:40:21.503

SystemMonitor

[Error] error in ping count=2

14:40:21.503

SystemMonitor

[Error] maximum error count reached - not scheduling retry

14:40:21.543

Switchover

[Enter] SwitchoverState::StateBackup: event=eTSDown

14:40:21.543

Switchover

[Error] SwitchoverMain::StatusMessage: Backup server switching to primary mode

Note:
In this example the two pings are failures; the first ping fails, and after a one-second sleep, a second attempt fails. After the second failure, a switch is attempted 21 seconds after the first ping transmission.

Manual Switchover

Something other than the Switchover system's standard diagnostics started the switchover.

For example:

  • The hardware tests on TSServer forced the switchover through its own diagnostics.

  • A user forced the switchover using the Switchover Control Panel.

To address this issue:

To determine if TS forced the switchover, check the TS log on the computer that was formerly the active server around the time this switchover occurred. The following entries illustrate how the time of a switchover is indicated in the TS log.

Time

Topic

Message

20:26:52.069

Switchover

[Enter] SwitchoverState::StateBackup: event=eInternalSwitchReq

20:26:52.069

Switchover

[Error] SwitchoverMain::StatusMessage: Backup server switching to primary mode

Often you need to find specific details about a switchover, such as the exact time it occurred.

  1. Open the Switchover log file on the computer that was the backup server before the switchover occurred.

  2. Near the end of the file, search for Initiating Connections Switch.

  3. Just before this line, you should see a line that indicates why the backup computer switched over.