Monday, November 4, 2013

Protecting OLTP Performance After Failover

An earlier article High Availability Demo, Revised discussed the possibility that read-only connections to the secondary or mirror database in a High Availability setup could suddenly be allowed to perform updates after a failover when the read-only secondary database suddenly becomes the updatable primary.

This happens because read-only connections don't get dropped when the failover occurs because the database they're connected to (the secondary) is still up and running. This behavior is intended, as discussed in Configuring read-only access to a database running on the mirror server.

Updates aren't the problem . . .

Chances are, a read-only connection isn't suddenly going to start performing updates when the DB_PROPERTY ( 'ReadOnly' ) changes from 'On' to 'Off' after a failover... so maybe nobody cares about this behavior.

. . . performance is

The risk of updates may not be a concern to anyone, even if those read-only connections did start updating data... that's not the reason the read-only connections were segregated to run on a separate server. They were moved so the resource-intensive OLAP queries would not affect OLTP response time on the primary database server.

After a failover, even if the read-only connections don't start performing updates, all the OLTP work moves over to the same server that's doing the OLAP Queries From Hell... there's nothing automatic to tell the OLAP workload to go away, and the OLTP workload has to share the same server.

Preserving The Divine Right Of OLTP After FailOver

That's what this article is about: Implementing a process that will automatically disconnect all the read-only connections immediately after a failover, forcing them to wait and reconnect when the read-only secondary server becomes available again.

Here's how it's done, with a CREATE EVENT TYPE MirrorFailover that is executed when either of the following events occur:
  • A database is started up and it determines that it should be the primary database in a High Availabiliy setup, or

  • a secondary (mirror) database in a High Availability setup becomes the primary database after a failover.
The first case is not a problem: the High Availability setup is just starting up and there are no connections worth dropping.

In the second case, however, all of the connections on the former-secondary-and-now-primary database should be dropped because they are all OLAP connections which will soon be joined by all those OLTP connections that were dropped during failover and are now going to reconnect to the new primary. In other words, everything needs to be dropped... or almost everything as the code will show:
CREATE EVENT failover TYPE MirrorFailover
HANDLER BEGIN

DECLARE @other_connection_id   INTEGER;
DECLARE @sql                   VARCHAR ( 1000 );

MESSAGE STRING ( CURRENT TIMESTAMP, ' MirrorFailover event has started...' ) TO CONSOLE;
MESSAGE which_database() TO CONSOLE;

IF DB_PROPERTY ( 'MirrorRole' ) = 'primary' THEN

   SET @other_connection_id = NEXT_CONNECTION ( NULL );

   WHILE @other_connection_id IS NOT NULL LOOP

      IF  @other_connection_id <> @@SPID  
      AND @other_connection_id < 1000000000 
      AND CONNECTION_PROPERTY ( 'Name', @other_connection_id ) LIKE 'OLAP%' THEN
         MESSAGE STRING ( 
            CURRENT TIMESTAMP,   
            ' MirrorFailover event is dropping connection ',  
            @other_connection_id,
            ' ("',
            CONNECTION_PROPERTY ( 'Name', @other_connection_id ),  
            '")...' ) TO CONSOLE;
         SET @sql = STRING (  
            'DROP CONNECTION ',  
            @other_connection_id );
         EXECUTE IMMEDIATE @sql;
      END IF;

      SET @other_connection_id = NEXT_CONNECTION ( @other_connection_id ) 

   END LOOP;

END IF;

END;
Here's how the code works:

The IF statement on line 10 is probably unnecessary because the MirrorFailover is only fired on a primary database, but it can't hurt.

The SET ... WHILE ... SET ... END LOOP statements on lines 12, 14, 32 and 34 step through all of the connections to the database, including client connections and internal connections like the one this EVENT is running on. It's the same logic that appeared previously in the article on p_drop_other_connections.

The IF statement on lines 16 through 18 determines which connections should be dropped. How you write this IF is up to you, but it's important not to drop the current connection, or any internal connections (ones that have large connection numbers like "INT: ApplyRecovery" or events like "DatabaseStart"). In the code shown here, the CON= connection name property is checked to make sure that only "OLAP..." connections are dropped.

The SET and EXECUTE statements on lines 26 through 29 do the actual work of dropping the OLAP connections.

The MESSAGE statements writes a record of what happened to the server's diagnostic console log file, including the MESSAGE statement on line 8 which calls this procedure:

CREATE FUNCTION which_database()
RETURNS LONG VARCHAR
BEGIN

   RETURN STRING (
      'PROPERTY ( ''MachineName'' )     = ', PROPERTY ( 'MachineName' ),     '\x0d\x0a',
      'PROPERTY ( ''Name'' )            = ', PROPERTY ( 'Name' ),            '\x0d\x0a',
      'PROPERTY ( ''ServerName'' )      = ', PROPERTY ( 'ServerName' ),      '\x0d\x0a',
      'PROPERTY ( ''TcpIpAddresses'' )  = ', PROPERTY ( 'TcpIpAddresses' ),  '\x0d\x0a',
      'DB_PROPERTY ( ''Name'' )              = ', DB_PROPERTY ( 'Name' ),         '\x0d\x0a',
      'DB_PROPERTY ( ''ReadOnly'' )          = ', DB_PROPERTY ( 'ReadOnly' ),     '\x0d\x0a',
      'DB_PROPERTY ( ''ArbiterState'' )      = ', DB_PROPERTY ( 'ArbiterState' ), '\x0d\x0a', 
      'DB_PROPERTY ( ''PartnerState'' )      = ', DB_PROPERTY ( 'PartnerState' ), '\x0d\x0a', 
      'DB_PROPERTY ( ''MirrorMode'' )        = ', DB_PROPERTY ( 'MirrorMode' ),         '\x0d\x0a', 
      'DB_PROPERTY ( ''MirrorRole'' )        = ', DB_PROPERTY ( 'MirrorRole' ),         '\x0d\x0a', 
      'DB_PROPERTY ( ''MirrorServerState'' ) = ', DB_PROPERTY ( 'MirrorServerState' ),  '\x0d\x0a', 
      'DB_PROPERTY ( ''MirrorServerWaits'' ) = ', DB_PROPERTY ( 'MirrorServerWaits' ),  '\x0d\x0a', 
      'DB_PROPERTY ( ''MirrorState'' )       = ', DB_PROPERTY ( 'MirrorState' ),  '\x0d\x0a', 
      'DB_PROPERTY ( ''File'' )              = ', DB_PROPERTY ( 'File' ),         '\x0d\x0a', 
      'CONNECTION_PROPERTY ( ''Number'' )          = ', CONNECTION_PROPERTY ( 'Number' ),   '\x0d\x0a',
      'CONNECTION_PROPERTY ( ''Name'' )            = ', CONNECTION_PROPERTY ( 'Name' ),     '\x0d\x0a',
      'CONNECTION_PROPERTY ( ''CommLink'' )        = ', CONNECTION_PROPERTY ( 'CommLink' ), '\x0d\x0a',
      'CONNECTION_PROPERTY ( ''CommNetworkLink'' ) = ', CONNECTION_PROPERTY ( 'CommNetworkLink' ) );

END; 
Here's what the failover looks like, as shown by the diagnostic console log for the secondary-becoming-primary database server:
I. 11/03 09:27:41. Database server started at Sun Nov 03 2013 09:27
I. 11/03 09:27:41. Starting database "demo" (C:\temp\server2\demo.db) at Sun Nov 03 2013 09:27
I. 11/03 09:27:41. Trying to start SharedMemory link ...
I. 11/03 09:27:41. Performance warning: Database file "C:\temp\server2\demo.db" consists of 3 disk fragments
I. 11/03 09:27:41.     SharedMemory link started successfully
I. 11/03 09:27:41. Database recovery in progress
I. 11/03 09:27:41. Trying to start TCPIP link ...
I. 11/03 09:27:41.     Last checkpoint at Sun Nov 03 2013 09:27
I. 11/03 09:27:41.     Checkpoint log...
I. 11/03 09:27:41. Starting on port 55502
I. 11/03 09:27:41.     TCPIP link started successfully
I. 11/03 09:27:41. Now accepting requests
I. 11/03 09:27:51. Database "demo" mirroring: local status: role=primary, state=synchronizing, sequence=1, yielding=N
I. 11/03 09:27:51.     Transaction log: demo.log...
I. 11/03 09:27:51. Database "demo" mirroring:  determining mirror role ...
E. 11/03 09:27:51. Database "demo" mirroring:  mirror connection to server "partner1_demo" failed: database not found
I. 11/03 09:27:52. Database "demo" mirroring:  mirror partner connected
I. 11/03 09:27:52. Database "demo" mirroring:  mirror arbiter connected
I. 11/03 09:27:52. Database "demo" mirroring: arbiter status: role=mirror, state=synchronizing, sequence=1, yielding=N
I. 11/03 09:27:53. Database "demo" mirroring: partner status: role=mirror, state=synchronizing, sequence=1, yielding=N
I. 11/03 09:27:53. Database "demo" mirroring:  synchronizing ...
I. 11/03 09:27:53. Database "demo" (demo.db) started as mirror at Sun Nov 03 2013 09:27
I. 11/03 09:27:53. Starting checkpoint of "demo" (demo.db) at Sun Nov 03 2013 09:27
I. 11/03 09:27:53. Finished checkpoint of "demo" (demo.db) at Sun Nov 03 2013 09:27
I. 11/03 09:27:53. Database "demo" mirroring:  synchronized
I. 11/03 09:32:19. Database "demo" mirroring:  mirror partner disconnected

I. 11/03 09:32:19. Database "demo" mirroring:  becoming primary server
I. 11/03 09:32:20.     Rollback log...
I. 11/03 09:32:20.     Checkpointing...
I. 11/03 09:32:20. Starting checkpoint of "demo" (demo.db) at Sun Nov 03 2013 09:32
I. 11/03 09:32:20. Finished checkpoint of "demo" (demo.db) at Sun Nov 03 2013 09:32
I. 11/03 09:32:20. Recovery complete
I. 11/03 09:32:20. Database "demo" (demo.db) started at Sun Nov 03 2013 09:32

I. 11/03 09:32:20. 2013-11-03 09:32:20.740 MirrorFailover event has started...
I. 11/03 09:32:20. PROPERTY ( 'MachineName' )     = ENVY 
PROPERTY ( 'Name' )            = partner2_demo 
PROPERTY ( 'ServerName' )      = partner2_demo 
PROPERTY ( 'TcpIpAddresses' )  = 192.168.1.100:55502 
DB_PROPERTY ( 'Name' )              = demo 
DB_PROPERTY ( 'ReadOnly' )          = Off 
DB_PROPERTY ( 'ArbiterState' )      = connected 
DB_PROPERTY ( 'PartnerState' )      = disconnected 
DB_PROPERTY ( 'MirrorMode' )        = synchronous 
DB_PROPERTY ( 'MirrorRole' )        = primary 
DB_PROPERTY ( 'MirrorServerState' ) =  
DB_PROPERTY ( 'MirrorServerWaits' ) = 0 
DB_PROPERTY ( 'MirrorState' )       = synchronizing 
DB_PROPERTY ( 'File' )              = C:\temp\server2\demo.db 
CONNECTION_PROPERTY ( 'Number' )          = 1000000134 
CONNECTION_PROPERTY ( 'Name' )            = failover 
CONNECTION_PROPERTY ( 'CommLink' )        = NA 
CONNECTION_PROPERTY ( 'CommNetworkLink' ) = NA

I. 11/03 09:32:20. 2013-11-03 09:32:20.751 MirrorFailover event is dropping connection 223 ("OLAP Runaway")...
I. 11/03 09:32:20. User "DBA" dropped connection 223 ("DBA")

I. 11/03 09:32:20. 2013-11-03 09:32:20.758 MirrorFailover event is dropping connection 221 ("OLAP Rogue")...
I. 11/03 09:32:20. Connection terminated abnormally
I. 11/03 09:32:20. User "DBA" dropped connection 221 ("DBA")

I. 11/03 09:32:20. Disconnected TCPIP client's AppInfo: IP=192.168.1.100;HOST=ENVY;
   OSUSER=Breck;OS='Windows 7 Build 7601 Service Pack 1';
   EXE='C:\Program Files\SQL Anywhere 12\bin32\dbisql.exe';
   PID=0xf48;THREAD=0x2cf0;VERSION=12.0.1.3298;API=iAnywhereJDBC;TIMEZONEADJUSTMENT=-300

I. 11/03 09:32:20. 2013-11-03 09:32:20.760 MirrorFailover event is dropping connection 220 ("OLAP Read-Only")...
I. 11/03 09:32:20. Connection terminated abnormally
I. 11/03 09:32:20. User "DBA" dropped connection 220 ("DBA")

I. 11/03 09:32:20. Disconnected TCPIP client's AppInfo: IP=192.168.1.100;HOST=ENVY;
   OSUSER=Breck;OS='Windows 7 Build 7601 Service Pack 1';
   EXE='C:\Program Files\SQL Anywhere 12\bin32\dbisql.exe';
   PID=0xf48;THREAD=0x3354;VERSION=12.0.1.3298;API=iAnywhereJDBC;TIMEZONEADJUSTMENT=-300

I. 11/03 09:32:20. Connection terminated abnormally
I. 11/03 09:32:20. Disconnected TCPIP client's AppInfo: IP=192.168.1.100;HOST=ENVY;
   OSUSER=Breck;OS='Windows 7 Build 7601 Service Pack 1';
   EXE='C:\Program Files\SQL Anywhere 12\bin32\dbisql.exe';
   PID=0xf48;THREAD=0x1398;VERSION=12.0.1.3298;API=iAnywhereJDBC;TIMEZONEADJUSTMENT=-300

After the "OLAP Rogue" connection is dropped during the failover process, it cannot reconnect because there is no server named "secondary_demo" running:

Once the secondary (mirror) server is started again, the "OLAP Rogue" ISQL session can connect again:

Everything's OK now; "OLAP Rogue" is connected to the new read-only secondary server where it cannot interfere with the OLTP workload:


No comments: