Thursday, February 4, 2010

Jdbc, TCP channels and other funny dudes...

My most affectionate readers already know that I recently worked on a java tool that automates the migration of database structure and content between different database versions (you know, changes in tables, fields, new indexes, and other database refactoring operations): the tool is called DMT, and I already spoke about it in the pages of this blog.
The tool is used by several development teams for all operations of database re-creation, migration and tests, and I was not receiving so many complaining emails these days, so I was pretty confident that it is now decently stable and the planned tests in production-like environment would run smoothly. Of course, this is not what happened: when DMT was used in a pre-production environment for testing purposes, the DBA's who run the tests experienced that it was not able to run the migration, hanging indefinitely without completing the execution of the migration task.
This was the right occasion for me to learn what happens when a TCP channel is closed while a jdbc connection is active over it.

The problem
  1. DMT connects to the Oracle host and sends through a jdbc connection the SQL statements to be executed
  2. The Oracle server receives the SQL statements and begins the execution. While the SQL statements are running, there is no traffic between DMT and the Oracle server, because DMT keeps waiting from the Oracle Server a signal when the execution of the statement is completed (or when an Oracle exception occurs)
  3. While the Oracle Server is running the SQL statements and DMT is waiting a signal that this process has been completed, a firewall detects that the TCP channel between DMT and the Oracle server is inactive and decides to close the connection because of a timeout configuration.
  4. When Oracle completes the SQL execution, the TCP connection with DMT is no longer open, so it is not able to communicate to DMT that the job is done, and DMT keeps waiting a message forever, like an unlucky man which has an unrequited love for a woman
The solution
The TCP Keep-alive mechanism!
It's possible to enable this mechanism in a jdbc connection simply adding the parameter ENABLE=BROKEN to the jdbc string used to activate the connection, and keep_alive "probes" will be  sent over the connection after a period of inactivity keeping the connection alive. The jdbc url will look like the following one:

jdbc:oracle:thin:@(DESCRIPTION=(ENABLE=broken)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=yourhost1)(PORT=1521))(ADDRESS= (PROTOCOL=TCP)(HOST=yourhost2)(PORT=1521))(LOAD_BALANCE=on)(FAILOVER=on))(CONNECT_DATA=(SERVER=dedicated)(SERVICE_NAME=service.yourcompany.com)(FAILOVER_MODE=(TYPE=session)(METHOD=basic)(RETRIES=10)(DELAY=3))))

The TCP settings of the host OS will be used to determine when to start sending the keep-alive probes ("keep alive time" parameter), how many probes to send before detecting that the connection is closed ("keep alive probes" parameter) and the interval between two consecutive probes ("keep alive interval")(some further details here).

Bonus: What does the classic "TCP/IP Illustrated" book say about keepalive?
Many newcomers to TCP/IP are surprised to learn that no data whatsoever flows across an idle TCP connection. That is, if neither process at the ends of a TCP connection is sending data to the other, nothing is exchanged between the two TCP modules. There is no polling, for example, as you might find with other networking protocols. This means we can start a client process that establishes a TCP connection with a server, and walk away for hours, days, weeks or months, and the connection remains up. Intermediate routers can crash and reboot, phone lines may go down and back up, but as long as neither host at the ends of the connection reboots, the connection remains established . [...]
There are times, however, when a server wants to know if the client's host has either crashed and is down, or crashed and rebooted. The keepalive timer, a feature of many implementations, provides this capability.
But be aware: 
Keepalives are not part of the TCP specification. The Host Requirements RFC provides three reasons not to use them: (1) they can cause perfectly good connections to be dropped during transient failures, (2) they consume unnecessary bandwidth, and (3) they cost money on an internet that charges by the packet. Nevertheless, many implementations provide the keepalive timer. 
Hope this helps!

2 comments: