MVSFORUMS.com

krylon

Hello everyone,

At our company we have several zSeries-machines running in a sysplex and providing services to our customers. So far this has been done primarily for load-balancing, but another issue comes to mind:
On some occasions, one of the machines was handling a connection and - due to overload - got a little laggy in responding; so laggy, in fact, that the client application considered it a timeout and reset the connection.
Since the machine in question was not really down, just slow, I am not 100% sure if this is where "stateful failover" may come in, but it sounds like what we are looking for:
If the server handling a client connection fails to answer a request within a certain amount of time, we want another machine to take over the connection and answer *without the client knowing about this*.

I found out about some solutions pointing in that direction. Cisco offers certain firewall products that do this - but only for the firewalls themselves (so if one of your firewalls breaks, you are still online).
OpenVMS and its clustering capabilities look more promising, but I seriously doubt HP will port it to zSeries or that the applications we are running are available for VMS.
I also found a paper describing a somewhat customized solution using Linux and a protocol called 'BTCP' (Backup TCP) and multiple multicast groups.

But I could not find anything specific to z/OS. Are there any solutions available to do this that do not require massive customization? Has anyone ever worked on such a solution?

Thanks for any hints,
Kind regards,
Benjamin

taltyman · Posted: Fri Nov 05, 2004 8:36 am Post subject:

You may also want to post this at IBM List
Hope that a link to this site is accepable Kolusu, if not feel free to delete.

semigeezer · Posted: Fri Nov 05, 2004 11:57 am Post subject:

It seems like this would need to be in the firewall or router layer. The application isn't cognizant of the connection state (and isn't responding anyway). There are, as you are aware, load balancers but they will sit in front of the mainframe or any other server. I don't know about stateful failover at all, and I can undertand how the communications layer can manage the connections, but what happens if the 1st box makes updates and then the request is retried on another box? Wouldn't the application need to be aware of the potential for multiple, out of synch requests at the back end? In other words, I have no idea, but if you find a solution, please post back here - 'cause at least I'm curious.