The current WS-AT/BA participant implementations resend prepared/completed messages at a fixed frequency until they receive a response. The period is currently defined by a settable property of class TransportTimer (which defaults to 5 seconds). It would be better if the period between resends could be configured to increase gradually up to some maximum period (obviously setting the maximum period equal to the initial period maintains the status quo). The benefit of a higher period is that it avoids resends using up the available network bandwidth when a response from the coordinator is slow. It is particularly beneficial in the case where a web service employs BA ParticipantCompletion participants since there may be a very long delay between the first completed message being sent and a subsequent close or cancel operation being dispatched by the coordinator. If the service is likely to support many long-running transactions then configuring a high maximum resend period will limit the extent to which resent messages clogg up the network. The downside of increasing the resend period is that a higher value means a higher latency before participant (bottom-up) recovery is initiated following a coordinator crash.
It would also be useful if the initial and maximum resend period could be configured via bean properties associated with the XTS Service bean.
Note that it only makes sense to implement this feature for retries dispatched from the participant side. Retries only occur on the coordinator side while the coordinator is waiting for a specific response from the participant and the wait will always timeout and cancel further retries in these cases (using the timeout interval defined by TransportTimer – default 30 seconds).
This change is a preliminary to a related change required to successfully recover BA participants. In order to detect coordinator crashes which occur between complete and close/cancel they need switch from sending Completed messages to sending GetStatus messages until they get a response or an invalid transaction/participant soap fault. The switchover algorithm needs to be defined to kick in compatibly with this incremental resend.