Uploaded image for project: 'JBoss Transaction Manager'
  1. JBoss Transaction Manager
  2. JBTM-487

Make WS-AT and WS-BA participants gradually increment the resend period when retrying prepared/completed messages

    XMLWordPrintable

    Details

    • Type: Feature Request
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 4.5.0
    • Fix Version/s: 4.6.0
    • Component/s: XTS
    • Labels:
      None

      Description

      The current WS-AT/BA participant implementations resend prepared/completed messages at a fixed frequency until they receive a response. The period is currently defined by a settable property of class TransportTimer (which defaults to 5 seconds). It would be better if the period between resends could be configured to increase gradually up to some maximum period (obviously setting the maximum period equal to the initial period maintains the status quo). The benefit of a higher period is that it avoids resends using up the available network bandwidth when a response from the coordinator is slow. It is particularly beneficial in the case where a web service employs BA ParticipantCompletion participants since there may be a very long delay between the first completed message being sent and a subsequent close or cancel operation being dispatched by the coordinator. If the service is likely to support many long-running transactions then configuring a high maximum resend period will limit the extent to which resent messages clogg up the network. The downside of increasing the resend period is that a higher value means a higher latency before participant (bottom-up) recovery is initiated following a coordinator crash.

      It would also be useful if the initial and maximum resend period could be configured via bean properties associated with the XTS Service bean.

      Note that it only makes sense to implement this feature for retries dispatched from the participant side. Retries only occur on the coordinator side while the coordinator is waiting for a specific response from the participant and the wait will always timeout and cancel further retries in these cases (using the timeout interval defined by TransportTimer – default 30 seconds).

      This change is a preliminary to a related change required to successfully recover BA participants. In order to detect coordinator crashes which occur between complete and close/cancel they need switch from sending Completed messages to sending GetStatus messages until they get a response or an invalid transaction/participant soap fault. The switchover algorithm needs to be defined to kick in compatibly with this incremental resend.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                adinn Andrew Dinn
                Reporter:
                adinn Andrew Dinn
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: