We tested Jboss 4.2.0.GA and found it worthy for production. When we actually did put it in production, and more than about 300 people were using the application simultaneously, people complained that the application was extremely slow. Some commands, which issued several rmi invocations, would take several minutes. There was not a heavy load on the servers, however. It seemed like every request was just waiting for something to timeout, or that something was causing deadlock in a synchronized block.
I analyzed the problem by both watching the jboss server.log, and also using tcpdump to watch network traffic. I noticed big pauses where neither the server, client, or network was doing anything at all. I made tcpdump have a zero snaplen and read through each packet trying to find what exactly it was doing. I followed the process from ports 1099(naming) to 1098(rmi) to 4446(unified invoker). Right when the client would invoke the stub it had from rmi, it'd sit there for 12 to 14 seconds. I looked at this from other commands and indeed, it was waiting for the invoker.
I started googling for hangs with the jboss invoker and stumbled across some bugs, and found these:
They're all old bugs, but at least it is obvious that the unified invoker had problems in the past with hanging while marshalling/unmarshalling with jboss's custom (un)marshallers. My suspicion was that they may have solved the problem with the hack back then, but it is probably showing its ugly head when the server is under a heavy load.
In searching for more stuff, I found that the readme for jboss-4.2.0.GA said that "The default invoker for EJBs has been changed from the rmi-invoker to the unified-invoker, provided by JBoss Remoting". This got me very susupicious of it, since it was definitely the invoker that was sitting there for around 15 seconds on every request when the server had a heavier load. I looked at the change to the standardjboss.xml:
I changed the proxy bindings from the unified invoker back to the rmi invoker in standardjboss.xml, and the problem went away. Invocations now go through port 4444, and are instantaneous.