[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xmlblaster] missing volatile messages

Hi, Marcel:

I wrote another script (testsubpub.sh, attached) to do sub, pub and validate jobs automatically. The script will keep running until messages get lost.

Three types of servers were tested. They all ran Linux, jdk 1.5.0 and xmlBlaster 1.1.1.

 Server1: one CPU: AMD Athlon 64 3000+ 2.0GHz, 1.0GB RAM
 Server2: two CPUs: Pentium III (Coppermine) 866MHz, 1.0GB RAM
 Server3: two CPUs: Intel Xeon 3.06GHz, 3.8GB RAM

Test results:
The test script need to take more than 70 loops before messages get lost on server1. But on server2, it only need 2 ~ 7 loops. Server3, 6 ~ 30 loops.

Message losing was always accompanied with the following Exception: (By the way, the erase flag was false in the test)

[Apr 5, 2006 4:06:01 PM ESC[31;40mERRORESC[0m XmlBlaster.SOCKET.tcpListener-ewu TopicHandler/topic/AABB] PANIC: invoke callback is strange in state 'UNREFERENCED'
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1158)
at org.xmlBlaster.engine.TopicHandler.checkIfAllowedToSend(TopicHandler.java:1198)
at org.xmlBlaster.engine.TopicHandler.invokeCallback(TopicHandler.java:1310)
at org.xmlBlaster.engine.TopicHandler.invokeCallbackAndHandleFailure(TopicHandler.java:1170)
at org.xmlBlaster.engine.TopicHandler.publish(TopicHandler.java:645)
at org.xmlBlaster.engine.RequestBroker.publish(RequestBroker.java:1677)
at org.xmlBlaster.engine.RequestBroker.publish(RequestBroker.java:1483)
at org.xmlBlaster.engine.RequestBroker.publish(RequestBroker.java:1477)
at org.xmlBlaster.engine.XmlBlasterImpl.publishArr(XmlBlasterImpl.java:185)
at org.xmlBlaster.util.protocol.RequestReplyExecutor.receiveReply(RequestReplyExecutor.java:402)
at org.xmlBlaster.protocol.socket.HandleClient.handleMessage(HandleClient.java:231)
at org.xmlBlaster.protocol.socket.HandleClient.run(HandleClient.java:352)
at java.lang.Thread.run(Thread.java:595)

Analysis: 1. one cpu is more robust than two cpu, no matter it is faster or not. 2. two fast cpus is better than two slow cpus.

Based on our test results, we think it might be a multithread racing issue.


Marcel Ruff wrote:


thanks for this detailed test.

We couldn't reproduce loosing messages with your scripts.
We tried on same server machine and distributed over two machines.
But we once got a - none reproducable - stack trace.
So the combination with XPath and volatile messages and
erasing the topic during operation (by the first of your publishers finishing)
seems to be some open issue.
We have added now some code to handle the described NPE.

> "but some messages(<1%) completely lost."
It could be possible because of the first publisher is erasing the topic
during the second publisher is firing.
Anyhow, i couldn't reproduce any lost here.

We'll keep the case open,


Attachment: testsubpub.sh
Description: application/shellscript