[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xmlblaster] ssl and compression with SOCKET

PBal wrote:
Hi Marcel!

this is a useful extension to our basic SOCKET implementation.
If you are willing to donate it under LGPL license we would be
happy to add it to the xmlBlaster distribution.

I'm willing.

There are many variants of compression. --------------------------------------

1. Compress MsgUnit above the protocol plugin layer

 This is not easily possible as for example the CORBA publish(MsgUnit) call
 can't set a flag that the MsgUnit is compressed.
 We would need to set a compression flag on login to tell that all
 publishes are compressed.
 On subscription we could set a compression flag that all updates
 shall be compressed.
 + This runs with all protocol plugins.
 - No fine grained control
 - CPU overhead (see below)

2. Compress in the SOCKET protocol plugin

 This is your way. The SOCKET spec supports it with a compression flag
 and a 'lenUnzipped' field.
 + Simple
 ++ Compresses everything, Key+content+Qos and even MsgUnit[] in a bulk
 - The CPU overhead in the xmlBlaster server increases for each subscriber
   as the received message is uncompressed on arrival and needs to be
   compressed for each subscriber separately.

3. Compress in MessageUnit.setContent()

 Here the Key and Qos is transferred uncompressed and only the
 message content is compressed. We could support this in our
 C/C++/Java MessageUnit struct. We could use a ClientProperty "__gzip"
 to mark it.
 + The xmlBlaster server would never uncompress the content
   (as it never looks into it) when receiving
   it and forwarding it to the subscribers.
 + This runs with all protocol plugins.
 ++ No CPU overhead
 - Key and Qos are not compressed

4. Compress messages in the security plugin

 + This runs with all protocol plugins.
 - More a specialized case (similar to 5.)

5. The client developers do it themselves.

 A xmlBlaster user can implement compression similar to 3.
 + This runs with all protocol plugins
 + Every fine grained combination is possible
 - Reinvent the wheel

The solutions 2 and 3 are probably our ways to go.

Configuration: --------------

Typically only publish() and update() and get()-return (and their Array & Oneway variants)
need compression.
Other requests like connect(), disconnect(), subscribe(),
unSubscribe(), ping() and erase() don't need compression.

Your configuration switches on/off compression for a publisher
as a whole.
Updates for all subscribers are always delivered uncompressed OR compressed depending
on the SOCKET plugin configuration in the server.

In a future step we will add a <compress type="gzip"/> flag to PublishQos and
to SubscribeQos to have fine grained control.

If so, you can send the patch directly to my mail address.
You would need to add a test case as well and some documentation in
the SOCKET requirement (xmlBlaster/doc/requirements/protocol.socket.xml).

Is there a compatible C compression library with a free license around
which could be added to the C/C++ client library?
Do you have a property for a minimum message size to switch on

Yes, there is a compatible C compression library. I mentioned that jzlib was used as the java library. As it is based on (actually, quite copy-paste work) the GNU zlib C library, the libs and their inputs/outputs have to be fully compatible. It's also stated on their website: http://www.jcraft.com/jzlib/

Let me explain the implementation details:
- Currently, compression is turned on if and only if SSL sockets are used. However, these features can be separated easily.
- The _whole_ tcp stream is compressed, because it was the easiest to implement it this way. This means that the "compression window" is not reinitialized on every message; The dictionary isn't flushed in the deflater, so repeating sequences are compressed even if they were in a previous message. I thought that would enable the deflater to compress the beginning of the message, too, and helps to achive better results when the same type of small messages are sent frequently (in my case, that is very true).
What I don't know is how all this behaves when random messages are sent on a connection. I mean, what if virtually every message would have some ugly binary content, for example? I still believe this would not make the messages bigger on average.

I don't understand this. Do you say that the message stream is compressed on the fly as bytes are pushed in? I'll look into the code you send to understand it.

A deeper understanding (and more modification) of the SOCKET implementation would be required if I wanted to compress individual messages. This is not impossible, either. Moreover, if I put the compression filter in the right place, we would have compression for every protocol, wouldn't we?

See discussion above.

Thinking about this, I already have another implementation which, in a way, supports compressing individual messages: it only compresses a block of bytes when the stream is flushed or when its buffer is full. When xmlBlaster sends a message, it writes the whole message to the OutputStream, then flushes it. When flush is invoked, the buffer state would enable it to decide whether to compress the current buffer (which should be a message, or a big part of it) or not. This way, we would lose the state of the deflater object and start with a new one on every flush, compress the buffer, and see if it's smaller than the original buffer. If it is, we'd send it compressed, otherwise not. Some simple protocol wrapper is also needed, but that is no problem.

If after compression the message is bigger it would consume CPU, probably a minimal message size could help here. (The minimal message size could be determined dynamically for each unnecessary compression...)

I think these are our options.

I will assemble a patch for you as soon as I sepatated the SSL and compression layer and cleaned up some code. If you like the idea of deciding compression on a per-flush basis, then that will be included, too.