Multiple xmlBlaster instances can build a cluster following the master/slave paradigm

We define a cluster as a configuration where more than one xmlBlaster server instance is running and those instances know of each other. The instances may run on the same host or distributed over the internet.

All clustering abilities for xmlBlaster reduce to the simple master/slave approach. This cluster approach is easy to understand as we are not leaving the MoM paradigm to support clustering.

An important part in clustering is the discovery and lookup. How to find other cluster nodes and access in depth node informations from them, and how to keep those informations up to date. This is addressed with the publish/subscribe idea as well. XmlBlaster nodes store their cluster informations in messages, so other nodes can subsrcribe to this data. If necessary one xmlBlaster is running as a 'naming service' holding the informations of all available xmlBlaster instances.

In the following examples, we use the term xmlBlaster server instance and xmlBlaster node or just node interchangeable.

Example for a typical xmlBlaster cluster

In this example we have three xmlBlaster instances running, each of them has a unique cluster node ID, here the names heron, golan and avalon.

Each of the nodes have an arbitrary number of clients attached. The clients can publish or subscribe to any message in the cluster, and may send PtP messages to any other client.

It is important to understand that clustering is based on topics. The above picture shows the physical connection of the cluster nodes. But in any node there may be some topics defined to be the master and others to be slaves.

Clustering in our sense covers the following topics:

Scalability:

A master xmlBlaster server instance allows to have any number of slave xmlBlaster server instances for specific message domains. These slaves can have further slaves again. This allows distributing messages to an almost unlimited number of clients. Note that one xmlBlaster node can be a master for some messages and a slave for other message types simultaneously.

Example for a typical xmlBlaster cluster

The example shows a tree like configuration of xmlBlaster nodes. In this way we can connect an almost unlimited number of clients. Every child leaf supplies a certain amount of slaves, which supplies other slaves which finally supply clients with message updates. The slaves are caching the messages and respond to request of their clients directly. The cache is always up to date as it is real time updated according to the publish/subscribe paradigm. With every child level in the tree the latency increases for typically 5-40 milliseconds (intranet) for new published message updates. Note that the publisher does not need to be connected to the master node, the client in the down left edge of the picture is publishing as well.

We introduce the term stratum as the distance of a node from the master. This is done in analogy to the network time protocol (NTP). A stratum of 0 is the master itself, 1 is the first slave and stratum=2 would be bilbo in the above picture.

Implementation status:

Mode	Description	Hot	Impl
Publish/Subscribe	This feature is implemented for Publish/Subscribe and ready for production use. Changing the cluster configuration in hot operation is addressed by the design but final implementation and testing of this feature is missing.
Point to point (PtP)	PtP routing in cluster environment is ready available. If you destination address has an absolute name like '/node/heron/client/joe' the local node and all direct neighbors are checked and the message is directly delivered. Otherwise the same routing rule as for Publish/Subscribe apply.

Availability (Failover):

An xmlBlaster slave may adopt the master role for selective message types if the current xmlBlaster master fails.

Example for a typical xmlBlaster cluster

Autonomous failure recovery without distinct cluster manager (no single point of failure).

We have three different failure situations to cover:

Master failure:
This is the example in the above picture.
Slave failure:
frodo dies, bilbo needs to rearrange (see figure 1). Bilbo needs to know the current cluster situation. Bilbo could choose to connect to a node with low load or with low stratum numbers for his messages.
Client reconnect:
Client looses connection to its xmlBlaster instance and needs to find another one.

Implementation status:

Mode	Description	Hot	Impl
Publish/Subscribe	-
PtP	-

Logical separation based on message domains:

One xmlBlaster instance can be the master for selective messages (e.g. for stock exchange messages) and be slave for other messages (e.g. for air traffic radar tracks or rugby news) simultaneously, we call that cluster message domains. Consequently a client interested in all informations only needs to connect to one xmlBlaster server instance.

Example for a typical xmlBlaster cluster

As we can see the node heron is master of messages of the domain "RUGBY_NEWS" but caches "STOCK_EXCHANGE" as well.

Implementation status:

Mode	Description	Hot	Impl
Publish/Subscribe	This feature is implemented for Publish/Subscribe and ready for production use. Note that `erase()` calls to the slaves need to have the domain set in the XmlKey (similar to the publishes) to be forwarded to the master. `erase()` calls to the master are automatically propagated to all slaves, even with a missing domain setting.
PtP	This feature is implemented for PtP and ready for production use.

Load balancing:

An xmlBlaster cluster allows to have more than one master server for a specific message domain. The master nodes are mirrored instances for those messages. Published messages reach all master nodes. Subscribed messages are retrieved using a load balance algorithm.
See How to determine CPU load from Jaa with JNI

Example for a typical xmlBlaster cluster

In the above scenario heron1 and heron2 share their knowledge. Slave nodes can choose which of those servers to use.

Implementation status:

Mode	Description	Hot	Impl
Publish/Subscribe	Mirroring of messges is possible in master/slave operation, mirroring of session stateful information is currently not implemented.
PtP	Mirroring of PtP messages is currently not supported as user session mirroring is not available.

Implementation overview:

We have to code and manage three logical mapping functionalities:

Find out who is the master of a message
We have a message and need to map it to a master. The decision can be based on any information in the message. As a default we supply a simple key attribute based approach: An XML attribute domain is used with the message <key> tag, for example:
```
<key domain='STOCK_EXCHANGE'/>
```
The domain based approach maps domain names to cluster node IDs. Please see the examples below.
Note that this simple domain name approach has severe drawbacks: The clients need to know beforehand to which domain a message belongs. If the domains change, all clients need to be recompiled/restarted/reconfigured. A rule based approach (see the XPATH example in the Example section) is generic and addresses this issue.
The plugin interface I_MapMsgToMasterId.java allows you to code your own mapping logic, the default plugin delivered with xmlBlaster is DomainToMaster.java which implements a domain attribute based approach.
Choose a node from the list of possible master nodes (load balancing)
If step 1. has found more than one master nodes, a load balancing algorithm needs to choose one of those.
The plugin interface I_LoadBalancer.java allows you to code your own load balancing logic, the default plugin delivered with xmlBlaster is RoundRobin.java which implements a round robin approach.
Map a cluster node ID to the physical xmlBlaster instance
Now we need to find out how to reach the physical xmlBlaster instance. We have its cluster node ID and need to get the CORBA IOR, XmlRpc URL, RMI registry entry or the socket to be able to communicate.

The cluster specific features are

Multi xmlBlaster instances
If multiple xmlBlaster servers are running, every message is assigned to exactly one master server (if no load balancing is switched on). The different xmlBlaster servers login to each other like other clients and act as slave to messages which they are not master from.
Connections between xmlBlaster instances
This login is done in 'lazy' mode. As soon as a client requests a message for which the local server is not master, this server does a login to the master and subscribes the message from there. The message is than cached locally and further local requests from clients are handled locally. The cache is always up to date, as the slave has subscribed that info from the master. An adjustable expire date clears the cached message.
Routing of published messages
If a message is published from a data source to a slave server, the message is routed directly to the master server (which may be a slave as well, forwarding the message to the real master with stratum level equals zero). If the master server is currently offline, it is queued for this 'client' until it logs in.
Plugin interface for master discovery
XmlBlaster supplies a plugin interface for your own logic to identify the master cluster node id. The default implementation uses the
```
<key domain='RUGBY'>
```
attribute (see examples below).
Plugin interface for load balancing logic
XmlBlaster supplies a plugin interface for the load balancing logic. The default load balancing logic uses the round robin approach.
Further plugins may support Least Loaded Server(LLS), Threshold Loaded Sever(TLS), Least Memory Consumed Host (LMCH), see K2 Component server cluster features.
Default master
The xmlBlaster node is domain master for its directly connected clients as a default. If a client is publishing/subscribing etc. without an explicitly specified domain, the messages are handled in the local xmlBlaster instance (as if no cluster is existing). If acceptDefault='false' is set another node is searched which accepts default messages. A node can set acceptOtherDefault="true" to accept messages with default domain from other nodes.
Master unknown
If a node receives a message and can't find the master, the message is accepted and queued for the master. If later a master appears the messages are flushed.
Forward a client
If an xmlBlaster node is stopped, it has the ability to inform its clients and pass them a forward address of another xmlBlaster node which is willing to take care of them.
PtP messages
To allow global delivery we introduce a unique naming schema, it is based on the URL naming synstax and allows addressing any client at any node. If you destination address has an absolute name like '/node/heron/client/joe' the local node and all direct neighbors are checked and if 'heron' is found the message is directly delivered. Otherwise the same routing rule as for Publish/Subscribe apply. For example a relative destination address 'client/joe/1' is routed similar to Publish/Subscribe by looking at the key-domain or other key attributes. If no routing matches the local node is chosen and the PtP message is queued for 'client/joe/1' until this client logs in.
Multiple masters for same domain
If multiple nodes acquire the master mode for a certain domain there are two approaches.
Messages which are published are sent to all masters.
Messages which are accessed with get() or subscribe() are handled by the load balancing algorithm. The default implementation is a round robin.
Behavior of the different xmlBlaster methods
XmlBlaster supports only a small number of methods to invoke. In the context of clustering they can be categorized into these groups:
1. Methods with local scope:
  These are connect(), disconnect() and ping(). The have only a local scope between a client and its direct xmlBlaster node.
2. Write access:
  These are publish(), publishOneway() and erase(). Such invocations are passed to the master node. The new state than cascades the usual way to the connected slaves.
3. Read access:
  These are get() and subscribe() and unSubscribe(). Such invocations are usually handled by the local xmlBlaster node, which itself forwards appropriate requests to the master.
4. Callbacks:
  These are update(), updateOneway() and ping(). The behavior in a cluster is not specified yet.
Connection states between nodes
The connection of an xmlBlaster node to another is categorized into three states:
1. logged in: If the connection is up
2. polling: If we have the address of the other node and are polling for it
3. not allowed: We know a node but are not allowed to use it

Please visit xmlBlaster/demo/javaclients/cluster for demos.

Here are xmlBlaster internal messages which support clustering:

1 Mapping of a cluster node ID to a physical xmlBlaster instance Comments

key
<key oid='__sys__cluster.node.info[heron]'> <__sys__cluster.node.info/> </key>
The connect tag contains a ConnectQos markup as described in the interface.connect requirement

content
<clusternode id='heron' maxConnections='800'> <connect> <qos> <address type='IOR'>IOR:00044550005...</address> <address type='XMLRPC' maxConnections='20'> http://www.mars.edu/RPC2 </address> <callback type='XMLRPC'>http://www.mars.universe:8081/RPC2</callback> <backupnode> <clusternode id='avalon'/>  <clusternode id='golan'/>  </backupnode> <nameservice>true</nameservice> </qos> <connect> <disconnect/> </clusternode>
The backupnode setting is currently not implemented. The disconnect markup can be used to force a disconnect on cluster node shutdown, usually you won't set this to keep the connection alive in the remote server (to be able to collect messages during our shutdown).

1	Mapping of a cluster node ID to a physical xmlBlaster instance	Comments
key	<key oid='__sys__cluster.node.info[heron]'> <__sys__cluster.node.info/> </key>	The connect tag contains a ConnectQos markup as described in the interface.connect requirement
content	<clusternode id='heron' maxConnections='800'> <connect> <qos> <address type='IOR'>IOR:00044550005...</address> <address type='XMLRPC' maxConnections='20'> http://www.mars.edu/RPC2 </address> <callback type='XMLRPC'>http://www.mars.universe:8081/RPC2</callback> <backupnode> <clusternode id='avalon'/> <!-- first failover node --> <clusternode id='golan'/> <!-- second backup node --> </backupnode> <nameservice>true</nameservice> </qos> <connect> <disconnect/> </clusternode>	The backupnode setting is currently not implemented. The disconnect markup can be used to force a disconnect on cluster node shutdown, usually you won't set this to keep the connection alive in the remote server (to be able to collect messages during our shutdown).

2 Determine the master: Mapping of messages to cluster node IDs
See NodeDomainInfo.java and plugin DomainToMaster.java Comments

2a) key
<key oid='__sys__cluster.node.master[heron]'> <__sys__cluster.node.master/> </key>
-

content
// This is a master for domainless messages and // for football and rugby <clusternode id='heron'> <master stratum='0' acceptOtherDefault='true'> <key queryType='DOMAIN' domain='football'/> <key queryType='DOMAIN' domain='rugby'/> </master> </clusternode>
This cluster node is the master of the domain 'football' and 'rugby'. Messages without a domain specified are treated locally as well.

2b) key
<key oid='__sys__cluster.node.master[frodo]'> <__sys__cluster.node.master/> </key>
-

content
// frodo is a slave for everything <clusternode id='frodo'> <master stratum='0' acceptDefault='false'/>  ... // heron is master for everything (domain '*') cluster.node[heron]=\ <clusternode id='heron'>\ <connect><qos>\ <address type='IOR' bootstrapHostname='' bootstrapPort='7600'/>\ </qos><connect>\ <master type='DomainToMaster'>\ <key queryType='DOMAIN' domain='*'/>\ </master>\ </clusternode>
Messages without a domain specified are normally treated by its local xmlBlaster node. Here this is switched off. This cluster nodes is the master for all Pub/Sub messages because of the wildcard '*' setting

2c) key
<key oid='__sys__cluster.node.master[bilbo]'> <__sys__cluster.node.master/> </key>
-

content
// Bilbo is master of RECIPIES and local clients, // but slave for everything else <clusternode id='bilbo'> <master stratum='0'> <key queryType='DOMAIN' domain=''/> <key queryType='DOMAIN' domain='RECIPIES'/> </master> // refid points to a node one stratum closer to master <master stratum='2' refid='frodo' /> </master> </clusternode>
Bilbo is slave of a slave for heron messages. Therefore it is stratum = 2 (two steps from the master). It only knows frodo, its direct parent node.

2d) key
<key oid='__sys__cluster.node.master[heron]'> <__sys__cluster.node.master/> </key>
-

content
// The master is determined in a generic way // (no explicit domain) <clusternode id='heron'> <master> <key queryType='EXACT' oid='radar.track'/> <key queryType='XPATH'> //STOCK_EXCHANGE </key> <filter type='ContentLength'>  8000  </filter> </master> </clusternode>
Approach without domains. Every message is filtered with the given rules. If one of the rules matches, we are the master of this message

2e) key
<key oid='__sys__cluster.node.master[heron]'> <__sys__cluster.node.master/> </key>
-

content
// The master is determined with a customer plugin // (no explicit domain) <clusternode id='heron'> <master> Java plugin (implements I_MapMsgToMasterId) </master> </clusternode>
Approach without domains. Every message is filtered by a user supplied plugin. The plugin looks into the message key or content or qos and decides who is the master.

2	Determine the master: Mapping of messages to cluster node IDs See NodeDomainInfo.java and plugin DomainToMaster.java	Comments
2a) key	<key oid='__sys__cluster.node.master[heron]'> <__sys__cluster.node.master/> </key>	-
content	// This is a master for domainless messages and // for football and rugby <clusternode id='heron'> <master stratum='0' acceptOtherDefault='true'> <key queryType='DOMAIN' domain='football'/> <key queryType='DOMAIN' domain='rugby'/> </master> </clusternode>	This cluster node is the master of the domain 'football' and 'rugby'. Messages without a domain specified are treated locally as well.
2b) key	<key oid='__sys__cluster.node.master[frodo]'> <__sys__cluster.node.master/> </key>	-
content	// frodo is a slave for everything <clusternode id='frodo'> <master stratum='0' acceptDefault='false'/> <!-- forward empty domains --> ... // heron is master for everything (domain '') cluster.node[heron]=\ <clusternode id='heron'>\ <connect><qos>\ <address type='IOR' bootstrapHostname='' bootstrapPort='7600'/>\ </qos><connect>\ <master type='DomainToMaster'>\ <key queryType='DOMAIN' domain=''/>\ </master>\ </clusternode>	Messages without a domain specified are normally treated by its local xmlBlaster node. Here this is switched off. This cluster nodes is the master for all Pub/Sub messages because of the wildcard '*' setting
2c) key	<key oid='__sys__cluster.node.master[bilbo]'> <__sys__cluster.node.master/> </key>	-
content	// Bilbo is master of RECIPIES and local clients, // but slave for everything else <clusternode id='bilbo'> <master stratum='0'> <key queryType='DOMAIN' domain=''/> <key queryType='DOMAIN' domain='RECIPIES'/> </master> // refid points to a node one stratum closer to master <master stratum='2' refid='frodo' /> </master> </clusternode>	Bilbo is slave of a slave for heron messages. Therefore it is stratum = 2 (two steps from the master). It only knows frodo, its direct parent node.
2d) key	<key oid='__sys__cluster.node.master[heron]'> <__sys__cluster.node.master/> </key>	-
content	// The master is determined in a generic way // (no explicit domain) <clusternode id='heron'> <master> <key queryType='EXACT' oid='radar.track'/> <key queryType='XPATH'> //STOCK_EXCHANGE </key> <filter type='ContentLength'> <!-- Use your I_AccessFilter plugin --> 8000 <!-- Msg contents smaller 8000 bytes only --> </filter> </master> </clusternode>	Approach without domains. Every message is filtered with the given rules. If one of the rules matches, we are the master of this message
2e) key	<key oid='__sys__cluster.node.master[heron]'> <__sys__cluster.node.master/> </key>	-
content	// The master is determined with a customer plugin // (no explicit domain) <clusternode id='heron'> <master> Java plugin (implements I_MapMsgToMasterId) </master> </clusternode>	Approach without domains. Every message is filtered by a user supplied plugin. The plugin looks into the message key or content or qos and decides who is the master.

A message can specify its domain as a key attribute:

      <key oid='football.49.results' domain='football'/>

3 The current status of a cluster node

key
<key oid='__sys__cluster.node.state[heron]'> <__sys__cluster.node.state/> </key>

content
<clusternode id='heron'> <state> <cpu id='0' idle='40'/>  <cpu id='1' idle='44'/> <ram free='12000'/>  <performance bogomips='1205.86' idleIndex='20'/> </state> </clusternode>

We need to investigate how other clusters communicate their current load in a standardized way.

3	The current status of a cluster node
key	<key oid='__sys__cluster.node.state[heron]'> <__sys__cluster.node.state/> </key>
content	<clusternode id='heron'> <state> <cpu id='0' idle='40'/> <!-- currently 60% load on first CPU --> <cpu id='1' idle='44'/> <ram free='12000'/> <!-- xmlBlaster server has 12 MB free memory --> <performance bogomips='1205.86' idleIndex='20'/> </state> </clusternode>

- Quality of Service (QoS) of a published message traversing a cluster Comments

qos
<qos> <sender>joe</sender> <route> <node id='bilbo' stratum='2' timestamp='34460239640'/> <node id='frodo' stratum='1' timestamp='34460239661'/> <node id='heron' stratum='0' timestamp='34460239590'/> </route> </qos>
A message published to bilbo found its way over frodo to the master heron.

-	Quality of Service (QoS) of a published message traversing a cluster	Comments
qos	<qos> <sender>joe</sender> <route> <node id='bilbo' stratum='2' timestamp='34460239640'/> <node id='frodo' stratum='1' timestamp='34460239661'/> <node id='heron' stratum='0' timestamp='34460239590'/> </route> </qos>	A message published to bilbo found its way over frodo to the master heron.

This shows more complete the syntax of the configuration possibilities:

<clusternode id='heron.mycomp.com'>
   <connect><qos>
      <address type='IOR'>
         IOR:09456087000
      </address>
      <address type='XMLRPC'>
         http://www.mycomp.com/XMLRPC/
      </address>
      <callback type='RMI'>
         rmi://mycomp.com
      </callback>
   </qos><connect>
   
   <master type='DomainToMaster' version='0.9'>
      <![CDATA[
         <key domain='RUGBY'/>
         <key type='XPATH'>//STOCK</key>
      ]]>
   </master>
   <master stratum='1' refId='frodo' type='MyOwnMapperPlugin' version='2.0'>
      <![CDATA[My own rule]]>
   </master>

   <state>
      <cpu id='0' idle='40'/>
      <cpu id='1' idle='44'/>
      <ram free='12000'/>
   </state>
</clusternode>

The return QoS value of a published message is if everything is OK as usual

<qos><state id='OK'/></qos>

If the message can't be forwarded to the master node, it is tailed back by your local xmlBlaster node and flushed on reconnect to the master. The publish return QoS indicates the situation with a "FORWARD_WARNING" response:

<qos><state id='FORWARD_WARNING'/></qos>

These parameters allow to configure the cluster behavior.

The cluster manager is activated in the xmlBlasterPlugins.xml file, take care to have activated the protocol plugins you want to use for inter-cluster communication in an earlier run-level.

   <plugin id='cluster' className='org.xmlBlaster.engine.cluster.ClusterManager'>
      <action do='LOAD' onStartupRunlevel='5' sequence='5' />
      <action do='STOP' onShutdownRunlevel='4' sequence='5'/>   
   </plugin>

They can be set on command line, in the xmlBlaster.properties file or dynamically via messages.

Property	Default / Example	Description
cluster.node.id	167.92.1.4:7607 or heron.mycomp.com	The world wide unique name of this xmlBlaster instance (= cluster node id), if not specified defaults to the unique listen address of one of your activated protocol drivers. If you specify the name yourself, you should use a unique name like heron.mycompany.com
cluster.loadBalancer.type	RoundRobin	Specifies which load balance plugin to use (see xmlBlaster.properties)
cluster.loadBalancer.version	1.0	The plugin version to use
cluster.node.info[heron]	<clusternode id='heron'> <connect><qos> <address type='SOCKET'> 192.168.1.2:7607 </address> </qos><connect> </clusternode>	Configures how to access heron, replace the node name in the brackets with your specific xmlBlaster node. NOTE: This setting can be overwritten by __sys__cluster.node.info[heron] messages.
cluster.node.master[heron]	<clusternode id='heron'> <master type='DomainToMaster'> <![CDATA[ <key domain='RUGBY_NEWS'/> ]]> </master> </clusternode>	Configures for which message types heron is the master node. NOTE: This setting can be overwritten by __sys__cluster.node.master[heron] messages.
cluster.node[heron]	<clusternode id='heron'> <connect><qos> <address type='SOCKET'> 192.168.1.2:7607 </address> </qos><connect> <master type='DomainToMaster'> <![CDATA[ <key domain='RUGBY_NEWS'/> ]]> </master> </clusternode>	The combination of cluster.node.info[...] and cluster.node.master[...] to allow a more compact configuration.
pingInterval[heron] ...	-pingInterval 2000 -pingInterval[frodo] 1000	All client connection configuration settings are adjustable. Try a `java HelloWorld3 -help` for a list of current options. Here we show as an example the ping interval, the time between the pings to another node in milliseconds. A given node pingInterval[frodo] has precedence over the pingInterval setting. This way you could tell xmlBlaster to ping its partner nodes every 2 seconds (pingInterval=2000) but to ping frodo more often (pingInterval=1000).
passwd[bilbo]	secret	Allows to set the password for cluster node `bilbo`. Bilbo uses this password when it logs in to another xmlBlaster node. You can't change the loginName of a cluster node. Every cluster node logs in to remote nodes with its cluser node id as the loginName.

cluster.dirtyRead cluster.PtP client.failsafe util.property interface.connect org.xmlBlaster.engine.cluster.ClusterManager org.xmlBlaster.engine.cluster.ClusterNode org.xmlBlaster.engine.cluster.simpledomain.DomainToMaster org.xmlBlaster.engine.cluster.simpledomain.RoundRobin ../../demo/javaclients/cluster/README ../../demo/javaclients/cluster/firewall/README xmlBlaster@marcelruff.info Heinrich.Goetzger@exploding-systems.de michele@laghi.eu 2002 05 12 $Revision: 1.48 $ - Tests a cluster setup as described in this requirement org.xmlBlaster.test.cluster.PublishTest org.xmlBlaster.test.cluster.SubscribeTest org.xmlBlaster.test.cluster.SubscribeXPathTest org.xmlBlaster.test.cluster.EraseTest org.xmlBlaster.test.cluster.DirtyReadTest org.xmlBlaster.test.cluster.PtpTest

Clustering in our sense covers the following topics:

Scalability:

Availability (Failover):

Logical separation based on message domains:

Load balancing:

Implementation overview:

Find out who is the master of a message

Choose a node from the list of possible master nodes (load balancing)

Map a cluster node ID to the physical xmlBlaster instance