We define a cluster as a configuration where more than one xmlBlaster server instance is running and those instances know of each other. The instances may run on the same host or distributed over the internet.
All clustering abilities for xmlBlaster reduce to the simple master/slave approach. This cluster approach is easy to understand as we are not leaving the MoM paradigm to support clustering.
An important part in clustering is the discovery and lookup. How to find other cluster nodes and access in depth node informations from them, and how to keep those informations up to date. This is addressed with the publish/subscribe idea as well. XmlBlaster nodes store their cluster informations in messages, so other nodes can subsrcribe to this data. If necessary one xmlBlaster is running as a 'naming service' holding the informations of all available xmlBlaster instances.
In the following examples, we use the term xmlBlaster server instance and xmlBlaster node or just node interchangeable.
In this example we have three xmlBlaster instances running, each of them has a unique cluster node ID, here the names heron, golan and avalon.
Each of the nodes have an arbitrary number of clients attached. The clients can publish or subscribe to any message in the cluster, and may send PtP messages to any other client.
It is important to understand that clustering is based on topics. The above picture shows the physical connection of the cluster nodes. But in any node there may be some topics defined to be the master and others to be slaves.
The example shows a tree like configuration of xmlBlaster nodes. In this way we can connect an almost unlimited number of clients. Every child leaf supplies a certain amount of slaves, which supplies other slaves which finally supply clients with message updates. The slaves are caching the messages and respond to request of their clients directly. The cache is always up to date as it is real time updated according to the publish/subscribe paradigm. With every child level in the tree the latency increases for typically 5-40 milliseconds (intranet) for new published message updates. Note that the publisher does not need to be connected to the master node, the client in the down left edge of the picture is publishing as well.
We introduce the term stratum as the distance of a node from the master. This is done in analogy to the network time protocol (NTP). A stratum of 0 is the master itself, 1 is the first slave and stratum=2 would be bilbo in the above picture.
Implementation status:
Mode | Description | Hot | Impl |
---|---|---|---|
Publish/Subscribe |
This feature is implemented for Publish/Subscribe and ready for production use. Changing the cluster configuration in hot operation is addressed by the design but final implementation and testing of this feature is missing. |
||
Point to point (PtP) | PtP routing in cluster environment is ready available. If you destination address has an absolute name like '/node/heron/client/joe' the local node and all direct neighbors are checked and the message is directly delivered. Otherwise the same routing rule as for Publish/Subscribe apply. |
Autonomous failure recovery without distinct cluster manager (no single point of failure).
We have three different failure situations to cover:
Implementation status:
Mode | Description | Hot | Impl |
---|---|---|---|
Publish/Subscribe | - | ||
PtP | - |
As we can see the node heron is master of messages of the domain "RUGBY_NEWS" but caches "STOCK_EXCHANGE" as well.
Implementation status:
Mode | Description | Hot | Impl |
---|---|---|---|
Publish/Subscribe |
This feature is implemented for Publish/Subscribe and ready for production use.
Note that erase() calls to the slaves need to have the domain set in the XmlKey (similar to the publishes)
to be forwarded to the master. erase() calls to the master are automatically propagated
to all slaves, even with a missing domain setting.
|
||
PtP | This feature is implemented for PtP and ready for production use. |
In the above scenario heron1 and heron2 share their knowledge. Slave nodes can choose which of those servers to use.
Implementation status:
Mode | Description | Hot | Impl |
---|---|---|---|
Publish/Subscribe | Mirroring of messges is possible in master/slave operation, mirroring of session stateful information is currently not implemented. | ||
PtP | Mirroring of PtP messages is currently not supported as user session mirroring is not available. |
We have to code and manage three logical mapping functionalities:
<key domain='STOCK_EXCHANGE'/>The domain based approach maps domain names to cluster node IDs. Please see the examples below.
The plugin interface I_MapMsgToMasterId.java allows you to code your own mapping logic, the default plugin delivered with xmlBlaster is DomainToMaster.java which implements a domain attribute based approach.
The plugin interface I_LoadBalancer.java allows you to code your own load balancing logic, the default plugin delivered with xmlBlaster is RoundRobin.java which implements a round robin approach.
The cluster specific features are
<key domain='RUGBY'>attribute (see examples below).
Please visit xmlBlaster/demo/javaclients/cluster for demos.
1 | Mapping of a cluster node ID to a physical xmlBlaster instance | Comments |
---|---|---|
key | <key oid='__sys__cluster.node.info[heron]'> <__sys__cluster.node.info/> </key> |
The connect tag contains a ConnectQos markup as described in the interface.connect requirement |
content | <clusternode id='heron' maxConnections='800'> <connect> <qos> <address type='IOR'>IOR:00044550005...</address> <address type='XMLRPC' maxConnections='20'> http://www.mars.edu/RPC2 </address> <callback type='XMLRPC'>http://www.mars.universe:8081/RPC2</callback> <backupnode> <clusternode id='avalon'/> <!-- first failover node --> <clusternode id='golan'/> <!-- second backup node --> </backupnode> <nameservice>true</nameservice> </qos> <connect> <disconnect/> </clusternode> |
The backupnode setting is currently not implemented. The disconnect markup can be used to force a disconnect on cluster node shutdown, usually you won't set this to keep the connection alive in the remote server (to be able to collect messages during our shutdown). |
2 | Determine the master: Mapping of messages to cluster node IDs See NodeDomainInfo.java and plugin DomainToMaster.java | Comments |
---|---|---|
2a) key | <key oid='__sys__cluster.node.master[heron]'> <__sys__cluster.node.master/> </key> |
- |
content | // This is a master for domainless messages and // for football and rugby <clusternode id='heron'> <master stratum='0' acceptOtherDefault='true'> <key queryType='DOMAIN' domain='football'/> <key queryType='DOMAIN' domain='rugby'/> </master> </clusternode> |
This cluster node is the master of the domain 'football' and 'rugby'. Messages without a domain specified are treated locally as well. |
2b) key | <key oid='__sys__cluster.node.master[frodo]'> <__sys__cluster.node.master/> </key> |
- |
content | // frodo is a slave for everything <clusternode id='frodo'> <master stratum='0' acceptDefault='false'/> <!-- forward empty domains --> ... // heron is master for everything (domain '*') cluster.node[heron]=\ <clusternode id='heron'>\ <connect><qos>\ <address type='IOR' bootstrapHostname='' bootstrapPort='7600'/>\ </qos><connect>\ <master type='DomainToMaster'>\ <key queryType='DOMAIN' domain='*'/>\ </master>\ </clusternode> |
Messages without a domain specified are normally treated by its local xmlBlaster node. Here this is switched off. This cluster nodes is the master for all Pub/Sub messages because of the wildcard '*' setting |
2c) key | <key oid='__sys__cluster.node.master[bilbo]'> <__sys__cluster.node.master/> </key> |
- |
content | // Bilbo is master of RECIPIES and local clients, // but slave for everything else <clusternode id='bilbo'> <master stratum='0'> <key queryType='DOMAIN' domain=''/> <key queryType='DOMAIN' domain='RECIPIES'/> </master> // refid points to a node one stratum closer to master <master stratum='2' refid='frodo' /> </master> </clusternode> |
Bilbo is slave of a slave for heron messages. Therefore it is stratum = 2 (two steps from the master). It only knows frodo, its direct parent node. |
2d) key | <key oid='__sys__cluster.node.master[heron]'> <__sys__cluster.node.master/> </key> |
- |
content | // The master is determined in a generic way // (no explicit domain) <clusternode id='heron'> <master> <key queryType='EXACT' oid='radar.track'/> <key queryType='XPATH'> //STOCK_EXCHANGE </key> <filter type='ContentLength'> <!-- Use your I_AccessFilter plugin --> 8000 <!-- Msg contents smaller 8000 bytes only --> </filter> </master> </clusternode> |
Approach without domains. Every message is filtered with the given rules. If one of the rules matches, we are the master of this message |
2e) key | <key oid='__sys__cluster.node.master[heron]'> <__sys__cluster.node.master/> </key> |
- |
content | // The master is determined with a customer plugin // (no explicit domain) <clusternode id='heron'> <master> Java plugin (implements I_MapMsgToMasterId) </master> </clusternode> |
Approach without domains. Every message is filtered by a user supplied plugin. The plugin looks into the message key or content or qos and decides who is the master. |
A message can specify its domain as a key attribute:
<key oid='football.49.results' domain='football'/>
3 | The current status of a cluster node |
---|---|
key | <key oid='__sys__cluster.node.state[heron]'> <__sys__cluster.node.state/> </key> |
content | <clusternode id='heron'> <state> <cpu id='0' idle='40'/> <!-- currently 60% load on first CPU --> <cpu id='1' idle='44'/> <ram free='12000'/> <!-- xmlBlaster server has 12 MB free memory --> <performance bogomips='1205.86' idleIndex='20'/> </state> </clusternode> |
- | Quality of Service (QoS) of a published message traversing a cluster | Comments |
---|---|---|
qos | <qos> <sender>joe</sender> <route> <node id='bilbo' stratum='2' timestamp='34460239640'/> <node id='frodo' stratum='1' timestamp='34460239661'/> <node id='heron' stratum='0' timestamp='34460239590'/> </route> </qos> |
A message published to bilbo found its way over frodo to the master heron. |
This shows more complete the syntax of the configuration possibilities:
<clusternode id='heron.mycomp.com'> <connect><qos> <address type='IOR'> IOR:09456087000 </address> <address type='XMLRPC'> http://www.mycomp.com/XMLRPC/ </address> <callback type='RMI'> rmi://mycomp.com </callback> </qos><connect> <master type='DomainToMaster' version='0.9'> <![CDATA[ <key domain='RUGBY'/> <key type='XPATH'>//STOCK</key> ]]> </master> <master stratum='1' refId='frodo' type='MyOwnMapperPlugin' version='2.0'> <![CDATA[My own rule]]> </master> <state> <cpu id='0' idle='40'/> <cpu id='1' idle='44'/> <ram free='12000'/> </state> </clusternode>The return QoS value of a published message is if everything is OK as usual
<qos><state id='OK'/></qos>If the message can't be forwarded to the master node, it is tailed back by your local xmlBlaster node and flushed on reconnect to the master. The publish return QoS indicates the situation with a "FORWARD_WARNING" response:
<qos><state id='FORWARD_WARNING'/></qos>
These parameters allow to configure the cluster behavior.
The cluster manager is activated in the xmlBlasterPlugins.xml
file,
take care to have activated the protocol plugins you want to use
for inter-cluster communication in an earlier run-level.
<plugin id='cluster' className='org.xmlBlaster.engine.cluster.ClusterManager'> <action do='LOAD' onStartupRunlevel='5' sequence='5' /> <action do='STOP' onShutdownRunlevel='4' sequence='5'/> </plugin>
They can be set on command line, in the xmlBlaster.properties file or dynamically via messages.
Property | Default / Example | Description | Implemented |
---|---|---|---|
cluster.node.id | 167.92.1.4:7607 or heron.mycomp.com |
The world wide unique name of this xmlBlaster instance (= cluster node id), if not specified defaults to the unique listen address of one of your activated protocol drivers. If you specify the name yourself, you should use a unique name like heron.mycompany.com | |
cluster.loadBalancer.type | RoundRobin | Specifies which load balance plugin to use (see xmlBlaster.properties) | |
cluster.loadBalancer.version | 1.0 | The plugin version to use | |
cluster.node.info[heron] | <clusternode id='heron'> <connect><qos> <address type='SOCKET'> 192.168.1.2:7607 </address> </qos><connect> </clusternode> |
Configures how to access heron,
replace the node name in the brackets with your specific xmlBlaster node. NOTE: This setting can be overwritten by __sys__cluster.node.info[heron] messages. |
|
cluster.node.master[heron] | <clusternode id='heron'> <master type='DomainToMaster'> <![CDATA[ <key domain='RUGBY_NEWS'/> ]]> </master> </clusternode> |
Configures for which message types heron is the master node. NOTE: This setting can be overwritten by __sys__cluster.node.master[heron] messages. |
|
cluster.node[heron] | <clusternode id='heron'> <connect><qos> <address type='SOCKET'> 192.168.1.2:7607 </address> </qos><connect> <master type='DomainToMaster'> <![CDATA[ <key domain='RUGBY_NEWS'/> ]]> </master> </clusternode> |
The combination of cluster.node.info[...] and cluster.node.master[...] to allow a more compact configuration. | |
pingInterval[heron] ... |
-pingInterval 2000 -pingInterval[frodo] 1000 |
All client connection configuration settings are adjustable. Try a java HelloWorld3 -help for a list of current options. Here we show as an example the ping interval, the time between the pings to another node in milliseconds. A given node pingInterval[frodo] has precedence over the pingInterval setting. This way you could tell xmlBlaster to ping its partner nodes every 2 seconds (pingInterval=2000) but to ping frodo more often (pingInterval=1000). |
|
passwd[bilbo] | secret |
Allows to set the password for cluster node bilbo . Bilbo uses
this password when it logs in to another xmlBlaster node.You can't change the loginName of a cluster node. Every cluster node logs in to remote nodes with its cluser node id as the loginName. |