From http://www.theserverside.com/home/thread.jsp?thread_id=18314 (thanks to edesouza@jamcracker.com) Read http://www.research.ibm.com/AEM/d-spheres.html http://otn.oracle.com/tech/java/architect/distributed_transactions.html INTERVIEW: 1. Tell us about yourself. - My name is Larry Jacobs. I'm the Director of Development of Oracle's Web Cache. Prior to that I worked on transaction processing for nearly a decade. I was working on two-phase commit protocols used to ensure that distributed transactions reach consistent outcomes as they're distributed across many systems or across large areas. After that I came to Oracle to work on transaction models for Oracle's application server and then I noticed that Web sites were being built with disproportionately large data centers; huge, big, iron sites with large pipes coming in order to support the users and I thought their must be some better way to deliver this content without requiring all this amount of hardware so I started the Web Cache group and that's what I run today. 2. Many members of TheServerSide.com have heterogeneous applications, and we've heard of Two Phase Commit (2PC) and transactional messaging as potential solutions for interoperability between hetergeneous applications that are transactional in nature. What is 2PC? How does that help interoperability? - Ok good question. So two-phase commit is a protocol where you can have a transaction co-ordinator control multiple resources; resources are like databases, and be able to make sure that all the databases and all the resources that participate in this transaction will either commit together or they're abort together. That is, the work, the transaction that is distributed, is performed as an atom. So the co-ordinator is responsible for making sure that all the databases that participate are prepared to either commit or abort when instructed to do so. So two-phase commit, as its name implies, has two phases. The first phase is where the co-ordinator asks all the participants to prepapre, meaning it must log in some reliable, persistent storage, the effect of the transaction. That is, here is the old value and here is the new value. And it must record that and be able to survive almost any failure: a disk failure, a system crash, a network partition. And when the co-ordinator gives it the outcome that it, say, committed, then it must make the changes permanent and if it gives it the outcome that it should abort then it has to go through and undo all the changes and make sure all of the resources are undone. So the problem its addressing is if I have a some transaction that involves updating the database there and updating a database here, if at the end of the day, all the transactions didn't both commit on both sides or abort on both sides you'll wind up destroying data or creating erroneous data. For example, if it was a bank transaction and you said remove a hundred dollars from the New York bank and add a hundred dollars to the San Francisco bank, and the San Francisco bank transaction committed and the New York one aborted, you've just created a hundred dollars that didn't exist before. So two-phase commit is a mechanism that's been used in distributed transaction processing, and TP monitors, the predecessor to application servers for decades. 3. What is transactional messaging? How does it help interoperability? - So transactional messaging is different from two-phase commit in that, in two-phase commit, all of the participants are active in this one transaction; all the databases and all the application servers involved are all actively working on that transaction at the same time. Transactional messaging allows us to decouple remote systems so that you now have a collection of local transactions tied together with messages. So in the case where we're doing the funds transfer example from New York to San Francisco, you could remove the hundred dollars from the New York account and put it on a queue as one transaction. And a separate transaction, a transaction messaging system will deliver that message from New York to San Francisco. And then in a third and independent transaction, you take the message off of the queue and add it to the account in San Francisco. And the nice thing about this is the transactions run independently. When you have any outage along the way, if you have a network partition or the service goes down, the other systems can operate and independently continue to function. 4. When would you use each one: 2PC commit transactions vs transactional messaging? - Good question. I believe that after having spent 10 years working on the two-phase commit protocol, I think transactional messaging is a much more flexible way of implementing distributed transactions when you have to have updates on different systems. The reason why is two-phase commit is all about dealing with failures: 'what happens if the server goes down. How do I make sure I get a consistent outcome.' But the way its designed, it's synchronous. If you take anything out, you lose availability of your system. You get global consistency but the cost is high availability. Transactional messaging gives you the ability to operate in failures: if one part fails the other can continue operating. So I almost always recommend running transactional messaging because it gives you much better performance of the system. Everybody can continue operating independently. Losing one system would bring out the whole system. Imagine every system had a reliability of 9/10ths of the time it was up and running. Every time you add a server in a distributed transaction environment, you're reducing the availability of the system as a whole because the availability of the whole is a product of all of the servers. So if you had 10 servers, it's 0.9 to the 10th and you can see it drastically reduces the availability of a system. Transactional messaging, everything is independent; it works much better. When do you have to use two-phase commit? If you're stuck, you have interoperability, you have this old system, this legacy system speaking some old protocol like LU 6.2 and that's the only thing it can talk and you need to get some data in and out of that system and another system and you want to keep them consistent, and you have no choice, you should use two-phase commit co-ordination and almost all app servers today support two-phase commit and databases support two-phase commit for that reason. 5. We've heard rumors that because of the overhead involved with messaging, in that you have message brokers, it can actually be slower than doing simple ACID transactions. What do you think about that? - Yeah good question. Is it faster to have to actually take the data and put it on a message queue and then take the data from the message queue and push it to another message queue and then finally take the data from the other message queue and apply it to the database? That seems to be more expensive computationally than just taking the data from one database and sticking it in the other database. The answer comes from all of the work to implement the two-phase commit. If you're doing local transactions, you just tell the local resource manager, let's say the database, to just do the work. When you have distributed transactions, you go through this dance, this two-phase commit dance. You tell everybody to prepare; they have to log these records to their persistence store, the final system, usually a RAID final system because you don't want one disk failure to affect the availability of your data. And you have to wait for that log flush to commit and then you send back the response to prepare to the co-ordinator and then the co-ordinator has to log a message and wait for it to come back and then the co-ordinator can tell the participants of the outcome. And so at the end of the day it actually takes more cycles to do the distributed transaction then just to do the queuing because the queing can be very efficient. 6. Do you have any recommended best practices developers should consider when doing messaging? - When messaging I think it's important to minimize the distribution. You want to keep all of your data on the fewest number of resource managers as possible. For example, you could put a message broker, in the local locations, let's say in the New York and San Francisco offices, as I point out in a banking transaction, but you have eliminated the two-phase commit from spanning the country and subjecting the transaction to availability loss if you have some outage between New York and San Francisco. But if you put a message broker next to the New York database and a broker next to the database in San Francisco, you're still doing a two-phase commit between the database and the local message broker in the local office, and a database and a message broker in the remote office. Ideally, if you can get your queuing in the same resource manager as your data, that is, use queuing that's provided by the database with your schema, your tables, in the same database, now the transactions become truly local. You don't have to do the two-phase commit between them. The benefits you get from that is you get true point-in-time recovery. If you have some failure in your application and you must restore the system to a precise state, at let's say 8 a.m., it's one resource manager to roll the logs back and bring it right back to 8 a.m. If you had a message queue on a database it's nearly impossible to bring them back to the same time. So you get great point-in-time recovery. You get great performance because you're not doing any of the two-phase commit dance between the servers and you get the independence of the systems by using message brokers for distributing your transactions, your work across the servers. 7. How does Oracle address this messaging challenge? - We do this by using advanced queuing in the Oracle database. So our Java messaging service can use the advanced queue as the repository. So all the data that gets queued and gets shipped around is still stored in the Oracle database. And so when we commit the transaction, it's committing just within the Oracle database and you get the ability of distributing your work but all the transactions commit locally. 8. If your background is TX why did you go into caching? - Good question. Because caching is at the opposite end of the technology of two-phase commit. Two-phase commit is all about spending a tremendous amount of time making sure this one little thing comes through and caching is all about 'hey, if you can cache this great and if you can't, so be it.' There's no recovery; it's a much simpler infrastructure. Why did I look at doing this? Because I saw so many people building these large data centers to supply simple content. They would wind up having huge infrastructure to be able to, on the fly, when a request comes in, run some application logic in a J2EE container, open up database connections, do queries against a large database to present the content. And I saw that so much of this was repetitive and I thought if we could just cache it in memory, we should be able to deliver it much faster. When we measure the Oracle Web Cache, we're an in-memory cache, we measure our throughput by saying how many cycles it takes to deliver a page. It's only a few hundred microseconds to actually parse an HTTP request, do a lookup in memory and ship the response out. And if you have to go into the JVM and instantiate a bunch of classes and open up database connections, you're looking at 2, 3, or 4 orders of magnitude greater cost than just shipping the page out of the cache. The challenge is knowing what pages to cache, and how to cache it, how you manage the coherence of the cache. 9. Caching sounds great, but a lot of developers and IT shops would say 'it's much cheaper and faster to throw more hardware at the problem and not use caching at all'. Why spend the extra time and effort and money on programmers' salaries and so forth when I can spend just a little bit more money on the hardware and solve the problem that way. What do you think about that? - Good question. With caching we're out to solve two problems. One is latency and one is scalability. So first with scalability, if you want to double the amount of users you could double the amount of hardware but that adds a lot of complexity to the infrastructure and as we know, if you have a thousand servers, there's always going to be one, or maybe two or three servers down at any time. If you had one server, it's most likey going to be operating. And so the complexity of operating your system is compounded by the number of servers. So managing twice the number of servers is more than twice as difficult because of the interdependencies of the different servers..." Ed Roman: "Isn't there also an argument that says the more servers you have, the more reliable your system is because there's back up for all your machines." "...Yes and no. It can be if your system is architected so that they're all independent and they can always run without the others around, then in some sense you do get improved availability but if there isn't any dependency on them the availability is actually reduced because if there is interdependency, the availability of the system as a whole is a product of the availability of any of the systems. So every time you add a system, it brings the availability down of the system as a whole. And even if you think your mid-tier servers are independent, there's still some management overhead, management infrastructure; there's metadata you keep on all the servers, application logic that you keep on the servers, and there's data that's cached on the servers. And all those things get more complicated if you have servers that are coming up and down all the time. So it isn't cheaper to just throw more hardware at it." 10. Can every IT shop benefit from caching? What about those IT shops that have a lof of dynamic content? How do you cache dynamic content? - I see there are three different kinds of dynamic content. Dynamic might mean that it's truly different for each request like getting a stock quote or looking at your account balance. It may be dynamic in the sense of a portal in that every version of each page for the user is distinct but the content itself is shared. Or it might be dynamic in the sense that I deliver a different version for each user who comes in, maybe a different price for a product depending on what category of customer you are. So in the first case where it's dynamic for each request, we provide a big benefit by caching the boilerplate on the page closer to the user. And if you look at a traditional html page, it's 20k to 60k of html; the actual content that a human reads is actually only a few 100 bytes. So 99% of the html is really boiler plate. When we can separate that boiler plate from the dynamic content, the request from the client to the server becomes a lot smaller and if we can get that boiler plate out on the user they get much faster response time. And now we're getting back towards the half second response time we want. In the case where it's dynamically assembled content, like a portal, we've introduced ESI, Edge Side Includes, which we've brought to the W3C with Accomi, as a standard for doing page assembly. And that's a way of demarcating pages so we can templates separate from the actual content. And portals are great use of that. Our own portal uses ESI for doing page assembly and allows them to separate the templates of users from the fragments. And by caching those on the Edge, we're able to do the assembly for each user independently. You request your page, we have your template cached. We see which fragments you want; we have those cached. We merely assemble those and ship them out. It's not expensive to do in the cache. You're just doing scatter-gather Unix writes on the network interface, the network I/O. So it's a very efficient way to get the data out to the user. And we don't have to go back to the server. 11. What are some best practices you'd recommend to developers in the area of caching? - If there's any message I could convey to developers, it would be to think about your performance when you're designing the application. Many people think the objective is to just get something to work and you leave to operational people to solve the scalability problem and the performance. You let them tune the databases, let them figure out how many servers it takes to run the site, or where they deploy the systems. But that's not really always the case. And we're buidling larger and more complicated systems today than we've done ten years ago. We're now introducing MVCs with Struts and Faces and this has more abstraction, more ways of allowing different people from choosing how to separate the layout from the business logic. And we have Enterprise Javabeans as another abstraction. We have lot's of things that make it easier to build the applications, but they all introduce overhead. And anything we can do up front to think about what can be cached and separating the cacheable content from the dynamic content will provide not only a faster application but a more scalable application. It could mean supporting an order of magnitude more users with just a little bit of hardware. For example, with Oracle Web Cache, we can deliver a page in a few hundred microseconds. You're talking about supporting thousands of concurrent users with a single CPU. It could be a cheap, Intel commodity machine supporting thousands of users. It's of tremendous value. 12. How do I know if I need any of this caching? As an application developer, how do I know for sure if my performance is not going to be good enough for my end users and what if it is good enough? - Good question. And I would contend that most people may think that they know what their performance is but they really don't. A lot of companies for the internet are using services where they measure from let's say 35 points around the world a particular URL and tell you how long it took that URL to be deliverd to those 35 servers. That's nice but your customers don't live in those data centers where those 35 servers are running. They live wherever they happen to live. And they have to live with whatever the network bandwith is going to their house and the number of Hopps that they have. And I would contend that most sites have no idea what the users are really experiencing because they aren't there at every site. This is a new problem we introduce with the World Wide Web. We can deliver content to everyone around the world but we don't really know what they're experiencing. What we've introduced, and part of the new version of Oracle Enterprise Manager is a way of actually measuring the delivery of every page to every user. We know from the time that somebody clicks on a link on the page, we start measuring the time from that point to the time that the last image is rendered on the page. And then we take that data, and it gets sent back to the log in our application server and rolled up in a data warehouse. And we can then look at every URL and say for this URL in the past day, week, or month, what was the average delivery time, what were the 100 slowest deliveries. We can look by regions of the world, by IP address, by collection of URLs. There's so many ways of actually looking at it. And actually when we started doing that here at Oracle for our own web site, we thought we were delivering our homepage in half a second as measured by popular services that people/companies use for measuring their performance. But when we actually saw the deliveries, we saw that we actually had a large number of users in Russia that were seeing half minute loads for some of these pages. And so we were able to go back to our Russian content developers and start trimming the content for the Russian users. We can put our Web caches over in Europe to cut down on the latency of delivering the content over there. But it's not until you really understand the problem can you go about fixing it. And so that's why it's really important to really understand what are your users experiencing. And a service isn't necessarily the solution because it's only measuring A URL to A site; it's not measuring what real users are requesting and their real experience. 13. Do you have any final words for members of TheServerSide? Any final piece of advice? - Yes think about caching up front. Think about what you can get in the cache and how you're going to utilize caching whether it's caching in the Web cache; You can cache fragments in your JSPs. You can cache your relational data for some data that you access like price lists and so on that doesn't change that much. You can utilize various forms of caching within the J2EE container. Any time you can take something that was constructed once and is usually delivered repetitively, you can cache it and avoid the subsequent constructions, you have a big win in performance. We think that Web cache living out in front of the whole thing provides the greatest benefit because you eleminate the request, say, into the apache server, the request into the J2EE container, instantiating of classes, the opening of database connections, the queries on the database. Anytime you can eliminate a huge amount of execution, you bring tremendous value in terms of scalability and performance of the site, bringing user response time down and by supporting more users with less hardware. =============================================================== 2PC all over the place Message #76377 Posted By: Mike Spille on March 12, 2003 @ 07:12 PM in response to Message #76368. \Muntean\ The story was about a TX that should spawn over 10 DB servers and the fact that if one server is down the TX will abort. What is the asynchronous architecture that can model a TX that have (business requirement) to spawn over 10 DBs servers? \Muntean\ Well, if you use a messaging model that features guaranteed messaging in some form, then you can do it. At the message injection and reception ends you may still need XA, but for the distribution of messages (e.g. getting to all the reception ends) you rely on the guaranteed messaging aspect. BUT - and this is a big but - you have to have an application that can live with data sources being slightly out of sync. And you have to live with an application that can either: - Get good performance by dealing with out-of-order messages - Force message order at a significant hit to performance \Horia\ What about parallel processing? (All the examples were about one operator that moves money from NY to San Francisco) You have to do business separation if you want to maitain performance and use messaging to "distribute" transactions (How can you use messaging when 1,000 operators are moving money independently?). \Horia\ This is an excellent point, and I'm sorry I didn't point it out myself :-) Assuming you're using a system like JMS queues to get messages point-to-point to the right place, you can get parallelism/load balancing by having lots of listeners on the queue. The problem, of course, is the ordering problem I mentioned previously. In order to get good performance, a queue-based system is going to spread the queue messages over all the listeners - and this means you lose message ordering. Even if you have only one listener, in the case of a failure it's common for messages to arrive out of order. To do otherwise, once again, entails a big performance hit in normal-path message delivery. Why does this matter? Well, imagine an equity trading system that's sending out these messages: - Create trade - Cancel trade (trader screwed up!) If you have N queue listeners, it's entirely possible that the cancel will get processed first, followed by the create. So you get an error on the cancel, the create goes through, and you've got bad data in your database. There are, of course, ways around this. But they incur either alot of application complexity, or taking a big hit in performance to guarantee messaging order. \Muntean\ I think that the 2 processing models have their own benefits and should be used with care. \Muntean\ I agree. And I'll go further and say that the two models are fundamentally different in enough ways that it's going to have a big impact on your architecture. 2PC does imply a performance hit (I don't quite buy the HA hit in the article), but async messaging is _not_ a drop in replacement that fixes all of 2PC's ills without any downside. They're two very distinct ways of passing data around in a distributed system. As a complete side note - I wish the keepers of the J2EE transactional spec would go back and reconsider asynchronous XA resources. The biggest hit in J2EE XA processing is the synchronous, serial nature of the spec. In a typical system with 2 XA resources, 6 very expensive forces-to-disk happen serially - 1 for the tran manager 2PC start, 1 prepare per resource, 1 commit per resource, and 1 "done" for the tran manager. If JTA would allow asynchronous resource management, then the XA resource prepare() and commit() calls could be done in parallel (e.g. issue prepare() to all XA Resources in a transaction without waiting for a result, then then wait for replies to come back, and do the same on commit()). This can result a 20%-40% performance improvement, and it scales much better to many XAResources. For those interested, look at the original C-based X/Open spec and its async support. J2EE could benefit from such a model, particularly as 2PC transaction processing becomes more common as higher level of integration are being sought on many projects. -Mike =============================================================== I'm in principle not against (transactional) messaging, but what Larry said about 2-phase commit (2pc) and transactional messaging is not correct. When talking about 2pc, he (by accident) mentions storing logs to persistent store, a RAID system to make it more fail-safe, sending prepare and commit requests and their responses, etc. On the other side, when speaking about messaging, he just speaks about sending message. I don't think that Larry doesn't know that it is not so easy. First, to have messaging with the same level of data stability as distributed atomic transactions have, you have of course use the same means, e.g. logging any sent message, having persistent message queues, etc. There is a difference with high-level abstractions, but the low level tools for ensuring reliability remain the same. In the example Larry gives, if the bank in San Francisco doesn't know the requested account number, a message has to be sent back to New York to cancel the banking transaction. In other words, you have to provide the same "two-phase commit dance" as in true 2pc, but in terms of "messages". Second, with messaging as Larry describes, you don't have the same level of data stability at all. If you commit your local transaction at New York before the message is received in San Francisco, you have a risk of compensating your transaction in New York; compensation sometimes cannot be completed (e.g., there is no longer enough money on the bank account). If you have multiple banking transactions at the same time, it's also not so simple that you send a message from one bank to another one, there is e.g. a risk of negative account balance when executing multiple withdraw operations from the same account. So, please, do not say A without saying B, don't provide us with "simple solutions" by hiding lots of details, don't say transaction monitors need RAID systems and reliable logging while transactional messaging systems do not, etc. Otherwise your interview looks like a paid advertisement of Oracle products. Marek Prochazka ObjectWeb/JOTM INRIA Rhone-Alpes ============================================================== Some messaing performance problems stem from the J2EE specs Message #76448 Posted By: Mike Spille on March 13, 2003 @ 09:15 AM in response to Message #76440. \Purdy\ On the messaging side, the reason that so many messaging implementations are slow is because they use the database. Even MQ Series uses DB2 inside it. \Purdy\ While this is true, it's not the only slow aspect when you consider JMS implementations. For example, if you're publishing under XA then the JMS implementation must log its prepare and "finished" actions to disk in a disk-forcing manner. By disk-forcing I mean that you're not just doing I/O, but you must force the data all the way out to disk. This is typically done with a rolling-forward transaction log. On a plain old disk this forcing can take 10-20 milliseconds. Adding in miscellaneous overhead, each XAResource in the transaction is going to add 80-100 milliseconds to the transaction length at a minimum (large objects on the JMS side, and complex transactions on the RDBMS side can obviously increase this). In the end, a naive JMS implementation with a simple transaction log is going to max out around 30 transactions/second on a regular disk. You can perhaps double this with a fast RAID array. Techniques like batching disk-forces across multiple transactions can be used to amortize the disk forcing cost over multiple transactions, and boost you up to the 150-200 TPS range, at the expense of slightly lengthing the average transaction time. Even so, these numbers aren't great, and the JMS server spends most of its time waiting for disk I/Os to complete. The problem is exacerbated on the transaction manager side - it spends most of its time both waiting for disk I/O and for the XAResources to return from their prepare or commit calls. About the only thing that can be done in these cases to speed things up is to get very expensive disk arrays that force to memory cache. As more and more applications start using JMS, their JMS calls are invariably going to get tied into RDBMS work - and hence we get stuck in the 2 phase commit cycle. This normally wouldn't be so terrible, but the JTA spec is forcing us into serial, synchronous access to the XA resources by the transaction manager. The JMS acking model also slows things down significantly - it's ashame they don't allow message gap-detection as an alternative model, which is must faster (but harder to implement). To give you an example - in the work I've done on a JMS product, I can pump out 600 messages/second per JMS server on a 4-way HP-UX machine. If I eliminate the acking model, that tips over 1000 messages/second. But involve the same JMS server in XA transactions, and the messaging rate drops to about 160 messages/second. And that rate was only achieved through very aggressive optimization of the transaction logs. And meanwhile, the a single app server/tran manager physically cannot drive that rate by itself - because its spending the bulk of its time waiting on its own tran log forces or XA resource calls to return. I need 3 app server processes to drive the JMS provider adequately. So - while using a RDBMS under the covers will certainly contribute to performance problems in JMS implementations, the JMS & JTA specs force even more problems. If the spec were changed to allow alternate implementations of guaranteed messaging, and to allow for asynchrous dispatch to XAResources, then existing JMS implementations combined with a global transaction manager could easily get 3x-5x gains in their publishing rates. =============================================================== Combining 2PC and transactional messaging Message #76480 Posted By: Stefan Tai on March 13, 2003 @ 02:59 PM in response to Message #76377. You may be interested to read about some research work on "Conditional Messaging" and "Dependency-Spheres" which allows to combine standard 2PC-transactions with transactional messaging in a single atomic unit-of-work. http://www.research.ibm.com/AEM/d-spheres.html Stefan =============================================================== 1 replies in this thread Mark as Noisy | Reply Combining 2PC and transactional messaging Message #76483 Posted By: Mike Spille on March 13, 2003 @ 04:04 PM in response to Message #76480. \Tai\ You may be interested to read about some research work on "Conditional Messaging" and "Dependency-Spheres" which allows to combine standard 2PC-transactions with transactional messaging in a single atomic unit-of-work. \Tai\ In the fairness of objectivity, perhaps you should point out that you are one of the primary creators of the D-Spheres research :-) I've read about this work several times over the past few months, but unfortunately I haven't had the time to study it in-depth. From what I've read, though, I have a few issues... - Loss of isolation (the I in acid). I understand D-Spheres needs to pump messages around, and therefore a global transaction can't really be isolated. As a result, pieces of global transaction are visible before its fully commited. This can be problematic for some applications. - Atomicity. There are repeated claims in the D-Spheres papers as to it achieving atomicity, but I think its overtaxing the word. The overall result of transactions in this model appear to ultimately achieve consistency, but it's stretching things to claim that each global transaction is atomic in any meaningful way. - Compensating actions. This is related to the above - the middleware at times needs to invoke application-defined compensating actions in the event of certain failures. This really doesn't meet any useful definition of the term "atomic". On top of that nitpick, it puts a heavy burden on application developers. Compared to relying on XA/2PC automatic rollback by the XA Resources, in this model the app effectively has to direct a sort-of manual rollback operation. Overall I think there are some good ideas in this research, but I'm doubtful of this or anything like it becoming a widely-implemented standard. Beyond my questions above, the sheer complexity of the model would mitigate against it becoming a widely used model e.g. people think EJBs are complex - what are they going to think about D-Spheres and all of its subtle implications?. -Mike =============================================================== Middleware Arch Series on Distributed Transactions Message #76486 Posted By: Sudhakar Ramakrishnan on March 13, 2003 @ 04:49 PM in response to Message #76339. Thought this might interest many on this thread. Larry Jacobs contributed to week 6 of the Middleware Architecture Series on "Distributed Transactions". He discusses the the global consistency promises of two-phase commitment protocols and transactional messaging systems and how architects can select the right technology for the application. http://otn.oracle.com/tech/java/architect/distributed_transactions.html Checkout the Middleware Architecture Series Distributed Transactions: What you need to know to stay out of trouble Middleware Arch Series: http://otn.oracle.com/middleware - Sudhakar Ramakrishnan Oracle Corp. ============================================================== = 1 replies in this thread Mark as Noisy | Reply Middleware Arch Series on Distributed Transactions Message #76490 Posted By: Mike Spille on March 13, 2003 @ 05:44 PM in response to Message #76486. \Ramakrishnan\ Thought this might interest many on this thread. Larry Jacobs contributed to week 6 of the Middleware Architecture Series on "Distributed Transactions". He discusses the the global consistency promises of two-phase commitment protocols and transactional messaging systems and how architects can select the right technology for the application. \Ramakrishnan\ Well, here's what I got out of that article: Number one point - Use Oracle products for everything!!! According to the article, you can avoid 2PC by having the database and the messaging provider one and the same. Um - this only works if every single database/messaging combination in my system happens to be Oracle. Not bloody likely! If my database and messaging system aren't Oracle, we're back to the in-doubt transaction problem. The article's bottom line "In just about every case you and your system will be better off without any distributed two-phase commit". The author lost a tremendous amount of credibility with this one. He's advocating moving systems towards messaging-in-the-middle - and thereby completely losing the "Isolation" part of ACID you get with true global transactions. Using the example from the article, your New York system will _always_ be out of sync with your San Fran one (because for any given transaction, San Fran will be commited and just sending the message over to NY. NY will commit later when the message comes in). On point-in-time recovery: almost complete nonesense. In most recovery scenarios, the transaction manager has sufficient information to coordinate recovery. If the tran manager or one of the XA resources becomes "corrupted", as the article says, you're pretty fairly well screwed whether you're using his messaging solution or a 2PC solution. In the end, you've still got two databases, one of them is inconsistent due to "buggy" app code, and some poor schmoe is going to have to manually slog through both sides to figure out what to do. On availability, the article says "Failure isolation, and therefore availability, distinguishes this message-based system from the earlier 2PC system. The participants can continue executing even when another participant fails....". This makes it sound like this messaging solution is a drop-in replacement for 2PC. It's not. This comes back to isolation, atomicity, and consistency. In a typical 2PC situation, you're doing 2PC because the resources involved must be in sync at all times. To go out of sync is very, very bad. But in the messaging scenario, the author boasts effectively that you can keep doing transactions on database A while database B is down. If you're relying on consistency, this is unbelievably bad - when database B comes back it will be horrendously out of date with database A! This may not seem like a problem, until you realize that data likely isn't just flowing from A to B, but may be going from B to A as well. If the data flow is bi-directional, then the inconsitencies between the databases will eventually cause both databases to end up inconsistent. There are some real problems that the author does a good job of pointing out - such as held database locks for in-doubt transactions in the case of a failure, and performance problems. But a resonsible author would advocate hardening the hardware for the in-doubt case (make sure you're friggin' hardware is highly available!), and offer suggestions the performance side (memory-caching disk arrays, the possibility of async XA, etc). To imply that you can drop a messaging solution in in place of a 2PC solution is just simply wrong. -Mike ============================================================== = 1 replies in this thread Mark as Noisy | Reply Combining 2PC and transactional messaging Message #76516 Posted By: Stefan Tai on March 13, 2003 @ 09:04 PM in response to Message #76483. \Mike\ Loss of isolation (the I in acid). I understand D-Spheres needs to pump messages around, and therefore a global transaction can't really be isolated. As a result, pieces of global transaction are visible before its fully commited. This can be problematic for some applications. \Mike\ That's true. D-Spheres may not be appropriate for some applications. However, there are many applications for which the D-Spheres model and its "relaxed isolation" is applicable, if not even desirable. Many business transactions, for example, need to be built using object middleware and messaging middleware in combination (many legacy apps can only be accessed using messaging); a pure synchronous, tightly-coupled JTS transaction simply would not do. Also, the isolation provided by the messaging ops in a D-Sphere is the same as with conventional transactional messaging. \Mike\ Atomicity. There are repeated claims in the D-Spheres papers as to it achieving atomicity, but I think its overtaxing the word. The overall result of transactions in this model appear to ultimately achieve consistency, but it's stretching things to claim that each global transaction is atomic in any meaningful way. \Mike\ I do not agree. Atomicity is clearly achieved; a D-Sphere may be longer-running than a simple JTS transaction, but that is only a reflection of many business transaction scenarios and needs. \Mike\ Compensating actions. This is related to the above - the middleware at times needs to invoke application-defined compensating actions in the event of certain failures. This really doesn't meet any useful definition of the term "atomic". On top of that nitpick, it puts a heavy burden on application developers. Compared to relying on XA/2PC automatic rollback by the XA Resources, in this model the app effectively has to direct a sort-of manual rollback operation. \Mike\ D-Spheres combine rollback and compensation techniques; either one can be applied where applicable. If rollback (and corresponding imaging and locks) are possible, fine; if not (and that is also common practice), then compensation is the choice. D-Spheres do not promote either one technique, but support both. And while compensation today is mostly hand-coded and a responsibility of the app developer, a middleware like the D-Spheres system reduces the burden of the app developer by supporting "guaranteed compensation". \Mike\ Overall I think there are some good ideas in this research, but I'm doubtful of this or anything like it becoming a widely-implemented standard. Beyond my questions above, the sheer complexity of the model would mitigate against it becoming a widely used model e.g. people think EJBs are complex - what are they going to think about D-Spheres and all of its subtle implications?. \Mike\ Thanks very much for your feedback and opinion; this is very valuable and interesting. Of course, as one of the authors of the D-Spheres technology (as you rightly point out), I am biased. I tend to think that the problem that D-Spheres is trying to solve is inherently complex, and that D-Spheres makes solving the problem only easier. I have yet to see an alternative solution that has comparable features and that would be even easier to use. Stefan