Monday, August 20, 2012

When CICS and DB2 for z/OS Monitors Disagree (Part 2)

About a week ago, in part 1 of this two-part blog entry, I described a situation I've encountered several times at mainframe DB2 sites over the past 20 years: CICS users complain of elongated response times for DB2-accessing transactions, and these users' complaints are backed up by CICS monitor data that indicate elevated "wait for DB2" times as being the cause of the performance degradation. Concurrently, DB2 systems people point to their own data, from a DB2 monitor, showing that in-DB2 time for CICS transactions is NOT significantly higher than normal during times when the CICS programs are running long. Finger-pointing can ensue, and frustration can build in the face of a seemingly intractable problem: how can it be that the CICS monitor shows higher "wait for DB2" times, while according to DB2 monitor data the performance picture looks good? I noted in the aforementioned part 1 blog entry that this situation can occur when there is either a shortage of DB2 threads available for CICS-DB2 transactions or a shortage of TCBs through which the DB2 threads are utilized. In this part 2 entry I'll shine a light on another possibility: the disagreement between CICS and DB2 monitors can be caused by inappropriate priority specifications for DB2 address spaces and/or CICS transactions.

The priority to which I'm referring is dispatching priority -- the kind that determines which of several "in and ready" tasks in a z/OS system will get CPU cycles first. Dispatching priorities for address spaces are specified in the WLM policy of the z/OS LPAR in which they run. The busier a system is (with respect to CPU utilization), the more important it is to get the address space dispatching priorities right. DB2's IRLM address space (the lock manager) should have a very high priority, and in fact should be assigned to the SYSSTC service class (for ultra-high-priority address spaces). Most sites get this right. The mistake that I've seen more than once is giving the DB2 system services and database services address spaces (aka MSTR and DBM1) a priority that is lower than that of the CICS application owning regions (AORs) in the z/OS LPAR. The thinking here is often something like this: "If the DB2 MSTR and DBM1 address spaces have a higher priority than the CICS AORs, they will get in the way of CICS transactions and performance will be the worse for it." Not so. The DB2 "system" address spaces generally consume little in the way of CPU time (DDF can be a different animal in this regard, as I pointed out in an entry posted to this blog last month), so "getting in the way" of CICS regions is not an issue. In fact, it's when CICS regions "get in the way" of the DB2 system address spaces (as they can in a busy system when they have a higher-than-DB2 priority), that CICS transaction performance can go downhill.

How's that? Well, it can come down to a thread-availability issue. It's common for CICS-DB2 threads to be created and terminated with great frequency as transactions start and complete. The DB2 address space that handles thread creation and termination is MSTR, the system services address space. If CICS address spaces are ahead of MSTR in the line for CPU cycles (i.e., if the CICS AORs have a higher priority than MSTR), and if the z/OS LPAR is really busy (think CPU utilization north of 90%), MSTR may not be readily dispatched when it has work to do -- and the work MSTR needs to do may be related to CICS-DB2 thread creation. If DB2 threads aren't made available to CICS transactions in a timely manner, the transactions will take longer to complete, and "wait for DB2" will be seen -- via a CICS monitor -- as the reason for the not-so-good performance. Check DB2 monitor data at such times and you'll likely see that things look fine. This is because DB2 doesn't "see" a CICS transaction until it gets a thread. Thus, as I pointed out in part 1 of this two-part entry, the "wait for DB2" time reported by the CICS monitor can be time spent in-between CICS and DB2, and that's why the different monitors paint different pictures of the same performance scene: the CICS monitor indicates that transactions are waiting longer for DB2, and that's true, but the DB2 monitor shows that when a transaction does get the thread it needs it performs just fine. If you see this kind of situation in your environment, check the dispatching priorities of the DB2 address spaces and of the CICS AORs. The priority of the DB2 MSTR, DBM1, and DDF address spaces should be a little higher than that of the CICS AORs in the LPAR. [Don't sweat giving DDF a relatively high dispatching priority. The vast majority of DDF CPU utilization is associated with the execution of SQL statements that come from network-attached clients, and these execute at the priority of the application processes that issue the SQL statements -- not at the priority of DDF.] Oh, and in addition to ensuring that DB2 MSTR can create threads quickly when they are needed for CICS transaction execution, think about taking some of the thread creation pressure off of MSTR by increasing the rate at which CICS-DB2 threads are reused -- I blogged on this topic a few months ago.

One other thing to check is the priority of the CICS transactions themselves -- more precisely, the priority of the TCBs used for CICS transaction execution. This is specified through the value given to PRIORITY in the definition of a CICS DB2ENTRY resource, or in the DB2CONN definition for transactions that utilize pool threads. The default value of the PRIORITY attribute is HIGH, and this means that the tasks associated with entry threads (or pool threads, as the case may be) will have a dispatching priority that is a little higher than that of the CICS AOR's main task. HIGH is an OK specification if the z/OS LPAR isn't too busy -- it helps to get transactions through the system quickly; however, if the z/OS LPAR is very busy, PRIORITY(HIGH) may lead to a throughput issue -- this because not only the DB2 address spaces, but the CICS AORs, as well, could end up waiting behind transactions to get dispatched. In that case, going with PRIORITY(LOW) could actually improve CICS-DB2 transaction throughput. I have seen this myself. Bear in mind that PRIORITY(LOW) doesn't mean batch low -- it means that the transactions will have a priority that is a little lower than that of the CICS region's main task.

Bottom line: dealing with (or better, preventing) CICS-DB2 performance problems is sometimes just a matter of getting your priorities in order.

No comments:

Post a Comment