Tuesday, December 29, 2015

Thoughts on DB2 for z/OS Index Compression

Not long ago, a DB2 for z/OS DBA requested of me some guidance in planning for his organization's initial use of index compression (a capability introduced with DB2 9 for z/OS). He asked some good questions, and with the thought that my responses might be useful to others in the DB2 community, I offer this blog entry.

First, strive to get good "bang" from your index compression "buck." "Bang for the buck" is an expression here in the USA (and perhaps elsewhere in the English-speaking world) that refers to getting a good return on investment. At this DBA's site, there were many thousands of indexes defined on DB2 tables (in two large production environments and multiple development and test systems). The DBA understandably viewed compression of every last one of these indexes as a daunting prospect. My advice: don't compress all the indexes. Generally speaking, at a DB2 for z/OS site a relatively small number of really big tables will hold the lion's share of data rows, and the indexes defined on these really big tables will be the ones that occupy the large majority of disk space used for indexes. Compress this proportionally small number of indexes, and you'll achieve maximum disk space savings with minimal effort and overhead.

I told the DBA to identify the largest indexes in a given DB2 system (and by "system," I'm referring to the objects for which information can be found in a particular DB2 catalog). He did that with a query similar to this one (the grouping and summing aggregates information pertaining to the partitions of a partitioned index):


We speak sometimes of the "80/20" rule, whereby 80% of results are tied to 20% of things acted upon. In the case of the DB2 systems examined by the DBA with whom I was working, 80/20 would have understated the outcome: in one of the production DB2 environments, 1% of the indexes (the top 1% in terms of number of entries) occupied 85% of the total disk space used for indexes (and results were similar for the other DB2 systems at this site). By compressing a relatively small number of indexes, a large decrease in disk space consumption will be achieved (index compression can often deliver a 50-70% reduction in the disk space occupied by a given index). That's a very good bang for the index compression buck.

Second, choose the right page size for an index that is to be compressed. When a DB2 for s/OS table space is compressed, data is in compressed form in memory as well as on disk (data is compressed when inserted into a page, and decompressed when accessed on behalf of a program), so a 4KB page (for example) of compressed data in memory also occupies 4KB of space on disk.

Not so with index compression. Index pages in memory are always uncompressed -- they are in compressed form only on disk. That being the case, index compression is achieved by squeezing the contents of a 32KB, 16KB, or 8KB index page in memory into a 4KB page on disk. Compressing an index, then, involves two actions: 1) change the index's page size to something larger than 4KB (if the current page size is 4KB) via ALTER INDEX with a BUFFERPOOL specification that references an 8K, 16K, or 32K buffer pool; and 2) change the index again to be compressed via a second ALTER INDEX with a COMPRESS YES specification. [The first ALTER INDEX, specifying an 8K, 16K, or 32K buffer pool, will place the index in AREOR status, indicating a pending DDL change -- a subsequent online REORG of the index will change the index's page size. The second ALTER INDEX, with the COMPRESS YES specification, will place the index in rebuild-pending status.] Knowing this, you might think, "Hey, I'll change the page size for all the indexes I want to compress to 32KB. That way, I'll get an 87% reduction in disk space for these indexes!"

That would not be a good move. Here's why: if a compressed index has 32KB-sized pages, DB2 will stop inserting entries into a page in memory if it determines that the page already holds the maximum amount of information that can be squeezed into a 4KB-sized page on disk; thus, going with 32KB pages for an index could result in space being wasted in the in-memory pages of the index. Does that mean that 8KB would be a better page-size choice for indexes that you want to compress? Not necessarily -- with 8KB-size pages, you might be missing out on disk space consumption reduction that could be achieved with a larger page size. What to do?

Thankfully, DB2 provides a utility, DSN1COMP, that provides a nice assist with respect to choosing the right page size for an index that you want to compress. When you run DSN1COMP for an index, the output will show you, for each page size (8, 16, and 32 KB), the estimated amount of disk space savings that could be achieved with compression, and the estimated amount of space that would be wasted in in-memory pages of the index. For a given index that you want to compress, the right page size will be the one -- per DSN1COMP output -- that will deliver the most disk space savings with the least amount of in-memory space wastage.

Finally, leverage z/OS LPAR memory resources to minimize the CPU overhead of index compression. As noted above, pages of a compressed DB2 for z/OS table space are compressed in memory as well as on disk. Consequently, data compression CPU overhead is incurred when a row is inserted into a table in a compressed table space (or an existing row is updated), and when a row is retrieved from a compressed page in memory. Because pages of compressed indexes are compressed only on disk, index compression CPU overhead is incurred when a compressed page is decompressed upon being read into memory, and when an in-memory index page is compressed as it is written to disk. In other words, the CPU cost of index compression is paid at read and write I/O time -- once an index page is in memory, there is no compression-related cost to access that page, because the page is in memory in uncompressed form.

Since the cost of index compression is associated with index I/O operations, it stands to reason that as these operations decrease, the CPU cost of index compression will go down. And what makes index I/O rates go down? A larger buffer pool. If you compress index XYZ, a subsequent enlargement of the buffer pool to which XYZ is assigned should reduce the rate of I/Os (especially read I/Os) for the index, and that should translate to reduced compression overhead for index XYZ. Where would you see this overhead reduction? That depends on the type of I/Os that are being reduced via buffer pool enlargement. To the extent that the rate of synchronous read I/Os is lowered by way of buffer pool enlargement, the in-DB2 (aka class 2) CPU time for programs using a compressed index should be positively impacted. Reduced prefetch read and database write activity should reduce the CPU consumption of the DB2 database services address space (aka DBM1) -- and recall that prefetch reads and database writes became 100% zIIP eligible starting with DB2 10, so a lower volume of that activity will cut down on associated zIIP engine consumption.

Index compression has been an option for some years now in a DB2 for z/OS environment, but I have the impression that it's a DB2 feature that is not yet used as often as it should be. If you're not yet using index compression, think about putting the capability to work at your site; and, if you decide to compress some of your indexes, do it the right way.

Monday, November 30, 2015


Having recently posted a blog entry on my use of output from the DB2 for z/OS -DISPLAY BUFFERPOOL command, it seems a good time to post a companion entry focused on the -DISPLAY GROUPBUFFERPOOL command. Obviously, this latter command is relevant to people who work with DB2 operating in data sharing mode on a Parallel Sysplex. If you are such a person, read on.

First, enter the command the right way

Actually, you'll want to enter the command in two forms. First, from any one member of the DB2 data sharing group of interest, issue the following:


More specifically, issue that command twice, one hour apart, and retain the output of the second issuance of the command. Why? Because that command's output will show activity, from a group perspective, for each group buffer pool for a one-hour period of time (i.e., for the period of time between the first and second issuances of the command). In the command output you'll see, for each group buffer pool (GBP), a DSNB782I message, the text of which will include the phrase "INCREMENTAL GROUP DETAIL STATISTICS SINCE timestamp-value." You can check that timestamp value to verify that the data in the second issuance of a pair of hour-separated -DISPLAY GROUPBUFFERPOOL command captures one hour of activity.

Consider issuing pairs of hour-separated -DISPLAY GROUPBUFFERPOOL commands, in the form shown above, at two different times of day -- maybe once during a busy online transaction time, and once during a period of heavy batch activity.

The other form of the command you'll want to use is as follows:


Since this command provides member-view information, you might want to issue it for more than one member of the data sharing group. As mentioned above for the GDETAIL form of the command, issue the MDETAIL form, on a given DB2 data sharing group member, twice, with an hour between each command issuance, so as to capture (in the output of the second issuance of the command) one hour of member-scope activity for each GBP (note that for the MDETAIL form of the command, the "STATISTICS SINCE timestamp-value" information is in a DSNB771I message, versus the DSNB782I message associated with the GDETAIL form of the command). Also as mentioned previously, consider issuing pairs of hour-separated commands at different times of the day (maybe heavy online and heavy batch).

If you or one of your colleagues has some REXX programming skills, you can do as some have done and have a program issue pairs of hour-separated -DISPLAY GROUPBUFFERPOOL commands, and then have a REXX routine parse the output of the second of those commands and insert useful information (as described below) into a DB2 table or maybe a file. If such a program and associated REXX routine are executed on a daily basis, you will have information that you can use to track, and adjust to, trends in GBP usage.

How I use information in the output of -DISPLAY GROUPBUFFERPOOL(*) TYPE(GCONN) GDETAIL(INTERVAL)
  • Look for the "double zeros" -- referring here to zero write failures due to lack of storage (see the DSNB786I message text that's part of the command output), and zero cross-invalidations due to directory entry reclaims (part of the DSNB788I message text in the command output), for each GBP. If GBP write failures occur due to lack or storage, more than likely the GBP is too small (or the ratio of directory entries to data entries for the GBP is too large -- more on this below). You don't want GBP write failures, because that can land pages on the logical page list (LPL), and that can lead to failures for programs that subsequently attempt to access those pages prior to their being recovered from the LPL. [Note that a DB2 11 enhancement called group buffer pool write-around was introduced to help reduce the incidence of GBP write failures due to lack of storage. You can read about this enhancement in section 5.1 of the IBM "redbook" titled DB2 11 for z/OS Technical Overview (downloadable from http://www.redbooks.ibm.com/abstracts/sg248180.html?Open).] Cross-invalidations due to directory entry reclaims occur when a directory entry (used to track inter-DB2 interest in table space and index pages) has to be stolen (because all are in use and a new page has to be registered). Because the directory entry reclaim results in the GBP losing the "pointer" to a page, from a GBP-dependent data set, that is cached locally in a buffer pool of at least one group member, the page has to be preemptively invalidated wherever it is locally cached. You'd rather avoid that situation, because when a page invalidated for this reason is next accessed by a DB2 member subsystem, it will have to be read back into memory from disk (or from the GBP, but a read "hit" in this case tends to be quite unlikely). The page read in from disk will probably look just like the previously cached page that was marked invalid (because the invalidation was necessitated by a directory entry reclaim, not by a change of the page by a program on another group member); thus, a disk read that would be otherwise unnecessary is driven by a directory entry reclaim. The key to avoiding directory entry reclaims is to have a number of directory entries in a GBP that is at least as large as the number of data entries in the GBP plus the number of buffers in the corresponding local buffer pools in each member DB2 subsystem (as described in further detail in an entry I posted to this blog a couple of years ago).
  • Check out the specifications of each GBP. In particular, look at the ratio of directory entries to data entries. The default for this ratio is 5:1. If you see a directory-to-data-entry ratio that is significantly higher than 5:1, check to see if ALLOWAUTOALT(YES) is specified for the GBP in the CFRM policy for the Sysplex. If the directory-to-data-entry ratio for a GBP has been automatically increased by the system in order to avoid directory entry reclaims, that's OK, but in my experience this kind of adjustment can be an indication that the GBP is smaller than it should be (I wrote about this in a blog entry I posted a few months ago). Note that if you decide to make a GBP larger, make sure that all structures defined for both coupling facilities used by a DB2 data sharing group can fit in one coupling facility LPAR (except for secondary group buffer pools, when GBPs are -- as they should be -- duplexed), in case the other CF LPAR is down for maintenance purposes or due to a failure (if you temporarily had all structures in one CF LPAR, you would not during that time be duplexing the GBPs).
How I use information in the output of -DISPLAY GROUPBUFFERPOOL(*) TYPE(MCONN) MDETAIL(INTERVAL)
  • Basically, I use the output of this command to get one item of information that is not available in the output of the GDETAIL form of the command: the XI GBP hit ratio. That ratio (described in more detail in the blog entry referenced in the item above about directory-to-data-entry ratios) is so named because it has to do with synchronous GBP read requests that are driven by cross-invalidations (or XIs, in DB2 data sharing shorthand). Why does this ratio merit attention? Because if there are no cross-invalidations due to directory entry reclaims (see the first item above pertaining to the output of the GDETAIL form of the -DISPLAY GROUPBUFFERPOOL command) then what cross-invalidations did occur pretty much had to be caused by data-change activity (when a program connected to member DB2A changes a page cached locally in a buffer pool of member DB2B, the DB2B copy of the page has to be invalidated so that DB2B will know to retrieve the current version of the page). Data-change activity associated with GBP-dependent data sets results in changed pages being written to the appropriate GBP, and if pages were written to a GBP and the GBP has enough data entries to keep changed pages cached for a while, it is reasonable to expect that a good percentage of GBP reads driven by data-change-caused cross-invalidations will result in GBP read "hits." The XI read hit ratio is easily calculated using the SYNCHRONOUS READS DUE TO BUFFER INVALIDATION information in the DSNB773I message that is part of the -DISPLAY GROUPBUFFERPOOL MDETAIL output for each GBP: just divide the number in the DATA RETURNED field by the sum of the DATA RETURNED and DATA NOT RETURNED fields. I regularly see this ratio in excess of 80% for well-sized GBPs. If you see an XI read hit ratio that is substantially below 80% for one of your GBPs, consider making that GBP larger (if available CF LPAR memory permits).
There you have it. I hope that this information will be of use to you.

Sunday, November 29, 2015

DB2 for z/OS: How I Use -DISPLAY BUFFERPOOL Output

A couple of weeks ago, I was going over output of the DB2 for z/OS command -DISPLAY BUFFERPOOL with a group of people, and one of the meeting participants said words to this effect: "You've got some useful formulas and rules of thumb here. Could you write them down for us?" I did that, and it occurred to me that the information would be useful to others in the larger DB2 for z/OS community; so, I've packaged it in this blog entry.

First, enter the command the right way

Based on my experience, the most useful form of the -DISPLAY BUFFERPOOL command is as follows:


You actually want to enter the command twice, one hour apart. Why? Because the command output shows activity since the buffer pools were last allocated (which was probably when the target DB2 subsystem was last "bounced"), or since the command was last issued. If you issue the command once, then issue it again one hour later, the output of the second issuance of the command will show the preceding hour's worth of activity. That being the case, you can divide the activity counters by 3600 to get per-second figures, and those figures are very useful for performance monitoring and tuning work. [Note that the command output includes a DSNB409I message for each buffer pool, the text of which reads, "INCREMENTAL STATISTICS SINCE timestamp-value." Check that timestamp in the output of the second of the two issuances of the command, to verify that the command captured approximately one hour of activity.]

It might be a good idea to use -DISPLAY BUFFERPOOL to capture activity for a busy "online" hour (often a mid-morning or mid-afternoon hour) and for an hour during which a batch workload is particularly heavy -- one period might see more of a load on your buffer pool configuration than the other.

Something else to consider: some organizations have written programs that issue a pair of -DISPLAY BUFFERPOOL commands (again, separated by one hour) one or more times per day, and have a REXX routine that parses the output of the second issuance of a pair of commands and extracts useful information and inserts these values into a DB2 table or maybe a file. In that way, trends can be tracked and responded to with adjustments to the buffer pool configuration.

OK, with the output of the second issuance of a pair of one-hour-apart -DISPLAY BUFFERPOOL commands in hand, here's how you can put the information to good use:

  • Add up the size of all the buffer pools (in megabytes), to get the total size of the buffer pool configuration for the DB2 subsystem. That's easily done by multiplying the number of buffers allocated for a pool by the pool's page size; so, a pool with 20,000 buffers of 4KB apiece is sized at 80 MB. If you have a single production DB2 subsystem running in a DB2 for z/OS LPAR, I would look to move towards a situation in which the total size of the subsystem's buffer pool configuration is 30-40% of the size of the z/OS LPAR's memory resource. For example, if a z/OS LPAR with 100 GB of central storage holds one production DB2 subsystem, I'd aim for a buffer pool configuration size (that is, the aggregate size of all buffer pools allocated for the subsystem) that is in the range of 30-40 GB (if a z/OS LPAR holds more than one production DB2 subsystem, I'd want the total size of all the subsystems' buffer pool configurations to be not much more than 50% of the LPAR's real storage resource). If you're not in that 30-40% of memory "zone" at present, there's no need get there in one giant step from where you are. A series of steps (but pretty good-sized steps, not a lot of baby steps) would be fine. You should be most aggressive in growing the pools that have the highest total read I/O rates (see the next item). By the way, the aim here is to have a buffer pool configuration that makes really good use of an LPAR's memory for enhanced performance, while avoiding a situation in which LPAR memory is under too much pressure -- if the LPAR's demand paging rate, which can be obtained from an RMF (or equivalent z/OS monitor) CPU activity report, is in the low single digits or less per second, the memory resource is not under too much pressure.
  • For each pool, calculate the total read I/O rate. That's the sum of five numbers (for each pool) from the command output -- random synchronous reads, sequential synchronous reads, sequential prefetch reads, list prefetch reads, and dynamic prefetch reads -- divided by the number of seconds in the interval between the first and second issuance of the -DISPLAY BUFFERPOOL command (if that interval was, as suggested, an hour, the number of seconds would be 3600). Focus your attention first on the pools (if any) with read I/O rates in excess of 1000 per second, and see if you can get the I/O rate for those pools below 1000 per second by making the buffer pools larger. You can then turn your attention to pools with a read I/O rate in the hundreds per second, to see if the read I/O rate for those pools can be brought below 100 per second. If the read I/O rate for each of your buffer pools is below 100 per second, you are making exceedingly good use of LPAR memory.
  • The -DISPLAY BUFFERPOOL output shows the specifications for each buffer pool, and you should examine those. All high-I/O pools (total read I/O rate > 100 per second) should be defined with PGFIX(YES), and those page-fixed pools should be backed by 1 MB page frames (check to see that the value of the LFAREA parameter in the IEASYSxx member of the system's PARMLIB data set is large enough for this purpose). If your demand paging rate is very low (as noted previously, that would be low single digits per second or less), consider going with PGFIX(YES) even for low-I/O pools that have a lot of GETPAGE activity (more than 1000 GETPAGEs per second). Page-fixing these pools will deliver a CPU benefit if the pools are backed by 1 MB page frames.
  • Are any of the DB2 subsystem's pools used to "pin" objects in memory (i.e., to cache table spaces and/or indexes in memory in their entirety)? If yes, such pools should be defined with PGSTEAL(NONE), so that DB2 will know that they are intended to be "pinning" pools (the PGSTEAL(NONE) specification was introduced with DB2 10 for z/OS). Additionally, the target read I/O rate for a "pinning" pool is zero. If a "pinning" pool has a read I/O rate that is greater than zero, consider whether the pool needs to be enlarged so that all pages of all objects assigned to the pool can be kept in memory.
  • The counter for the number of times that the data manager threshold (DMTH) was reached should be zero for all pools. If that threshold is hit, either the buffer pool is way too small or the deferred write thresholds are way too high. [For a pool dedicated to 4KB-page or 32KB-page work file table spaces, the deferred write thresholds -- DWQT and VDWQT -- can be set to values higher than the respective defaults of 30 and 5, and doing this can result in some CPU savings, but don't take that idea too far. DWQT/VDWQT settings of 70/40 or even 80/50 should be OK for a work file-dedicated pool, but some sites have gone to 90 for DWQT for such pools, and have ended up hitting the data manager threshold as a result. Hitting DMTH causes the CPU cost of accessing data in a pool to jump, so make sure that you're staying under that threshold.]
  • The "prefetch disabled" counters -- no buffer, and no read engine -- should ideally be zero for all pools. If either is non-zero for a pool, either the pool is too small, or the VPSEQT threshold (default value is 80) is too low (so there are too few buffers available to hold pages read via prefetch, leading to elevated levels of prefetch read activity), or the deferred write queue threshold (DWQT) is too high (this would generally apply to a work file-dedicated pool, as mentioned in the preceding item).
  • For each pool, the number of asynchronous write operations should be significantly larger than the number of synchronous write operations. If this is not the case, the vertical deferred write queue threshold (VDWQT), or the horizontal deferred write queue threshold (DWQT), or both, should be lowered.

I hope that you find this information to be useful. Stay tuned for a companion entry in which I'll document the ways in which I use -DISPLAY GROUPBUFFERPOOL information (relevant to DB2 data sharing systems).

Thursday, October 29, 2015

DB2 for z/OS: DDF and Accounting Trace Records

I recently encountered a situation that served to illuminate a number of important aspects of application processing in the context of the DB2 for z/OS distributed data facility (DDF). I'll go through the scenario here, in hopes that you will find the information to be useful.

A systems programmer contacted me about a problem his organization had run into with a new application that accesses DB2 for z/OS by way of a network connection (and so uses DDF). The application drives large numbers of row-insert operations, and was generating huge numbers of DB2 accounting trace records (written to SMF data sets). In one test, executed in a pre-production environment, the application issued 140 million INSERT statements, and that resulted in DB2 cutting 140 million accounting trace records. The trace records filled the SMF data sets faster than they could be offloaded, and as a consequence some of the records were lost.

To reduce the torrent of trace records flooding the SMF data sets, the systems programmer changed the value of the ZPARM parameter ACCUMACC in the target DB2 subsystem to 100. In doing that, he made it possible for DB2 to "roll up" activity for as many as 100 DDF units of work into a single accounting trace record. That action addressed the immediate problem, but only for the test system: in the production environment, the organization had a requirement for one accounting trace record per DDF unit of work (ACCUMACC on the production DB2 subsystem was set to NO).

With roll-up of activity for multiple DDF transactions into a single accounting record not an option in production, the systems programmer looked into multi-row INSERTs (also referred to as "bulk inserts") as a means of reducing the volume of trace records that would be generated when the new application executed. He put this question to me: Could multi-row INSERT be utilized for a Java application accessing DB2 for z/OS-managed data via DDF? I responded affirmatively, and pointed out that information on multi-row INSERT for Java programs can be found in the DB2 for z/OS Application Programming Guide and Reference for Java (the DB2 10 manual can be downloaded from http://www-01.ibm.com/support/docview.wss?uid=swg27019288#manuals, and the DB2 11 manual from http://www-01.ibm.com/support/docview.wss?uid=swg27039165#manuals).

The systems programmer got together with one of the developers of the new application, and they ran a test in which 1000 rows were bulk-inserted into a DB2 table, in chunks of 100 rows. Instead of 1000 accounting trace records, execution of the test program generated 10 trace records. Here's the interesting part: the program was not issuing any explicit commits. That led the systems programmer to surmise that perhaps accounting record volume was being driven by the volume of INSERT statements issued by the application program. After all, when 140 million single-row INSERTs had been issued by the program, 140 million accounting records had been written to SMF. When 1000 rows were inserted by way of 10 bulk inserts of 1000 rows apiece, 10 trace records had been cut. I explained that accounting records are generated not by issuance of data-changing SQL DML statements, but by commits issued following execution of one or more SQL DML statements. [That's the way it works for application processes, such as DDF-using programs, that are transactional in nature -- a z/OS batch program is different, in that one accounting record will be generated when the program completes, even if the program issued hundreds of commits while executing.]. As the Java application developer was not issuing explicit commits from his program, I speculated that AutoCommit was enabled on the client side of this client-server application. Not so. The systems programmer verified that AutoCommit was set to FALSE for the new application. This information left us scratching our heads.

The sysprog next suggested that perhaps the DBAT used by the application was going inactive after each INSERT, and the accounting records were being generated as a result of DBATs going inactive (DBAT is short for database access thread -- the kind of thread used by applications that connect to DB2 via DDF). Not the case, I told him. DBATs don't go inactive in the usual sense. A connection (from a client application to DB2) will go inactive when a DDF transaction completes. Yes, the DBAT that had been used to service the transaction will go back to the DBAT pool at that time, but I'd say that "disconnected" is a better term that "inactive" for the pooled DBATs (I'm talking here about "regular" DBATs -- a high-performance DBAT will remain associated with the connection through which it was instantiated, and that connection will not go inactive until the high performance DBAT is terminated after having been reused 200 times). In any case, I said, the key is transactions completing, not connections going inactive. Connections going inactive, like accounting records getting cut (with ACCUMACC set to NO), are a response to DDF transactions completing, and transaction completion involves commit processing.

Asked the sysprog: Could packages bound with RELEASE(COMMIT) be the cause of the large volume of accounting trace records associated with execution of the new application? My response: No. Accounting trace records are generated at DDF transaction boundaries, and RELEASE(COMMIT) and RELEASE(DEALLOCATE) do not affect transactional boundaries. Commits establish transaction boundaries. Could DB2 itself be driving the commit activity that was in turn driving trace record generation? No. DB2 does not cause commits. DB2 responds to commits (an exception to that rule is the commit driven upon completion of a DB2 stored procedure defined with COMMIT ON RETURN YES).

Finally, in searching for the elusive commits that we knew were coming from somewhere, the systems programmer and his application development colleagues found the answer: the new application was using the open-source Spring Framework, and in response to the application indicating completion of a transaction, the Spring Framework would drive a commit that in turn would drive generation of a DB2 accounting trace record. With this now clarified, the application team can vary commit frequency as desired to balance volume of trace records generated (don't want too many commits) against accumulation of DB2 child X-locks associated with INSERT processing (don't want too few commits).

The key take-away here: for a lot of reasons, issuance of commits is really important for DB2-accessing application processes. Sometimes, developers of DDF-using applications have direct visibility of commit-issuing logic. Other times, use of something like a framework between a client program and a DB2 for z/OS data server (and the Spring Framework is just one example of this kind of thing) abstracts issuance of commits in a way that removes commit visibility from an application developer. When that is the case, commit frequency can still be controlled and changed as needed, but that requires an understanding of how interaction by a program with the framework drives the commits that flow to DB2. If your DB2-accessing client programs interface with an application framework, take the time to understand how commits are made to flow to DB2. In doing that, you'll avoid surprises and you'll be less likely to spend time scratching your head and spinning your wheels.

Tuesday, October 20, 2015

DB2 for z/OS: Getting a Handle on a CPU-Constrained Environment

z/OS systems are famous for their reliability, and part of that story is the ability of z/OS to accommodate surges of application activity: you can pile more and more work onto a z/OS system, and it will keep on trucking, slicing the "pie" of available processing capacity into smaller and smaller slices to keep a workload moving along. That's highly preferable to failing as a result of being overloaded, but there is a point at which applications running on a z/OS system will perform in a sub-optimal way if the system's processing capacity is undersized relative to the workload that the system is being asked to execute. If you work with DB2 for z/OS, you want to be able to spot a situation in which DB2 performance is being negatively impacted by a shortage of CPU capacity, so that your organization can take corrective action. Through this blog entry, I want to help you in that endeavor.

Recognizing indications of CPU overload in DB2 monitor reports and displays

First, know the signs of strained CPU capacity that are reflected in DB2 monitor data. With respect to overloaded general-purpose engines, the key indicator is in-DB2 not-accounted-for time. You can get that metric from a DB2 monitor-generated accounting long report (also known as an accounting detail report) or an online display of DB2 application workload activity. My preference is to use a report, and what I particularly like to use is an accounting long report with information ordered by (or the term may be "grouped by," depending on the DB2 monitor product in use at your site) connection type. That kind of report will contain a sub-report for each connection type used with the target DB2 subsystem: one for the CICS-DB2 workload (if you use CICS with DB2), one for the IMS-DB2 workload (if that's present in your environment), one for the DRDA workload (i.e., the DDF workload -- again, if you have such a workload), one for batch jobs that connect to DB2 via the call attach facility, etc. Check the in-DB2 not-accounted-for times for your higher-priority transactional workloads, such as CICS-DB2, IMS-DB2, and DRDA. Average in-DB2 not-accounted-for time should be reported by your DB2 monitor, but if you don't see a field called not-accounted-for time among the in-DB2 (also known as class 2) times, you can easily derive the value yourself: just subtract average in-DB2 CPU time (that's general-purpose engine CPU time plus zIIP, or "specialty engine," CPU time) from average in-DB2 elapsed time, and then subtract from that figure the total class 3 suspend time (class 3 times include wait for synchronous read time, wait for other read time, wait for lock/latch time, etc.). What's left is average in-DB2 not-accounted-for time. If that time is more than 10% of in-DB2 elapsed time for a higher-priority transactional workload (CICS-DB2, IMS-DB2, DRDA), you have an indication that overloaded general-purpose engines are having a detrimental effect on application throughput. [An in-DB2 not-accounted-for time that is more than 10% of a batch workload's in-DB2 elapsed time is less of an issue in my eyes, owing to the fact that 1) batch programs often run at a lower priority than transactional programs, and 2) organizations sometimes intentionally push general-purpose engine utilization close to 100% during periods of heavy batch processing.]

Don't stop with an assessment of CPU constraint related to general-purpose engines. You need to check out the situation regarding a mainframe's zIIP engines (if any), as well. Here, the most useful indicator is a field in the "class 1" column of a DB2 monitor-generated accounting report (this field might be available via online displays as well, depending on the DB2 monitor product in use at your site). The field to which I'm referring might have a label (depending on the monitor) like SECP CPU, with the "SE" being short for "specialty engine" (that would be zIIP engine) and the "CP" standing for "central processor," which is a reference to a general-purpose engine. The value of the SECP CPU field shows the average class 1 CPU time that is associated with zIIP-eligible work that ended up being dispatched to a general-purpose engine. If you're looking at a DB2 monitor-generated accounting long report with data ordered by connection type, look at the sub-report for the DRDA connection type (SQL from DRDA requesters tends to be the main DB2-related driver of zIIP engine utilization). Referring to class 1 times (versus class 2), call the value in the SECP CPU field A, and the value in the SE CPU TIME field B (this is CPU time for zIIP-eligible work that was dispatched to a zIIP engine). Perform the simple calculation A / (A + B), and what you get is the percentage of zIIP-eligible work that ran on a general-purpose engine. I call that the "zIIP spill-over percentage," and if it is higher than 5% for a workload that drives a lot of zIIP usage (and again, the prime example there, from a DB2 perspective, is the DRDA workload), that's an indicator that your zIIP engines are over-used to the point of reducing overall throughput for DB2-accessing applications.

Why does zIIP-eligible work get dispatched to general-purpose engines, leading to a non-zero value in the SECP CPU field for your DRDA workload? That happens when a zIIP-eligible piece of work is ready for dispatch but no zIIP engine is available to process that work. Why can more than a small amount of zIIP-eligible-on-general-purpose time be a performance-impacting issue? Because the system will wait a few milliseconds before dispatching a piece of zIIP-eligible work to a general-purpose engine (this to, presumably, give a zIIP engine a chance to become available in that few-millisecond window). Why is that a problem? Because prefetch processing is 100% zIIP-eligible starting with DB2 10 for z/OS, and if zIIP engines are busy enough to cause more than a small amount of zIIP-eligible work to be redirected to general purpose engines, that can slow prefetch processing. That, in turn, can reduce throughput for applications that drive a lot of prefetch read activity (and don't think that's just batch -- plenty of online transactional applications are characterized by significant prefetch activity in addition to synchronous database reads).

Check your z/OS monitor for engine-busy information

If your DB2 monitor is pointing to a shortage of general-purpose and/or zIIP engine capacity, use your z/OS monitor to check on the utilization of the engines assigned to the LPAR in which the DB2 subsystem is running. In doing this, use the right monitor-generated report for the task at hand. Take IBM's RMF (Resource Measurement Facility) monitor, for example. RMF can be used to generate a number of different reports pertaining to processor utilization. All are useful, but for different purposes. If you want to confirm a general purpose engine-utilization problem suggested by elevated in-DB2 not-accounted-for times as reported by your DB2 monitor, look at an RMF CPU Activity Report (information about that report -- and others -- can be found in the IBM Knowledge Center on the Web). In that report (generated for the LPAR of interest), if the TOTAL/AVERAGE value for the general-purpose engines (identified as CPs, short for central processors) in the MVS BUSY column is greater than 85% (for a transactional workload) AND the value in the % column of the IN READY row for "number of address spaces" <= N in the SYSTEM ADDRESS SPACE ANALYSIS section of the report is less than 80%, you likely have work delays that are caused by CPU contention (an average CP utilization greater than 85% can be OK during periods of heavy batch processing).

You'll also see in an RMF CPU Activity Report a TOTAL/AVERAGE value in the MVS BUSY column for the specialty engines (zIIP engines, identified as IIPs) assigned to the LPAR. How high can this value be before performance problems arise? That depends very much on the number of zIIP engines available. As I mentioned previously in this entry, you can get from your DB2 monitor (or should be able to get) the numbers (from an accounting detail report or online display of accounting information) that you need to calculate the "zIIP spill-over percentage" for a zIIP-driving workload, such as the DRDA workload (that ratio indicates the percentage of zIIP-eligible work that ends up being executed on general-purpose processors). I noted that you want the "zIIP spill-over ratio" to be not more than 5%. If the z/OS LPAR in which the DB2 subsystem of interest is running has a single zIIP engine, you could see an undesirable zIIP spill-over percentage if that zIIP engine's utilization is in the 30-40% range. Contrast that with a real-world situation I encountered recently in which average utilization of an LPAR's zIIP engines was 78%, while the zIIP spill-over percentage was only 1.6%. How could that be? The LPAR has 9 zIIP engines, that's how. It's queuing theory, folks. Even though these zIIPs are running at a relatively high level of utilization, because there are so many of them a zIIP engine is almost always available on this system when a zIIP-eligible piece of work is ready for dispatch. Note that starting with the EC12 z Systems servers (and continuing with the z13 servers), a mainframe can be configured with up to two zIIP engines per general-purpose engine (that ratio previously couldn't go above 1:1). The more zIIPs you have, the hotter you can run them while still avoiding an overly large degree of zIIP spill-over.

Summing it up

The fact that z Systems remain reliable servers even when running at very high levels of utilization is a very good thing, but even mainframes can be loaded with work to an extent that application performance will be sub-optimal. Using DB2 and z/OS monitor data, keep an eye on your system to make sure that a heavily-loaded server doesn't become an overloaded server. With effective capacity planning, you can keep the supply of MIPS sufficiently ahead of demand to deliver topflight performance for your mainframe-based applications.

Tuesday, September 29, 2015

An Oft-Overlooked DB2 for z/OS Big Memory Exploitation Play

Over the past few years, I've done a lot of writing and presenting and talking about ways in which large-scale z Systems memory resources (which I refer to as Big Memory) can be leveraged so as to enhance the performance of DB2 for z/OS subsystems and related applications. While I very much like to see z/OS LPARs with memory sizes of 100 GB or more, I DON'T like to see situations in which organizations stop well short of exploiting Big Memory to maximum positive effect with regard to DB2 performance. Big buffer pools are a great way to exploit available real storage gigabytes, but too often DB2 people don't look much beyond that particular use of mainframe memory to boost workload throughput and CPU efficiency. In this blog entry I want to highlight a DB2 application of Big Memory that, in my experience, is often overlooked: the EDM skeleton pool.

The EDM pool (long name: environmental descriptor manager pool) occupies a portion of a DB2 subsystem's database services address space (also known as DBM1). Various DB2 elements supporting SQL statement execution are cached in different parts of the EDM pool. Among these elements are skeleton package tables, or SKPTs -- one SKPT for a given package. Here's how this space is used: when an application process needs to execute a package, DB2 checks to see if the package (actually, the requisite sections of the package) is already in memory. If it isn't, DB2 loads the package (sections) from the SPT01 table space in the directory into an SKPT in the skeleton pool part of the EDM pool. From there, the package, in being allocated to the thread through which it is to be executed, is copied into another part of the DBM1 address space (the thread-specific copy of the package goes into what is called agent local pool storage -- if a given package is allocated to 10 threads, there will be 10 copies of the package in agent local pool storage in DBM1). The more times a package that is to be allocated to a thread is found in the skeleton pool in the EDM pool, versus having to be loaded from the directory on disk, the better for the overall performance of the associated DB2 workload. Often, this is is not taken into account by people charged with optimizing the performance of a DB2 for z/OS environment.

Here's what you should do (if you haven't already): check on skeleton pool activity by way of your DB2 monitor. You can use an online display of EDM pool information for this purpose, but my preference is to use the EDM pool information found in a DB2 monitor-generated statistics long report (depending on the DB2 monitor product used at your site, this may be called a statistics detail report). In such a report, the EDM pool section would contain some fields like these (I've left out some lines to highlight the information of interest):

EDM POOL                     QUANTITY
---------------------------  --------

  HELD BY SKCT                  40.98
  HELD BY SKPT                3724.36
  FREE PAGES                   175.66

In many cases, a person looking at this information would see the 0.00 value for FAILS DUE TO POOL FULL, conclude that the skeleton pool is plenty big, and move on to something else. Now, it is true that having no program failures caused by DB2 not being able to load a package into the skeleton pool is a good thing; however, that's only part of the story. Look a little further down in the EDM pool section of the report, and you'll see numbers like these:

EDM POOL                     QUANTITY
---------------------------  --------

PT REQUESTS                   6446.1K
PT NOT FOUND                   193.0K 

Take note of the ratio of PT REQUESTS to PT NOT FOUND. The former value is the number of times that a package section was requested for allocation to a thread, and the latter is the number of times that a package request could not be satisfied from the skeleton pool (and so required a loading of the package section from the DB2 directory). The ratio calculated from the numbers above (which show activity in a real-world DB2 subsystem) is about 33:1. That might look OK to you, but you should be able to do a lot better than that on your system (unless you are really memory-constrained). Here are numbers from a different real-world DB2 subsystem:

EDM POOL                     QUANTITY
---------------------------  --------

PT REQUESTS                   1717.9K
PT NOT FOUND                   128.00 

The ratio of PT REQUESTS to PT NOT FOUND in this case is over 13,000 to 1. Looked at from an activity rate perspective, the rate of package section loads from SPT01 on disk associated with the first set of PT REQUESTS and NOT FOUND numbers is about 27 per second (193,000 requests in a 2-hour reporting interval), while the rate of package section loads associated with the second set of numbers is about 1 every 14 seconds (128 requests in a 30-minute reporting interval).

I'd rather have the second set of numbers. And you know what? It often doesn't take much memory to get to a skeleton pool size that yields a really high ratio of package section requests to package section loads from disk. Interestingly, though two systems (used by two different organizations) are represented by the numbers shown above, in both cases the rate of requests for package table sections was a little over 900 per second. An important difference between the two environments is a skeleton pool that is about 2.5 times larger for the DB2 subsystem with the really high ratio of requests to loads versus the skeleton pool that has the much lower ratio of package requests to loads. The size delta amounts to about 6000 pages (4K pages), and that's about 24 MB. The point I want to make here is this: for an investment of maybe a few tens of megabytes of memory, you might be able to dramatically increase the ratio of package section requests to package section loads in your DB2 environment, if that ratio is not already high (and I like to see at least hundreds to one, and ideally thousands to one). Unless your z/OS LPAR is quite storage-constrained, there is a good chance that you could boost the size of the skeleton pool (via the EDM_SKELETON_POOL parameter in ZPARM) by a few tens of megabytes, and still have a demand paging rate for the LPAR that is very low ("very low" is what you want for the demand paging rate in a production z/OS LPAR, and I define that as being a rate -- available from a z/OS monitor -- that is in the low single digits or less per second during busy processing periods). Note that there are also request-versus-load numbers pertaining to the DBD cache in the EDM pool -- I've focused on the skeleton pool numbers because I more often see lower-than-desired ratios there.

That's it. Keep in mind that using Big Memory effectively means using it for all kinds of DB2 performance-boosting purposes -- bigger buffer pools, sure, but don't stop there. Give the skeleton pool enough space to satisfy the vast majority of package section requests out of memory, thereby reducing the rate of package section loads from the DB2 directory on disk. Doing this will get you that much closer to optimal DB2 application performance.

Sunday, September 27, 2015

I Don't Worry About DB2 for z/OS Buffer Pool Hit Ratios

And neither should you.

It's true that for years, mine was among a chorus of DB2 specialists' voices that placed a lot of emphasis on monitoring DB2 buffer pool hit ratios (there are variations with respect to calculating this ratio, but the basic formula is (GETPAGEs - synchronous reads) / GETPAGEs). Some years ago, it dawned on me (and on others -- and for some folks earlier than for me) that I'd been focusing on the wrong buffer pool performance metric. Nowadays, I don't look at buffer pool hit ratios at all.

What I look at instead is the total read I/O rate for each and every buffer pool. Before getting more into the details of that performance indicator, I'll give you some reasons for my not giving consideration to buffer pool hit ratios:
  • They can give you a false sense of being "done" with respect to leveraging buffer pool resources to boost system performance. People can see a value of greater than 99% for a buffer pool hit ratio and conclude, "The performance impact of this pool is as good as it's going to get. I'll turn my attention to other pools." That conclusion can be very much incorrect, as I'll explain momentarily.
  • They can cause you to disregard asynchronous buffer pool read activity. Because a commonly used formula for calculating a buffer pool's hit ratio ignores asynchronous read activity, it can lead to, at best, a benign neglect of prefetch read activity, or worse, actions that overly diminish asynchronous read space in buffer pools for the purpose of "goosing" a synchronous read-based hit ratio. That can work against your overall DB2 performance goals. As I pointed out in an entry posted to this blog a couple of years ago, prefetch reads matter.
  • Depending on how they're calculated, they can have weird values that confuse people. To avoid leaving prefetch reads out of the picture, some performance monitoring products will (or at least have, in the past) use buffer pool hit ratio formulas that take asynchronous read activity into account. On occasion, given a particular level and type of prefetch activity, such formulas will yield odd-looking results, like negative buffer pool hit ratios. Values of this nature can be hard for users to interpret.

[Note that I do care about a particular kind of hit ratio that is relevant to group buffer pools in a DB2 data sharing environment -- something I call the XI GBP read hit ratio. I described this ratio in a blog entry I posted about two months ago.]

I've mentioned that the buffer pool performance metric that matters most to me is the total read I/O rate. For a given buffer pool, that value is calculated as follows:

Total read I/O rate = (synchronous reads/second) + (asynchronous reads/second)

Getting the numbers to plug into this formula is a pretty easy thing. If you can generate a statistics long report (sometimes called a statistics detail report) for a DB2 subsystem by way of your DB2 monitor, look in such a report for the section containing buffer pool activity information. For each active pool in the environment, you'll see the synchronous read rate, and you'll also see the three asynchronous read rates, for sequential prefetch, list prefetch, and dynamic prefetch. Your monitor might already have these rates in per-second form for you, and that makes it really easy: just sum those four numbers (synchronous reads per second, sequential prefetch reads per second, list prefetch reads per second, and dynamic prefetch reads per second). If your monitor reports activity in per-minute form, just divide the four relevant values by 60 to get per-second numbers, and do the previously mentioned addition. You could also get these synchronous and asynchronous read numbers from an online display of buffer pool activity provided by your DB2 monitor, but for ongoing performance tuning, I prefer DB2 monitor-generated reports over online displays.

An alternative means of getting buffer pool activity information is to use the output of the DB2 command -DISPLAY BUFFERPOOL. If you go the command route, my suggestion is to do the following:
  1. Issue the command -DISPLAY BUFFERPOOL(ACTIVE) DETAIL on the DB2 subsystem of interest. You can disregard the output of this issuance of the command.
  2. One hour after issuing the command the first time, issue -DISPLAY BUFFERPOOL(ACTIVE) DETAIL again. The output of this second issuance of the command will capture, for each active pool in the environment, activity for the one-hour period since the first issuance of the command (the command captures activity since the buffer pools were last allocated, or since the command was last issued -- whichever occurred most recently). You can then divide the activity numbers by 3600 to get per-second figures.

In the output of the second issuance of the -DISPLAY BUFFERPOOL command you'll see a timestamp value in a DSNB409I message (for example, "INCREMENTAL STATISTICS SINCE 10:00:32 SEP 1, 2015"). Check that timestamp to verify that the value is approximately one hour since the command was issued the first time. If you see a timestamp that is, say, 25 minutes prior to the time at which the command was issued the second time, it means that someone else got in there 25 minutes ago and issued the command.

Instead of adding together four numbers, as you'd do when using information from a DB2 monitor statistics long report (or from an online monitor display of buffer pool activity), you sum five numbers found in -DISPLAY BUFFERPOOL(ACTIVE) DETAIL command output. Why five instead of four? Because in the command output, synchronous read activity is reported in two "buckets": random synchronous reads and sequential synchronous reads. Total up, then, the random synchronous reads, the sequential synchronous reads, the sequential prefetch reads, the list prefetch reads, and the dynamic prefetch reads, and divide that sum by 3600 to get the total read I/O rate per second (assuming that you issued the command once and then again an hour later, and you're using the output of the second issuance of the command). At some sites, folks have created jobs that issue the -DISPLAY BUFFERPOOL(ACTIVE) DETAIL command at a certain time of day, then issue it again an hour later, and via REXX code parse the output of the second issuance of the command, pulling out the read activity values for each pool and totaling them and dividing the total by 3600 to get the total read I/O rate per second.

What I like to see, for each buffer pool, is a total read I/O rate of less than 1000 per second. A rate of less than 100 per second for a pool is really good, unless that pool is used to "pin" objects in memory, in which case you'd like to see a read I/O rate of zero (and "pinning" pools should be defined with PGSTEAL(NONE), as described in a blog entry I wrote a few years ago). Driving read I/Os down benefits system performance in two important ways: 1) it reduces elapsed time for DB2-accessing programs, and 2) it improves the CPU efficiency of a DB2 workload (every I/O operation consumes some CPU time, so reducing that activity lowers the CPU cost of DB2 data access). The best way to lower DB2 read I/O rates is to increase the size of buffer pools that have a high level of read I/O operations. Thus, monitoring buffer pool read I/O rates points you to the pools for which the potential for performance gains through read I/O reduction is the greatest.

Less than a week ago, I was reviewing buffer pool activity statistics for a DB2 subsystem that is a key component of an organization's IT infrastructure. One of the busier pools showed a hit ratio of 99.3%. That looks pretty good, doesn't it? Seeing that, might you turn your attention to another of the subsystem's buffer pools? You could, but that would be a mistake -- this same buffer pool had a total read I/O rate of over 2800 per second -- higher than that of any other pool belonging to the subsystem. THIS was seen to be the pool most in need of attention, per the marker provided by the read I/O rate. The pool, with approximately 600,000 4K buffers (about 2.4 GB of space), could stand to be at least doubled in size so as to substantially lower the high rate of read I/O activity. Could the organization make that move? Yes, because the LPAR housing the DB2 subsystem has over 100 GB of real storage (an increasingly common configuration these days), and much of that memory is unused. With another 2.4 GB of memory utilized for buffering DB2 pages (in the form of a twice-as-large buffer pool), the z/OS LPAR's demand paging rate (available via a z/OS monitor), currently zero, would likely remain at that level. If doubling the size of the high-I/O pool took the associated read I/O rate from 2800 per second to, say, 1200 per second, I'd make the pool significantly larger again to try to get the read I/O rate below 1000 per second -- while keeping an eye on the LPAR's demand paging rate (a demand paging rate in the low single digits per second or less during busy processing periods is an indicator that a system's real storage resource is not under too much pressure).

Here's the really important point that I want to make through this blog entry: at more and more sites, you find production DB2 for z/OS subsystems running in LPARs with lots of real storage (more than 100 GB -- sometimes several hundred gigabytes). That's a good thing, if you put that big memory resource to productive use. A great way to do that is to grow DB2 buffer pools in a performance-positive way, and the total read I/O rate for the various buffer pools -- NOT the hit ratio -- is your guide for doing this in an effective manner. So, look at the right buffer pool performance numbers, and act on them in the right way.

Monday, August 31, 2015

DB2 for z/OS: Which Buffer Pools Should You Page-Fix?

A few days ago, I delivered a presentation on DB2 for z/OS performance monitoring and tuning for the DBA team at a large financial institution. At one point during the session, an attendee asked this question: "Which of our buffer pools should we page-fix?" Because that's a good question, and because it's a question to which the answer has changed over the years, I decided to make it the subject of this blog entry.

The ability to fix a buffer pool in memory, accomplished through the PGFIX(YES) option of the -ALTER BUFFERPOOL command, was introduced with DB2 for z/OS Version 8. [Note that PGFIX(YES), unlike other -ALTER BUFFERPOOL specifications, does not take effect immediately for an already-allocated buffer pool. Actualizing this change requires deallocation and reallocation of the pool -- something typically accomplished by stopping and restarting the associated DB2 subsystem.] The benefit initially ascribed to page-fixed DB2 buffer pools was cheaper (in terms of CPU cost) I/O operations. How so? Well, if a DB2 buffer is not fixed in memory (and PGFIX(NO) is the default), DB2 has to request that z/OS fix the buffer in memory (i.e., make it non-pageable) every time a table space or index page is read into or written from the buffer (and that is true for reads and writes from and to group buffer pools in a DB2 data sharing environment, as well as reads and writes from and to disk volumes). When the I/O operation is complete, the buffer is "de-fixed" (i.e., made pageable again). Why is this done? It's done so that the buffer won't be stolen out from under DB2 by z/OS in the midst of the I/O operation. One pair of page-fix and page-release operations is pretty inexpensive, but when these happen at a rate of thousands per second, you're talking about a fairly significant consumption of processor resources (and keep in mind that a single prefetch read that brings, say, 32 pages of an object into memory will require 32 page-fix and 32 page-release actions). When the pages of a buffer pool are fixed in memory from the get-go, obviously page-fix and page-release actions are not required for every movement of a page into or out of a buffer, and thus I/O operations associated with a page-fixed buffer pool are more CPU-efficient than they otherwise would be. In-DB2 CPU time can be reduced, in some cases, by several percentage points for programs accessing data in a page-fixed DB2 buffer pool.

So, the original answer to the question, "Which of our buffer pools should we page-fix?" was, "The pools with the highest I/O rates." More specifically, I'd tell people to check on the total read I/O rate for each of the buffer pools of a production DB2 subsystem, and page-fix those pools having a rate of 1000 or more read I/Os per second, and perhaps as well pools with read I/O rates in the high hundreds per second. [The read I/O rate for a buffer pool is the number of synchronous reads plus the number of prefetch reads associated with the pool, expressed as a per-second figure. I/O activity information can be obtained from a DB2 monitor, or from the output of the DB2 command -DISPLAY BUFFERPOOL(ACTIVE) DETAIL. If the command is used, it should be issued once, then issued again an hour later. The statistics in the output of the second issuance of the command can then be divided by 3600 to get per-second figures.] In DB2 Version 8 and DB2 9 environments, specifying PGFIX(YES) for a low-I/O pool wouldn't do much good.

Along came DB2 10 for z/OS, and the answer to the "Which of our buffer pools should we page-fix?" question changed. It changed because of DB2 10's support for 1 MB real storage page frames (versus traditional 4 KB frames) for page-fixed buffer pools (1 MB page frames are themselves made available in a z/OS LPAR via the LFAREA parameter in the IEASYSxx member of SYS1.PARMLIB). When a buffer pool's pages are backed by 1 MB page frames, CPU efficiency is improved because the translation of virtual storage addresses to real storage addresses in that context is a less-costly process versus the 4 KB page frame situation (this thanks to a better hit ratio in what's called the translation lookaside buffer). That being the case, in DB2 10 (and 11) environments, high-I/O buffer pools are joined, as good choices for PGFIX(YES), by high-activity pools, and some in the latter category may not be in the former category. Sure, high-I/O pools are very likely to be high-activity pools (and I think that GETPAGEs per second is a good measure of activity), but a high-activity pool could very well be a low-I/O pool. In fact, a high-activity pool could be a no-I/O pool (no read I/Os, that is): consider a pool, defined with the DB2 10-introduced PGSTEAL(NONE) option, that is used to "pin" (that is, entirely cache) one or more database objects in memory. Ideally, you would see no read I/Os for such a pool once all of the pages belonging to table spaces or indexes assigned to the pool have been read into memory from disk (and for a PGSTEAL(NONE) buffer pool, DB2 will asynchronously read all of an associated object's pages into memory when the object is first referenced after pool allocation). If a "pinning" pool has a high GETPAGE rate, PGFIX(YES) could be a nice CPU-saving move.

The most recent development affecting the answer to the "Which buffer pools should we page-fix?" question is the advent of really large memory resources for z/OS LPARs. I like to see at least 20-40 GB of memory for each engine in a production z/OS LPAR (zIIP engines included, so I would want 160-320 GB -- or more -- of memory for a production z/OS LPAR with four general-purpose and four zIIP engines), and I'm seeing more LPARs that have this Big Memory characteristic. In a z/OS system with memory to burn, you might consider page-fixing ALL of the buffer pools for a production DB2 subsystem. [Here, I'm talking about a situation in which a z/OS LPAR holds a single production DB2 subsystem, and the size of that subsystem's buffer pool configuration is less than half of the LPAR's real storage size. If there are multiple DB2 subsystems in the LPAR, I'd want the combined size of those subsystems' buffer pool configurations to be not more than half of the LPAR's real storage size.] A set-up like this is not blue sky thinking on my part. It's reality. An organization with which I've worked a good bit over the past few years has a production DB2 subsystem in a z/OS LPAR with 9 engines (5 general-purpose, 4 zIIP) and 212 GB of real storage. The DB2 subsystem has a 90 GB buffer pool configuration, and every one of the buffer pools is defined with PGFIX(YES). The demand paging rate for that z/OS LPAR (my preferred measure of the pressure on a z/OS LPAR's memory resource) is zero -- even with 90 GB of the LPAR's real storage fenced off for exclusive use for buffering DB2 data, the remaining 122 GB is more than enough for other memory users in the system. Now, even if you have a z/OS LPAR with a ginormous amount of real storage, don't just change to PGFIX(YES) for all of a production DB2 subsystem's buffer pools at one time. Do that, instead, in stages, and keep an eye on the LPAR's demand paging rate (available from a z/OS monitor). You're good if that rate is zero. If it's non-zero but low (e.g., 1 or 2 per second), you're not in a bad situation, but you might want to put the brakes on additional use of PGFIX(YES) until you can get more memory on the system.

In conclusion, the answer to the question, "Which buffer pools should we page-fix?" depends on the nature of your DB2 environment:
  • If the z/OS LPAR's demand paging rate is on the high side (high single digits or more per second), you might want to stay away from PGFIX(YES) altogether until you can add to the LPAR's real storage resource.
  • If the z/OS LPAR's demand paging rate is low but non-zero (e.g., in the vicinity of 1 or 2 per second), consider page-fixing one or more of your buffer pools that have the highest total read I/O rates.
  • If the z/OS LPAR's demand paging rate is zero and you are running DB2 10 or 11, consider page-fixing the pools that have the highest read I/O rates and the pools that have the highest GETPAGE rates (assuming that you are managing some of the LPAR's real storage resource in 1 MB page frames).
  • If you have a z/OS LPAR with memory to burn (running z/OS on what I would call a muscle machine -- rev it up, baby), and you're running DB2 10 or 11, consider page-fixing all of the buffer pools for a production DB2 subsystem. [And one more thing: if you have one or more REALLY BIG buffer pools (meaning 20 GB or more for a single pool) in a DB2 11 environment, consider backing that pool with 2 GB page frames (if the system has such -- again, this is related to the LFAREA parameter in member IEASYSxx of SYS1.PARMLIB), by way of the new FRAMESIZE option of the -ALTER BUFFERPOOL command. For a pool that's not really big, 2 GB page frames won't make much of a difference, CPU efficiency-wise.]

Hope this information proves to be helpful for you. Survey your DB2 landscape, and leverage buffer pool page-fixing as it makes sense for your environment.

Wednesday, August 12, 2015

What Does a DB2 for z/OS System Look Like When You Have Memory to Burn?

I've long been an advocate of Big Memory (meaning, lots of real storage) for DB2 for z/OS systems. For years, I was disappointed at seeing one production DB2 subsystem after another running in a z/OS LPAR with 16 GB, 20 GB, maybe 40 GB of memory -- enough real storage to run decently, but not enough to let DB2 do its thing with maximum performance. It was like seeing a thoroughbred racehorse in a small coral. You want to see that beast run free in a big pasture, to see what it can do with some real space to work in.

Lately, I have noticed things changing for the better. Memory sizes for z/OS LPARs -- especially the ones that I really care about, which are those that house production DB2 subsystems -- are finally starting to get seriously large. That's large as in more than a hundred gigabytes -- sometimes several hundred gigabytes -- for one LPAR. This change is being fueled by multiple factors:
  • The cost of z Systems memory continues to go down on a per-GB basis. The z13 servers took us another big step in that direction, and if you get enough memory when upgrading your z196 or zEC12 mainframe to a z13 -- what our sales reps call "mega memory" -- then the discount versus the list price of the memory can be downright ginormous.
  • You can get -- and use -- a lot more mainframe memory than you could before. A single z13 server can be configured with as much as 10 TB of real storage (versus a previous max of 3 TB), and up to 4 TB of z13 memory can be used for a single z/OS LPAR (versus 1 TB previously), if you're running z/OS 2.2 -- soon to be available -- or z/OS 2.1 with some PTFs.
  • Organizations are finally paying attention to z Systems memory. Mainframe engines have become really powerful (about 1000 MIPS of processing capacity per CPU), and z Systems configurations were getting a little out of balance for a while, with big-time processing power being paired with too-small real storage resources. Memory sizes are now catching up with engine capacity. People are also increasingly waking up to the fact that Big Memory is somewhat analogous to zIIP engines: it boosts performance and throughput for DB2 workloads (and for Java applications, as well) without impacting the license cost of z/OS software.

Now, some DB2 for z/OS people might be thinking, "What would I do if I had a ton of memory to work with (let's say, at least a few hundred GB in a z/OS LPAR)?" How might I exploit that resource, and what would my DB2 system look like with that resource exploited in a major way?" Well, I'm glad you asked. I think your DB2 system would look something like this:
  • Your buffer pool configuration size is really big, and the total read I/O rate for each pool is really low. Size-wise, given at least a few hundred GB of memory in the LPAR, I'd say that your buffer pool configuration would be at least 100 GB (i.e., the aggregate size of all buffer pools allocated for the DB2 subsystem would be 100 GB or more). In general, when a z/OS LPAR houses a single production DB2 subsystem, I think that a buffer pool configuration size that is 30-40% of the LPAR's real storage size is very reasonable; so, if you have 400 GB in the LPAR, a buffer pool configuration of 120-160 GB should fit very nicely. With a buffer pool configuration of that size, you might see really low read I/O rates for each pool (the read I/O rate for a buffer pools is the rate of all read I/Os for the pool, synchronous plus asynchronous, per second). In my mind, a "really low" total read I/O rate for a given pool is less than 100 per second. That said, with a really big buffer pool configuration you might use some pools for "pinning" certain objects in memory (you'd use the PGSTEAL(NONE) option in that case), and for those pools your target read I/O rate would be zero. Also with a really big buffer pool configuration, you might have one or more individual pools sized at 20 GB or more, and for pools of that size 2 GB real storage page frames (usable for page-fixed buffer pools starting with DB2 11) could deliver additional CPU savings. Finally, with a whole lot of real storage on hand, you might decide to page-fix most, and maybe even all, of your buffer pools, for maximum CPU efficiency.
  • Your packages associated with frequently-executed transactions that re-use threads, and packages associated with batch jobs that issue frequent commits, are bound with RELEASE(DEALLOCATE). For packages bound or rebound in a DB2 10 or DB2 11 system, almost all of the virtual storage associated with those packages when they are allocated to threads for execution goes above the 2 GB bar in the DB2 DBM1 address space, and it uses agent local pool storage versus the EDM pool, so you don't need to worry about running out of space in a virtual storage sense (RELEASE(DEALLOCATE), in combination with threads that persist through commits, increases virtual and real storage utilization). CICS-DB2 thread re-use can be boosted through protected entry threads, IMS-DB2 thread re-use can be increased via pseudo-WFI and/or WFI regions, and DDF thread re-use can be achieved with high-performance DBATs. For relatively simple transactions (those with relatively low in-DB2 CPU times), the combination of RELEASE(DEALLOCATE) packages and thread re-use can reduce in-DB2 CPU time by 10% or more. For batch programs that issue lots of commits, RELEASE(DEALLOCATE) has the added benefit of making sequential prefetch and index lookaside more effective. Note that DB2 11 provided relief for the problem of some bind/re-bind, DDL, and utility operations being blocked by RELEASE(DEALLOCATE) packages executed via persistent LOCAL threads. [If you need to keep RELEASE(DEALLOCATE) packages associated with DDF work from blocking database administration tasks, you can turn off high-performance DBAT functionality via the command -MODIFY DDF PKGREL(COMMIT), and then later turn it back on with the command -MODIFY DDF PKGREL(BNDOPT).]
  • The hit ratio for your DB2 dynamic statement cache is north of 90%. More memory allows for a larger dynamic statement cache, and that means more cache hits and more avoidance of full PREPAREs.
  • All of the RID list processing operations performed on your system are completed using only RID pool space. Starting with DB2 10, two important things related to RID list processing occurred: 1) the default RID pool size went way up (to 400 MB, from 8 MB), and 2) RID list processing operations that can't complete using only RID pool space (because there's not enough of that resource) will continue, using space in 32K-page work file table spaces. Your DB2 monitor (if it supports DB2 10) will show you the extent to which work file space is used for the completion of RID list processing operations that ran out of RID pool space, and if you see such RID list processing "spill-over" activity, you make your RID pool larger (which you can do because the LPAR in which the DB2 subsystem is running has a whole lot of memory). That action allows RID list processing operations to complete in the RID pool, and that boosts performance (versus having to use work file space).
  • You have a really big DB2 sort pool, and that reduces use of work file space for SQL sorts, and that improves SQL sort performance. The sort pool (sized per the value specified for the SRTPOOL parameter in ZPARM) is the amount of in-memory work space that can be used for each concurrent SQL-related sort executing in your DB2 system (so, if the SRTPOOL value is Y, and there are 10 large SQL-related sorts executing concurrently on your system, you could have 10Y of space in the DB2 DBM1 address space used for in-memory processing of these sorts). The default value of SRTPOOL is 10 MB (up from 2 MB prior to DB2 10). Because you have lots of real storage in your z/OS LPAR, you have a larger SRTPOOL value (maybe 40 MB or more), and that means more SQL-related sort work gets done in memory, and that is good for performance.
  • If you run DB2 in data sharing mode, your group buffer pools are large enough so that you have zero directory entry reclaims AND high "XI" GBP read hit ratios. Here, I'm assuming that your coupling facility LPARs, as well as your z/OS LPARs, have lots and lots of memory. Don't know what the "XI" GBP read hit ratio is? Read about it here.
  • With all these big uses of Big Memory, your z/OS LPAR's demand paging rate is still zero, or close to zero. The demand paging rate, my preferred indicator of pressure (or lack thereof) on a z/OS LPAR's real storage resource, is the rate at which pages that had been moved out of memory to auxiliary storage by z/OS (to make room for other pages to be brought into memory) are brought back into server memory on-demand. With a way big real storage resource in your z/OS LPAR, you're able to have a really large buffer pool configuration, lots of RELEASE(DEALLOCATE) packages executed via persistent threads, a big dynamic statement cache, and lots of RID pool and sort pool space, while still leaving plenty of memory for other DB2 and non-DB2 uses. With enough memory to go around, demand paging should be nil or close to it.

So, does your system look like that? If not, why not? Do you not have hundreds of gigabytes of real storage in the LPARs in which you run production DB2 subsystems? If you don't, work on getting there. If you do have a z/OS LPAR with tons of memory and a production DB2 subsystem running therein, and you've not leverage that big memory resource, get to work on that; otherwise, you're missing out on optimal DB2 performance. DB2's a thoroughbred. Give it room to run.

Friday, July 31, 2015

DB2 for z/OS Group Buffer Pools: ALLOWAUTOALT, Directory Entries, and GBP Size

So, I've seen something interesting lately in reviewing some DB2 for z/OS data sharing systems, and as is often the case in such situations, when I see something interesting I want to write about it. Thus this blog entry. Hope you find it to be useful.

What has caught my eye recently in looking over DB2 data sharing configuration information? Big group buffer pool (GBP) directory to data entry ratios. Like 29:1. 30:1. 32:1. 34:1. The default ratio is 5:1. What's going on here?

I'll provide at this point a bit of background information. In a DB2 data sharing system, group buffer pools have a couple of functions. First, they are used to keep track of pages of GBP-dependent objects that are cached in members' local buffer pools, so that the system can mark the copy of page X cached in buffer pool BP3 on member DBP1 invalid if that page is changed by a process running on member DBP2. Second, they serve as a super-high-performance cache for changed data and index pages, so that when member DBP1 sees that the copy of page X that it had cached locally in BP3 has been marked invalid, it can pull the current image of the page from GBP3 (to which the page was written by DBP2 at the time of the commit of a change to the page effected on that member) in a few microseconds. There is usually more page registration activity than GBP page write activity in a data sharing system (not every page read is changed), so the ratio of 5 directory entries (used to register the local caching of pages belonging to GBP-dependent objects) for every 1 data entry (used for caching changed pages of GBP-dependent objects) is a sensible default setting.

There will be times when table spaces and/or indexes assigned to a given buffer pool have a mix of read versus update activity that makes a downward or upward adjustment of the default directory-to-data entry ratio, to, say, 3:1 or 8:1, in the corresponding GBP a reasonable performance tuning action. A ratio greater than 20:1 seems high to me.

On seeing, just a couple of weeks ago, a very high GBP directory-to-data entry ratio in analyzing system information at a mainframe DB2 site, I asked, "Why did you set that ratio to such a high value?" The response: "We didn't do that." I asked, "Is that GBP defined with ALLOWAUTOALT(YES) in the CFRM policy for this data sharing group?" "Yes." Ah, so.

ALLOWAUTOALT can be a useful specification for a coupling facility structure, as it allows the system to dynamically adjust some characteristics of the structure to enhance operational efficiency and availability. When the structure is a GBP, ALLOWAUTOALT can alter the GBP's directory-to-data entry ratio. Why would the system do that, maybe up to a very high value? That happens in an effort by the system to avoid what are called directory entry reclaims. Those can happen when the number of "slots" into which pages of objects assigned to a given buffer pool can go (and that number of "slots" would be all of the buffers in, for example, BP4 on each member of the data sharing group, plus all of the data entries in GBP4) exceeds the number of directory entries in the associated GBP. If BP4 has 50,000 buffers on member DBP1 and another 50,000 buffers on member DBP2, and GBP4 has 10,000 data entries, that's 110,000 "slots" into which, conceivably (though not probably), 110,000 different pages could be placed. If the GBP has 50,000 directory entries, that discrepancy could result in the system having to steal, or reclaim, an in-use directory entry in order to track a page being newly brought into a member's local buffer pool. When such a directory reclaim occurs, the page that WAS being tracked using the reclaimed directory entry is preemptively marked as invalid, wherever it's locally cached. When THAT happens, the page will likely have to be read back into memory from disk when next referenced, and the page read in from disk is likely to be identical to the one that is already cached in memory, but marked invalid, on the reading member (because the invalidation was caused by a directory entry reclaim and not by a page update on another member). Thus it is that directory entry reclaims drive disk reads that would otherwise be unnecessary. That's a drag on performance, and that's why you like to avoid directory entry reclaims (directory entry reclaim activity for GBPn can be checked via the output of the DB2 command -DISPLAY GROUPBUFFERPOOL(GBPn) GDETAIL).

I've seen that with ALLOWAUTOALT in effect for a GBP, the system will often increase the directory-to-data entry ratio for a GBP in an effort to avoid directory entry reclaims. If the ratio were made too high, you could end up with a number of data entries in a GBP that is so small that you get GBP write failures due to lack of storage (also trackable via -DISPLAY GROUPBUFFERPOOL(GNPn) GDETAIL command output). You really don't want GBP write failures to occur, because they can cause pages to land on the logical page list (LPL), and that can lead to failures of programs that try to access LPL pages. The good news here is that ALLOWAUTOALT functions with that admonition in mind (figuratively speaking): I tend NOT to see any GBP write failures due to lack of storage for GBPs that have had their directory-to-data entry ratio automatically adjusted upwards through ALLOWAUTOALT(YES).

That doesn't mean that a significant upwards adjustment of a GBP's directory-to-data entry ratio is completely without penalty, even when no GBP write failures occur. Because more directory entries means fewer data entries (absent an increase in the size of a GBP), a directory-to-data entry ratio increase will result in shorter GBP residency time for changed pages written to the GBP at commit time. That, in turn, leads to a degradation in the one GBP read hit ratio that matters to me: the hit ratio for GBP reads due to buffer invalidation (which I also call the "XI" GBP read hit ratio, for a reason that will be apparent). This read hit ratio can be calculated for a member of a DB2 data sharing group using numbers found in a DB2 monitor statistics long report, or an online monitor display of GBP activity (you can also find information on GBP reads due to buffer invalidations in the output of the DB2 command -DISPLAY GROUPBUFFERPOOL(GBPn) MDETAIL). In an IBM OMEGAMON for DB2 statistics long report, the numbers would look like this:

GROUP BP3                      QUANTITY
----------------------------   --------
GROUP BP R/W RATIO (%)            41.23


The overall GBP hit ratio (highlighted in blue above) is not important to me. Neither is the read hit ratio for GBP synchronous reads due to page not found, which one can calculate using the numbers highlighted in green (it's usually pure luck when a page not found in a local buffer pool is found in the corresponding GBP). The "XI" read hit ratio, calculated using the numbers highlighted in red in the OMEGAMON report snippet, matters. These figures indicate the result of synchronous GBP reads due to buffer invalidation (XI, for short). If there are no directory entry reclaims for a GBP, you KNOW that any local pool buffer invalidations occurred because of the updating of a page of a GBP-dependent page set or partition. Such an updated page would HAVE to be written by DBP2 (if that was the member on which the page-changing action occurred) to the appropriate GBP, and your hope is that the page is still in the GBP when member DBP1 goes there looking for it (and DBP1 will look for the page in the GBP when its locally cached copy of the page, invalidated via member DBP2's update action, is referenced). The longer a changed page stays in a GBP, the better the odds that it will be found there (versus having to be read from disk, which takes MUCH more time than a GBP read) when it is next referenced by a member with an old (since updated) copy of the page in its local buffer pool. More GBP data entries means longer GBP page residency time, and thus more hits for GBP reads due to buffer invalidation. Fewer GBP data entries means shorter GBP page residency times, and a lower "XI" GBP read hit ratio. The numbers in the OMEGAMON report information shown above (displaying activity in a real-world system) provide this "XI" GBP read hit ratio:

39,738.37 / (39,738.37 + 451.85) = 98.87%

That's a very good "XI" GBP read hit ratio. I often see "XI" GBP read hit ratios that are 90% or better (when GBPs are adequately sized), and I don't think that it's unreasonable for you to shoot for a similarly high ratio.

Here, then, is what I'd recommend: check the directory-to-data entry ratio for your production DB2 GBPs. Is it really high for a GBP (I'd say that 20 or more to one is quite high)? If so, how does the "XI" read hit ratio for that GBP look? If that read hit ratio is low (I'd say that below 70% is low, and below 40% is quite low), see if you have enough memory available in the coupling facility LPAR to significantly enlarge the GBP. If you can accomplish that GBP enlargement, see if that allows for a reduction in the directory-to-data entry ratio for the GBP that still provides enough directory entries to avoid directory entry reclaims (auto-alter can lower the ratio dynamically -- if you manually change a GBP's directory-to-data entry ratio via an -ALTER BUFFERPOOL command, that change will take effect the next time the GBP is allocated). Finally, see if those two actions (bigger GBP, lower directory-to-data entry ratio) yields enough additional data entries to substantially improve the "XI" read hit ratio for the GBP. If you do see that read hit ratio improve, you've enhanced the performance of your DB2 data sharing system.

ALLOWAUTOALT is useful in that it imbues coupling facility structures with some added flexibility and resiliency; however, in the case of GBPs, at least, ALLOWAUTOALT should not be thought of as a "set it and forget it" thing. You still need to monitor GBP configuration settings, and understand that when you see a really high directory-to-data entry ratio that was set through ALLOWAUTOALT action, that's the system trying to make the best use of a GBP that is probably smaller than it ought to be. Making good use of too little GBP space is good. Providing more GBP space, in that situation, is better still.