Monday, August 31, 2015

DB2 for z/OS: Which Buffer Pools Should You Page-Fix?

A few days ago, I delivered a presentation on DB2 for z/OS performance monitoring and tuning for the DBA team at a large financial institution. At one point during the session, an attendee asked this question: "Which of our buffer pools should we page-fix?" Because that's a good question, and because it's a question to which the answer has changed over the years, I decided to make it the subject of this blog entry.

The ability to fix a buffer pool in memory, accomplished through the PGFIX(YES) option of the -ALTER BUFFERPOOL command, was introduced with DB2 for z/OS Version 8. [Note that PGFIX(YES), unlike other -ALTER BUFFERPOOL specifications, does not take effect immediately for an already-allocated buffer pool. Actualizing this change requires deallocation and reallocation of the pool -- something typically accomplished by stopping and restarting the associated DB2 subsystem.] The benefit initially ascribed to page-fixed DB2 buffer pools was cheaper (in terms of CPU cost) I/O operations. How so? Well, if a DB2 buffer is not fixed in memory (and PGFIX(NO) is the default), DB2 has to request that z/OS fix the buffer in memory (i.e., make it non-pageable) every time a table space or index page is read into or written from the buffer (and that is true for reads and writes from and to group buffer pools in a DB2 data sharing environment, as well as reads and writes from and to disk volumes). When the I/O operation is complete, the buffer is "de-fixed" (i.e., made pageable again). Why is this done? It's done so that the buffer won't be stolen out from under DB2 by z/OS in the midst of the I/O operation. One pair of page-fix and page-release operations is pretty inexpensive, but when these happen at a rate of thousands per second, you're talking about a fairly significant consumption of processor resources (and keep in mind that a single prefetch read that brings, say, 32 pages of an object into memory will require 32 page-fix and 32 page-release actions). When the pages of a buffer pool are fixed in memory from the get-go, obviously page-fix and page-release actions are not required for every movement of a page into or out of a buffer, and thus I/O operations associated with a page-fixed buffer pool are more CPU-efficient than they otherwise would be. In-DB2 CPU time can be reduced, in some cases, by several percentage points for programs accessing data in a page-fixed DB2 buffer pool.

So, the original answer to the question, "Which of our buffer pools should we page-fix?" was, "The pools with the highest I/O rates." More specifically, I'd tell people to check on the total read I/O rate for each of the buffer pools of a production DB2 subsystem, and page-fix those pools having a rate of 1000 or more read I/Os per second, and perhaps as well pools with read I/O rates in the high hundreds per second. [The read I/O rate for a buffer pool is the number of synchronous reads plus the number of prefetch reads associated with the pool, expressed as a per-second figure. I/O activity information can be obtained from a DB2 monitor, or from the output of the DB2 command -DISPLAY BUFFERPOOL(ACTIVE) DETAIL. If the command is used, it should be issued once, then issued again an hour later. The statistics in the output of the second issuance of the command can then be divided by 3600 to get per-second figures.] In DB2 Version 8 and DB2 9 environments, specifying PGFIX(YES) for a low-I/O pool wouldn't do much good.

Along came DB2 10 for z/OS, and the answer to the "Which of our buffer pools should we page-fix?" question changed. It changed because of DB2 10's support for 1 MB real storage page frames (versus traditional 4 KB frames) for page-fixed buffer pools (1 MB page frames are themselves made available in a z/OS LPAR via the LFAREA parameter in the IEASYSxx member of SYS1.PARMLIB). When a buffer pool's pages are backed by 1 MB page frames, CPU efficiency is improved because the translation of virtual storage addresses to real storage addresses in that context is a less-costly process versus the 4 KB page frame situation (this thanks to a better hit ratio in what's called the translation lookaside buffer). That being the case, in DB2 10 (and 11) environments, high-I/O buffer pools are joined, as good choices for PGFIX(YES), by high-activity pools, and some in the latter category may not be in the former category. Sure, high-I/O pools are very likely to be high-activity pools (and I think that GETPAGEs per second is a good measure of activity), but a high-activity pool could very well be a low-I/O pool. In fact, a high-activity pool could be a no-I/O pool (no read I/Os, that is): consider a pool, defined with the DB2 10-introduced PGSTEAL(NONE) option, that is used to "pin" (that is, entirely cache) one or more database objects in memory. Ideally, you would see no read I/Os for such a pool once all of the pages belonging to table spaces or indexes assigned to the pool have been read into memory from disk (and for a PGSTEAL(NONE) buffer pool, DB2 will asynchronously read all of an associated object's pages into memory when the object is first referenced after pool allocation). If a "pinning" pool has a high GETPAGE rate, PGFIX(YES) could be a nice CPU-saving move.

The most recent development affecting the answer to the "Which buffer pools should we page-fix?" question is the advent of really large memory resources for z/OS LPARs. I like to see at least 20-40 GB of memory for each engine in a production z/OS LPAR (zIIP engines included, so I would want 160-320 GB -- or more -- of memory for a production z/OS LPAR with four general-purpose and four zIIP engines), and I'm seeing more LPARs that have this Big Memory characteristic. In a z/OS system with memory to burn, you might consider page-fixing ALL of the buffer pools for a production DB2 subsystem. [Here, I'm talking about a situation in which a z/OS LPAR holds a single production DB2 subsystem, and the size of that subsystem's buffer pool configuration is less than half of the LPAR's real storage size. If there are multiple DB2 subsystems in the LPAR, I'd want the combined size of those subsystems' buffer pool configurations to be not more than half of the LPAR's real storage size.] A set-up like this is not blue sky thinking on my part. It's reality. An organization with which I've worked a good bit over the past few years has a production DB2 subsystem in a z/OS LPAR with 9 engines (5 general-purpose, 4 zIIP) and 212 GB of real storage. The DB2 subsystem has a 90 GB buffer pool configuration, and every one of the buffer pools is defined with PGFIX(YES). The demand paging rate for that z/OS LPAR (my preferred measure of the pressure on a z/OS LPAR's memory resource) is zero -- even with 90 GB of the LPAR's real storage fenced off for exclusive use for buffering DB2 data, the remaining 122 GB is more than enough for other memory users in the system. Now, even if you have a z/OS LPAR with a ginormous amount of real storage, don't just change to PGFIX(YES) for all of a production DB2 subsystem's buffer pools at one time. Do that, instead, in stages, and keep an eye on the LPAR's demand paging rate (available from a z/OS monitor). You're good if that rate is zero. If it's non-zero but low (e.g., 1 or 2 per second), you're not in a bad situation, but you might want to put the brakes on additional use of PGFIX(YES) until you can get more memory on the system.

In conclusion, the answer to the question, "Which buffer pools should we page-fix?" depends on the nature of your DB2 environment:
  • If the z/OS LPAR's demand paging rate is on the high side (high single digits or more per second), you might want to stay away from PGFIX(YES) altogether until you can add to the LPAR's real storage resource.
  • If the z/OS LPAR's demand paging rate is low but non-zero (e.g., in the vicinity of 1 or 2 per second), consider page-fixing one or more of your buffer pools that have the highest total read I/O rates.
  • If the z/OS LPAR's demand paging rate is zero and you are running DB2 10 or 11, consider page-fixing the pools that have the highest read I/O rates and the pools that have the highest GETPAGE rates (assuming that you are managing some of the LPAR's real storage resource in 1 MB page frames).
  • If you have a z/OS LPAR with memory to burn (running z/OS on what I would call a muscle machine -- rev it up, baby), and you're running DB2 10 or 11, consider page-fixing all of the buffer pools for a production DB2 subsystem. [And one more thing: if you have one or more REALLY BIG buffer pools (meaning 20 GB or more for a single pool) in a DB2 11 environment, consider backing that pool with 2 GB page frames (if the system has such -- again, this is related to the LFAREA parameter in member IEASYSxx of SYS1.PARMLIB), by way of the new FRAMESIZE option of the -ALTER BUFFERPOOL command. For a pool that's not really big, 2 GB page frames won't make much of a difference, CPU efficiency-wise.]

Hope this information proves to be helpful for you. Survey your DB2 landscape, and leverage buffer pool page-fixing as it makes sense for your environment.

Wednesday, August 12, 2015

What Does a DB2 for z/OS System Look Like When You Have Memory to Burn?

I've long been an advocate of Big Memory (meaning, lots of real storage) for DB2 for z/OS systems. For years, I was disappointed at seeing one production DB2 subsystem after another running in a z/OS LPAR with 16 GB, 20 GB, maybe 40 GB of memory -- enough real storage to run decently, but not enough to let DB2 do its thing with maximum performance. It was like seeing a thoroughbred racehorse in a small coral. You want to see that beast run free in a big pasture, to see what it can do with some real space to work in.

Lately, I have noticed things changing for the better. Memory sizes for z/OS LPARs -- especially the ones that I really care about, which are those that house production DB2 subsystems -- are finally starting to get seriously large. That's large as in more than a hundred gigabytes -- sometimes several hundred gigabytes -- for one LPAR. This change is being fueled by multiple factors:
  • The cost of z Systems memory continues to go down on a per-GB basis. The z13 servers took us another big step in that direction, and if you get enough memory when upgrading your z196 or zEC12 mainframe to a z13 -- what our sales reps call "mega memory" -- then the discount versus the list price of the memory can be downright ginormous.
  • You can get -- and use -- a lot more mainframe memory than you could before. A single z13 server can be configured with as much as 10 TB of real storage (versus a previous max of 3 TB), and up to 4 TB of z13 memory can be used for a single z/OS LPAR (versus 1 TB previously), if you're running z/OS 2.2 -- soon to be available -- or z/OS 2.1 with some PTFs.
  • Organizations are finally paying attention to z Systems memory. Mainframe engines have become really powerful (about 1000 MIPS of processing capacity per CPU), and z Systems configurations were getting a little out of balance for a while, with big-time processing power being paired with too-small real storage resources. Memory sizes are now catching up with engine capacity. People are also increasingly waking up to the fact that Big Memory is somewhat analogous to zIIP engines: it boosts performance and throughput for DB2 workloads (and for Java applications, as well) without impacting the license cost of z/OS software.

Now, some DB2 for z/OS people might be thinking, "What would I do if I had a ton of memory to work with (let's say, at least a few hundred GB in a z/OS LPAR)?" How might I exploit that resource, and what would my DB2 system look like with that resource exploited in a major way?" Well, I'm glad you asked. I think your DB2 system would look something like this:
  • Your buffer pool configuration size is really big, and the total read I/O rate for each pool is really low. Size-wise, given at least a few hundred GB of memory in the LPAR, I'd say that your buffer pool configuration would be at least 100 GB (i.e., the aggregate size of all buffer pools allocated for the DB2 subsystem would be 100 GB or more). In general, when a z/OS LPAR houses a single production DB2 subsystem, I think that a buffer pool configuration size that is 30-40% of the LPAR's real storage size is very reasonable; so, if you have 400 GB in the LPAR, a buffer pool configuration of 120-160 GB should fit very nicely. With a buffer pool configuration of that size, you might see really low read I/O rates for each pool (the read I/O rate for a buffer pools is the rate of all read I/Os for the pool, synchronous plus asynchronous, per second). In my mind, a "really low" total read I/O rate for a given pool is less than 100 per second. That said, with a really big buffer pool configuration you might use some pools for "pinning" certain objects in memory (you'd use the PGSTEAL(NONE) option in that case), and for those pools your target read I/O rate would be zero. Also with a really big buffer pool configuration, you might have one or more individual pools sized at 20 GB or more, and for pools of that size 2 GB real storage page frames (usable for page-fixed buffer pools starting with DB2 11) could deliver additional CPU savings. Finally, with a whole lot of real storage on hand, you might decide to page-fix most, and maybe even all, of your buffer pools, for maximum CPU efficiency.
  • Your packages associated with frequently-executed transactions that re-use threads, and packages associated with batch jobs that issue frequent commits, are bound with RELEASE(DEALLOCATE). For packages bound or rebound in a DB2 10 or DB2 11 system, almost all of the virtual storage associated with those packages when they are allocated to threads for execution goes above the 2 GB bar in the DB2 DBM1 address space, and it uses agent local pool storage versus the EDM pool, so you don't need to worry about running out of space in a virtual storage sense (RELEASE(DEALLOCATE), in combination with threads that persist through commits, increases virtual and real storage utilization). CICS-DB2 thread re-use can be boosted through protected entry threads, IMS-DB2 thread re-use can be increased via pseudo-WFI and/or WFI regions, and DDF thread re-use can be achieved with high-performance DBATs. For relatively simple transactions (those with relatively low in-DB2 CPU times), the combination of RELEASE(DEALLOCATE) packages and thread re-use can reduce in-DB2 CPU time by 10% or more. For batch programs that issue lots of commits, RELEASE(DEALLOCATE) has the added benefit of making sequential prefetch and index lookaside more effective. Note that DB2 11 provided relief for the problem of some bind/re-bind, DDL, and utility operations being blocked by RELEASE(DEALLOCATE) packages executed via persistent LOCAL threads. [If you need to keep RELEASE(DEALLOCATE) packages associated with DDF work from blocking database administration tasks, you can turn off high-performance DBAT functionality via the command -MODIFY DDF PKGREL(COMMIT), and then later turn it back on with the command -MODIFY DDF PKGREL(BNDOPT).]
  • The hit ratio for your DB2 dynamic statement cache is north of 90%. More memory allows for a larger dynamic statement cache, and that means more cache hits and more avoidance of full PREPAREs.
  • All of the RID list processing operations performed on your system are completed using only RID pool space. Starting with DB2 10, two important things related to RID list processing occurred: 1) the default RID pool size went way up (to 400 MB, from 8 MB), and 2) RID list processing operations that can't complete using only RID pool space (because there's not enough of that resource) will continue, using space in 32K-page work file table spaces. Your DB2 monitor (if it supports DB2 10) will show you the extent to which work file space is used for the completion of RID list processing operations that ran out of RID pool space, and if you see such RID list processing "spill-over" activity, you make your RID pool larger (which you can do because the LPAR in which the DB2 subsystem is running has a whole lot of memory). That action allows RID list processing operations to complete in the RID pool, and that boosts performance (versus having to use work file space).
  • You have a really big DB2 sort pool, and that reduces use of work file space for SQL sorts, and that improves SQL sort performance. The sort pool (sized per the value specified for the SRTPOOL parameter in ZPARM) is the amount of in-memory work space that can be used for each concurrent SQL-related sort executing in your DB2 system (so, if the SRTPOOL value is Y, and there are 10 large SQL-related sorts executing concurrently on your system, you could have 10Y of space in the DB2 DBM1 address space used for in-memory processing of these sorts). The default value of SRTPOOL is 10 MB (up from 2 MB prior to DB2 10). Because you have lots of real storage in your z/OS LPAR, you have a larger SRTPOOL value (maybe 40 MB or more), and that means more SQL-related sort work gets done in memory, and that is good for performance.
  • If you run DB2 in data sharing mode, your group buffer pools are large enough so that you have zero directory entry reclaims AND high "XI" GBP read hit ratios. Here, I'm assuming that your coupling facility LPARs, as well as your z/OS LPARs, have lots and lots of memory. Don't know what the "XI" GBP read hit ratio is? Read about it here.
  • With all these big uses of Big Memory, your z/OS LPAR's demand paging rate is still zero, or close to zero. The demand paging rate, my preferred indicator of pressure (or lack thereof) on a z/OS LPAR's real storage resource, is the rate at which pages that had been moved out of memory to auxiliary storage by z/OS (to make room for other pages to be brought into memory) are brought back into server memory on-demand. With a way big real storage resource in your z/OS LPAR, you're able to have a really large buffer pool configuration, lots of RELEASE(DEALLOCATE) packages executed via persistent threads, a big dynamic statement cache, and lots of RID pool and sort pool space, while still leaving plenty of memory for other DB2 and non-DB2 uses. With enough memory to go around, demand paging should be nil or close to it.

So, does your system look like that? If not, why not? Do you not have hundreds of gigabytes of real storage in the LPARs in which you run production DB2 subsystems? If you don't, work on getting there. If you do have a z/OS LPAR with tons of memory and a production DB2 subsystem running therein, and you've not leverage that big memory resource, get to work on that; otherwise, you're missing out on optimal DB2 performance. DB2's a thoroughbred. Give it room to run.

Friday, July 31, 2015

DB2 for z/OS Group Buffer Pools: ALLOWAUTOALT, Directory Entries, and GBP Size

So, I've seen something interesting lately in reviewing some DB2 for z/OS data sharing systems, and as is often the case in such situations, when I see something interesting I want to write about it. Thus this blog entry. Hope you find it to be useful.

What has caught my eye recently in looking over DB2 data sharing configuration information? Big group buffer pool (GBP) directory to data entry ratios. Like 29:1. 30:1. 32:1. 34:1. The default ratio is 5:1. What's going on here?

I'll provide at this point a bit of background information. In a DB2 data sharing system, group buffer pools have a couple of functions. First, they are used to keep track of pages of GBP-dependent objects that are cached in members' local buffer pools, so that the system can mark the copy of page X cached in buffer pool BP3 on member DBP1 invalid if that page is changed by a process running on member DBP2. Second, they serve as a super-high-performance cache for changed data and index pages, so that when member DBP1 sees that the copy of page X that it had cached locally in BP3 has been marked invalid, it can pull the current image of the page from GBP3 (to which the page was written by DBP2 at the time of the commit of a change to the page effected on that member) in a few microseconds. There is usually more page registration activity than GBP page write activity in a data sharing system (not every page read is changed), so the ratio of 5 directory entries (used to register the local caching of pages belonging to GBP-dependent objects) for every 1 data entry (used for caching changed pages of GBP-dependent objects) is a sensible default setting.

There will be times when table spaces and/or indexes assigned to a given buffer pool have a mix of read versus update activity that makes a downward or upward adjustment of the default directory-to-data entry ratio, to, say, 3:1 or 8:1, in the corresponding GBP a reasonable performance tuning action. A ratio greater than 20:1 seems high to me.

On seeing, just a couple of weeks ago, a very high GBP directory-to-data entry ratio in analyzing system information at a mainframe DB2 site, I asked, "Why did you set that ratio to such a high value?" The response: "We didn't do that." I asked, "Is that GBP defined with ALLOWAUTOALT(YES) in the CFRM policy for this data sharing group?" "Yes." Ah, so.

ALLOWAUTOALT can be a useful specification for a coupling facility structure, as it allows the system to dynamically adjust some characteristics of the structure to enhance operational efficiency and availability. When the structure is a GBP, ALLOWAUTOALT can alter the GBP's directory-to-data entry ratio. Why would the system do that, maybe up to a very high value? That happens in an effort by the system to avoid what are called directory entry reclaims. Those can happen when the number of "slots" into which pages of objects assigned to a given buffer pool can go (and that number of "slots" would be all of the buffers in, for example, BP4 on each member of the data sharing group, plus all of the data entries in GBP4) exceeds the number of directory entries in the associated GBP. If BP4 has 50,000 buffers on member DBP1 and another 50,000 buffers on member DBP2, and GBP4 has 10,000 data entries, that's 110,000 "slots" into which, conceivably (though not probably), 110,000 different pages could be placed. If the GBP has 50,000 directory entries, that discrepancy could result in the system having to steal, or reclaim, an in-use directory entry in order to track a page being newly brought into a member's local buffer pool. When such a directory reclaim occurs, the page that WAS being tracked using the reclaimed directory entry is preemptively marked as invalid, wherever it's locally cached. When THAT happens, the page will likely have to be read back into memory from disk when next referenced, and the page read in from disk is likely to be identical to the one that is already cached in memory, but marked invalid, on the reading member (because the invalidation was caused by a directory entry reclaim and not by a page update on another member). Thus it is that directory entry reclaims drive disk reads that would otherwise be unnecessary. That's a drag on performance, and that's why you like to avoid directory entry reclaims (directory entry reclaim activity for GBPn can be checked via the output of the DB2 command -DISPLAY GROUPBUFFERPOOL(GBPn) GDETAIL).

I've seen that with ALLOWAUTOALT in effect for a GBP, the system will often increase the directory-to-data entry ratio for a GBP in an effort to avoid directory entry reclaims. If the ratio were made too high, you could end up with a number of data entries in a GBP that is so small that you get GBP write failures due to lack of storage (also trackable via -DISPLAY GROUPBUFFERPOOL(GNPn) GDETAIL command output). You really don't want GBP write failures to occur, because they can cause pages to land on the logical page list (LPL), and that can lead to failures of programs that try to access LPL pages. The good news here is that ALLOWAUTOALT functions with that admonition in mind (figuratively speaking): I tend NOT to see any GBP write failures due to lack of storage for GBPs that have had their directory-to-data entry ratio automatically adjusted upwards through ALLOWAUTOALT(YES).

That doesn't mean that a significant upwards adjustment of a GBP's directory-to-data entry ratio is completely without penalty, even when no GBP write failures occur. Because more directory entries means fewer data entries (absent an increase in the size of a GBP), a directory-to-data entry ratio increase will result in shorter GBP residency time for changed pages written to the GBP at commit time. That, in turn, leads to a degradation in the one GBP read hit ratio that matters to me: the hit ratio for GBP reads due to buffer invalidation (which I also call the "XI" GBP read hit ratio, for a reason that will be apparent). This read hit ratio can be calculated for a member of a DB2 data sharing group using numbers found in a DB2 monitor statistics long report, or an online monitor display of GBP activity (you can also find information on GBP reads due to buffer invalidations in the output of the DB2 command -DISPLAY GROUPBUFFERPOOL(GBPn) MDETAIL). In an IBM OMEGAMON for DB2 statistics long report, the numbers would look like this:

GROUP BP3                      QUANTITY
----------------------------   --------
GROUP BP R/W RATIO (%)            41.23


The overall GBP hit ratio (highlighted in blue above) is not important to me. Neither is the read hit ratio for GBP synchronous reads due to page not found, which one can calculate using the numbers highlighted in green (it's usually pure luck when a page not found in a local buffer pool is found in the corresponding GBP). The "XI" read hit ratio, calculated using the numbers highlighted in red in the OMEGAMON report snippet, matters. These figures indicate the result of synchronous GBP reads due to buffer invalidation (XI, for short). If there are no directory entry reclaims for a GBP, you KNOW that any local pool buffer invalidations occurred because of the updating of a page of a GBP-dependent page set or partition. Such an updated page would HAVE to be written by DBP2 (if that was the member on which the page-changing action occurred) to the appropriate GBP, and your hope is that the page is still in the GBP when member DBP1 goes there looking for it (and DBP1 will look for the page in the GBP when its locally cached copy of the page, invalidated via member DBP2's update action, is referenced). The longer a changed page stays in a GBP, the better the odds that it will be found there (versus having to be read from disk, which takes MUCH more time than a GBP read) when it is next referenced by a member with an old (since updated) copy of the page in its local buffer pool. More GBP data entries means longer GBP page residency time, and thus more hits for GBP reads due to buffer invalidation. Fewer GBP data entries means shorter GBP page residency times, and a lower "XI" GBP read hit ratio. The numbers in the OMEGAMON report information shown above (displaying activity in a real-world system) provide this "XI" GBP read hit ratio:

39,738.37 / (39,738.37 + 451.85) = 98.87%

That's a very good "XI" GBP read hit ratio. I often see "XI" GBP read hit ratios that are 90% or better (when GBPs are adequately sized), and I don't think that it's unreasonable for you to shoot for a similarly high ratio.

Here, then, is what I'd recommend: check the directory-to-data entry ratio for your production DB2 GBPs. Is it really high for a GBP (I'd say that 20 or more to one is quite high)? If so, how does the "XI" read hit ratio for that GBP look? If that read hit ratio is low (I'd say that below 70% is low, and below 40% is quite low), see if you have enough memory available in the coupling facility LPAR to significantly enlarge the GBP. If you can accomplish that GBP enlargement, see if that allows for a reduction in the directory-to-data entry ratio for the GBP that still provides enough directory entries to avoid directory entry reclaims (auto-alter can lower the ratio dynamically -- if you manually change a GBP's directory-to-data entry ratio via an -ALTER BUFFERPOOL command, that change will take effect the next time the GBP is allocated). Finally, see if those two actions (bigger GBP, lower directory-to-data entry ratio) yields enough additional data entries to substantially improve the "XI" read hit ratio for the GBP. If you do see that read hit ratio improve, you've enhanced the performance of your DB2 data sharing system.

ALLOWAUTOALT is useful in that it imbues coupling facility structures with some added flexibility and resiliency; however, in the case of GBPs, at least, ALLOWAUTOALT should not be thought of as a "set it and forget it" thing. You still need to monitor GBP configuration settings, and understand that when you see a really high directory-to-data entry ratio that was set through ALLOWAUTOALT action, that's the system trying to make the best use of a GBP that is probably smaller than it ought to be. Making good use of too little GBP space is good. Providing more GBP space, in that situation, is better still.

Thursday, July 9, 2015

How Will You Use DB2 for z/OS Global Variables?

About seven years ago, in an entry posted to the blog I maintained while working as an independent DB2 consultant (prior to rejoining IBM), I wrote of the evolution of DB2 for z/OS SQL from merely a data manipulation language to a programming language. That evolution continues with DB2 Version 11, in part through the introduction in that DB2 release of global variable functionality. If you want to know more about this new application enabling feature of DB2 for z/OS, read on.

As SQL (in a DB2 for z/OS context) has become an ever-richer language, the scenarios in which lines of code written in some other language (e.g., COBOL or C or Java) have to be interspersed with SQL statements to accomplish some objective have decreased in number. A few more such scenarios were eliminated thanks to global variables. In particular, consider the situation in which you want to use one query to obtain a value from a DB2 table, which you then plug into a predicate of a second query. You could certainly write a simple routine (in COBOL or some other programming language) that declares a host variable, populates it with a value retrieved by a query of a table, and then issues a second query with the host variable referenced in a predicate. The pseudo-code would look something like this:


Now, you might say, "I don't need COBOL (or Java or whatever) to do that. I can write a native SQL procedure that gets this done using nothing but SQL statements." True enough, but there are scenarios in which use of a SQL PL routine would not be appropriate -- or maybe not even possible. Suppose, for example, that you're working with a tool (Data Studio, perhaps, or the SPUFI interface that comes with DB2) through which one interactively issues SQL statements for execution by a DB2 subsystem. What then? That's where a DB2 11 global variable (if you're in a DB2 11 new-function mode environment) comes in handy. Here are some statements that I executed on one of our IBM test DB2 systems, by way of SPUFI:



FROM SYSIBM.SYSCOLUMNS                        
WHERE TBNAME = ROBSBLOG                       

And the result of that SELECT is:
NAME                   COLNO                  
PRODUCT_ID                 1                  
MONTH                      2                  
UNIT_SALES                 3            

Easy breezy. And, no coding required beyond the SQL statements.

Now, there are a few informational items of which you should be aware regarding the use of DB2 global variables:

The value of a global variable is associated with the thread of an application process; thus, if the value of a global variable is set by a transaction program, the global variable's value will be discarded when the transaction ends and its thread is deallocated. If a global variable's value is set in a batch job, the variable's value will persist across COMMITs issued by the batch job (though it can of course be subsequently changed by the batch job), and will be discarded when the job completes and its thread is deallocated. In the case of my SPUFI exercise mentioned above, to make those SQL statements work as desired I had to set AUTOCOMMIT in my SPUFI session to NO. Had I not done that then the auto-COMMIT after my SET statement (referring to the statement that assigned the value 'PRODUCT_SALES' to the global variable) would have resulted in the assigned value being discarded, and then the statement


would have returned the value 'HELLO' (the default value specified in my CREATE VARIABLE statement).

A DB2 global variable, once created, can be used by any authorization ID or role with the requisite privilege. An ID or role can be granted the READ or WRITE privilege (or both) for a given global variable. The owner of a global variable has, implicitly, all privileges on that variable.

Though a created global variable can be used by any authorization ID or role with the requisite privilege, a global variable's value does not extend beyond a given session (i.e., a given thread). So, if I create a global variable called VAR_X, and I set that variable to the value 'ABC' (assuming that it's a character string variable), another application process (with the right privilege) can use VAR_X, but the value 'ABC' that I assigned to VAR_X will not be picked up in any other session that references VAR_X; so, a global variable itself is globally available within a DB2 subsystem, but a value assigned to the global variable within a session does not extend to any other session.

Some global variables come with DB2 11 for z/OS. Besides global variables that you might create and use, there are several that are already there in a DB2 11 system. These built-in global variables are:


The first of these provides the IP address associated with a network-connected DB2 client application, or of a requester DB2 subsystem that establishes a connection to a remote DB2 host (for a local-to-DB2 application, the value of SYSIBM.CLIENT_IPADDR will be NULL). The other two built-in global variables are for use with the DB2-managed archiving (aka transparent archiving) feature of DB2 11.

There you have it. Yet another in a long line of DB2 for z/OS enhancements that have truly made SQL a programming language. When you get to DB2 11 in new-function mode (or if you're already there), give global variables a try-out, and think about how you might put them to work at your site.

Thursday, June 18, 2015

Are You Using DB2 for z/OS Real-Time Statistics? DB2 Is

DB2 for z/OS is making increased use of the information that it keeps in the real-time statistics tables in the DB2 catalog. You should be doing likewise. Read on to learn more.

For a long time, the only statistics to be found in DB2 "system" tables were those generated through execution of the RUNSTATS utility. Starting with DB2 Version 7 (around the 2000 time frame), DB2 could automatically and periodically (every 30 minutes, by default) write a variety of statistics about table spaces and index spaces to a pair of tables (one for table space statistics, and one for index space statistics), but only if you created those tables. With DB2 9 in new-function mode, the real-time statistics information was moved from user-created tables to a couple of new catalog tables, SYSIBM.SYSTABLESPACESTATS and SYSIBM.INDEXSPACESTATS (I wrote about that change a few years ago, in an entry posted to the blog that I maintained while working as an independent DB2 consultant, prior to re-joining IBM). In those days, real-time statistics were mainly used by DBAs, via vendor tools or by way of user-written programs (the latter often written in REXX, and sometimes involving invocation of the DB2-supplied stored procedure called DSNACCOX), to make more intelligent decisions regarding the execution of utilities such as REORG and COPY. That's still a very good use of DB2 real-time statistics, but what has interested me more of late -- and what I'm writing about here -- is DB2's increasing use of these statistics in support of new autonomic functions, and a number of newer columns in SYSIBM.SYSTABLESPACESTATS and SYSIBM.SYSINDEXSPACESTATS that serve useful purposes beyond smarter utility management. I'll cover these topics now, in that order.

Increased use of real-time statistics by DB2 itself

Rather than try to provide an exhaustive list of each and every usage by DB2 itself of its own real-time statistics, I'll highlight a few that are particularly interesting to me:
  • DB2-managed allocation of utility sort work data sets -- This was a terrific enhancement, in my opinion, and I blogged about it back in 2011. By way of this new feature, built into the DB2 10 base code and retrofitted to DB2 Versions 8 and 9 via PTFs, DB2 utilities such as REORG, LOAD, REBUILD INDEX, and RUNSTATS could dynamically, automatically, and intelligently (using real-time statistics) manage the allocation of the sort work data sets required for their execution. At a growing number of DB2 for z/OS sites, utility JCL and DB2 ZPARM parameter values have been changed to put the DB2 utilities in charge of sort work data set allocation, and every such site that has communicated with me about that change has reported positive results in terms of both utility job reliability (a higher percentage of utility jobs successfully run to completion) and efficiency. That said, at too many sites utility sort work data sets are still being allocated the old way. If that's true where you work, read that aforementioned blog entry and make the changes described therein. It would be a good investment of your time.
  • Automatic calculation by REORG of the appropriate fixed hash space for hash-organized table spaces. DB2 10 for z/OS introduced hash-organized tables -- a row-organizing mechanism that enables super-efficient data access for a query that issues a singleton SELECT containing an "equals" predicate that references a particular unique key of the target table. For hash organization of data to be effective, a table space's fixed hash area must be of a sufficient size; however, you don't want to overdo the sizing of this area, because that would waste space (on disk and in memory). A good way to get a "just right" fixed hash area for a hash-organized table space is to let DB2 figure that out, and that's just what the REORG utility will do -- using real-time statistics -- when you run REORG with AUTOESTSPACE YES (the default) for the table space in question.
  • DB2-calculated space to accommodate table row length increases caused by UPDATEs. Starting with DB2 11 for z/OS, you can set aside space in a table space's pages that will be reserved exclusively for the accommodation of row length-increasing UPDATE statements (previously, space in a table space's pages could be set aside only for accommodation of INSERTs). If in a CREATE TABLESPACE or ALTER TABLESPACE statement you specify PCTFREE FOR UPDATE -1, for the associated table space the initial amount of space set aside in each page (following a REORG or a LOAD REPLACE utility operation) for length-increasing UPDATEs will be 5%, and that amount will subsequently be adjusted by DB2 (for subsequent REORG or LOAD REPLACE jobs) based on real-time statistics information (setting the new ZPARM parameter PCTFREE_UPD to AUTO will make -1 the default value of PCTFREE FOR UPDATE in CREATE TABLESPACE and ALTER TABLESPACE statements).
  • Identification of indexes in need of pseudo-deleted index entry clean-up. Starting with DB2 11 for z/OS, DB2 gets much better at physically deleting index entries that had been marked as pseudo-deleted in the course of row-delete operations. To determine which indexes are most in need of pseudo-deleted index entry clean-up (done by a background process), DB2 periodically checks relevant values in the SYSIBM.SYSINDEXSPACESTATS catalog table. You can read more about this in section 4.2 of the IBM redbook titled, "IBM DB2 11 for z/OS Performance Topics" (downloadable at
As DB2 itself is making ever-greater use of real-time statistics, so should you -- and not just to better manage utility operations (though that, as mentioned, continues to be a good use of real-time stats). Below I'll call attention to some of my favorite columns in the two real-time statistics table spaces.

Some of my favorite SYSTABLESPACESTATS columns
  • DATASIZE -- The number of bytes occupied by rows in a table space.
  • REORGCLUSTERSENS -- Why REORG a table space to reestablish clustering sequence, if that isn't important for the table space? This column provides information to let you make that determination.
  • UPDATESIZE -- This column (new with DB2 11) lets you know of the extent to which a table space is growing (or shrinking) as a result of length-increasing (or length-decreasing) UPDATE operations.
  • LASTDATACHANGE -- When was data in the table space last modified?
  • REORGNEARINDREF and REORGFARINDREF -- If you set aside space in a table space's pages for length-increasing UPDATE operations (a new option, as previously noted, with DB2 11), did that action reduce the occurrence of indirect references that are the result of row relocations caused by update-related row-length increases?
Some of my favorite SYSINDEXSPACESTATS columns
  • LASTUSED -- A great way to identify indexes that aren't doing you any good, and could therefore be dropped (following some due diligence on your part) to free up disk space and to reduce the CPU cost of insert and delete (and some update) and utility operations.
  • REORGNUMLEVELS -- Are levels being added to, or removed from, any of your indexes?
  • DRIVETYPE -- Same information as the identically-named column in SYSTABLESPACESTATS, but for index spaces.
  • REORGLEAFFAR -- Are index page splits causing index leaf pages to be located far from where they optimally should be in an index? If so, would a larger page size make sense for this index?

I encourage you to look over all of the columns of SYSTABLESPACESTATS and SYSINDEXSPACESTATS. I'm pretty sure that you'll find some that will be helpful to you. Take advantage of this information. DB2 is.

Thursday, May 28, 2015

DB2 11 for z/OS: Setting Aside Space for Updates

A DB2 table space is well-organized when rows are physically located where they should be per the clustering index of the associated table. Because a high degree of "clustered-ness" is a desirable quality of a table space (mostly for performance reasons), a DB2 DBA will often take steps to preserve row sequencing between REORGs. One such step that is very commonly executed is the setting aside of space in a table space's pages to accommodate inserts of new rows (relevant for a table that is clustered by a key that is NOT continuously ascending -- in the case of a continuously-ascending clustering key, new rows will be inserted at the "end" of the table). This insert-accommodating space set-aside is accomplished by way of the PCTFREE option of CREATE and ALTER TABLESPACE.

PCTFREE has always been a nice tool in the DB2 DBA's physical database design tool kit, but it had a shortcoming: it didn't apply to UPDATE operations that caused a row's length to increase. If a row in a DB2 for z/OS table space becomes longer as a result of an UPDATE, and if that longer row will no longer fit into the page in which it had been stored, it will be moved to another page in the table space. Will the row's location, as indicated by the RID (row ID) for the row in entries of indexes defined on the table, be changed to reflect the update-driven move of the row to a new page? No. Index entries associated with the row will continue to point to the row's former location. When DB2, in using an index to build a query result set that will include the row that was moved because of a length-increasing UPDATE, arrives at the "original home" page to retrieve the row, it will find, in effect, a "We've moved!" sign, along with a pointer to the row's new page. And, if that moved row is again lengthened by an UPDATE and again is moved (if it became too long to go back into its "second home" page), another "We've moved!" sign will be posted, along with a pointer to the row's "thIrd home" page. And so on.

These "We've moved!" signs related to update-induced row relocations are known, in DB2 parlance, as indirect references. Why are things done this way? Because updating a row's RID in a table's indexes every time a row-lengthening UPDATE caused a row move would add too much cost to UPDATE operations (the affected RIDs will be corrected with the next table space REORG). Obviously, rows with varying-length columns (such as VARCHAR columns) can be lengthened (and potentially relocated) as a result of UPDATE operations. So, too, can rows containing only fixed-length columns. How's that? Compression, that's how -- a row in a COMPRESS YES table space might compress differently (and with more length) with different column values resulting from an UPDATE.

Because indirect references are a drag on query performance (because of the hops to "new home" pages to which lengthened-beyond-fitting rows are relocated), one would like to reduce their occurrence. But how? There wasn't a good answer to this question until DB2 11 came along and provided the new PCTFREE FOR UPDATE option for CREATE and ALTER TABLESPACE. Here's how it works: if you set PCTFREE FOR UPDATE for a table space to, say, 10 then 10% of the space in the table space's pages (following a REORG or a LOAD REPLACE) will be set aside for the sole purpose of accommodating length-increasing  UPDATEs. In other words, that space will NOT be available for newly inserted rows. Can you still set aside space for to-be-inserted row's in a table space's pages? Of course you can. If you wanted 15% of the space in pages of a table space to be set aside (again, following a REORG or a LOAD REPLACE) for new INSERTs, and 10% set aside for length-increasing UPDATEs, you'd specify the following:


Would it be appropriate to specify a PCTFREE  FOR UPDATE value for each and every one of your DB2 tables? Probably not. An unneeded, non-zero PCTFREE FOR UPDATE value will result in wasted space in a table space's pages. It's best to use this DB2 enhancement for tables that are the targets of length-increasing UPDATE operations. Is it easy to identify table spaces that are associated with such tables? Sure it is -- just check the new (with DB2 11) UPDATESIZE column of the SYSIBM.SYSTABLESPACESTATS real-time statistics table in the DB2 catalog.The value in this column shows the extent to which a table space grew (or shrank) as a result of length-changing UPDATE operations since the most recent REORG or LOAD REPLACE. Speaking of real-time stats, here's an interesting option: if you specify PCTFREE FOR UPDATE -1 for a table space, the initial space set aside in the table space's pages for length-increasing UPDATEs will be 5%, and that amount will subsequently be adjusted by DB2 based on certain real-time statistics values. This is one of a growing number of examples of DB2 using its own real-time statistics to enhance efficiency of operation -- I'll be writing more about that soon.

One more thing: there is a new (with DB2 11) ZPARM, PCTFREE_UPD, that provides the default value for PCTFREE FOR UPDATE for ALTER and CREATE TABLESPACE statements. Setting PCTFREE_UPD to AUTO would make PCTFREE FOR UPDATE -1 (described above) the default.

How do you know if you've done good by specifying a non-zero PCTFREE FOR UPDATE value for a table space? Check to see that the incidence of indirect references (i.e., "We've moved!" signs) has decreased. Indirect references for table spaces can be tracked via the NEARINDREF and FARINDREF columns of the SYSIBM.SYSTABLEPART table, and the REORGNEARINDREF and REORGFARINDREF columns of SYSIBM.SYSTABLESPACESTATS. [An indirect reference is "near" if the associated row was moved to a page within 16 pages (or SEGSIZE/2, for a segmented table space -- and universal table spaces are segmented) of its previous "home" page; otherwise, the indirect reference is considered to be "far".]

By the way, PCTFREE FOR UPDATE is expected to be particularly useful for tables that have VARCHAR columns that are initially set to NULL and are later updated to have non-null values -- that is a scenario in which you'd expect rows to become significantly longer as a result of UPDATEs.

And there you have it. Nice feature, eh? When you get to DB2 11 (or if you're already there), take advantage of it.

Wednesday, May 13, 2015

For a Large DB2 for z/OS Table, Should You Go With Partition-by-Range or Partition-by-Growth?

There are several aspects to this question: What do I mean by "large?" Is the table in question new, or does it exist already? What is the nature of the data in the table, and how will that data be accessed and maintained? I'll try to cover these various angles in this blog entry, and I hope that you will find the information provided to be useful.

What is a "large" table?

Why even ask this question? Because the relevance of the partition-by-range (PBR) versus partition-by-growth (PBG) question is largely dependent on table size. If a table is relatively small, the question is probably moot because it is unlikely that range-partitioning a smaller table will deliver much value. Partitioning by growth would, in that case, be the logical choice (for many smaller tables, given the default DSSIZE of 4G, a PBG table space will never grow beyond a single partition).

OK, so what is "smaller" and what is "larger" when you're talking about a DB2 for z/OS table? There is, of course, no hard and fast rule here. In my mind, a larger DB2 for z/OS table is one that has 1 million or more rows. That's not to say that a table with fewer than 1 million rows would never be range-partitioned -- it's just that the benefits of range-partitioning are likely to be more appealing for a table that holds (or will hold) millions of rows (or more).

When the table in question is a new one

This, to me, is the most interesting scenario, because it is the one in which the options are really wide open. I'll start be saying that you definitely want to go with a universal table space here, primarily because a number of recently delivered DB2 features and functions require the use of universal table spaces. But should the table space be PBR or PBG? A partition-by-growth table space can be as large as a partition-by-range table space, so that's not a differentiator. What, then, would be your criteria?

To me, the appeal of a PBG table space is mostly a factor of it being a labor-saving device for DB2 for z/OS DBAs. PBG table spaces have an almost "set it and forget it" quality. There is no need to identify a partitioning key, no need to determine partition limit key values, no worries about one partition getting to be much larger than others in a table space. You just choose reasonable DSSIZE and MAXPARTITION values, and you're pretty much done -- you might check back on the table space once in a while, to see if the MAXPARTITION value should be bumped up, but that's about it. Pretty sweet deal if you're a DBA.

On the other hand, PBR can deliver some unique benefits, and these should not be dismissed out of hand. Specifically:
  1. A PBR table space provides maximum partition independence from a utility perspective. You can even run the LOAD utility at the partition level for PBR table space -- something you can't do with a PBG table space. You can also create data-partitioned secondary indexes (DPSIs) on a PBR table space (not do-able for a PBG table space), and that REALLY maximizes utility-related partition independence (though it should be noted that DPSIs can negatively impact the performance of queries that do not reference a PBR table space's partitioning key).
  2. PBR table spaces enable the use of page-range screening, a technique whereby the DB2 for z/OS optimizer can limit the partitions that have to be scanned to generate a result set when a query has a predicate that references a range-partitioned table space's partitioning key (or at least the lead column or columns thereof). Page-range screening doesn't apply to PBG table spaces, because a particular row in such a table space could be in any of the table space's partitions.
  3. A PBR table space can be a great choice for a table that would be effectively partitioned on a time-period basis. Suppose, for example, that the rows most recently inserted into a table are those most likely to be retrieved from the table. In that case, date-based partitioning (e.g., having each partition hold data for a particular week) would have the effect of concentrating a table's most "popular" rows in the pages of the most current partition(s), thereby reducing GETPAGE activity associated with retaining sets of these rows. Date-based partitioning also enables very efficient purging of a partition's data (when the purge criterion is age-of-data) via a partition-level LOAD REPLACE operation with a dummy input data set (the partition's data could be first unloaded and archived, if desired).
  4. A PBR table space tends to maximize the effectiveness of parallel processing, whether of the DB2-driven query parallelization variety or in the form of user-managed parallel batch jobs. This optimization of parallel processing can be particularly pronounced for joins of tables that are partitioned on the same key and by the same limit key values.
Those are some attractive benefits, I'd say. Still, the previously mentioned DBA labor-saving advantages of PBG table spaces are not unimportant. That being the case, this is my recommendation when it comes to evaluating PBR versus PBG for a large, new table: consider first whether the advantages of PBR, listed above, are of significant value for the table in question. If they are, lean towards the PBR option. If they are not, PBG could be the right choice for the table's table space. In particular, PBG can make sense for a large table for which access will be mostly through transactions (as opposed to batch jobs), especially if those transactions will retrieve small result sets via queries for which most row filtering will occur at the index level. In that case, the advantages of range-partitioning could be of limited value.

When the table space in question is an existing one

Here, the assumption is that the table space is not currently of the universal type. When that is true, and the aim is (as it should be) to convert the table space from non-universal to universal, the PBR-or-PBG decision will usually be pretty straightforward and will be based on the easiest path to universal: you'll go with universal PBR for an existing non-universal range-partitioned table space (if it is a table-controlled, versus an index-controlled, partitioned table space), because that change can be accomplished non-disruptively with an ALTER TABLESPACE (to provide a SEGSIZE for the table space) followed by an online REORG (if you are have DB2 10 running in new-function mode, or DB2 11). Similarly, for an existing non-partitioned table space (segmented or simple, as long as it contains only one table), you'll go with universal PBG because that change can be accomplished non-disruptively with an ALTER TABLESPACE (to provide a MAXPARTITIONS value for the table space) followed by an online REORG (again, if your DB2 environment is Version 10 in new-function mode, or DB2 11).

I recently encountered an exception to this rule: if you have a non-universal, range-partitioned table space, with almost all of the data in the last of the table space's partitions (something that could happen, depending on how partition limit keys were initially set), you might decide not to go for the non-disruptive change to universal PBR, because then you'd have a PBR table space with almost all of the data in the last of the table space's partitions. Yes, with enough ALTER TABLE ALTER PARTITION actions, you could get the table's rows to be spread across many partitions (and with DB2 11, alteration of partition limit key values is a non-disruptive change), but that would involve a lot of work. You might in that case just opt to go to a PBG table space through an unload/drop/re-create/re-load process.

To sum things up: PBR and PBG have their respective advantages and disadvantages. In choosing between these flavors of universal table space, the most important thing is to put some thought into your decision. Give careful consideration to what PBR might deliver for a table, and think also of how useful PBG might be for the same table. If you weigh your options, the decision at which you ultimately arrive will likely be the right one.