Tuesday, March 31, 2015

The DB2-Managed Data Archiving Feature of DB2 11 for z/OS

Over the past year and a half, I have been talking to lots of people about DB2 11 for z/OS. In the course of these discussions and presentations, I've noticed something interesting pertaining to the DB2-managed data archiving capability delivered with this latest release of the DBMS: a) it is one of the more misunderstood features of DB2 11, and b) when people do understand what DB2-managed archiving is about, it becomes one of the DB2 11 features about which they are most enthusiastic.

If you are not real clear as to what DB2-managed data archiving can do for you, I hope that this blog post will be illuminating. I also hope that it will stoke your enthusiasm for the new functionality.

What I most want to make clear about DB2-managed data archiving is this: it makes it easier and simpler to implement a mechanism that some organizations have used for years to improve performance for applications that access certain of their DB2 tables.

Before expanding on that statement, I want to draw a distinction between DB2-managed data archiving, introduced with DB2 11 for z/OS, and system-time temporal data support, which debuted with DB2 10. They are NOT the same thing (in fact, temporal support, in the form of system time and/or business time, and DB2-managed data archiving are mutually exclusive -- you can't use one with the other). A table for which system time has been enabled has an associated history table, while a table for which DB2-managed data archiving has been enabled has an associated archive table. Rows in a history table (when system time temporal support is in effect) are NOT CURRENT -- they are the "before" images of rows that were made NON-CURRENT through delete or update operations. In contrast, rows in an archive table will typically be there not because they are non-current, but because they are unpopular.

OK, "unpopular" is not a technical term, but it serves my purpose here and helps me to build the case for using DB2-managed data archiving. Consider a scenario in which a table is clustered by a non-continuously-ascending key. Given the nature of the clustering key, newly inserted rows will not be concentrated at the "end" of the table (as would be the case if the clustering key were continuously-ascending); rather, they will be placed here and there throughout the table (perhaps to go near rows having the same account number value, for example). Now, suppose further that the rows more recently inserted into the table are the rows most likely to be retrieved by application programs. Over time, either because data is not deleted at all from the table, or because the rate of inserts exceeds the rate of deletes, the more recently inserted rows (which I call "popular" because they are the ones most often sought by application programs) are separated by an ever-increasing number of older, "colder" (i.e., "unpopular") rows. The result? To get the same set of "popular" rows for a query's result set requires more and more DB2 GETPAGEs as time goes by, and that causes in-DB2 CPU times for transactions to climb (as I pointed out in an entry I posted several years ago to the blog I maintained while working as an independent DB2 consultant). The growing numbers of "old and cold" rows in the table, besides pushing "popular" rows further from each other, also cause utilities to consume more CPU and clock time when executed for the associated table space.

As I suggested earlier, some organizations faced with this scenario came up with a mitigating work-around: they created an archive table for the problematic base table, and moved "old and cold" (but still current, and occasionally retrieved) rows from the base to the archive table (and continued that movement as newer rows eventually became unpopular due to age). They also modified code for transactions that needed to retrieve even unpopular rows, so that the programs would issue SELECTs against both the base and archive tables, and merge the result sets with UNION ALL. This archiving technique did serve to make programs accessing only popular rows more CPU-efficient (because those rows were concentrated in the newly-lean base table), but it introduced hassles for both DBAs and developers, and those hassles kept the solution from being more widely implemented.

Enter DB2 11 for z/OS, and the path to this performance-enhancing archive set-up got much smoother. Now, it's this easy:
  1. A DBA creates an archive table that will be associated with a certain base table. Creation of the archive table could be through a CREATE TABLE xxx LIKE yyy, statement, but in any case the archive table needs to have the same column layout as the base table.
  2. The DBA alters the base table to enable DB2-managed archiving, using the archive table mentioned in step 1, above. This is done via the new (with DB2 11) ENABLE ARCHIVE USE archive-table-name option of the ALTER TABLE statement.
  3. To move "old and cold" rows from the base table to the archive table requires only that the rows be deleted from the base table -- this thanks to a built-in global variable, provided by DB2 11, called SYSIBMADM.MOVE_TO_ARCHIVE. When a program sets the value of this global variable to 'Y' and subsequently deletes a row from an archive-enabled base table, that row will be moved from the base table to its associated archive table. In other words, the "mover" program just has to delete to-be-moved rows from the base table -- it doesn't have to insert a copy of the deleted row into the archive table because DB2 takes care of that when, as mentioned, the global variable SYSIBMADM.MOVE_TO_ARCHIVE is set to 'Y'. If you want the "mover" program to be able to insert rows into the base table and update existing rows in the base table, as well as delete base table rows (which then get moved by DB2 to the archive table), have that program set SYSIBMADM.MOVE_TO_ARCHIVE to 'E' instead of 'Y'. And note that the value of SYSIBMADM.MOVE_TO_ARCHIVE, or of any DB2 global variable, for that matter, has effect for a given thread (i.e., a given session). Some people take the word "global" in "global variable" the wrong way, thinking that it is global in scope, like a ZPARM parameter. Nope. "Global" here means that a global variable is globally available within a DB2 subsystem (i.e., any program can use a given built-in or a user-created global variable). It affects only the session in which it is set.
  4. If a program is ALWAYS to access ONLY data in an archive-enabled base table, and not data in the associated archive table, its package should be bound with the new ARCHIVESENSITIVE bind option set to NO. If a program will always or sometimes access data in both an archive-enabled base table and its associated archive table, its package should be bound with ARCHIVESENSITIVE set to YES. For a program bound with ARCHIVESENSITIVE(YES), the built-in global variable SYSIBMADM.GET_ARCHIVE provides a handy behavior-controlling "switch." Suppose that a bank has a DB2 for z/OS table in which the account activity of the bank's customers is recorded. When a customer logs in to the bank's Web site, a program retrieves and displays for the customer the last three months of activity for his or her account(s). Let's assume that more than 9 times out of 10, a customer does not request additional account activity history data, so it could make good sense to archive-enable the account activity table and have activity data older than three months moved to an associated archive table. An account activity data retrieval program could then be bound with ARCHIVESENSITIVE(YES). When a customer logs in to the bank's Web site, the program sets the SYSIBMADM.GET_ARCHIVE global variable to 'N', and a SELECT is issued to retrieve account activity data from the base table. When the occasional customer actually requests information on account activity beyond the past three months (less than 10% of the time, in this example scenario), the same account activity data retrieval program could set SYSIBMADM.GET_ARCHIVE to 'Y' and issue the same SELECT statement against the account activity base table. Even though the base table contains only the past three months of account activity data, because the program set SYSIBMADM.GET_ARCHIVE to 'Y' DB2 will take care of driving the SELECT against the archive table, too, and combining the results of the queries of the two tables with a UNION ALL.
And that's about the size of it. No great mystery here. This is, as I stated up front, all about making it easier -- for DBAs and for application developers -- to enhance CPU efficiency for retrieval of oft-requested rows from a table, when those "popular" rows are those that have been more recently added to the target table. You could have done this on your own, and a number of organizations did, but now DB2 11 gives you a nice assist. I hope that you will consider how DB2-managed data archiving could be used to advantage in your environment.

Friday, March 20, 2015

DB2 for z/OS: Non-Disruptively Altering a Native SQL Procedure

Not long ago, I was contacted by a DB2 for z/OS DBA who wanted to run a situation by me. On one of his organizations' mainframe DB2 systems -- a system with very high transaction volumes and very stringent application availability requirements -- there was a native SQL procedure for which the ASUTIME specification had to be changed. The DBA wanted to see if I could help him to find a non-disruptive way to effect this stored procedure modification.

[Background: a native SQL procedure is a DB2 stored procedure, written in SQL procedure language (SQP PL), for which the associated package is the the procedure's only executable (a native SQL procedure does not have an external-to-DB2 load module). ASUTIME is an option that can be included in a CREATE or ALTER statement for a DB2 stored procedure, and it indicates the maximum amount of mainframe processor time, in CPU service units, that can be consumed in one execution of the stored procedure. The default value for ASUTIME is NO LIMIT, and in this case the DBA needed to set an ASUTIME limit for a stored procedure because it was sometimes running too long.]

One stored procedure change mechanism that you really want to avoid, if at all possible, is drop and re-create. At some DB2 for z/OS sites, particularly when use of stored procedures is at an early stage, it is not uncommon for stored procedure changes to be accomplished through a drop and re-create process. That approach, aside from being relatively disruptive, can become unfeasible once an organization starts using nested stored procedures (referring to stored procedures that are called by other stored procedures). Native SQL procedures, in particular, make the drop-and-re-create means of stored procedure modification problematic, because an attempt to drop a stored procedure (whether native or external) will fail if that stored procedure is called by a native SQL procedure.

So, ALTER PROCEDURE is the way to go, versus drop and re-create, unless the change you want to make cannot be accomplished with ALTER (several parameter-related changes come to mind here -- ALTER PROCEDURE can be used to change the names of a native SQL procedure's parameters, but not to change the number of parameters for a stored procedure, or a parameter's usage (e.g., from IN to OUT), or a parameter's data type). You need to keep in mind, however, that changing some options of the current version of a native SQL procedure via ALTER PROCEDURE will cause the packages of programs that call the altered SQL procedure to be invalidated. Other ALTER PROCEDURE changes -- again, when it is the current version of the procedure that is modified -- cause the package of the native SQL procedure itself to be invalidated (and some current-version changes do both: they invalidate the SQL procedure's package and the packages of programs that call the SQL procedure). A table in the DB2 for z/OS SQL Reference shows the package invalidation effects of changing various options of the current version of a native SQL procedure via ALTER PROCEDURE. Here is the URL for information on ALTER PROCEDURE for a native SQL procedure, from the DB2 10 for z/OS Knowledge Center on IBM's Web site:

http://www-01.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_sql_alterproceduresqlnative.dita

When you go to the Web page pointed to by the above URL, scroll down to Table 2, and there you will find the information pertaining to the package invalidation effects of ALTER PROCEDURE changes applied to the current version of a native SQL procedure. You will see in that table that changing ASUTIME via ALTER PROCEDURE does not invalidate (i.e., does not drive an implicit rebind or regeneration of) the package of the target native SQL procedure. This change does, however, invalidate the package of any program that calls the altered native SQL procedure. The DBA who brought his situation to my attention could have identified dependent packages (i.e., packages of programs that invoke the native SQL procedure for which the ASUTIME would be changed) via a query of the SYSIBM.SYSPACKDEP table in the DB2 catalog, and then could have issued the ALTER PROCEDURE statement and rebound the affected packages, given a brief window of time during which the related programs could be offline. That approach was not attractive to the DBA, given the fact that scheduled application outages at his site were hard to come by.

Putting our heads together, the DBA and I came up with an alternate, non-disruptive process for effecting the desired change in the definition of the native SQL procedure. This approach relies on two, not one, ALTER PROCEDURE statements to get the job done. The first is an ALTER PROCEDURE with the ADD VERSION clause, along with the new ASUTIME value (and this ALTER PROCEDURE statement would also include the list of parameters for the procedure, any options that will have non-default values, and the body of the procedure). Because the ASUTIME change applies to the new, added version of the SQL procedure, packages associated with callers of the current version of the stored procedure are not invalidated when the ALTER PROCEDURE with ADD VERSION is executed. A second ALTER PROCEDURE statement with the ACTIVATE VERSION clause is issued to make the just-added procedure version (the one with the updated ASUTIME value) the current version, and the change process is complete. Callers of the native SQL procedure are not disrupted by the ALTER PROCEDURE statement with ACTIVATE VERSION, because that statement just causes subsequent CALLs naming the SQL procedure to use the version of the procedure that you previously added with step 1 of this 2-step process.

The DBA tested this 2-step native SQL procedure change process during a period of heavy activity on a DB2 for z/OS system, and as expected, no contention issues were seen (even so, his plan is to use this process during periods when the system is less busy, and that is probably a good idea).

So, if you're willing to issue two ALTER PROCEDURE statements versus one, you can non-disruptively change ASUTIME and most other characteristics of a native SQL procedure. An extra ALTER PROCEDURE statement seems to me to be a small price to pay for enhanced application availability.

Thursday, February 19, 2015

Of DB2 Connect "Gateway" Servers and DB2 for z/OS DSNL030I Authentication Messages

A few days ago I received a note in which a DBA reported strange and inconsistent information in some DSNL030I messages that were generated periodically by a DB2 for z/OS subsystem in his charge. The messages in question were tied to authentication failures associated with DDF-using applications (i.e., applications that access DB2 via network connections). Such a failure could occur if an application used an invalid password on a DB2 for z/OS connection request (and that could be the result of the password for the application's DB2 authorization ID being changed on the z/OS side, with the corresponding change to the new password not being made on the application server side, due to an oversight on someone's part).

When a failure of this nature occurred for a DB2 subsystem at the DBA's company, he would see a DSNL030I message from DB2, with a 00F30085 reason code ("The requester's password could not be verified"). Nothing strange there. What struck the DBA as odd was the information that he would see in the LUWID part of the DSNL030I message text. Now, at first glance the form of the information in the LUWID part of the message text might look weird: a three-part string of seemingly meaningless letters and numbers. In fact, the first of the three parts of that string is a client IP address, the second part is the associated port number, and the third part is just a unique sequence number generated by DB2 Connect (or the by the IBM Data Server Driver Package if that DB2 client-side software is being used). OK, but what if the first part of the LUWID string looks like this:

J56F045C

That sure doesn't look like an IP address. And it couldn't be a hex (i.e., hexadecimal) representation of an IP address, could it? Not with a 'J' in the string.

Actually, it is an IP address in hex form, with the twist being that the high-order letter is translated to a number as described in the IBM "Technote" at this URL:

http://www-01.ibm.com/support/docview.wss?uid=swg21055269

As pointed out in the Technote, the string J56F045C in the first part of the LUWID section of a DB2 for z/OS DSNL030I message would resolve to an IP address of 53.111.4.92 (and using the same scheme for substituting a numerical value for the high-order letter in the second part of the DSNL030I LUWID string, G422 would be seen to be a representation of port number 1058).

The DBA who wrote to me wasn't thrown off by the representation of the IP address in the LUWID part of a DSNL030I message, because he'd already seen the aforementioned Technote and therefore knew how to derive the actual IP address. What had the DBA perplexed was the variability he saw in these IP addresses when there should not have been variability. Sometimes, he would see the IP address of a DB2 Connect "gateway" server (more on this in a moment), and sometimes he would see the address of a client application server "upstream" from the DB2 Connect gateway server. On top of that, when the DBA saw an upstream client IP address in the LUWID section of a DSNL030I message, that address was not always consistent with regard to the actual application server for which the authentication failure had occurred. What was going on?

The inconsistent IP address information that the DBA saw in the LUWID part of DB2 DSNL030I messages is related to the fact that client application servers at his site access DB2 for z/OS via DB2 Connect gateway servers, as opposed to going directly to DB2 using the IBM Data Server Driver Package. Here's how the two situations are linked: when a client application requests a connection to a DB2 for z/OS server through a DB2 Connect gateway server, authentication is a very early step in the process of establishing that connection. IF authentication is successful, the DB2 Connect gateway server will send the client's IP address to the DB2 for z/OS subsystem. If authentication is NOT successful then the client address will not be sent by the DB2 Connect gateway server to DB2 for z/OS.

If the DB2 Connect gateway server does not send the upstream client's IP address to DB2 for z/OS when client application authentication is not successful, why did the DBA sometimes see a client IP address in the LUWID part of a DSNL030I authentication failure message? That can happen when the DB2 Connect gateway server connection associated with the client authentication failure is being reused following a previously successful client authentication (keep in mind that the DB2 Connect gateway, by default, keeps a pool of connections to a downstream DB2 subsystem that it reuses for upstream clients -- a feature that boosts efficiency versus having to frequently terminate and then re-establish connections to the DB2 for z/OS host system). In that case -- when an authentication failure occurs using a pooled connection from the DB2 Connect gateway server to DB2 for z/OS that had previously been used for a successful authentication -- you will see in the LUWID part of the DSNL030I message an upstream client IP address.

What client IP address will that be? It will be the address of the last client to successfully authenticate using the DB2 Connect gateway server connection in question. That may OR MAY NOT be the client for which authentication failed. It WILL be the IP address of the client that encountered the authentication failure IF the same client was the last one to successfully authenticate to DB2 for z/OS using the connection. If the last client to successfully authenticate to DB2 using the connection between the DB2 Connect gateway server and the DB2 subsystem is DIFFERENT from the client that encountered the authentication failure, you'll see the IP address of that SUCCESSFULLY authenticated client application in the LUWID part of the DSNL030I authentication failure message.

But sometimes the DB2 DBA saw the IP address of a DB2 Connect gateway server, instead of a client IP address, in the LUWID part of a DSNL030I authentication failure message. Why? That can happen when the first client to use a connection between the DB2 Connect gateway server and the DB2 subsystem gets an authentication failure. In that case, the LUWID part of the DSNL030I message will contain the IP address of the "adjacent" (to DB2 for z/OS) server, and that will be, given a DB2 Connect gateway server set-up, the IP address of a DB2 Connect gateway server.

So, what you know is this: the LUWID part of a DB2 for z/OS DSNL030I authentication failure message will contain an IP address. Depending on the particular circumstances of the authentication failure, the IP address in the LUWID part of the DSNL030I message will be the IP address of the client that encountered the failure, or the IP address of a different client that had previously used the connection (and successfully authenticated to DB2), or the IP address of the DB2 Connect gateway server (if no client had previously used the connection and had successfully authenticated to DB2). The bottom line: you may not see a client IP address in the LUWID part of a DSNL030I message, and even if you do, that client IP address may be different from the address of the client that encountered the authentication failure.

To ensure some consistency in DSNL030I output, the fix for DB2 APAR PM82054 causes DB2 to consistently record the IP address of the "adjacent" server in the THREAD-INFO part of the DSNL030I message when an authentication error occurs. When DB2 Connect is running on a gateway server, that IP address will be the DB2 Connect gateway server's IP address. The information in the LUWID part of the message will not be consistent, and if it does contain a client IP address that address may or may not be that of the client that encountered the authentication failure.

This is another good reason to go to an IBM Data Server Driver Package, direct-to-DB2 connection set-up, versus a DB2 Connect gateway server set-up: if an authentication error occurs, the IP address of the application server on which the Data Server Driver Package is installed will show up -- consistently -- in the THREAD-INFO part of the DSNL030I message, because that server will be the "adjacent server" to the DB2 for z/OS subsystem. Note that entitlement to deploy the IBM Data Server Driver Package is based on DB2 Connect licensing: if you're licensed for the latter, you can deploy the former, and you should deploy the former -- not only for the reason I've just mentioned (having the IP address of the server "adjacent" to DB2 for z/OS be that of an application server versus a DB2 Connect gateway server), but also for a simplified IT infrastructure, better performance (through elimination of a "hop" between application servers and DB2 for z/OS), and easier upgrades to new releases of the DB2 client code (and, speaking of ease, if you license DB2 Connect Unlimited Edition for System z, you can deploy the IBM Data Server Driver on any application server or other client that directly accesses a DB2 for z/OS system, without having to have a license file on each of those client servers -- the client licenses are managed on the DB2 for z/OS host system). On top of that, going from a DB2 Connect gateway server configuration to the IBM Data Server Driver Package direct-to-DB2 configuration typically involves little to nothing in the way of application code changes -- it should just be a matter of updating the client's connection string for the target DB2 for z/OS server. In (usually) rare cases, there could be an application dependency on DB2 Connect, such as when an application needs two-phase commit capability AND the client transaction manager uses a dual-transport processing model (IBM's WebSphere Application Server uses a single-transport processing model).


The more you know about the IBM Data Server Driver Package, the better it looks. There was a time when DB2 Connect gateway server configurations made sense, but for most DB2 for z/OS-using organizations that time has passed.

Thursday, February 12, 2015

The New IBM z13 Mainframe: A DB2 for z/OS Perspective

Last month, IBM announced the z13 -- the latest generation of the mainframe computer. The z13 is chock full of great technical features, and there are already plenty of presentations, white papers, and "redbooks" in which you can find a large amount of related information. Going through all that information is an option, but you might be thinking, "I'm a DB2 for z/OS person. What's in the z13 for me?" In this blog entry, I'll give you my take on that question.

My favorite z13 feature is the larger and less expensive memory resource available on the servers. From a technical perspective, I like what I call Big Memory because nothing boosts application performance and CPU efficiency in a DB2 for z/OS environment like expansive real storage (as I pointed out in an entry that I posted to this blog a couple of months ago). A single z13 server can be configured with as much as 10 TB of memory (up from a maximum of 3 TB on a zEC12, the previous top-of-the-line mainframe), and a single z/OS LPAR on a z13 can use up to 4 TB of memory (with z/OS V2.2, or z/OS V2.1 with some PTFs) -- that's up from 1 TB for a z/OS LPAR on a zEC12. How much memory should a z/OS LPAR have? I will tell you that for an LPAR in which a production DB2 for z/OS subsystem is running, I like to see at least 20-40 GB of real storage per engine -- and I mean total engines, zIIP as well as general-purpose (so, for example, if a z/OS LPAR with a production DB2 for z/OS subsystem has eight engines -- four general-purpose and four of the zIIP variety -- then my recommendation would be to configure that LPAR with at least 160-320 GB of memory).

How would I want to exploit that memory for DB2 performance purposes? Well, for starters I'd like to have a big buffer pool configuration -- to the tune of 30-40% of the LPAR's memory resource (so, for example, if I had a z/OS LPAR with 300 GB of memory then I'd want the aggregate size of all the buffer pools allocated for a production DB2 subsystem in that LPAR to be 90-120 GB). I'd also want to have PGFIX(YES) specified for most of these buffer pools -- certainly for the more active pools. I'd also give consideration to specifying PGSTEAL(NONE) for one or more pools, and using those to cache some really performance-critical table spaces and or indexes in memory in their entirety. [Note: my 30-40% of memory guideline for the size of a production DB2 subsystem's buffer pool configuration assumes one production DB2 subsystem in the LPAR. If there were several production DB2 subsystems in a z/OS LPAR, you would not want each of them to have a buffer pool configuration sized at 30-40% of the LPAR's real storage resource. If there were multiple production DB2 subsystems in an LPAR, I would generally try to keep the combined size of all of the subsystems' buffer pool configurations at not much more than 50% of the LPAR's memory.]

I would not stop my exploitation of a z13 Big Memory environment with a large buffer pool configuration. In addition to that, I'd look at enlarging a production DB2 subsystem's dynamic statement cache, to drive the cache "hit ratio," ideally, to 95% or more (avoided "full" prepares of dynamic SQL statements can reduce CPU consumption significantly). I'd also go for a larger in-memory sort work area (specifically, I'd consider a value of SRTPOOL in ZPARM of 40-60 MB or more, and a MAXSORT_IN_MEMORY value of 20-30 MB or more). Additionally, I'd make greater use of the RELEASE(DEALLOCATE) package bind option, together with persistent threads (e.g., high-performance DBATs and CICS-DB2 protected entry threads), to improve the CPU efficiency of frequently executed programs. With the kind of LPAR memory I'm talking about, I think that I could do all of these real-storage-leveraging things and still have a demand paging rate of zero (though I wouldn't get too concerned if I had a small but non-zero demand paging rate of maybe 1 or 2 per second).

Now, I mentioned that I like the z13 memory picture from both an expanse and an expense point of view. I just covered the expanse angle. Now a look through the expense lens. The cost of memory on the z13 starts out substantially lower versus the cost of memory on a zEC12 or z196 server (the latter being the mainframe generation that preceded the zEC12). Depending on how much real storage you order for a z13, beyond what you have on a zEC12 or z196 from which you are upgrading to a z13, the memory cost can go lower still -- a LOT lower. Talk to an IBM z Systems sales representative for details about the memory deals available when upgrading your mainframe to a z13. They are impressive, if I do say so myself. Time to load up on gigabytes (or terabytes).

While I am definitely enamored with Big Memory (as DB2 for z/OS people ought to be), that's not all that I like about the z13. I'm also a fan of two enhancements that, in different but complementary ways, enable z13 processors to work smarter for enhanced application efficiency and throughput. One of these processing enhancements is called SMT -- short for simultaneous multi-threading. SMT allows multiple software threads to run on the same processor core at the same time. On a z13, two threads can execute concurrently on one SMT-supporting processor, and that is why the feature is sometimes referred to as SMT2. With SMT in effect, each of the two threads using one core will run more slowly than would be the case for a single-thread core, but throughput is boosted. My colleague Jim Elliott likes to use the example of a two-lane road with a 45 miles-per-hour speed limit versus a single-lane road with a speed limit of 60 miles per hour (and don't read anything into these speed limit numbers -- the only point is that one is lower than the other). Cars go faster on the one-lane road, but considerably more cars per unit of time travel a given distance using the two-lane road. In other words, the two-lane road with the lower speed limit has a greater carrying capacity than the one-lane road with the higher speed limit. Similarly, an SMT-exploiting processor should provide greater transactional throughput than it would if it were running in the traditional single-thread mode.

SMT is not applicable to all z13 engines. Specifically, the z13 processors that support SMT are zIIP engines and IFLs (the latter being an acronym for Integrated Facility for Linux -- processors dedicated to Linux workloads on z Systems servers, running either in native, "on the metal" LPARs or as virtual Linux systems in z/VM LPARs). That these engines support SMT is good from a DB2 for z/OS standpoint. zIIP engine utilization is often driven to a large extent by DB2 for z/OS DDF workloads (i.e., client-server applications that access DB2 data over network connections). DB2 for z/OS also utilizes zIIP engines for significant portions of utility processing, and (starting with DB2 10) for prefetch read and database write operations. Another reason z13 zIIP support for SMT is good for DB2: Java programs running in z/OS systems use zIIP engines, and z/OS is an increasingly popular environment for Java applications, and those applications very often involve access to data managed by DB2. z13 IFL support for SMT is great performance news for applications that run under Linux on z Systems, and many such applications interact with DB2 in an adjacent z/OS LPAR.

The other "work smarter" z13 processing enhancement that I like a lot is SIMD, or Single Instruction Multiple Data. With SIMD, if the same operation needs to be performed on several data elements, the operation can be performed once for all of the data elements at the same time, versus being performed for first data element, then performed again for the second element, then again for the third, etc. Again, fellow IBMer Jim Elliott provided a nice analogy: suppose you need to get three packages of the same type from point A to point B. It would be more efficient to do that by sending all three packages in one truck, as opposed to sending three trucks carrying one package apiece. Here's the DB2 for z/OS angle: SIMD should boost the performance of two kinds of application in particular: those that are data-intensive (referring to the volume of data operated upon) and those that are compute-intensive (referring to operations performed on data elements). Put those two application characteristics together, and what do you get? Analytics (among other things). In recent years, there has been a marked increase in the use of z Systems servers for analytics applications, driven in part by the steady convergence of transactional and analytics processing. z13 SIMD technology will add fuel to this trend, and it's a trend that most definitely benefits DB2 for z/OS. Software has to be modified to take advantage of SIMD, but that work is already underway. Look for SIMD exploitation by IBM analytics tools in the near future. And, in a z/OS V2.2 system (and V2.1 with some PTFs), SIMD will be exploited by XML System Services (used for things such as schema validation when XML data is stored in DB2 for z/OS tables), by our Java SDKs (good for WebSphere Application Server), and by our COBOL and PL/I compilers.

And that, folks, is the short and sweet summary of what I most like about the z13: Big Memory (like jet fuel for DB2, and priced to sell), SMT (like multi-lane highways), and SIMD (like shipping multiple packages in one truck). There's plenty more z13 technology that is cool in its own right, but as a DB2 for z/OS specialist I'm keeping my "big three" features front-of-mind.
 

Saturday, January 31, 2015

DB2 for z/OS: Native SQL Procedures and STAY RESIDENT

A couple of days ago, I received from a DB2 for z/OS professional a question about stored procedures. This individual asked specifically about the STAY RESIDENT option of CREATE PROCEDURE (and ALTER PROCEDURE), and its effect on a native SQL procedure (i.e., a stored procedure that is written in SQL procedure language -- aka SQL PL -- and which runs in the DB2 database services address space). Less than an hour after the e-mail containing the query hit my in-box, another message from the same DB2 person arrived. The lead word in this second note was "Disregard," and that was followed by explanation of the discovery that STAY RESIDENT is not applicable to a native SQL procedure.

So, this question-asker had quickly found that STAY RESIDENT is only relevant to an external DB2 stored procedure (referring to a stored procedure that is written in a language such as COBOL and which runs in a DB2 stored procedure address space); still, I think that the question remains interesting because it leads to another: if STAY RESIDENT YES or NO cannot be specified on a CREATE or ALTER PROCEDURE statement tied to a native SQL procedure, what happens to the executable after the native SQL procedure has been called and then runs to completion?

In considering this question, consider the nature of the executable. For an external stored procedure, the associated executable is a load module. When the external stored procedure is first invoked, its load module will be copied from a load library on disk into a stored procedure address space in the z/OS LPAR in which the DB2 subsystem is running. After the stored procedure has run to completion, its load module will remain in memory in the stored procedure address space if the stored procedure was created with (or subsequently altered to have) the STAY RESIDENT YES specification; otherwise, the load module will be deleted from memory after the stored procedure has finished executing. For an external stored procedure, therefore, STAY RESIDENT YES is a performance tuning option that can be appropriately utilized for external stored procedure programs that are compiled and linked as reentrant and reusable (if a stored procedure program is not reentrant and reusable and is defined -- properly, in that case -- with STAY RESIDENT NO, module load time can be reduced by loading from the z/OS Virtual Lookaside Facility, aka VLF).

If we're talking about a native SQL procedure, that stored procedure's one and only executable is its DB2 package. To put it a slightly different way, for a native SQL procedure the package is the executable (as opposed to the external stored procedure situation, in which the package is only part of the executable story -- the other part being the stored procedure's load module). What can we say about the "stay resident" characteristic of a native SQL procedure's executable (its package)? We can say about that what we'd say for ANY DB2 package: it depends on the size of the DB2 skeleton pool, the nature of the thread through which the native SQL procedure is invoked, and the RELEASE specification of the SQL procedure's package. Hereinafter I'll expand on that statement.


When a native SQL procedure is first called (say, following the most recent START of a DB2 subsystem), its package will be loaded from the DB2 directory table space called SPT01 (often referred to as the skeleton package table) into a part of the DB2 EDM pool called the skeleton pool. The EDM pool is in the DB2 database services address space (aka DBM1). The size of the skeleton pool within the EDM pool is determined by the value of the  EDM_SKELETON_POOL parameter in the DB2 subsystem's ZPARM module.

After the package has been loaded into the skeleton pool, it is copied from there into another part of the DBM1 address space, called a local agent pool, that is associated with the thread through which the native SQL procedure's CALL got to DB2. That memory-to-memory copy operation is part of the process of allocating the package to the thread. 

When the native SQL procedure is done executing, the copy of the procedure's package in the thread's agent local pool in DBM1 will remain there, allocated to the thread, if the package was bound with RELEASE(DEALLOCATE) and if the thread persists through the commit that marks the end of a DB2 unit of work. Some thread types persist through commits, and some do not. Thread types that are persistent include CICS-DB2 protected entry thread, threads belonging to IMS pseudo-WFI regions, high-performance DBATs, and batch job threads (a DB2-accessing batch program could issue many commits, but its thread will persist until the job has finished executing).

If the native SQL procedure's package was bound (or rebound) with RELEASE(COMMIT), or if the thread through which the CALL came is non-persistent, the copy of the package in the agent local pool will be deleted. Does that mean that the package is no longer resident in memory? No. The "master" copy of the package remains in the skeleton pool section of the DB2 EDM pool, and it will again be copied from there to a thread's agent local pool as needed for execution via a thread.

So, in essence a native SQL procedure's executable will "stay resident" in memory by default. For optimal performance and CPU efficiency, the package can be bound with RELEASE(DEALLOCATE) and executed by way of a persistent thread -- that keeps the procedure's package not only in memory, but in the thread's own agent local pool.

How long will a native SQL procedure's package remain in the skeleton pool portion of the EDM pool, once it's been loaded there from the skeleton package table in the DB2 directory? A long time, if the skeleton pool is adequately sized. See, once a package is in the skeleton pool it will stay there (assuming that it's not invalidated through, for example, an ALTER of an object on which the package depends) until the DB2 subsystem is stopped (as it might be, on occasion, for application of software maintenance) or until DB2 steals the space in the skeleton pool occupied by the package in order to make room for another package that has to be brought in from SPT01. That space steal will only happen, of course, if the skeleton pool becomes full, and even then only space occupied by inactive packages is subject to stealing. Your DB2 monitor, via a statistics long report or an online display of EDM pool activity, can provide you with information that can help you to determine whether or not the skeleton pool is appropriately sized. In particular, check to make sure that you aren't getting any program failures caused by a full skeleton pool (such failures would indicate that the pool is undersized, and probably significantly so). Also check the number of free pages in the pool (I like to see a number of free page that is at least 10% of the total number of pages in the skeleton pool), and the skeleton pool hit ratio (the percentage of package allocation requests that were satisfied out of the skeleton pool, as opposed to requiring a package load from the skeleton package table in the DB2 directory -- I like to see a hit ratio of 99% or more for a production DB2 subsystem).

There you have it. As mentioned, what's true for a native SQL procedure's package, memory-residency-wise, is true for packages in general. What's different about a native SQL procedure is the fact that there is no executable beyond the package; so, if that package stays resident in memory, the native SQL procedure stays resident in memory.

Monday, January 26, 2015

A Java Program, a COBOL Program, and a DB2 Update - An Interesting Case Study

Not long ago, an IBM colleague of mine who is a WebSphere Application Server (WAS) for z/OS specialist sent to me and to a few other IBMers (including some of our CICS and DB2 for z/OS developers) some information about an application scenario, and asked for our ideas and feedback. The scenario involved a Java program that accessed a local DB2 for z/OS subsystem (i.e., a subsystem in the same z/OS LPAR as the WAS instance) and selected a row to be updated, and then invoked a COBOL program (also in the same z/OS LPAR) that performed the actual row update operation. The IBM WAS for z/OS specialist had some particular questions about DB2 locking in this situation, and was looking for an approach that would be favorable from a concurrency perspective. Everyone chimed in with their thoughts on the matter, and in this blog entry I'll synthesize and summarize that information. I think it makes for an interesting case study.

The client with which my WAS for z/OS specialist colleague (named Edward) was working had an application requirement whereby, as I mentioned, a Java program was to select a DB2 row for update and then was to invoke a COBOL program which would perform the prescribed update operation. The COBOL program existed already in the form of a CICS transaction, and as CICS has advanced capabilities regarding interaction with Java applications, Edward wanted to assess the options available for invoking the COBOL program in its present, CICS transactional form. Edward's chief concerns were locking and data integrity (locking, of course, is a mechanism for preserving data integrity). Edward was thinking that he wanted the Java program to acquire and retain a lock on the DB2 row to be updated, so as to prevent some other program from altering the record between its selection by the Java program and its subsequent update via the Java-invoked CICS transaction; he was concerned, however, that the Java-acquired lock on the row would end up blocking the update action of the called CICS transaction program.

The impact of the concurrency issue raised by Edward hinged on whether or not the selecting Java program and the updating CICS program would be seen by DB2 for z/OS as being associated with the same "global" transaction: if "no" then a lock (row- or page-level, depending on the target table space's LOCKSIZE attribute) acquired by the Java program would indeed block the update attempt by the invoked CICS transaction; if "yes" then a Java-acquired lock would be owned by the global transaction and would therefore be inherited by the CICS program. In that latter case, then, even an X-lock acquired on the row (or the row's page) by the Java program would not impede the CICS program's subsequent update operation. [Sometimes, the choice is indeed to X-lock a selected row, to prevent it form even being read by another program prior to its "downstream" modification by a called process -- such a "hands-off" lock is sometimes acquired by way of an update statement that sets the value of a column in a row equal to its current value, thereby X-locking the row without actually changing its contents.]

To this question our CICS and DB2 for z/OS developers responded with some enlightening information. First, if the row-selecting Java program were to run in WAS for z/OS and then link into CICS to invoke the row-updating program, the two processes (row-select and row-update) would be seen by DB2 as two different units of work, and the result would be a lock conflict. This conflict could be avoided by having the Java program execute in the CICS Liberty environment (the Liberty Profile is a lightweight Java application development and runtime environment, one form of which is available within CICS); however, CICS Liberty itself is only one part of the solution -- there is also the matter of the JDBC driver used to access DB2 for z/OS from the Java program. If the type 4 JDBC driver is utilized, the global nature of the overall transaction will not be recognized by DB2, and the lock conflict challenge will remain. If the type 2 JDBC driver is used (and that is an option when the target DB2 subsystem is in the same z/OS LPAR as the CICS region), and if the link to the COBOL program is local (i.e., not a distributed program link to a program in another CICS region), access to DB2 from the Java program running in the CICS Liberty environment and from the COBOL program will both be associated with the same DB2 thread; therefore, any lock acquired on DB2 data by way of the row-selecting Java program will belong as well to the row-updating COBOL program.

So, a CICS-based solution satisfying the application requirement was definitely available to the organization with which Andrew was working. As an alternative approach -- a plan B, if you will -- I suggested to Andrew the following: have the Java program invoke the row-updating COBOL program as a DB2 stored procedure. In that case, the same DB2 thread should be used for the Java-based SELECT and the COBOL-based UPDATE. Going the stored procedure route could in some cases be quite straightforward: generally speaking, if a COBOL program that executes in a CICS region issues SQL statements but no CICS commands, it can be invoked in the form of a stored procedure with little, if any, modification.

Yet another approach would not depend on the Java and COBOL programs sharing locks. Suppose that the Java program did NOT acquire and retain a lock on the target DB2 row prior to invoking the updating COBOL program (assuming that the typical cursor stability isolation level is in effect, an S lock acquired on a target row will be released following completion of a SELECT statement). If a lock on the target row is not acquired and retained by the Java program, how can one avoid the situation in which some other program comes in and changes the row before the Java-invoked COBOL program gets to it, potentially leading to an undesired "double update" of the target row? The answer, in a nutshell, is that you don't avoid that situation. Instead, you expect that it won't happen, and you detect it if it does. This is what so-called optimistic locking is about.

Optimistic locking was introduced with DB2 9 for z/OS, and there is a good description of this feature in section 7.3 of the IBM "redbook" titled DB2 9 for z/OS Technical Overview. Basically, it works like this: you alter the target table to add what is called a row change timestamp. That timestamp value is maintained by DB2 and it does what you'd expect: it records the time at which a row is changed. With that done (and continuing with the application requirement of which I've written), the Java program would select a row (without retaining a lock on the data) and then invoke the COBOL program to update the row. In doing that, the Java program would pass to the COBOL program the value of the row's last-changed timestamp. The COBOL program would then issue an UPDATE that would search for the target row using the appropriate qualifying criteria (e.g., an account number) AND the value of the row-change timestamp retrieved by the Java program's SELECT. If the target row has already been changed by an intervening statement (i.e., a data-changing SQL statement that gets to the row between the Java SELECT and the COBOL UPDATE), the row-change timestamp value will have changed (again, DB2 maintains that value), and the COBOL UPDATE won't find the target row -- that's how you avoid the aforementioned "double-update" scenario. If you want, you could have the COBOL program in that situation select the target row using its unique key but NOT with a predicate referencing the row-change timestamp -- that would show you that the row is there (assuming the intervening data-change statement wasn't a DELETE), but that it was updated between the Java SELECT and the COBOL UPDATE. The extra SELECT to check on this occurrence might strike you as being a little expensive, but keep in mind that it's something your program will rarely have to do -- the term "optimistic locking" suggests an expectation that the target row will NOT be changed between the Java SELECT and the COBOL UPDATE. Something that's a little expensive but seldom done is usually not a big deal.

There you have it: at least three ways to address the stated application requirement:
  • Use the CICS Liberty environment for the Java program, and the type 2 JDBC driver and a local link to the COBOL program in the form of a CICS transaction, or
  • Invoke the COBOL program as a DB2 stored procedure, or
  • Use the optimistic locking approach.

The first two of these options involve locking the target row via the selecting Java program (as mentioned, that could be done by updating the row to set a column equal to itself), and then invoking the updating COBOL program which will not be blocked by the Java-acquired lock because it will share ownership of that lock. The third approach does not depend on such shared lock ownership.

Faced with a similar application requirement, the approach you should take is the one that best suits your needs and your environment. Always nice to have options, I say.

Monday, December 22, 2014

DB2 for z/OS: Time to go Bold and Big with System z Memory

With 2015 right around the corner, many people are probably starting to ponder their new year's resolutions -- the things they aim to do with a fresh set of twelve months. To the usual line-up of action items ("I will get in shape," "I will read more," "I will eat more fruits and vegetables," etc.), I would suggest that you add the following: "I will be bold in using memory for the production DB2 for z/OS systems I support." If that doesn't initially grab you like the idea of learning to play a musical instrument (though that could be a good resolution), give me a chance to explain where I'm coming from.

First, I need to give credit to Martin Packer, my UK-based colleague and one of IBM's leading System z experts, for the "be bold" directive. Martin is one of a growing number of people in the worldwide DB2 for z/OS community who have seen that, with the possible exception of WebSphere Application Server for z/OS, no subsystem responds positively to increased memory resources as does DB2: the more memory you provide to a DB2 for z/OS subsystem, the better your DB2-accessing applications will perform, and the more CPU-efficient your DB2 workload will be. Martin likes to tell his clients to "be bold" in boosting the real storage resources with which a DB2 subsystem can work because he knows (and he's right) that incremental increases aren't the way to move the needle significantly with regard to DB2 performance and CPU efficiency. Big impact comes from big moves. Don't be thinking 5% or 10% more memory. Think 100% or more.

Right here, some folks might be thinking, "Maybe Catterall gets a commission on IBM sales of mainframe memory." Not so. I'm paid to help organizations succeed with DB2 for z/OS, and Big Memory is a big part of that. In multiple ways over the past decade or so, IBM has plowed the field, removing barriers that formerly would have gotten in the way of going big with respect to provisioning mainframe memory:
  • Software -- About 10 years ago, z/OS V1.2 delivered 64-bit storage addressing, allowing for 16 exabytes (that's 16 million terabytes) of virtual storage addressability (versus the 2 gigabytes we had with the old 31-bit addressing scheme). DB2 Version 8 exploited that technology by moving most DB2 storage pools above the 2 GB "bar" in the DB2 database services address space (aka DBM1).
  • Hardware -- The old zSeries line of servers (z990, z900, z800, and z890) enabled byte-addressable use of more than 2 GB of real storage back in the early 2000s. Today, an EC12 mainframe can be configured with up to 3 terabytes of real storage.
  • Economics -- System z memory costs much less now than it did just a few years ago.
Of course, being able to provision a large amount of System z memory is of value to the extent that you can exploit that big real storage resource. DB2 has provided all kinds of help here, as pointed out below.

Buffer pools

The poster child of Big Memory exploiters, DB2 buffer pools have been above the 2 GB bar in DBM1 since Version 8. Growing your DB2 buffer pool configuration can boost performance in two ways:
  • Response time and throughput -- Take a look at an accounting long report generated by your DB2 monitor. Chances are, the largest components of in-DB2 elapsed time for your applications are read I/O-related: wait for synchronous read (on-demand reads of individual table space or index pages) and wait for "other" read (this has to do with prefetch reads: if a program requests a page that is currently scheduled for read-in via dynamic, list, or sequential prefetch, the program will wait for the prefetch read to complete); thus, these wait-for-read occurrences tend to be your primary leverage point for improving response time and throughput for DB2-accessing applications. The bigger your buffer pools, the lower the incidence of wait-for-read situations.
  • CPU efficiency -- Some System z people are under the mistaken impression that a mainframe's I/O assist processors take on all of the CPU load associated with I/O operations. Not true. The I/O assist processors certainly help, and they are one reason for System z's preeminence when it comes to handling I/O-heavy workloads, but make no mistake: every I/O operation consumes cycles of a general-purpose mainframe engine (or zIIP engine cycles, if we're talking about a zIIP-eligible workload). Synchronous read I/Os eliminated via bigger buffer pools lead to reduced in-DB2 CPU time (also known as "class 2" CPU time) for DB2-accessing applications, and avoided prefetch reads and database writes reduce the CPU time consumed by the DB2 database services address space (with DB2 10, prefetch reads and database writes became 100% zIIP-eligible, so savings here can help to avoid zIIP contention issues).
DB2 11 for z/OS anticipated configuration of extra-large buffer pools by providing support for backing page-fixed buffer pools (i.e., those defined with the PGFIX(YES) option) with 2 GB real storage page frames (larger real storage page frames -- 4 KB was once the only frame size available -- improve CPU efficiency by reducing the cost of virtual to real storage address translation). Now, you might think that 2 GB page frames would be way better for address translation efficiency than 1 MB page frames (another page frame size choice, first supported for page-fixed buffer pools with DB2 10), and they do offer an advantage here, but you're not likely to see a significant CPU efficiency gain versus 1 MB page frames unless the buffer pool in question is REALLY big. How big is really big, in this context? I don't know that we have a single hard and fast number, but a DBA at one site running DB2 11 for z/OS told me that he started to see a nice boost in CPU efficiency when he got a page-fixed buffer pool allocated across ten 2 GB page frames. That's right: a single buffer pool sized at 20 GB.

Does the thought of a single 20 GB buffer pool seem really "out there" to you? It might, given that most DB2 shops I've seen have entire buffer pool configurations (by which I mean the aggregate size of all buffer pools associated with one DB2 subsystem) that are well under 20 GB in size. Here's my message for you: it's time to start wrapping your brain around the idea of really big buffer pool configurations, and getting the ball rolling in that direction at your site. Let me give you a data point for your consideration: at an organization with which I've worked for a number of years (a large supplier to many firms in the manufacturing industry), the size of the buffer pool configuration for the production DB2 subsystem is 90 GB. And get this: every one of the buffer pools in this configuration is defined with PGFIX(YES). Are these people crazy? Are they thrashing horribly? No. Their demand paging rate (a z/OS monitor-reported metric, and my preferred indicator of pressure on an LPAR's memory resource) is ZERO. It's zero because the LPAR in question is configured with 212 GB of real storage -- there's plenty of memory for use beyond DB2 page caching. Oh, and this company also runs WebSphere Application Server, another Big Memory exploiter, in that same z/OS LPAR. It's a client-server application system in a box (the WebSphere apps access DB2 for z/OS data), and performance-wise, it screams.

I think that it won't be long before we see DB2 buffer pool configurations north of 100 GB.

Thread storage

CICS-DB2 people, in particular, have long known of the CPU efficiency gains to be had when persistent threads (those being threads that persist through commits, such as CICS-DB2 protected entry threads) are paired with DB2 packages bound with RELEASE(DEALLOCATE) (organizations using IMS Transaction Manager with DB2 get the same benefit with WFI and pseudo-WFI regions, and DB2 10 extended this performance tuning possibility to DDF-using applications via high-performance DBATs). The thing is, RELEASE(DEALLOCATE) packages executed via persistent threads makes for "fatter" threads, from a virtual and real storage standpoint (fellow IBMer John Campbell's term -- I like it). That required, in the past, very selective use of RELEASE(DEALLOCATE), not only because virtual and real storage were at a premium, but because the fatter threads consumed space in the part of the DB2 EDM pool used for packages allocated to threads (if you ran out space in that section of the EDM pool, you'd have programs failing). Starting with DB2 10 for z/OS, that situation changed: for packages bound (or rebound) in a DB2 10 system, almost all of the virtual storage used for copies of the packages when they were allocated to threads went above the 2 GB bar in DBM1; furthermore, those packages were copied to agent local pool storage when allocated to threads, as opposed to taking up EDM pool space. The upshot? DB2 10 provided much more virtual storage "head room" (John Campbell, again) for the pairing of RELEASE(DEALLOCATE) packages and persistent threads. Being able to use more virtual storage to gain greater CPU efficiency is nice, but only if the extra virtual storage utilization is backed by a sufficient real storage resource. Getting a bigger bang from RELEASE(DEALLOCATE) packages and persistent threads is, then, another reason to go big with memory when it comes to DB2.

Other ways to exploit Big Memory for better DB2 performance

How about a bigger dynamic statement cache? Especially for quick-running dynamic SQL statements (such as those that tend to dominate for many high-volume DB2 DDF-using applications), the CPU cost of statement preparation can be several times the cost of statement execution. The bigger the DB2 dynamic statement cache (set via a ZPARM parameter), the bigger your cache hit ratio is likely to be (the percentage of statement "finds" for cache look-ups), and that will result in CPU savings. I regularly see, these days, dynamic statement cache hit ratios (as reported by a DB2 monitor) that exceed 90%. If your hit ratio is less than that, consider enlarging your dynamic statement cache. You have all the virtual storage you'd want for that (the space has been above the 2 GB bar in DBM1 since DB2 Version 8) -- you just need to back that virtual storage with real storage.

Often getting less attention than the dynamic statement cache, the DB2 sort pool presents another opportunity to use more memory to get more done with less CPU. The sort pool is the in-memory work space available to DB2 for processing a SQL-related (as opposed to utility-related) sort operation (such as one that might be required to satisfy an ORDER BY, GROUP BY, or DISTINCT specification). The more of the processing for a SQL sort that can be accomplished in the sort pool (versus using space in the DB2 work file database), the better, in terms of CPU efficiency. The default size of the sort pool (as determined by the value of the SRTPOOL parameter in ZPARM) is 10 MB (it was 2 MB before DB2 10). The largest sort pool size I've seen at a DB2 site is 48 MB. The max size is 128 MB, but don't go crazy here -- keep in mind that this value is the maximum in-memory sort work area that DB2 can use per concurrent SQL sort; so, if you had a 40 MB sort pool and 10 big SQL sort operations were going at one time, you could see 400 MB (10 X 40 MB) of virtual storage used for sort work space. Of course, lots of virtual storage for SQL sort processing is plenty do-able if (same song, nth verse) you back that virtual storage with a large real storage resource.

And don't forget the RID pool, used by DB2 for sorting and otherwise processing lists of RIDs (row IDs, obtained from indexes) that are used for certain data access paths (such as list prefetch, and index ANDing and ORing). DB2 10 raised the default size of this pool (see your MAXRBLK setting in ZPARM) to 400 MB from the former default of 8 MB. DB2 10 also changed things so that running out of space in the RID pool for a RID list processing operation would result in that processing being continued using work file database space (versus the previous out-of-RID-pool-space action, which was to abandon RID list processing in favor of a table space scan). That was a good change, but if you see some work file spill-over for RID list processing (your DB2 monitor can show you this), you might consider making your RID pool larger -- completing a RID list processing operation using a combination of RID pool and work file space is probably better than going to a table space scan, but getting all the RID list work done in the RID pool should deliver optimal CPU efficiency.

Closing thoughts

From the start, DB2 for z/OS was architected to make good use of large server memory resources (it was one of the first exploiters of the old MVS/XA operating system, which first gave us access to 2 GB -- versus 16 MB -- of byte-addressable space on a mainframe server), and that focus continues. If you want to take the performance and CPU efficiency of your DB2 for z/OS system to the next level, go big on memory. Are you looking to have a z/OS environment in which CPU capacity is well balanced by real storage? I'd think in terms of 20 GB or more of memory for each engine (zIIP as well as general-purpose) in an LPAR.

Keep in mind that even a large real storage resource can be over-committed. As previously noted, I like to use the demand paging rate for an LPAR as the indicator of pressure on the LPAR's real storage. In expanding DB2's use of memory (larger buffer pools, more thread storage, etc.), I would aim to keep the demand paging rate from getting out of hand. My take? If that rate is in the low single digits per second, on average, during busy processing times, it's not out of hand. Mid-single digits per second is what I see as "yellow light" territory. If I saw a demand paging rate in the higher single digits per second (or more), I'd want to either make more memory available to the LPAR (my preference) or reduce virtual storage utilization. Of course, a zero demand paging rate is great, but don't let a very small, non-zero rate get you overly bothered.

More and more organizations are seeing first hand what Big Memory can do for a DB2 subsystem and the applications that access that subsystem, and they like what they see. So, in 2015, if you haven't already done so, do what my friend Martin recommends: on the DB2 memory front, go bold and go big.