Tuesday, March 31, 2015

The DB2-Managed Data Archiving Feature of DB2 11 for z/OS

Over the past year and a half, I have been talking to lots of people about DB2 11 for z/OS. In the course of these discussions and presentations, I've noticed something interesting pertaining to the DB2-managed data archiving capability delivered with this latest release of the DBMS: a) it is one of the more misunderstood features of DB2 11, and b) when people do understand what DB2-managed archiving is about, it becomes one of the DB2 11 features about which they are most enthusiastic.

If you are not real clear as to what DB2-managed data archiving can do for you, I hope that this blog post will be illuminating. I also hope that it will stoke your enthusiasm for the new functionality.

What I most want to make clear about DB2-managed data archiving is this: it makes it easier and simpler to implement a mechanism that some organizations have used for years to improve performance for applications that access certain of their DB2 tables.

Before expanding on that statement, I want to draw a distinction between DB2-managed data archiving, introduced with DB2 11 for z/OS, and system-time temporal data support, which debuted with DB2 10. They are NOT the same thing (in fact, temporal support, in the form of system time and/or business time, and DB2-managed data archiving are mutually exclusive -- you can't use one with the other). A table for which system time has been enabled has an associated history table, while a table for which DB2-managed data archiving has been enabled has an associated archive table. Rows in a history table (when system time temporal support is in effect) are NOT CURRENT -- they are the "before" images of rows that were made NON-CURRENT through delete or update operations. In contrast, rows in an archive table will typically be there not because they are non-current, but because they are unpopular.

OK, "unpopular" is not a technical term, but it serves my purpose here and helps me to build the case for using DB2-managed data archiving. Consider a scenario in which a table is clustered by a non-continuously-ascending key. Given the nature of the clustering key, newly inserted rows will not be concentrated at the "end" of the table (as would be the case if the clustering key were continuously-ascending); rather, they will be placed here and there throughout the table (perhaps to go near rows having the same account number value, for example). Now, suppose further that the rows more recently inserted into the table are the rows most likely to be retrieved by application programs. Over time, either because data is not deleted at all from the table, or because the rate of inserts exceeds the rate of deletes, the more recently inserted rows (which I call "popular" because they are the ones most often sought by application programs) are separated by an ever-increasing number of older, "colder" (i.e., "unpopular") rows. The result? To get the same set of "popular" rows for a query's result set requires more and more DB2 GETPAGEs as time goes by, and that causes in-DB2 CPU times for transactions to climb (as I pointed out in an entry I posted several years ago to the blog I maintained while working as an independent DB2 consultant). The growing numbers of "old and cold" rows in the table, besides pushing "popular" rows further from each other, also cause utilities to consume more CPU and clock time when executed for the associated table space.

As I suggested earlier, some organizations faced with this scenario came up with a mitigating work-around: they created an archive table for the problematic base table, and moved "old and cold" (but still current, and occasionally retrieved) rows from the base to the archive table (and continued that movement as newer rows eventually became unpopular due to age). They also modified code for transactions that needed to retrieve even unpopular rows, so that the programs would issue SELECTs against both the base and archive tables, and merge the result sets with UNION ALL. This archiving technique did serve to make programs accessing only popular rows more CPU-efficient (because those rows were concentrated in the newly-lean base table), but it introduced hassles for both DBAs and developers, and those hassles kept the solution from being more widely implemented.

Enter DB2 11 for z/OS, and the path to this performance-enhancing archive set-up got much smoother. Now, it's this easy:
  1. A DBA creates an archive table that will be associated with a certain base table. Creation of the archive table could be through a CREATE TABLE xxx LIKE yyy, statement, but in any case the archive table needs to have the same column layout as the base table.
  2. The DBA alters the base table to enable DB2-managed archiving, using the archive table mentioned in step 1, above. This is done via the new (with DB2 11) ENABLE ARCHIVE USE archive-table-name option of the ALTER TABLE statement.
  3. To move "old and cold" rows from the base table to the archive table requires only that the rows be deleted from the base table -- this thanks to a built-in global variable, provided by DB2 11, called SYSIBMADM.MOVE_TO_ARCHIVE. When a program sets the value of this global variable to 'Y' and subsequently deletes a row from an archive-enabled base table, that row will be moved from the base table to its associated archive table. In other words, the "mover" program just has to delete to-be-moved rows from the base table -- it doesn't have to insert a copy of the deleted row into the archive table because DB2 takes care of that when, as mentioned, the global variable SYSIBMADM.MOVE_TO_ARCHIVE is set to 'Y'. If you want the "mover" program to be able to insert rows into the base table and update existing rows in the base table, as well as delete base table rows (which then get moved by DB2 to the archive table), have that program set SYSIBMADM.MOVE_TO_ARCHIVE to 'E' instead of 'Y'. And note that the value of SYSIBMADM.MOVE_TO_ARCHIVE, or of any DB2 global variable, for that matter, has effect for a given thread (i.e., a given session). Some people take the word "global" in "global variable" the wrong way, thinking that it is global in scope, like a ZPARM parameter. Nope. "Global" here means that a global variable is globally available within a DB2 subsystem (i.e., any program can use a given built-in or a user-created global variable). It affects only the session in which it is set.
  4. If a program is ALWAYS to access ONLY data in an archive-enabled base table, and not data in the associated archive table, its package should be bound with the new ARCHIVESENSITIVE bind option set to NO. If a program will always or sometimes access data in both an archive-enabled base table and its associated archive table, its package should be bound with ARCHIVESENSITIVE set to YES. For a program bound with ARCHIVESENSITIVE(YES), the built-in global variable SYSIBMADM.GET_ARCHIVE provides a handy behavior-controlling "switch." Suppose that a bank has a DB2 for z/OS table in which the account activity of the bank's customers is recorded. When a customer logs in to the bank's Web site, a program retrieves and displays for the customer the last three months of activity for his or her account(s). Let's assume that more than 9 times out of 10, a customer does not request additional account activity history data, so it could make good sense to archive-enable the account activity table and have activity data older than three months moved to an associated archive table. An account activity data retrieval program could then be bound with ARCHIVESENSITIVE(YES). When a customer logs in to the bank's Web site, the program sets the SYSIBMADM.GET_ARCHIVE global variable to 'N', and a SELECT is issued to retrieve account activity data from the base table. When the occasional customer actually requests information on account activity beyond the past three months (less than 10% of the time, in this example scenario), the same account activity data retrieval program could set SYSIBMADM.GET_ARCHIVE to 'Y' and issue the same SELECT statement against the account activity base table. Even though the base table contains only the past three months of account activity data, because the program set SYSIBMADM.GET_ARCHIVE to 'Y' DB2 will take care of driving the SELECT against the archive table, too, and combining the results of the queries of the two tables with a UNION ALL.
And that's about the size of it. No great mystery here. This is, as I stated up front, all about making it easier -- for DBAs and for application developers -- to enhance CPU efficiency for retrieval of oft-requested rows from a table, when those "popular" rows are those that have been more recently added to the target table. You could have done this on your own, and a number of organizations did, but now DB2 11 gives you a nice assist. I hope that you will consider how DB2-managed data archiving could be used to advantage in your environment.

12 comments:

  1. Hello Robert, Thanks for sharing the details.

    Is there any way i can get the list of top 20 or 30 Largest Tables by using an SQL Query in DB2 z/OS

    ReplyDelete
  2. Sorry about the delayed response.

    Largest by what measure? Number of rows? Physical size of table?

    Robert

    ReplyDelete
  3. Can I initially populate the archive table via DB2 LOAD?

    ReplyDelete
    Replies
    1. Sorry about the delayed response, Jim.

      Yes, you can use LOAD for a table space containing an archive table. Here is an interesting item from the DB2 11 for z/OS Utility Guide and Reference, in the section on LOAD: "LOAD REPLACE is not allowed on an archive-enabled table. (LOAD REPLACE is allowed on the table space that contains the archive table.)" So, no LOAD REPLACE for an archive-enabled table, but LOAD REPLACE is OK for the archive table that is associated with an archive-enabled table.

      Robert

      Delete
  4. Robert, is this feature available on DB2 V11 CM mode or NFM?

    Thanks!

    Jean Fang from Visa.

    ReplyDelete
    Replies
    1. New-function mode, Jean.

      As a general rule, when use of a new DB2 for z/OS feature involves using new SQL syntax (such as ALTER TABLE... ENABLE ARCHIVE), that feature will only be available when the DB2 system is running in new-function mode.

      Robert

      Delete
  5. Can you run REORG DISCARD on an archive enabled table?
    Thanks
    Henrik

    ReplyDelete
    Replies
    1. You should be able to do that, Henrik, and that would be an OK thing to do IF YOUR INTENTION IS TO REMOVE THE DISCARDED ROWS from the one "logical" table that is the combination of an archive-enabled table and its associated archive table. If what you want to do is remove the discarded rows from the archive-enabled table and place them in the associated archive table, REORG with DISCARD would not do this the way a DELETE from the archive-enabled table would, if the SYSIBMADM.MOVE_TO_ARCHIVE global variable were set to 'Y'. In that case, the deleted rows are not removed from the "one logical table" - they are simply relocated from the archive-enabled table to the archive table. If you're wanting to do that (relocate rows) using REORG DISCARD (perhaps for efficiency reasons, if we are talking about a really large number of rows), I'd think that you would want to do that within a window in which you have blocked user and application access to the table. If you have such a window, you can do the REORG of the archive-enabled table's table space with DISCARD, and then load the discarded rows into the archive table during the window. If you follow that approach, the rows would be in the "one logical table" prior to entering the window, and they'd be there (having been relocated) on exiting the window. If you go the REORG DISCARD + LOAD approach while preserving user and application access to the table (which would require, on the archive table side, either INSERT or LOAD with SHRLEVEL CHANGE, there would be a period during which the DISCARDed rows are missing from the "one logical table" (because it would be a 2-step operation). That, in turn, could cause problems for users and/or applications accessing the "one logical table."

      Hope this helps.

      Robert

      Delete
    2. Thanks Robert for a very detailed answer. Reason for my question was exactly to do an efficient relocation of huge amount rows from the archive enabled table to the archive table.

      Delete
    3. Robert -

      I have an application where we do quarterly REORG w/discards in a small window that involves 10's of millions of rows per table. History tables exist. Unloads are done on the active tables before the REORGs and the unloaded data is LOAD RESUMEd into the history tables. In reading your response from 1/18/2016, it sounds like an archive enabled table with a REORG w/discard run against it will not automatically move the discarded rows over to the archive table. Is that correct? I was hoping the transparent archive process would allow me to eliminate the UNLOAD/LOAD RESUME process I am currently doing and just do the REORG w/discard process with transparent archive.

      Thank you.

      Delete
  6. Hi Robert, Does the archive table have to be in the same schema or can it have a different schema?

    ReplyDelete
    Replies
    1. Sorry about the delayed response.

      No, the schema name (the qualifier of the table name) for an archive table does not have to be the same as that of the associated base table. That said, I'd think that making the schema name the same would be helpful - it would be an indicator that the tables are related to each other in some way; but, not a technical requirement.

      Robert

      Delete