Friday, November 29, 2013

A Great Analytics on System z Use Case: The Operational Data Store

A few months ago, I posted an entry to this blog on the subject of "moving queries to data," the idea being that when data to be analyzed for decision support purposes originates in a DB2 for z/OS database (as a great deal of the world's corporate and governmental data does), a very good case can be made for analyzing that data on its platform of origin -- System z -- as opposed to copying and moving the data to a different platform for querying and reporting. Quite recently I've had some communications with various individuals on this very topic, and these exchanges have led to my highlighting a particularly good use case for DB2 and System z as a data server for business intelligence applications: the operational data store, or ODS. The ODS is not a new concept, but I've found that it's not universally understood and leveraged. In this blog entry, I'll provide some information that I hope will clarify what an ODS is, for those not very familiar with the idea, and show why DB2 for z/OS can be an outstanding environment for an ODS, particularly when the operational data in question "lives" in a DB2 for z/OS system to begin with.

First, I think that it can be useful to contrast an ODS with a data warehouse. Keep in mind that having an ODS versus a data warehouse is not an either/or proposition -- an organization can make effective use of both. Speaking, then, in pretty broad-brush terms, here are some of the things that tend to differentiate these two types of analytics data stores (and when I use the term "operational database" hereafter, I'm referring to a source database that is accessed by operational, "run the business" online and/or batch applications):
  • Whereas the design of a data warehouse database will usually differ from the design of the operational database(s) from which the data warehouse data is sourced (for example, the data warehouse might feature so-called fact and dimension tables arranged in what's known as a star schema, to facilitate certain types of online analytical processing), an ODS is very often an exact duplicate of an operational database (or at least a duplicate of a subset of the tables in the operational database) -- "duplicate" referring to table and column names, and table structure (i.e., column order and column data types). In some cases an ODS will differ from the associated operational database in terms of data retention (a company might decide to keep data for X amount of time in the operational database, and for a longer period of time in an ODS, for historical reporting purposes -- an arrangement that can work especially well if the older data is rarely, if ever, updated).
  • Whereas a data warehouse is often sourced from several databases (which may be housed on different server platforms), an ODS commonly has one associated source database.
  • Whereas data in a data warehouse is sometimes aggregated in some way, data in an ODS is typically "atomic-level" with regard to detail.
  • Whereas there is often some tolerance for delay with respect to source data updates being reflected in a data warehouse (data extract/transform/load, aka ETL, processes that run on a nightly basis are not unusual for a data warehouse), users of an ODS often expect and may demand extremely low latency in this regard -- the requirement could even be for near-real-time ODS updates with respect to the updating of corresponding records in the source operational database.
  • Whereas data in a data warehouse is often transformed in some way, to accomplish things such as making data fields (which might be code values in an operational source table) more user-friendly, data is typically NOT transformed when added to an ODS.

Given these differences, in the case of an ODS you will regularly find that instead of ETL, you simply have EL (extract/load, from the source database to the ODS), and that EL process may involve near-real-time replication. That being the case, when the source operational database is managed by DB2 for z/OS, it can make all kinds of sense to use DB2 for z/OS as well for the ODS. Different organizations do exactly that, in different ways. A company might have an ODS in the same DB2 for z/OS subsystem that is used for the operational database, with the ODS tables being in a different schema (that is to say, the tables in the ODS would have different high-level qualifiers versus the tables in the operational database). More commonly (in my experience), the ODS will be set up in a different DB2 subsystem. The DB2 subsystem used for the ODS could be in the same z/OS LPAR as the DB2 subsystem used for the operational database, but I more frequently find that it's in a different z/OS LPAR.

When an ODS sourced from a DB2 for z/OS database is itself managed by DB2 for z/OS, very low-latency replication of source data changes to the ODS is facilitated, as is management of the ODS environment itself (the DBA team that takes care of the DB2 for z/OS operational database will generally have a pretty easy time managing an ODS database that essentially mirrors the source database). This ease of management extends to matters related to data security -- the same DB2 for z/OS and RACF (or equivalent) security controls used to lock down data in the operational database can be leveraged to protect ODS data from unauthorized access. Another factor that contributes to the goodness of fit of DB2 for z/OS with an ODS is the nature of the typical ODS workload: some longer-running, complex and/or data-intensive queries, yes, but also a relatively high volume of quick-running, in-and-out "transactional queries" -- users wanting to see, immediately, the information pertaining to this particular customer, or that particular transaction, in the course of their data analytics work. DB2 for z/OS and System z have long excelled at handling such mixed workloads, while some other platforms used for analytics work can bog down in processing a large number of concurrently executing queries. If you want to make your DB2 for z/OS system even more highly capable with regard to "operational analytics" workloads, take a look at the DB2 Analytics Accelerator for z/OS -- technology that can preserve excellent performance for high-volume, transactional queries while dramatically speeding up the execution of more complex and data-intensive queries.

Here's another thought: when the idea of minimizing inter-platform data movement is taken to its extreme, you don't move the data at all -- you allow analytics users to query the actual operational tables in the production DB2 for z/OS database. This approach, while not commonplace, is utilized in some cases, and successfully. It is certainly technically feasible, and more so on the z/OS platform than others, thanks to the highly advanced workload management capabilities of z/OS. One interesting option in this area is available to organizations that run DB2 for z/OS in data sharing mode on a Parallel Sysplex: you route the analytics queries through a subset of the data sharing group members, and route the operational transactions and batch jobs through the other members of the group. All the SQL statements hit the same DB2 tables, but because the analytics and operational SQL statements run in different DB2 subsystems (which can be in different z/OS LPARs on different System z servers in the Sysplex), the analytics and operational workloads don't compete with each other for server memory or processing resources. I have some personal experience with such a set-up, and I wrote a few blog entries that provide related information: a part 1 and part 2 entry on running OLTP and business intelligence workloads on the same DB2 for z/OS system (written while I was working as an independent DB2 consultant), and an entry that describes technology that can be used to limit different workloads to different subsets of the members of a DB2 data sharing group.

Does your organization use DB2 for z/OS to manage and protect your most valuable data: the data generated by your run-the-business operational applications -- the data that you own? If so, do you have a DB2 for z/OS-based ODS that provides users with secure, high-performance access to a current version of that data with atomic-level detail? If you don't have such an ODS, consider how a data store of this nature, managed by DB2 for z/OS -- the same DBMS in which your "gold" data is housed -- could provide users throughout your business with a resource that would enhance decision making effectiveness and improve outcomes. Lots of organizations have taken this path. It may be one that your company should take, as well.

Tuesday, November 26, 2013

DB2 for z/OS Work: the Task's the Thing

To understand how DB2 work is handled in a z/OS system, you need to have some understanding of the tasks used to manage that work. Here, "tasks" doesn't refer to things done or to be done. It refers instead to the control blocks used to represent, for dispatching purposes, application and system processes in a z/OS LPAR. I have found that several misconceptions related to DB2 for z/OS workload processing are rooted in misunderstandings of the tasks behind the workload. In particular, I've seen situations in which folks have the wrong idea about DB2 DDF address space CPU consumption, about batch- versus online-issued SQL statements, and about zIIP eligibility of DB2 work, and in each case clarity came from a realization of the tasks related to the DB2 activity in question. In this blog entry, I'll try to provide you with some information that I hope will be helpful to you in understanding why DB2 for z/OS work is processed as it is.

DB2 DDF address space CPU consumption

With respect to control blocks used to manage DB2 work in a z/OS system, there are at least a couple of relevant dichotomies. First, you have "system" tasks and "user" tasks. The system tasks are associated with DB2 address spaces, and they generally have to do with work of a housekeeping variety (e.g., writes of changed data and index pages to disk), as well as things pertaining to the SQL statement execution environment (e.g., thread creation and termination). Where things can get a little confusing is in the area of DB2 DDF CPU utilization (DDF being the distributed data facility -- the DB2 address space through which SQL statements from network-attached application servers and workstations flow, and through which results flow in the other direction). The confusion to which I refer stems from the fact that the DDF address space has associated with it both system tasks and user tasks, as opposed to having only the former. For SQL statements issued by "local" applications (those that attach directly to a DB2 subsystem in the same z/OS LPAR), the user tasks -- those under which SQL statements execute and to which the large majority of CPU time consumed in SQL statement execution is charged -- belong to "allied" address spaces (i.e., the address spaces in which the SQL-issuing programs run). So, for example, the user task under which a SQL statement issued by a local CICS transaction program executes is the CICS subtask TCB associated with that transaction, and the user task under which a SQL statement issued by a local batch job executes is the TCB of the batch address space (more on TCBs in a moment). Compared to the user tasks of allied address spaces connected to a local DB2 subsystem, the system tasks of the DB2 IRLM (lock manager), MSTR (system services), and DBM1 (database services) address spaces consume relatively little CPU time.

In the case of the DDF address space, you have to keep in mind that the programs issuing the SQL statements aren't running in the z/OS LPAR with DB2, and yet they must be represented locally in the z/OS LPAR so that they can be properly prioritized and dispatched by the operating system. The local representation of a remote DB2-accessing program is a so-called preembtable SRB in the DDF address space (more on SRBs to come), and because that task is the local representation of the remote program, most of the CPU time associated with execution of SQL statements issued by that program will be charged to the DDF preemptable SRB. That is why DDF CPU consumption will be significant if there are a lot of SQL statements flowing to DB2 through the DDF address space -- it's analogous to a CICS region consuming a lot of CPU time if a lot of SQL statements are sent to DB2 by transactions running in that CICS region. The DDF user tasks are charged with a lot of the CPU time related to SQL statement execution, while the DDF system tasks consume only a small amount of CPU time. You can find more information about DDF CPU consumption in an entry I posted to this blog last year.

Batch- versus online-issued SQL statements

Not long ago, I got a note from a software analyst who was concerned that SQL statements issued by batch programs would get in the way of SQL statements issued by concurrently executing CICS transactions. He was specifically worried about "inefficient" batch-issued SQL statements (which could be thought of as long-running SQL statements, though long-running SQL statements are not necessarily inefficient in any way) getting in line for CPU cycles ahead of CICS-issued SQL statements. The main issue in this person's mind was the dispatching priority of the DB2 address spaces: a priority that was somewhat higher (as recommended) than that of the CICS regions in his system. If DB2 has a really high priority in the z/OS LPAR, won't long-running, batch-issued SQL statements negatively impact the performance of CICS transactions?

The answer to that question (unless the site's workload manager policy is unusual) is, "No" (or at least, "No, CICS transactions will not be behind these long-running, batch-issued SQL statements in the line for CPU cycles"). Again, the task's the thing. A SQL statement (long-running or otherwise) issued by a batch job runs under that job's task (though it executes in the DB2 database services address space, as do all SQL statements), and it therefore has the priority of that task. A SQL statement issued by a CICS transaction runs under that transaction's task in the associated CICS region, and so it executes with the priority of the transaction's task. Assuming that batch jobs in your environment have a lower priority than CICS transactions, SQL statements issued by batch jobs will have a lower priority relative to SQL statements issued by CICS transactions.

The priority of DB2 address spaces (which should be really high) does NOT impact the priority of SQL statements that access DB2 data. Why, then, is it important for the DB2 address spaces to have a high priority in the system? Because a LITTLE BIT of the work related to SQL statement execution is done under DB2 tasks (examples include data set open and close, and lock management), and if this work doesn't get done RIGHT AWAY as needed, the whole DB2-accessing workload can get seriously gummed up. That's why giving the DB2 address spaces a higher priority (doesn't have to be way higher) than DB2-connected CICS regions is good for CICS-DB2 throughput: it enables DB2 to very quickly take care of the little bit of work done by DB2 tasks in a very timely manner, so that the bulk of SQL statement processing (which, again, happens at the priority of SQL statement-issuing programs) won't get bogged down waiting for locks to be released or threads to be created or whatever. More information on DB2 and CICS address space priority recommendations can be found in a blog entry on the topic that I posted last year.

zIIP eligibility of DB2 work

I mentioned in the second paragraph of this entry that there are a couple of dichotomies with regard to the control blocks that are used for tracking and dispatching work in a z/OS system. The first of these -- system tasks and user tasks -- I covered already. The second dichotomy I have in mind is TCBs and SRBs, or, more formally, task control blocks and service request blocks. For many years, people associated TCBs with user tasks, and SRBs with system tasks. That thinking wasn't far off the mark until the 1990s, when, needing a mechanism to manage work such as that processed through the DB2 DDF address space, IBM z/OS developers delivered a new type of SRB -- an SRB that, as said by a then-DB2 DDF developer, "acts like a TCB." This was the enclave SRB, and in particular, the preemptable SRB.

It's important to keep in mind that SQL statements that get to DB2 via DDF execute under preemptable SRBs. Here's why that's important: work that runs under such tasks is zIIP eligible (zIIPs are System z Integrated Information Processors -- the "specialty engines" that provide very economical computing capacity for certain types of work). In the case of DDF-routed SQL, the zIIP offload percentage tends to be about 60% (queries parallelized by DB2 also run under preemptable SRBs, and so are zIIP eligible, as I pointed out in a blog entry I posted back in 2010).

In my experience, people are most likely to get confused about zIIP eligibility of DB2 work when they are thinking about native SQL stored procedures. First, they may wonder why a native SQL procedure is zIIP-eligible when called by a DRDA requester (i.e., when invoked via a CALL that goes through the DDF address space), but not zIIP-eligible when called by a local-to-DB2 program (such as a CICS transaction). People can also be a little unclear on the reason why a native SQL procedure called by a DRDA requester is zIIP-eligible, while an external stored procedure (such as one written in COBOL or C) called by a DRDA requester is not zIIP-eligible. To get things straight in both of these cases, remember (again) that the task's the thing. A native SQL procedure runs under the task of the application process through which it was invoked, while an external stored procedure runs under a TCB in a WLM-managed stored procedure address space; thus, a native SQL procedure, when called by a DRDA requester, will run under a preemptable SRB in the DDF address space (as will any SQL statement issued by a DRDA client program), and that is why the native SQL procedure will be zIIP eligible in that situation. When the same native SQL procedure is called (for example) by a CICS transaction program, it will run under that program's task. Because that task is a TCB (in the associated CICS region), the native SQL procedure will not be zIIP eligible when so called. Similarly, even if an external stored procedure is called by a DRDA requester, it will run under a TCB (in a WLM-managed stored procedure address space) and so will not be zIIP eligible.

If you'll keep in mind the type of task -- user or system, TCB or SRB -- involved in getting various types of DB2 work done, you'll have a much better understanding of how that work is managed and prioritized, and whether or not it is zIIP-eligible. I hope that the information in this blog entry will be useful to you in that regard.