Understanding Concurrent Copy for Success
By James Ratliff, Dino-Software (10-minute read)
Introduction to Concurrent Copy –
Concurrent Copy is the IBM solution that allows data to be copied or dumped while it is still active. To utilize Concurrent Copy (CC), you must have the feature activated on the Control Unit that contains your data.
For a full explanation and operational guides, please refer to the various IBM publications pertaining to Concurrent Copy and its use.
Concurrent Copy may be used for Backup, Copy or Dump operations from either DFSMShsm (DFHSM), DFSMSdss (DFDSS), or both. Data may be identified by individual dataset name or volume serial number. In either case, once the data is identified, Concurrent Copy will process the data by protecting and copying tracks. This means if a dataset has a single 80-byte record on a given track, the entire track is identified and processed. There is no record level processing in Concurrent Copy.
There are many references in the IBM DFSMS publications identifying how to use CC; therefore, this paper gives an overview of the resources being utilized so the user gains a better understanding of why CC is more complex than often realized.
After reading and understanding this information, you should be able to define your Concurrent Copy data and initiate Concurrent Copy with a higher success rate than you may have obtained in the past.
First, a WARNING and quick overview of why CC can be a bit tricky to use.
CC utilizes a very large pool of resources during operation, and if allowed, can consume enough storage that your system may start failing other operations or slow down to the point of catastrophic failure. Although this may sound extremely harsh at first, the function was designed to allow a Point-In-Time copy of your data while allowing the data to be updated. In other words, this was the first function that allowed data to be backed-up, copied or dumped without serializing the data throughout the entire operation, and it insured the integrity of your data.
One must remember, if a failure occurs, there may be no other way to capture or restore data that changed between the initial CC request and the time of failure. Since the operation may be covering large amounts of data and take a long time to physical completion, initial design was to try and protect the data (since it can be changing) above all other considerations.
Resource Considerations –
If you are using DFSMSdss Backup, Copy or Dump to process your CC tasks, the following resources come into play:
SETUP:
Note: IF DFHSM operations are using CC, DFHSM will perform necessary processing to invoke DFDSS using the Cross-Memory address space ADRXMAIA.
If your CC operation is started from JCL using PGM=ADRDSSU, a single address space is created. This address space will contain the data buffers that will be used to move your data from the System Data Mover (SDM) to your output device.
If ADRXMAIA was specified in your PGM= statement, or your CC operation is being processed by DFHSM requests, 2 or more address spaces are created to handle the request. ADRXMAIA will be started, and an additional address space will be started to act as a server. The server address space will interact with SDM.
DFDSS at some point may Enqueue the data sets (depends on the parameters in the JCL or HSM options).
DFDSS performs Catalog Locates for the data.
DFDSS identifies the tracks that are being used to contain the data. Please note, CC operations are at the track level, not the record level. This means that if any portion of the data resides on a track, the entire track will be processed at the hardware (Control Unit / Disk).
DFDSS then sends the list of tracks to IBM’s System Data Mover (SDM, identified as the program ANTMAIN in the z/OS system, which has its own address space and was started during IPL).
SDM registers the intent to use CC with each Control Unit, and it identifies which track is to be monitored for updates.
Now that the data has been identified and marked for ‘protection’, DFDSS can release any Enqueues (if they were held).
DATA MOVEMENT:
At this point, several things occur.
Applications using the data may continue and update as needed.
DFDSS will request tracks from SDM.
Any unread track that is updated by an application before SDM reads it from the CU will be moved into the CU cache before the changes are written to the DASD.
SDM reads the tracks from the DASD and any available in the CU cache. Tracks read directly from the DASD will be read into the DFDSS buffer. Tracks read from the CU cache will be read into the dataspace until requested by DFDSS.
The data movement process will continue until all the data has been processed or the session is terminated.
CLEANUP:
Once DFDSS processes all the data needed, SDM is notified the operation has completed.
SDM will release the session on all the CUs.
CUs will unmark any leftover tracks and clear cache.
SDM will release control blocks and clear any remaining track data from its storage buffers.
DFDSS will release any storage it had gotten.
DFDSS will end. If invoked via JCL, the DFDSS address space or spaces will be terminated. If DFHSM invoked DFDSS, the address spaces will not be terminated now.
Considerations for success –
- Each DSS to SDM CC operation is considered a ‘session.’ All data being copied by the operation is managed as the same ‘session.’
- The CU has specific time limits and the number of tracks that may be contained in the Cache. If these limits are exceeded, the CU will terminate sessions. Typically, the largest session will be terminated first, but this is not guaranteed. There could be many CUs involved in a single Concurrent Copy operation. If any of the CUs are too heavily loaded and restrict the operation, the entire Concurrent Copy operation could be terminated and result in failure.
- SDM utilizes Dataspaces to store ‘changed’ data. This means for every track that gets updated, a 64K track area is used to contain the data until it is requested by DFDSS. No, there is no limit to the number of Dataspaces that may be obtained. Each dataspace used to hold track data is 2 Gigabyte (2,147,483,648 bytes). That would allow a maximum of 2,147,483,648/65,536=32,768 track areas per dataspace. If you are using CC to copy large amounts of changing data, SDM will create and manage more and more dataspaces to accommodate them.
- Note: a 65KB area is used by SDM to hold each track.
- Each SDM Data dataspace has a paired smaller Index dataspace that is used to store session related information.
- Dataspace cleanup is performed at the end of CC for a session, but the total space for all the dataspaces may not be cleaned and released for a long time.
- If a failure occurs, be it a cancel or operational timeout, all the data associated with the session may be (and usually is) ‘lost.’ This means the Point-in-Time will have been lost, and typically the copied data will not be usable since it may or may not be complete. A restart of the job will be at a different Point-In-Time than the original.
- When DFSMShsm uses CC, there are 3 address spaces involved. DFSMShsm, DFSMSdss and DFSMSsdm. When attempting to diagnose a failure, all 3 product’s activity logs and Dumps (if generated) are helpful in putting together a complete picture of where and why the failure occurred.
Guidelines when using CC:
If the data absolutely MUST be copied in this operation, DO NOT USE CC. The potential exists for the Point-In-Time to be missed by the occurrence of an error. Use a standard Backup, Copy or Dump operation that serializes the data throughout the complete job.
When identifying the data for your DSS JCL, group the data into smaller requests if possible. This will increase your chances of success and less rerun.
Submit your operation outside of peak time if possible. This insures more resources are available and will minimize failures because of either resource shortages or cache time-out.
Do not reset the CU while CC is running. This would remove track protection and make the CC operation fail.
Do NOT issue a Cancel for the ANTMAIN address space while CC is running. If this occurs, ALL CC operations will be terminated and fail.
Examples of things to avoid –
The following examples are just a few of the many things that will minimize your chance for success.
- Use CC to Dump ALL your volumes in a single DFDSS command.
- Although this may sound like a reasonable thing to do to save time, it is almost guaranteed to fail at some point during the operation.
- If the track is either no longer protected at the CU, or SDM cannot find the track in its buffer area, (the track may be read once from either the DASD or CU for this session) an RC=8, RSN=14 is sent back to DSS. This may be simply due to too many tracks being processed by the CUs and delay involved getting data successfully moved from the cache and properly placed in the SDM dataspace(s).
- Run DFHSM space management, and submit DFDSS CC jobs at the same time. Although this is possible, the likelihood something will fail is increased.
- At first thought, each of these ‘jobs’ would operate independently. However, the entire workload is processed by the various CUs and are competing for the CU cache. Remember that your system workload, meaning all your users and background jobs, are also using your system resources and CU cache. The higher the workload, the more likely CC may fail due to a shortage of resources at the system level or on the CU.
- Start one or more CC jobs, and run DEFRAG on the same volume or volumes.
- Although this is possible, it is not the best idea when considering a single failure during Concurrent Copy could result in a loss of your Point-In-Time data. If you have prior copies, you may recover the data from some prior point in time, but, at the very least, it will cost you unneeded time of which you most likely do not have.
Summary –
Concurrent Copy is a very useful function allowing you to make Point-In-Time copies of your active data quickly using either DFSMSdss or DFSMShsm simply by adding a single keyword. Although use is a bit more complex then often realized, it isn’t difficult to tune your requests into more manageable portions, or reschedule some of your requests to more successful time slots.
References:
z/OS V2R2 DFSMS Advanced Copy Services – SC23-6847-02
z/OS DFSMShsm Storage Administration – SC35-0421-14
z/OS DFSMSdss Storage Administration – SC35-0423-14
Glossary:
This glossary defines technical terms and abbreviations used in this document.
backup. The process of creating a copy of a data set to ensure against accidental loss.
CC. concurrent copy
concurrent copy. A function to increase the accessibility of data by letting you make a consistent backup or copy of data concurrent with normal application program processing.
COPY command. The DFSMSdss function that performs data set, volume, and track movement.
DASD. Direct access storage device.
DEFRAG command. The DFSMSdss function that consolidates the free space on a volume to help prevent out-of-space abends on new allocations.
DFSMS. Data Facility Storage Management Subsystem.
DFSMSdss. A DFSMS functional component used to copy, move, dump, and restore data sets and volumes.
DFSMSdss. Functional component that is the primary data mover of DFSMS.
DFSMShsm. A DFSMS functional component used for backing up and recovering data, and managing spaceon volumes in the storage hierarchy.
DUMP command. The DFDSS function used to back up data sets, tracks, and volumes.
IBM. International Business Machines Corporation in the United States
SDM. A functional component of Data Facilities Product (DFP) that provides data movement and control for Concurrent Copy.
James Ratliff has been in the software industry for over 30 years, contributing to projects in all phases of the development lifecycle. He has extensive expertise with the IBM System Managed Storage products, specializing in the System Data Mover and was one of the original developers of Concurrent Copy. James is currently a Senior Developer for T-Rex with Dino-Software Corporation.
About Dino-Software
Dino-Software Corporation develops enterprise-wide solutions for the management, analysis, protection, and repair of complex z/OS mainframe environments. Dino-Software has long been acknowledged for its superiority in ICF catalog management and technical support, helping organizations ensure their business-critical assets remain online and recoverable in a disaster. Learn more about Dino-Software and its z/OS mainframe storage solutions at https://dino-software.com.