SEEDIS access to data in tar files
introduction
how SEEDIS access works
how the background cache process works
ONECACHE: locate installed SEEDIS data
CACHEYY: determine current data availability
CACHEYY inner loop: determine availability of one file
CACHEYY: submit multiple jobs to cache files
BCKALL: cache BCK data from one cache request
BACKUPX: cache data from one BCK tape
BCKCLAIM: put data into SEEDIS cache
temporary changes for testing BCKCLAIM
databases and levels tested to date
introduction
For the 100-odd databases already installed in SEEDIS, automatic (but slow) access can be easily provided to the 40 GB of tarfiles in seedis.census.gov. These databases include all 1980 Census data (stf1, stf2, stf3, and stf4) and partial 1970 Census data (fourth count at STATE and SMSA71 level only, and fifth count at ZIP code and MCD level). Also included are many other Census Bureau data files (Censuses of Agriculture, County City Data Books, etc), and non-Census data (mortality, cancer incidence, etc.) 1960 STF data are available in tarfiles but are NOT installed in SEEDIS.
how SEEDIS access works
For databases installed in SEEDIS, access proceeds as follows:
- In the AREA module, the SEEDIS user defines a geographic level and scope, which creates a file GEOCODE. in the user's working directory. For subcounty levels, files STATE.GEO and either COUNTY.GEO or COUNTY80.GEO are also created.
- In the DATA module, the SEEDIS user selects data elements from one or more databases, which creates a file DELIST.DAT in the user's working directory.
- In the DATA module, the user types EXTRACT, which appends one cache request to the file sy$cache:[cache]allcache.ar. Each cache request is a file created by the Software Tools ar program, which is analogous to UNIX tar. It contains subsets with names dbname.dat, ftype.dat, level.dat, series.dat, state.dat, recoind.dat, county.dat, county80.dat, and symbols.tmp, which are the user's files needed to process the cache request. The temporary file symbols.tmp is a VMS command file, which will be executed later to provide the necessary global symbols.
The file series.dat contains a list of three numbers: 1,4,6. This file will direct the cache process to look for data in three locations: 1=the primary series of LBNL GSS (Gettape-Stotape System) tapes; 4=the primary series of LBNL BCK (VMS BACKUP) tapes; 6=permanent disk locations, e.g. disk$seedis001 or disk$seedis004.
Change required:
The physical tapes (series 1 and 4) are no longer available, either at LBNL or at the Census Bureau. In their place are tar archives on the LBNL mass storage system (MSS), each of which contains all the files from one GSS or BCK tape. The tar archives include some tapes from the original tape series 1 and 4, some from the duplicate tapes in series 2 and 5 (GSS and BCK respectively). At Census, SEEDIS always requests data from series 1 or 4. New routines slot2tapeno.com and tapeno2tarloc are used to locate the correct tarfile.
- Then the user's process initiates a background process owned by CACHE, to put the required data files from one cache request into the temporary SEEDIS cache located at sy$cachetemp:[cache.temp.seedata...].
- Simultaneously, the user's process determines which physical files are required. It looks for those files in two locations: (a) the temporary SEEDIS cache (b) permanent SEEDIS disk locations such as disk$seedis001:[seedis.seedata...] and disk$seedis004:[seedis.seedata...].
- If the user's process finds all the required files in the temporary cache, it extracts immediately from them, appending to the file CODATA.DAT in the user's working directory. The file DELIST.DAT is deleted.
- Otherwise, if the user's process finds all the required files in the permanent disk locations but not in the cache, it extracts immediately from the permanent disk locations. Results are appended to CODATA.DAT, and the file DELIST.DAT is deleted. Even though not needed, the background process is permitted to run to completion. The next time the same data are requested, EXTRACT will use the new files in the temporary disk cache.
- If the user's process finds some of the required files in the temporary cache and the rest in permanent disk locations, it waits for completion of the background cache process and then extracts, using only the temporary cache files. The user is advised to expect a 10-30 minute delay, and may optionally put the user process into the background by typing control-Y, then "batch", then "quit." Logging on some time later, the user will find a completed CODATA.DAT file in his/her working directory. The log file from the background batch job is left in the user's working diretory.
- If the user's process finds that some required files are in neither the temporary cache nor the permanent SEEDIS disk locations, it advises the user that the data are on tape, and to expect an overnight delay. In fact, the data will never arrive in the disk cache.
- At LBNL, if the background cache process finds that some required files are in neither the temporary cache nor the permanent SEEDIS disk locations, it initiates a background process in the machine csa.lbl.gov, which has two tape drives attached. For each separate tape that is required, a message is sent to the CSA operator's console asking for that tape to mounted. The operator cannot mount the tapes because they are no longer available at LBNL.
Change required:
At LBNL, the system could be modified to stage the necessary tar files from MSS. However, since the LBNL system will be shut down in the near future, this change will not be made.
- At the Census Bureau, the background process will similarly find the required files in neither the temporary cache nor the permanent SEEDIS disk locations. The attempt to send a message to the CSA operator's console would fail because the CSA computer is not in the Census Bureau's local VMS cluster.
Change required:
At the Census Bureau, if the required files are in neither the temporary disk cache nor the permanent disk locations (series 6), the modified background process will look for a tarfile in the location 'disk':[seedis.mss.'tapeno']gss.tar (series 1) or 'disk':[seedis.mss.'tapeno']all.tar (series 4), where 'disk' is one of the four disks dka200, dka300, dka500 dka600. All four disks are searched; exactly one of them should contain the desired tar archive.
how the background cache process works
- Background cache requests are initiated by the user's process in two steps: first a record is appended to the world-writeable file sy$cache:[cache]allcache.ar, as described above. Then the user's process issues the command $submit/remote 'cachewn''cacheroot']sallcache.com to submit to the default batch queue a process SALLCACHE owned by user CACHE. The global symbols 'cachewn' and 'cacheroot' are defined when the user first enters SEEDIS.
- The batch process SALLCACHE in the default batch queue executes the command $submit/keep/noprint/que='cachebq' sy$seedis:[seedis.cache]allcache.com, which submits to the queue 'cachebq' a batch process ALLCACHE. The global symbol 'cachebq', which is equal to SEED01_BACKGROUND in the Census SEEDIS computer seedis.census.gov, is defined when the user first enters SEEDIS. The special queue 'cachebq', with a job limit of 1, is used to ensure that cache requests are processed one at a time. This permits the use of generic file names that are re-used for every cache request. ONECACHE runs in the CACHE login directory, which is sy$cache:[cache]. The ALLCACHE log file is saved in sy$cache:allcache.log.
- The process ALLCACHE submits a job RECOVER6 to the queue 'cachebq', whose purpose is to restart ALLCACHE in case there is a system crash while ALLCACHE is in progress.
- Next, ALLCACHE runs sy$seedis:[seedis]netrans.com to locate disk drives that may be on other VMS computers in the same cluster as the SEEDIS computer.
Change required:
Optionally, comment out the call to netrans.com. Not needed in the present configuration.
- Next, ALLCACHE extracts one multi-line record from the cache request file sy$cache:[cache]allcache.ar. The record which is selected for processing is the first one which is not already in progress; namely the first record (with a label 'uniq') for which a file sy$cache:'uniq'.ar does not exist. ALLCACHE creates a file 'uniq'.ar and begins to process the corresponding cache request.
- ALLCACHE invokes the command file sy$seedis:[seedis.cache]onecache.com whose job is to process one cache request. If onecache.com completes successfully, ALLCACHE deletes the temporary file sy$cache:'uniq'.ar; and deletes the queued RECOVER6 job whose purpose was to restart ALLCACHE in case of a system crash.
ONECACHE: locate installed SEEDIS data
- A single cache request is a request to put into the SEEDIS disk cache sy$cachetemp:[cache.temp.seedata...] the necessary files for a specified list of databases (e.g. STF1,STF3) at a single geographic level (e.g. COUNTY80), for a specified list of file types (e.g. DAT and NDX), for a specified list of geographic areas (e.g. counties 06001 and 06003). For multi-race databases such as STF2B7R, the single request obtains all the files for a specified list of races (e.g. 00,01,02). Currently, ONECACHE is directed to search for data in a specified list of data series, namely 1,4,6. 1 refers to the primary series of GSS tapes formerly at LBNL; 4 refers to the primary series of BCK (VMS BACKUP) tapes formerly at LBNL; 6 refers to data in permanent disk locations, e.g. disk$seedis001:[seedis.seedata...].
- The temporary file allcache.ar is created by the software tools ar module, which is analogous to UNIX tar. The command $ar xv 'cacheroot']allcache.ar 'p1' extracts a single cache request from the file allcache.ar.
- The cache request is itself a software tools ar file, containing subsets with names dbname.dat, ftype.dat, level.dat, series.dat, state.dat, recoind.dat, county.dat, county80.dat, and symbols.tmp. The command $ar xv 'p1'. is used to extract files with those names. The temporary file symbols.tmp is a VMS command file, which is executed to provide the necessary global symbols to the ONECACHE process.
- The command file sy$seedis:[seedis.cache]onecache.com is used twice. It is invoked directly by the user's process to determine whether all the files required for extraction are immediately available; and it is used by the background cache process to determine what jobs need to be submitted, to bring all the required data into the cache.
- ONECACHE uses the request-specific files described above, as well as a set of permanent files in sy$cache:[cache.perm...] which contain information about the physical file locations of databases installed in SEEDIS. All those files are used as input by a fortran program sy$seedis:[seedis.cache]cachez.exe (with source in cachez.for). The task of cachez is to provide a list of physical files corresponding to the current cache request, for all three series (e.g. 1,4,6) specified in the file series.dat. In addition, cachez provides the target locations where EXTRACT will expect to find copies of the files in the SEEDIS disk cache sy$cachetemp:[cache.temp.seedata...].
- The FORTRAN program produces an AREAS line in cytemp.13 and a codata DF (data file) in cytemp.1 The command file onecache.com combines these with a custom DDF created by itself, to produce a codata file cytemp.13. The same file, sorted by VOLUME.1, LEVEL, DBNAME, STATE, COUNTY80, RECOIND, and FTYPE, is written to cytemp.14. The data part of this is written to cyytemp.1, and the temporary files cytemp.* are deleted. The file cyytemp.1 contains all necessary "permanent" information about the installed locations of SEEDIS files. It does not contain the "temporary" information about whether those files are presently in the cache, or in the permanent SEEDIS disk locations.
CACHEYY: determine current data availability
- The command file onecache.com invokes cacheyy.com, whose task is to determine which files are currently in the SEEDIS disk cache and in permanent SEEDIS disk locations. The present version of cacheyy.com behaves as follows: (a) Files present on SEEDIS disk (series 6) but not in the SEEDIS cache are copied to the SEEDIS cache. (b) Files present on tape (series 1 or 4) but not on SEEDIS disk (series 6) are copied to the SEEDIS cache.
Change required
: In Census SEEDIS, cacheyy.com will to be modified as follows. (a) same as before. (b) same as before, except that files will be obtained not from physical tape, but from tarfiles on 'disk':[seedis.mss]gss.tar or all.tar for series 1 and 4 respectively.
- cacheyy.com first reads sy$cache:[cache]cache.ar, which contains temporary information about disk space availability in the SEEDIS cache. It initializes a number of counters (blocks,n0,...n7,etc), not all of which are used. These keep track of the system resources needed to fulfill the cache request.
- ntot is the total number of files requested in the cache request.
- n0 is the number of files that are present both on disk (series 6) and in the cache.
- n1 is the number of files that are in the cache but not on disk (series 6).
- n2 is the number of files that are on disk (series 6) but not in the cache.
- n4 is the number of files that are on GSS tape (series 1) but not on disk (series 6) or in the cache.
- n5 is the number of files that are on BCK tape (series 4) but not on GSS tape (series 1) or on disk (series 6) or in the cache.
- n3 and n6 are always zero in the current configuration.
- n7 is the number of files that are not found in any of the locations above (series 1,4,6 or in the cache). A non-zero value of n7 is an error situation.
- cacheyy.com simultaneously opens file cyytemp.1 for reading, and eight files for writing, cyytemp.21 through cyytemp.28. These files will receive directives which will be supplied to command files, in order to execute the cache request, and later the extraction process. (Depending on the operating system, it may not be possible to open more files than these simultaneously; the number should not be increased without careful testing.)
CACHEYY inner loop: determine availability of one file
- Between lines 153 and 313, cacheyy.com reads through cyytemp.1, processing each file request (each logical record) in turn. In each logical record, temporary symbols are calculated, including t_gss_tape, t_bck_tape, and t_vms_path, which are the pathnames of the file on GSS tape (series 1), BCK tape (series 4), and VMS disk (series 6) respectively.
- At line 227, cacheyy.com checks to see if the background task netrans.com has completed its task of looking for a remotely mounted disk.
Change required
: In the current configuration, this statement is not needed and may be optionally removed.
- Beginning at line 245, cacheyy.com calculates a local symbol file_status which has an value (0,1,2,4,5,7) depending on which device(s) the required file is presently available. The possible values are:
- 0: in cache and local disk (series 6)
- 1: in cache only
- 2: on local disk (series 6) but not in cache
- 4: on GSS tape (series 1) but not in cache or on disk (series 6)
- 5: on BCK tape (series 4) but not in cache, on disk (series 6), or in cache
- 7: (error) not in cache, disk (series 6), GSS tape (series 1), or BCK tape (series 4).
File_status is set to 4 if file_status is not (0,1,2,3) and if t_gss_tape is not blank; file_status is set to 5 if file_status is not (0,1,2,3,4) and if t_bck_tape is not blank.
- At line 291, cacheyy.com write a record to cyytemp.25 (logical name x25), which is used to process files from BCK tapes. The blocksize of dat files is not needed for physical BCK tapes, because the VMS file attributes are preserved.
Change required
: For tarfiles made from BCK tapes, the blocksize is not preserved. At line 291, it is necessary to write t_blocksize, as was done for GSS tapes a few lines earlier.
- At line 308, the corresponding counter n0,n1,...n7 is incremented by one depending on the value of file_status.
CACHEYY: submit multiple jobs to cache files
- At line 337, cacheyy.com checks to make sure enough space is available in the disk cache to handle the request. If not, control is transferred to "toobig:".
- At line 340, cacheyy.com uses the values of (n0,n1,n2,n4,n5) to determine which processes must be executed to put the required files in the SEEDIS cache.
- If all the required files are on local disks but not all in the cache, (ncache.lt.ntot.and.nlocal.eq.ntot) and if no special processing needs to be performed (nconvert.le.0.and.nmult.le.0), then the local symbol CHOICE is set equal to VMS, to extract from SEEDIS disks immediately. The user is advised:
(ntot) files are required.
All (ntot) are on local disk packs.
Of these, (ncache) are also in disk cache.
The remaining (ndiff) will be cached after use.
Please wait on line for completion of processing.
Otherwise CHOICE is set equal to CAC, to extract from the cache.
Change required
: None. We had talked earlier about doing away with the disk cache. This is inconvenient because some files have a slightly different format on disk and in the cache. (slight errors in some files must be repaired via software in the caching process; and multi-race files require multiple NDX files which are created in the caching process.) It is more convenient to make the cache large enough to accommodate all the SEEDIS files that will ever be used. After all SEEDIS files are in the cache, we will have ncache=ntot for every request, and the disk cache will used for every. request. Control will pass immediately to the following section.
If future errors are discovered in the tarfiles or SEEDIS disk files, the caching software can be modified as necessary and the defective cached files deleted from the cache. When the data are next requested, corrected files will be added to the cache.
- If CHOICE=CAC and all the required files are in the cache (ncache=ntot), the user is advised:
(ntot) files are required.
All (ntot) are in disk cache.
Please wait on line for completion of processing.
Change required
: None.
- If CHOICE=CAC, and some files are required from disk (ncache.ne.ntot), but no files are required from tape (ntape=0):
(ntot) files are required.
Of these, (ncache) are in disk cache.
(n2) are on local disk packs.
Files will be copied into disk cache.
Expected delay 10-30 minutes.
If you wish to complete processing offline,
you may now hit control-Y and then type
batch
quit
Change required
: None.
- If CHOICE=CAC, and not all the required files are in the cache (ncache.ne.ntot), and files ARE required from tape (ntape.gt.0):
(ntot) files are required.
Of these, (ncache) are in disk cache.
(n2) are on local disk packs.
(ntape) are in LBL tape library.
Files will be copied into disk cache.
Expected delay 1-2 hours if tape operator on duty.
(8am-midnight PST M-F, 8am-4pm PST Sat, Sun, holidays)
If you wish to complete processing offline,
you may now hit control-Y and then type
batch
quit
Change required
: Change message to read:
(ntot) files are required.
Of these, (ncache) are in disk cache.
(n2) are on local disk packs.
(ntape) are in local tar files.
Files will be copied into disk cache.
Expected delay 10-30 minutes.
If you wish to complete processing offline,
you may now hit control-Y and then type
batch
quit
- If caching subprocesses (retrieval from disk or tape) are required, those jobs submitted as separate batch jobs to the special queue 'cachebq' (which has a job limit of 1 so jobs will not interfere with each other). Jobs are submitted in the following order:
- If any of the required files are already in the cache (ncache.gt.0), those files must be given a current expiration date, so subsequent jobs cannot cause them to be immediately purged from the cache. For this purpose, at line 439 of cacheyy.com, a background job TOUCHALL is submitted to the queue 'cachebq'. TOUCHALL uses temporary files that were created while CACHEYY was processing cyytemp.1.
- At line 477 of cacheyy.com if any of the required files must be obtained from BCK tapes (n5.gt.0), a background job SBCK2 is submitted/remote as user CACHE to the default batch queue of csa.lbl.gov, a computer with a local 9-track tape drive. Information required to complete the caching process is added to the file backup.ar. A named subset is added to a file 'uniq'.ar, where 'uniq' is the unique identifier of this cache request (one subset of allcache.ar). This subset will be deleted by the child process when caching of the files from all BCK tapes has been completed. The presence of this subset is a signal to the user's process that EXTRACT cannot yet proceed.
Change required
: At LBNL, no change: submit/remote sbck2 to 'cachern'. At Census: submit sbck3.com (not remote). sbck3.com submits an analogous background job TARBCKALL (with the same interface) which will (1) invokes VMSTAR to extract the desired files from one or more tar archives (2) completes processing exactly as if the files had come from a physical BCK tape.
- At line 510 of cacheyy.com, if any of the required files must be obtained from GSS tapes (n4.gt.0), a background job GSSALL is submitted by user CACHE to the 'cachebq' batch queue of csa.lbl.gov. Information required to complete the caching process is added to the file gss.ar. A named subset is added to the file 'uniq'.ar. This subset will be deleted by the child process when caching of the files from all GSS tapes has been completed. The presence of this subset is a signal to the user's process that EXTRACT cannot yet proceed. The slightly different handling of BCK and GSS tape requests is historical and of no significance.
Change required
: Rather than submitting the process GSSALL, submit an analogous background job TARGSSALL (with the same interface) which will (1) invoke VMSTAR to extract the desired files from one or more tar archives (2) complete processing exactly as if the files had come from a physical GSS tape.
Presently the GSS interface is not implemented.
- At line 690 of cacheyy.com, if any of the required files must be obtained from local disks (n2.gt.0), a background job LOCAL is submitted by user CACHE to the 'cachebq' batch queue of csa.lbl.gov. Information required to complete the caching process is added to the file local.ar. A named subset is added to the file 'uniq'.ar. This subset will be deleted by the child process when caching of the files from all local SEEDIS disks has been completed. The presence of this subset is a signal to the user's process that EXTRACT cannot yet proceed.
- Portions of cacheyy.com were previously needed for LBL media which no longer exist, e.g. the Automatic Tape Library (ATL), remote disk packs, and mountable offline disk packs. These commands are never used and there is no harm in leaving them in.
- At line 778, files dblocat.'choice' and ndxlist.'choice' are created in the user's directory, which are used by EXTRACT. The file contents depend on whether data will be extracted directly from permanent SEEDIS disk files (CHOICE=VMS) or from the cache (CHOICE=CAC).
- The signal to the user's process that caching has been completed, is the insertion of a subset named "complete" into the temporary world-writeable file status.ar. The user's process keeps looping until this signal is detected; then EXTRACT is executed and all temporary files are deleted.
BCKALL: cache BCK data from one cache request
- cacheyy.com submits a job SBCK2 as user CACHE, which submits a job BCKALL. BCKALL processes one cache request from the temporary file backup.ar. BCKALL submits a job RECOVER9 to restart itself instead of a system crash. BCKALL then runs backup.com to process all the tapes required by that request.
Change required
: cacheyy.com submits a job SBCK3 as user CACHE, which submits a job TARBCKALL. TARBCKALL processes one cache request from the temporary file backup.ar. TARBCKALL then runs tarbackup.com to process all the tapes required for that request.
- backup.com submits multiple jobs, each of which caches the required files from one BCK tape. (The requested files listed in backup.ar are sorted in tape order.) backup.com creates temporary subdirectories in sys_seedgss:[cache_files.'uniq'.'tape'], where 'uniq' is the unique ID of this cache request. Each pending single-tape job causes a subset named 'slot' to be appended to the temporary file sys_seedgss:[cache_files.'uniq']backup.ar
Change required
: tarbackup.com submits multiple jobs, each of which caches the required files from one tar archive.
- The single-tape job is created on bcktemp.16 (logical name xyz1) and the command file to submit it is b'uniq'.com (logical name xyz2). At LBL, b'uniq'.com is submitted/remote to csa1.lbl.gov, a machine which has a local tape drive.
Change required
: b'uniq'com is submitted (not remote) to the local machine in the default queue sys$batch, which permits single-tape jobs to run in parallel. At line 140 of tarbackup.com, (in the Census machine only) change "$submit/remote 'gssrn'::'cacheroot']b'p1'.com" to "$submit/queue='cachebq' 'cacheroot']b'p1'.com".
At LBL, each single-tape job b'uniq'.com submits backupx.com to the queue 'tapebq' (CSA1_ECONOMY).
Change required
: At LBL, no change. At the Census, each single-tape job b'uniq'.com submits (not remote) tarbackupx.com to the default queue sys$batch, which permits single-tape jobs to run in parallel. A change is required at line 118 of tarbackup.com.
BACKUPX: cache data from one BCK tape
- b'uniq.com creates temporary scratch files in its own subdirectory sys_seedgss:[cache_files.'uniq'.'tape']. Its log file is in sy$cache:[cache]btape.log.
- At line 39, backupx.com displays the quota of user CACHE on disk sys_seedgss.
Change required
: Disk quotas are not enabled at Census. Instead of a disk quota, execute "show dev sys_seedgss".
- Beginning at line 47, backupx.com issues a request to the operator of csa.lbl.gov to mount the BCK tape identified by its 3-character slot ID e.g. G80.
Change required
: tarbackupx.com uses VMSTAR to extract from a tarfile all the files from a single subdirectory.
If both ASCII (DDF,DDX,NDX, etc) and binary (DAT) files are required, two separate passes through the complete tarfile are required, one with the binary option and one without. Each pass takes about 10 minutes. Fetching only the files needed for the user's request is barely more efficient, and causes complications in case the combined VMS command exceeds the maximum size of about 400 characters.
Accordingly, if any DAT file is requested, all the DAT files from that directory are staged in binary mode. If any NDX file is requested, all the NDX files from that directory are staged in ASCII mode. If any other file is requested, all the files from the directory are staged in ASCII mode.
The routine tarbackupx.com generates a file bkxtemp.4 in the following format:
[SEEDIS.SEEDATA.STF3]STF3.EDX
[SEEDIS.SEEDATA.STF3]STF3.DDX
[CENSUS80.STF3C.COUNTY80]S06.NDX
[CENSUS80.STF3C.COUNTY80]S36.NDX
[CENSUS80.STF3C.COUNTY80]S36.DAT
which is translated into a file bkxtemp.5 in the following format, by a PERL program sy$seedis:[seedis.cache]tarstage.pl:
$show time
$vmstar xvbf 'tarloc' -
"./census80/stf3c/county80/*.dat" -
" "
$show time
$vmstar xvf 'tarloc' -
"./seedis/seedata/stf3/*.*" -
"./census80/stf3c/county80/*.ndx" -
" "
$show time
The file bkxtemp.5 is executed to stage the required files to the scratch disk sys_seedgss.
- At LBL, backupx.com sends an e-mail message to user SEEDIS::CACHE, which is forwarded to dwmerrill@lbl.gov.
The e-mail message to SEED01::CACHE is not needed, since no operator intervention is required.
- At LBL, due to use of ACL protections, a special routine delete_directory.com is needed in backupx.com to set protections of temporary subdirectories in sys_seedgss:[cache_files].
Change required
: In tarbackupx.com, delete_directory.com is not needed.
- At LBL, series 4 BCK tapes are identified by a 3-character slot number e.g. G80.
Change required
: tarbackupx uses a new routine slot2tapeno.com to convert a three-character slot number, e.g. G80, into a five digit tape number e.g. 50042, and it invokes a new routine tapeno@tarloc.com to convert the five digit tape number into the tarfile location. Where duplicate series 4 and 5 BCK tapes exist at LBL (and in the MSS archive), both tape numbers at Census are directed to a single tarfile.
- As the cache request for an individual tape is fulfilled, subset 'slot' is removed from the temporary ar file in sys_seedgss:[cache_files]backup.ar. When all subsets have been removed, the file is deleted. When the file has been deleted, a job srecover5 is submitted/remote to the SEEDIS computer. srecover5 submits recover5.com, which submits a job BCKCLAIM, which process the files on the scratch disk sys_seedgss and puts them in the SEEDIS cache sy$cachetemp:[cache.temp...].
Change required
: At Census, the job srecover5 is submitted (not remote).
BCKCLAIM: put data into SEEDIS cache
- bckclaim.com runs on the local SEEDIS machine. At LBL, the temporary scratch disk is on the remote machine where the tape drive is located, namely 'gssrn'::sys_seedgss:[cache_files], where 'gssrn' is CSA1.
Change required
: At Census, the temporary scratch disk is local, at sys_seedgss:[cache_files]. A change is required in bckclaim.com.
- At line 71 of bckclaim.com, the variables t1-t5 which are required to correctly process each file.
Change required
: It is necessary to read a sixth variable t6, which is the block size of the dat file.
- DAT files: at LBL, bckclaim.com simply copies *.dat files from the scratch disk sys_seedgss to the SEEDIS cache sy$cachetemp:[cache.temp...]. The block size is a VMS file attribute that is preserved when the VMS backup software retrieves the *.dat file from a physical BCK tape.
Change required
: When VMSTAR copied the physical BCK tapes to tar archives in LBL MSS, the block size attribute of the *.dat files was lost. It needs to be restored to a value that is database-specific. This is accomplished by a custom program sy$seedis:[seedis.cache]setblk.exe, using information stored in sy$cache:[cache.perm]dbname.cod.
- Block size information: At LBL, the file sy$cache:[cache.perm]dbname.cod contains block size information only for databases on GSS tapes.
Change required
: sy$cache:[cache.perm]dbname.cod must be extended to provide information also for databases stored on BCK tapes (e.g. STF3). Blocksize for STF3 was changed from 512 to 168. Need to make sure this is valid for all levels.
- NDX files: at LBL, bckclaim.com processes *.ndx files temporarily stored on the scratch disk sys_seedgss. These have a VMS file attribute carriagecontrol=fortran, that is preserved when the VMS backup software retrieves the *.dat file from a physical BCK tape.
Change required
: When VMSTAR copied the physical BCK tapes to tar archives in LBL MSS, the first character of each line of the *.ndx files was lost. The error occurred because the VMS file attribute of the files was incorrectly set to carriagecontrol=fortran at the time VMSTAR was run. Fortunately, the format of *.ndx files is such that the missing characters can be automatically restored. This is accomplished by a custom PERL program sy$seedis:[seedis.cache]bckndx.pl, which also changes the VAX line to point to the correct cache location of the corresponding DAT file.
temporary changes for testing BCKCLAIM
To test BCKCLAIM only, temporarily make the following changes:
- bckclaim.com line 417: activate "$goto cont2" which will suppress deletion of the temporary files in sys_seedgss.
- bckclaim.com line 448: comment out "$ar dv 'cacheroot']bckclaim.ar 'uniq'" which will suppress deletion of subset 'uniq' from bckclaim.ar.
- bckclaim.com line 459 comment out "$ar dv sy$cache:[cache]'parent'.ar 'uniq'" which will suppress deletion of subset 'uniq' from 'parent'.ar.
- bckclaim.com line 538: comment out "$$ar dv sy$cache:[cache]bckclaim.ar 'uniq'; which will suppress deletion of subset 'uniq' from bckclaim.ar.
- As user SEEDTEST, do a complete SEEDIS run which requires caching of a BCK tarfile. The files required by bckclaim.com will be preserved. In sy$cache:[cache], allcache.ar and bckclaim.ar should not be empty, and 'uniq'.ar should be present. Temporary files sys_seedgss:[cache_files.'uniq'] should be present.
- As user SEEDIS, make desired changes in bckclaim.com
- As user CACHE in the default login directory, type "submit srecover5" which will restart bckclaim.com.
- Repeat the two previous steps as many times as desired.
- When testig is completed, as user SEEDIS, restore the four suppressed commands.
- As user CACHE in the default login directory, type "submit srecover5" which will restart bckclaim.com once more, this time sweeping out the temporary files. In sy$cache:[cache], allcache.ar and bckclaim.ar should be empty, and 'uniq'.ar should be gone. The temporary files sys_seedgss:[cache_files] should be gone.
databases and levels tested to date
database level outcome
STF3 COUNTY80 OK
STF3 STATE OK
STF3 PLACE80 dka300:[seedis.mss.50072]all.tar truncated
STF1 CYPL80 pending
back to installing data from tar archives into SEEDIS
seedis/taraccess.html 3/23/97 in:
http://parep2.lbl.gov/mdocs
http://merrill.wwh.net/mdocs
http://imap.chesapeake.net/~merrill/mdocs
merrill@crocker.com