wiki:ShmDocDatabaseArchivingSystem

Version 1 (modified by MarcusWalther, 14 years ago) (diff)

--

Archiving concept based on the sfdb database

The archiving system at the SZGRF is based on the three classes of data files:

  1. files from online data transmission protocols like seedlink which are usually available for a limited time on a specific directory tree;
  2. intermediate term archive files, collected from various data sources usually available with a time delay of a day or a few days;
  3. permanently archived files which have been analysed, quality checked and put to a final location on a RAID system or CD/DVD jukebox.

The intermediate directories (B) are searched every day for N days backward in time for gaps in the data streams. N is typically in the order of 10 days. Gaps are found using the sfdb database entries (pathid's between 100 and 1000) under the assumption that only continuous files (without gaps inside the file) are archived there. All gaps found are tried to be filled using data files from the type (A) archive (pathid's between 0 and 100 and additional on path 1000) or, if not all data are available there, using dedicated data retrieval methods accessing external data sources. When copied to be (B) archive all data files are split up at data gaps so that the sftab entries in sfdb represent the data coverage of the archive. After these N days the (B) archive should contain all available data for the data streams.

Data archive structure mapping

After a period of M days (M is in the order of 30 to 60 days) the data in (B) are copied to the final archive location in the (C) type archive. In this step the data of all streams of an archive are copied into one directory of a given maximum size. Currently, this maximum size is set to 4.5 GBytes so that the data directories can be copied onto DVDs for backup. These type (C) data directories (with pathid's above 1000) contain data of all stations of the archive for the same time slice. In a final step, all data in archive (B) older than K days before the end of the most recent type (C) data are deleted. The data in type (A) archive, the online directories, are expected to be deleted by the data providing software (seedlink) after some time.

The archiving software can manage a number of different archives in parallel. All archives have the same structure, but are independent of each other and have separate type (C) directories. The defining parameters of an archive are:

  • name: Name of the archive.
  • index: Index number of the archive.
  • max_storage: Maximum storage capacity (in bytes) in the type (C) directories.
  • b_rootpath: Root path to the type (B) data.
  • c_rootpath: Root path to the type (C) data.
  • wait_days: After this number of days data are exported from the (B) archive to the (C) archive.
  • dir_prefix: Prefix string of the output directories of type (C) data and label names.
  • keep_days: This number of days after the most recent type (C) data the files of the (B) archive are deleted.
  • usb_backups: Number of copies of type (C) directories on external USB disks.

At the SZGRF currently are 5 archives defined. Their parameter descriptions are:

  • name='grsn-bh', index=1, max_storage=4500000000, b_rootpath='/r06p4/arch_bh', c_rootpath='/r06p1/datalib', wait_days=21, dir_prefix='grsn', keep_days=30, usb_backups=2
  • name='grsn-hh', index=2, max_storage=4500000000, b_rootpath='/r06p4/arch_hh', c_rootpath='/r06p3/grsn-80hz', wait_days=30, dir_prefix='hhgr', keep_days=15, usb_backups=0
  • name='grsn-lh', index=3, max_storage=640000000, b_rootpath='/r06p4/arch_lh', c_rootpath='/r06p1/datalib', wait_days=30, dir_prefix='lp', keep_days=60, usb_backups=0
  • name='krakatau', index=4, max_storage=4500000000, b_rootpath='/r08p1/foreign', c_rootpath='/r08p1/foreign', wait_days=30, dir_prefix='krak', keep_days=60, usb_backups=0
  • name='foreign-bh', index=5, max_storage=4500000000, b_rootpath='/r06p2/foreign', c_rootpath='/r06p2/foreign', wait_days=30, dir_prefix='frgn', keep_days=30, usb_backups=0

Archiving routines

The routines for processing the archiving functions are !ArchAtoB.py and archiver.py. Both Python scripts are in the directory $SH_UTIL/sfdb. A typical crontab entry would look like

22 3,6,9 * * * /usr/local/SH/shlink/util/sfdb/ArchAtoB.py 12 >>log/ArchAtoB.log 2>&1
59 15 * * * /usr/local/SH/shlink/util/sfdb/archiver.py >>log/archiver.log 2>&1
  • ArchAtoB.py <backdays>: Copies data files from archive (A) to archive (B) for the last <backdays> days. Loops all channels of all archives defined.
  • archiver.py: Loops all archives and calls for each the methods !ExportDataToTypeC() (collects all streams of the (B) archive for the next (C) type directory or DVD) and !RemoveDataFromTypeB() (removes data from (B) archive).

back to documentation index

Attachments