wiki:ShmDocDatabaseLogMsg

Version 2 (modified by MarcusWalther, 15 years ago) (diff)

--

Automatic status check processes at SZGRF

The core of the automatic status check is the logmsg database. It holds status messages created by various processes which are usually running as cron jobs. The messages have different severity which is characterised as an alert level. The levels implemented so far are:

  • Alive (numeric value 10): alive messages of cron jobs, created when exiting the process in a normal way. The messages are used to check whether all processes are running regularly (done by program ProcessCheck.py).
  • Informational (numeric value 20): informational messages of cron jobs, e.g. about a newly created data directory (of type C) or an ISO file or a DVD has been written. These messages are for operators of the data centre.
  • Operational (numeric value 25): messages about exceptional conditions during data processing in the data centre. Mainly for debugging and code optmisation purposes.
  • Warning (numeric value 30): messages about conditions which are outside the tolerance limits but not severe, e.g. a small time delay in the data reception or a moderately low time quality at a station.
  • Error (numeric value 40): error conditions like a large time lag in online data streams or detected timing errors.

The messages are collected in a mysql database, named logmsg. It contains two tables, msgtab and txttab. The status of a station is then determined by the occurrence of a message of a specific alert level within a given time. The program MsgDisplay.py evalutes the status of all stations and displays it visually in window. As parameter it takes the (integer) number of hours back in time from now. Within this time window it retrieves messages from the logmsg database. If warning or error messages are present within this window it marks the station in yellow or red colour, respectively. Once a minute it updates the status information. Possible status changes are notified by colour changes and optionally by emails sent out to an address list specified as second parameter.

List of modules:

  • logmsg/CreateMessage.py: utility program to insert a message into the message database logmsg (used only in C-shell-scripts and interactively for testing).
  • logmsg/MsgDisplay.py: main program to display the station states and view the corresponding messages. Is able to send emails to specified addresses on status changes.
  • logmsg/MsgUtil.py: library of routines necessary for managing the logmsg database.
  • qualcheck/ProcessCheck.py: program to check alive messages of cron jobs. Is itself used as a cron job and creates messages in logmsg when alive messages are missing.
  • qualcheck/QualCheck.py: library of quality check procedures for waveforms.
  • qualcheck/UptimeStatistics.py: executable for creating uptime statistics of stations using the sfdb database.
  • qualcheck/WaveformQuality.py: program to check the quality of the waveforms. Runs as cron job and creates messages in the logmsg database when problems are found. A more detailed description follows.

Program MsgDisplay.py

Interface program to the logmsg database. Call is $DPROG/logmsg/MsgDisplay.py <backhours> [<maillist>]. <backhours> specifies how far back in time (in hours) the messages should be read out of the database. A reasonable value would be 12. <maillist> is a list of mail addresses, separated by comma, to send notification mails on status changes. The display shows a button for each station and one for the data centre (GRF). On warning or error conditions the colour is set to yellow or red, respectively. Clicking on a station button will show the messages of the last <backhours> hours for this station.

Program WaveformQuality.py

This is the main routine of the quality check procedure for the station data. Each station is assigned an instance of a parameter class and an array of quality check procedures. For both default values are used if no other specifications are made. To override the default values for each station a dictionary of parameter keywords and values can be defined. This has to be added to the sparam dictionary. Legal parameter names are so far:

  • chanlist: tuple of channels like ('bh',) or ('bh','hh'). Channels to checked within the quality procedures. Default is ('bh',).
  • complist: list of components to be checked. Default is zne.
  • max_small_online_lag: threshold in time delay (s) for raising a warning condition. Default is 1800.
  • max_large_online_lag: threshold in time delay (s) for raising an error condition. Default is 86400.
  • low_timequal: lower threshold in time quality for raising a warning condition. Deafult is 65.
  • bad_timequal: lower threshold in time quality for raising an error condition. Deafult is 50.
  • mingapsize: minimum size of a time gap to be detected in s. Default is 0.001.
  • maxgapnum: maximum number of data gaps to be ignored. Default is 0.

Using the above parameter set a number of quality procedures is executed. These procedures are defined in an array. By default all available procedures are executed. To override the default to exclude some or all procedures a dictionary of station - procedure array has to be defined and inserted in the sqlist dictionary.


back to documentation index