Testing the log watcher

This is part II of the article. Part I is here.
All the above code has to be included in the entrypoint script so it gets executed at container start up time but it can also be tested more simply in a traditional repository installation.
First, we’ll move the code into a excutable script, e.g. entrypoint.sh, and run it in the background in a first terminal. Soon, we will notice that lots of log messages get displayed, e.g. from jobs executing into the repository:

---------------------------------------
Job Arguments:
(StandardJobArgs: docbase_name: dmtest.dmtest user_name: dmadmin job_id: 0800c35080007042 method_trace_level: 0 )
window_interval=120
queueperson=null
max_job_threads=3
 
---------------------------------------
2020-01-02T07:03:15.807617 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
Executing query to collect XCP automatic tasks to be processed:
select r_object_id from dmi_workitem where a_wq_name = 'to be processed by job' and r_auto_method_id != '0000000000000000' and (r_runtime_state = 0 or r_runtime_state = 1) order by r_creation_date asc
Report End 2020/01/02 07:03:15
2020-01-02T07:03:16.314586 5888[5888] 0100c3508000ab90 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab90 quit."
Thu Jan 02 07:04:44 2020 [INFORMATION] [LAUNCHER 6311] Detected during program initialization: Version: 16.4.0080.0129 Linux64
2020-01-02T07:04:44.736720 6341[6341] 0100c3508000ab91 [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab91 started for user dmadmin."
Thu Jan 02 07:04:45 2020 [INFORMATION] [LAUNCHER 6311] Detected while preparing job dm_Initialize_WQ for execution: Agent Exec connected to server dmtest: [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab91 started for user dmadmin."
 
 
2020-01-02T07:04:45.686337 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
2020-01-02T07:04:45.698970 1597[1597] 0100c3508000ab7f [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab7f is owned by user dmadmin now."
Initialize_WQ Report For DocBase dmtest.dmtest As Of 2020/01/02 07:04:45
 
---------------------------------------
Job Arguments:
(StandardJobArgs: docbase_name: dmtest.dmtest user_name: dmadmin job_id: 0800c3508000218b method_trace_level: 0 )
window_interval=120
queueperson=null
max_job_threads=3
 
---------------------------------------
Starting WQInitialisation job:
2020-01-02T07:04:45.756339 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
Executing query to collect Unassigned worktiems to be processed:
select r_object_id, r_act_def_id from dmi_workitem where a_held_by = 'dm_system' and r_runtime_state = 0 order by r_creation_date
Total no. of workqueue tasks initialized 0
Report End 2020/01/02 07:04:45
2020-01-02T07:04:46.222728 6341[6341] 0100c3508000ab91 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab91 quit."
Thu Jan 02 07:05:14 2020 [INFORMATION] [LAUNCHER 6522] Detected during program initialization: Version: 16.4.0080.0129 Linux64
2020-01-02T07:05:14.828073 6552[6552] 0100c3508000ab92 [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab92 started for user dmadmin."
Thu Jan 02 07:05:15 2020 [INFORMATION] [LAUNCHER 6522] Detected while preparing job dm_bpm_XCPAutoTaskMgmt for execution: Agent Exec connected to server dmtest: [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab92 started for user dmadmin."
 
 
2020-01-02T07:05:15.714803 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
2020-01-02T07:05:15.726601 1597[1597] 0100c3508000ab7f [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab7f is owned by user dmadmin now."
bpm_XCPAutoTaskMgmt Report For DocBase dmtest.dmtest As Of 2020/01/02 07:05:15
 
---------------------------------------
 

Then, from a second terminal, we’ll start and stop several idql sessions and observe the resulting output. We will notice the familiar lines *_START and *_QUIT from the session’s logs:

---------------------------------------
2020-01-02T07:09:16.076945 1600[1600] 0100c3508000ab80 [DM_SESSION_I_ASSUME_USER]info: "Session 0100c3508000ab80 is owned by user dmadmin now."
Executing query to collect XCP automatic tasks to be processed:
select r_object_id from dmi_workitem where a_wq_name = 'to be processed by job' and r_auto_method_id != '0000000000000000' and (r_runtime_state = 0 or r_runtime_state = 1) order by r_creation_date asc
Report End 2020/01/02 07:09:16
2020-01-02T07:09:16.584776 7907[7907] 0100c3508000ab97 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab97 quit."
2020-01-02T07:09:44.969770 8080[8080] 0100c3508000ab98 [DM_SESSION_I_SESSION_START]info: "Session 0100c3508000ab98 started for user dmadmin."
2020-01-02T07:09:47.329406 8080[8080] 0100c3508000ab98 [DM_SESSION_I_SESSION_QUIT]info: "Session 0100c3508000ab98 quit."
...

So, inotifywatch is pretty effective as a file watcher.

Let’se see how many tail processes are currently running:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 4818 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 4846 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 4850 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 8375 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c3508000218b
1 8389 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab99
1 8407 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/Initialize_WQDoc.txt
1 8599 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c35080000386
1 8614 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9a
1 8657 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9b
1 8673 24411 9328 pts/1 8723 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/DataDictionaryPublisherDoc.txt

And after a while:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 4818 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 4846 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 4850 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 8599 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c35080000386
1 8614 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9a
1 8657 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9b
1 8673 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/DataDictionaryPublisherDoc.txt
1 8824 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c35080007042
1 8834 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9c
1 8852 24411 9328 pts/1 9132 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/bpm_XCPAutoTaskMgmtDoc.txt

Again:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 10058 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 10078 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c3508000218b
1 10111 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab9f
1 10131 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 10135 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 10139 24411 9328 pts/1 10252 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/Initialize_WQDoc.txt

And again:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 10892 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 10896 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/job_0800c3508000218b
1 10907 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000aba1
1 10921 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 10925 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f
1 10930 24411 9328 pts/1 11022 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/sysadmin/Initialize_WQDoc.txt

And, after disabling, the aggregation of new logs:

$ echo 0 > $tail_on_off
$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 24676 24277 9328 pts/1 26096 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/agentexec/agentexec.log
1 24710 24277 9328 pts/1 26096 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab80
1 24714 24277 9328 pts/1 26096 S 1001 0:00 tail --follow=name --retry /app/dctm/dba/log/0000c350/dmadmin/0100c3508000ab7f

And eventually:

$ psg tail | grep -v gawk
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
<nothing>

From now on, until the aggregation is turned back on, this list will be empty as no new tail commands will be started.
The number of tail commands grows and shrinks as expected, so the process clean up is working.
The agentexec.log file is often here because jobs are started frequently and therefore this file regularly triggers the MODIFY event. dmadmin and sysadmin owned session logs come and go for each started job. The DataDictionaryPublisherDoc.txt, bpm_XCPAutoTaskMgmtDoc.txt and Initialize_WQDoc.txt are a few followed job’s logs.
In the used test system, an out of the box docbase without any running applications, the output is obviously very quiet with long pauses in between but it can become extremely dense in a busy system. When investigating problems in such systems, it can be useful to redirect the whole output to a text file and use one’s favorite editor to search it at leisure. It is also possible to restrict it to an as narrow as desired time window (if docker logs is used) in order to reduce its noise, exclude files from being watched (see the next paragraph) and even to stop aggregating new logs via the $tail_on_off file so only the currently ones are followed as long as they are active (i.e. written in).

Dynamically reconfiguring inotifywait via a parameter file

In the preceding examples, the inotifywait command has been passed a fixed, hard-coded subtree name to watch. In certain cases, it could be useful to be able to watch a different, random sub-directory or list of arbitrary files in unrelated sub-directories with a possible list of excluded files; for even more flexibility, the other inotifywait’s parameters could also be changed dynamically. In such cases, the running inotifywait command will have to be stopped and restarted to change its command-line parameters. One could imagine to check a communication file (like the $tail_on_off file above) inside the entrypoint’s main loop, as shown below. Here the diff is relative to the precedent static parameters version:

10a11,13
> # file that contains the watcher's parameters;
> export new_inotify_parameters_file=${watcher_workdir}/inotify_parameters_file
> 
15a19,20
>    inotify_params="$1"
>  
19c24
    eval $private_inotify ${inotify_params} $watcher_workdir | gawk -v tail_timeout=$((tail_timeout * 60)) -v excluded_files="$excluded_files" -v bMust_tail=1 -v tail_on_off=$tail_on_off -v heartbeat_file=$heartbeat_file -v env_private_tail=private_tail -v FS="|" -v Apo="'" 'BEGIN {
122a128,142
> process_new_inotify_parameters() {
>    # read in the new parameters from file $new_inotify_parameters_file;
>    # first line of the $new_inotify_parameters_file must contains the non-target parameters, e.g. -m -e create,modify,attrib -r --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T %e %w%f";
>    new_params=$(cat $new_inotify_parameters_file)
>    rm $new_inotify_parameters_file
>  
>    # kill current watcher;
>    pkill -f $private_inotify
> 
>    # kill the current private tail commands too;
>    pkill -f $private_tail
> 
>    # restart inotify with new command-line parameters taken from $new_inotify_parameters_file;
>    follow_logs "$new_params" &
> }
138,139c158,162
< # start the watcher;
 # default inotifywait parameters;
> # the creation of this file will trigger a restart of the watcher using the new parameters;
> cat - < $new_inotify_parameters_file
>    --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format '%T|%e|%w%f' ${DOCUMENTUM}/dba/log --exclude $new_inotify_parameters_file
> eot
143a167
>    [[ -f $new_inotify_parameters_file ]] && process_new_inotify_parameters

To apply the patch, save the first listing into a file, e.g. inotify_static_parameters.sh, the above diff output into a file, e.g. static2dynamic.diff, and use the command below to create the script inotify_dynamic_parameters.sh:

patch inotify_static_parameters.sh static2dynamic.diff -o inotify_dynamic_parameters.sh

In order not to trigger events when it is created, the $new_inotify_parameters_file must be excluded from inotifywait’s watched directories.
If, for instance, we want later to exclude jobs’ logs in order to focus on more relevant logs, we could use the following inotifywait’s parameters:

# cat - <<-eot > $new_inotify_parameters_file
   --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T|%e|%w%f" ${DOCUMENTUM}/dba/log --exclude '${DOCUMENTUM}/dba/log/dmtest/agentexec/job_.*' --exclude $new_inotify_parameters_file
eot

From the outside of a container:

docker exec <container> /bin/bash -c 'cat - < $new_inotify_parameters_file
   --quiet --monitor --event create,modify,attrib --recursive --timefmt "%Y/%m/%d-%H:%M:%S" --format "%T|%e|%w%f" ${DOCUMENTUM}/dba/log --exclude "${DOCUMENTUM}/dba/log/dmtest/agentexec/job_.*" --exclude $new_inotify_parameters_file
eot
'

where the new option −−exclude specifies a POSIX-compliant regular expression to be quoted so the shell does not perform its variable expansions.
After at most $pause_duration seconds, this file is detected (line 34), read (line 14), deleted (line 15) and the current watcher gets restarted in the background with the new parameters (line 24).
This approach is akin to polling the parameter file and it looks a bit unsophisticated. We use events to control the tailing through the $tail_on_off file and for the heartbeat. Why not for the parameter file ? The next paragraph will show just that.

Dynamically reconfiguring inotifywait via an event

A better way would be to use the watcher itself to check if the parameter file has changed ! After all, we shall put our money where our mouth is, right ?
In order to do this, and dedicated watcher is set up to watch the parameter file. For technical reasons (in order to watch a file, that file must already exist. Also, when the watched file is erased, inotifywait does not react to any events on that file any longer; it just sits there idly), instead of watching $new_inotify_parameters_file, it watches its parent directory. To avoid scattering files in too many locations, the parameter file will be stored along with the technical files in ${watcher_workdir} and, in order to avoid impacting the performance, it will be excluded from the main watcher (parameter –exclude $new_inotify_parameters_file, do not forget to append it otherwise the main watcher will raise events for nothing; it should not impede the parameter file to be processed though).
When a MODIFY event on this file occurs, which includes a CREATE event if the file did not pre-exist, the event is processed as shown precedently.
Here the diff compared to the preceding polled parameter file version:

143a144,151
> # the parameter file's watcher;
> param_watcher() {
>    IFS="|"
>    inotifywait --quiet --monitor --event create,modify,attrib --timefmt "%Y/%m/%d-%H:%M:%S" --format '%T|%e|%w%f' $watcher_workdir --exclude $tail_on_off $heartbeat_file | while read timestamp event target_file; do
>       [[ $target_file == $new_inotify_parameters_file ]] && process_new_inotify_parameters
>    done
> }
> 
162a171,173
> 
> # starts the parameter file's watcher;
> param_watcher &

The delta is quite short with just the function param_watcher() to start the watcher on the parameter file and process its signal by invoking the same function process_new_inotify_parameters() introduced in the polling version.
To apply the patch, save the above diff output into a file, e.g. dynamic2event.diff, and use the command below to create the script inotify_dynamic_parameters_event.sh from the polling version’s script inotify_dynamic_parameters.sh created above:

patch inotify_dynamic_parameters.sh dynamic2event.diff -o inotify_dynamic_parameters_event.sh

The dedicated watcher runs in the background and restarts the main watcher whenever the latter receives new parameters. Quite simple.

Dynamically reconfiguring inotifywait via signals

Yet another way to interact with inotifywait’s script (e.g. the container’s entrypoint) could be through signals. To this effect, we’ll need first to choose suitable signals, say SIGUSR[12], define a trap handler and implement it. And while we are at it, let’s also add switching the watcher on and off, as illustrated below:

2a3,5
> trap 'bounce_watcher' SIGUSR1
> trap 'toggle_watcher' SIGUSR2
> 
7a11,14
> # new global variable to hold the current tail on/off status;
> # used to toggle the watcher;
> export bMust_tail=1
> 
144,149c151,162
< # the parameter file's watcher;
< param_watcher() {
<    IFS="|"
<    inotifywait --quiet --monitor --event create,modify,attrib --timefmt "%Y/%m/%d-%H:%M:%S" --format '%T|%e|%w%f' $watcher_workdir --exclude $tail_on_off $heartbeat_file | while read timestamp event target_file; do
<       [[ $target_file == $new_inotify_parameters_file ]] && process_new_inotify_parameters
 # restart the watcher with new parameters;
> bounce_watcher() {
>    [[ -f $new_inotify_parameters_file ]] && process_new_inotify_parameters
>    echo "parameters reloaded"
> }
> 
> # flip the watcher on/off;
> # as file $tail_on_off is watched, modifying it will trigger the processing of the boolean bMust_tail in the gawk script;
> toggle_watcher() {
>    (( bMust_tail = (bMust_tail + 1) % 2 ))
>    echo $bMust_tail > $tail_on_off
>    echo "toggled the watcher, it is $bMust_tail now"
172,173c185,187
< # starts the parameter file's watcher;
 process_new_inotify_parameters
> 
> echo "process $$ started"
178d191
<    [[ -f $new_inotify_parameters_file ]] && process_new_inotify_parameters

To apply the patch, save the above diff output into a file, e.g. event2signals.diff, and use the command below to create the script inotify_dynamic_parameters_signals.sh from the event version’s script inotify_dynamic_parameters_event.sh created above:

patch inotify_dynamic_parameters_events.sh event2signals.diff -o inotify_dynamic_parameters_signals.sh

The loop is simpler now as the functions are directly invoked by outside signals.
To use it, just send the SIGUSR[12] signals to the container, as shown below:

# write a new configuration file:
...
# and restart the watcher:
$ docker kill --signal SIGUSR1 <container>
 
# toggle the watcher;
$ docker kill --signal SIGUSR2 <container>

If testing outside a container, the commands would be:

/bin/kill --signal SIGUSR1 pid
# or:
/bin/kill --signal SIGUSR2 pid

where pid is the pid displayed when the script starts.
We won’t indulge more on signals, see How to stop Documentum processes in a docker container, and more (part I) for many more examples of using signals to talk to containers.

Conclusion

inotifywait is simple to configure and extremely fast, pratically instantaneous, at sending notifications. Although the aggregated output looks a bit confusing when the system is loaded, there are several ways to reduce its volume to make it easier to use. It is an interesting addition to an administrator to help troubleshooting repositories and their client applications. With suitable parameters, it can be used to support any containerized or traditional installations. It is one of those no frills tools that just work (mostly) as expected. If you ever happen to need a file watcher, consider it, you won’t be disappointed and it is fun to use.