Infrastructure at your Service

David Barbarin

Debugging SQL Server containers considerations

When it comes to troubleshooting processes or to get a deeper look of how SQL Server works internally , I always used debugging tools as windbg on Windows and since SQL Server is available on Linux, I switched to strace tool.

But let’s add containers in the game now. I didn’t want to touch any based image for applications including SQL Server with debugging tools added. So, my concern was to figure out how to debug a running Docker container from a separate container. Creating a custom image for only debugging tools including strace is not a hard task and after some quick google searches, it is easy to find blogs that explain the process.

Here is my Docker file:

FROM alpine

ENV CAPTUREMODE "S"
RUN apk update && apk add strace
CMD ["sh", "-c", "
strace -t -f -p $(if [ $CAPTUREMODE == ‘Start’ ]; then echo 1; else pgrep -P 1; fi)”]

 

I picked up the alpine based image for is low footprint and my custom image is designed to trace any running container. Note that I’m using the -p parameter to attach to the concerning process and begin tracing. In the Docker file the PID environment variable is intended to target the process ID of the remote container process I want to look at. Obviously, this image may be improved by adding other parameters to implement specific filter rules for example, but this is not the focus of this write-up.

As an aside, maybe you are wondering why I have implemented such variable because usually we trace the PID 1 within a container? (the magic of Docker entry point and Linux namespace)

(Update 06.09.2019 : Anthony E. Nocentino brought me to my attention a more elegant way to get dynamically the child processes of the PID 1. In fact it works well with a running container but not a starting container  because pgrep will return child processes of the sleep command. I modified my Dockerfile above accordingly to provide a way to choose which mode you want to enter)

Well, referring to the Bob Dorr’s blog post and applying it to the container world, good chances are the PID 1 doesn’t belong to the SQL Server process itself but it is a lightweight monitor process for the SQL Server process (WATCHDOG).

If I try to trace the PID 1 of a running SQL Server container, I get the following result with nothing relevant here …

strace: Process 1 attached with 2 threads
[pid     1] 08:11:59 wait4(9,  <unfinished ...>
[pid     8] 08:11:59 ppoll([{fd=15, events=POLLIN}], 1, NULL, NULL, 8

 

… hence the idea to implement the PID variable to provide the ability to pick up the right process to trace (by default the PID 1). Here an example of a Docker command to trace the sqlservr process inside my container:

$ docker ps | grep sql
e3aca69d5dff        mcr.microsoft.com/mssql/server:2019-latest   "/opt/mssql/bin/perm…"   About an hour ago   Up About an hour    0.0.0.0:1401->1433/tcp   sql1

$ docker exec -ti sql1 top
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                 
    9 root      20   0 12.337g 770160  58540 S   1.7 12.7   1:11.78 sqlservr
    1 root      20   0  148520  23036  13376 S   0.0  0.4   0:00.62 sqlservr
  325 root      20   0    1888    920    728 S   0.0  0.0   0:00.04 strace
  376 root      20   0   38728   3184   2748 R   0.0  0.1   0:00.04 top

$ docker run --name debug -t --pid=container:sql1 \
 --net=container:sql1 \
 --cap-add sys_admin \
 --cap-add sys_ptrace \
 strace | tee trace.txt

 

The PID 9 corresponds to the SQL Server process (not the WATCHDOG). Then I started my debug container which begun to trace a bunch of activities. The last docker command uses special parameters including –pid, –net and –cap-add to run the debug container in the same pid and network namespace. Pretty cool!

Here a trimmed sample of the trace.txt file during the creation of the test database:

strace: Process 9 attached with 161 threads
[pid   373] 08:21:49 restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid   343] 08:21:49 futex(0x7f9835de2cc4, FUTEX_WAIT_PRIVATE, 5, NULL <unfinished ...>
…
[pid   162] 08:26:05 open("/var/opt/mssql/data/test.mdf", O_RDWR|O_CREAT|O_EXCL|O_DIRECT, 0660 <unfinished ...>
[pid    14] 08:26:05 clock_gettime(CLOCK_REALTIME, {tv_sec=1567671965, tv_nsec=625743500}) = 0
…
[pid   162] 08:26:05 stat("/var/opt/mssql/data/test.mdf", {st_mode=S_IFREG|0640, st_size=0, ...}) = 0
[pid    14] 08:26:05 clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
…

 

So, for a running container everything seems to work like a charm but what if I want to catch events for a starting container? . I get this opportunity to thank Anthony E. Nocentino (@nocentino) to make me indirectly want to write this blog post 🙂 Anthony is  a Microsoft Data Platform MVP who is doing incredible stuff around SQL Server containers with blogs and trainings. I would simply say just follow it :). At the first step, I got the same idea as him and he already explained very well the context of this tracing problem when SQL Server is starting and the way he found to fix it. Everything is in his blog post so there is no need to duplicate his work. The interesting point here is Anthony’s final approach consists in creating a custom container with SQL Server and strace installed. That’s exactly at the opposite of what I want to achieve but it seemed to be the better alternative so far. To be honest, it remains a good and simple approach in any cases.

But I kept motivated (thank again Anthony for that!), to find another solution I decided to continue exploring other alternatives based on my first requirements (a separate container) and I believe I found an interesting one. Just to refresh your mind, the problem here is to ensure catching all events since the startup of the SQL Server process but However running 2 containers in parallel – the SQL Server container and the debug container including strace tool – seems to be not a viable option and this is exactly what Anthony explained in the first attempt of tests section of his blog.

If you start the debug container too early before the SQL Server container process has started the sqlservr process …

docker run --rm --name debug -t --pid=container:sql1 \
 --net=container:sql1 \
 --cap-add sys_admin \
 --cap-add sys_ptrace \
 strace > trace.txt & \ docker run --name sql1 \
 --env 'ACCEPT_EULA=Y' \
 --env 'MSSQL_SA_PASSWORD=Password1' \
 --volume /t/Docker/DMK/BACKUP/test1:/var/opt/mssql/ \
 --publish 1401:1433 \
 --detach \
 mcr.microsoft.com/mssql/server:2019-latest

 

… chances are you will face an error as show below:

docker: Error response from daemon: No such container: sql1.

 

At the opposite, if you start the debug container too late (note the introduction of the sleep command)  …

$ docker run --name debug -t --pid=container:sql1 \
 --net=container:sql1 \
 --cap-add sys_admin \
 --cap-add sys_ptrace \
 strace > trace.txt & \
 sleep .001 && docker restart sql1

[1] 1171
sql1

[1]  + 1171 exit 137   docker run --name debug -t --pid=container:sql1 --net=container:sql1 --cap-ad

 

… chances are you will miss some events. Here the beginning of the trace output:

trace: Process 1 attached
13:52:53 pread64(13, "\33\n\6-\16\2rx\6\2p\3(\237\24\0\6\24*~L\16\0\n\7\6oQ\16\0\n\6"..., 131072, 31195148) = 131072
13:52:53 pread64(13, "\1\0\245\20\0\0\0\0\0\0\306\r(i\2\0\340\1\0\0\246\20\0\0\0\0\0\0\306\rhi"..., 131072, 31326220) = 131072
13:52:53 pread64(13, "\301\35\2\0b.\0\0\344\10\362\31\2\0i.\0\0\364\t\260\225\1\0\205\4\0\0l\nIQ"..., 131072, 31457292) = 131072

 

Referring to the above strace output, it is clear we didn’t catch all the events from the startup of the sqlservr process with the following missing entry for example:

execve("/opt/mssql/bin/sqlservr", ["/opt/mssql/bin/sqlservr"]

 

In a nutshell, this is not a viable way of achieving what we want. My approach, based on keeping running separate containers, consists in overriding the default entry point in order to delay directly the call of the sqlservr binary during the container runtime.

We may easily identify the Docker entry point by running the docker history command against the concerned SQL Server docker image as follow:

$ docker history c5a295efea97 --no-trunc | grep ENTRYPOINT
<missing>                                                                 2 weeks ago         /bin/sh -c #(nop)  ENTRYPOINT ["/opt/mssql/bin/permissions_check.sh"]

 

When the SQL Server container is spinning up the /opt/mssql/bin/permissions_check.sh is called. Without going into details here, this bash script executes at the final stage the CMD from the Dockerfile which the following is:

$ docker history c5a295efea97 --no-trunc | grep CMD
sha256:c5a295efea970743a71a330a1d314458272811e3524e1fc1245e6d6fe57aab90   2 weeks ago         /bin/sh -c #(nop)  CMD ["/opt/mssql/bin/sqlservr"]

 

It only remains to override the entry point to make a custom call of the sqlservr process during the container runtime with a delay of 4s for example. Here the command I used to start both the SQL Server container and the debug container in parallel and how I managed to trace all the desired events of the sqlservr process.

docker run --name sql1 \
 --env 'ACCEPT_EULA=Y' \
 --env 'MSSQL_SA_PASSWORD=Password1' \
 --volume /t/Docker/DMK/BACKUP/test1:/var/opt/mssql/ \
 --publish 1401:1433 \
 --detach \
 --entrypoint "/bin/sh" \
 mcr.microsoft.com/mssql/server:2019-latest -c 'sleep 4 && /opt/mssql/bin/sqlservr' && \
docker run --rm --name debug -t --pid=container:sql1 \
 --net=container:sql1 \
 --cap-add sys_admin \
 --cap-add sys_ptrace \
 strace > trace.txt

 

I get the following result (pretty similar to Anthony’s result in his blog post)

strace: Process 1 attached
09:12:12 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6
09:12:15 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
09:12:15 rt_sigreturn({mask=[]})        = 6
09:12:15 clone(strace: Process 13 attached
 <unfinished ...>
[pid    13] 09:12:15 execve("/opt/mssql/bin/sqlservr", ["/opt/mssql/bin/sqlservr"], 0x5577ad6ccbc8 /* 6 vars */ <unfinished ...>
[pid     1] 09:12:15 <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f941b2499d0) = 13
[pid     1] 09:12:15 wait4(-1,  <unfinished ...>
[pid    13] 09:12:15 <... execve resumed> ) = 0
[pid    13] 09:12:15 brk(NULL)          = 0x558af526b000
[pid    13] 09:12:15 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
[pid    13] 09:12:15 readlink("/proc/self/exe", "/opt/mssql/bin/sqlservr", 4096) = 23
[pid    13] 09:12:15 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
[pid    13] 09:12:15 open("/opt/mssql/bin/tls/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
…

 

Here we go ! Feel free to comment!

Happy container debugging!

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

David Barbarin
David Barbarin

Principal Consultant & Microsoft Technology Leader