Infrastructure at your Service

David Barbarin

Upgrading SQL Server pods on K8s and helm charts

It has been while since my last blog. Today it is about continuing with helm charts and how to upgrade / downgrade SQL Server containers to a specific cumulative update. My first write-up in my to-do list.

blog 149 - 0 - banner

Last year, I wrote an introduction of SQL Server containers on K8s. I remembered to face some issues when testing upgrade scenarios (probably a lack of knowledge). Since then, I have discovered helm charts and I use them intensively with my environments and they also provide upgrade / rollback capabilities.

So, the question is how to upgrade an existing SQL Server container to a new cumulative update with a helm chart?

First of all, during deployment you need to specify a strategy type. There are several strategy types and most of them address upgrade scenarios with stateless applications (ramped, blue/green, canary and a/b testing). Unfortunately, with stateful applications like SGBDRs the story is not the same because persistent storage cannot be accessed by several at time. In this case K8s must first stop and remove the current pod and then spin up a new pod with the new version. “recreate” strategy type is designed to carry out this task and to address SQL Server pod upgrade scenarios.

My deployment file is as follow:

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: {{ template "mssql.fullname" . }}
  labels:
    app: {{ template "mssql.name" . }}
    chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
{{- if .Values.deployment.annotations }}
  annotations:
{{ toYaml .Values.deployment.annotations | indent 4 }}
{{- end }}
spec:
  replicas: {{ .Values.replicaCount }}
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: {{ template "mssql.name" . }}
      release: {{ .Release.Name }}
  template:
    metadata:
      labels:
        app: {{ template "mssql.name" . }}
        release: {{ .Release.Name }}
    spec:
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          env:
            - name: ACCEPT_EULA
              value: "{{ .Values.acceptEula.value | upper }}"
            - name: MSSQL_PID
              value: "{{ .Values.edition.value }}"
            - name: MSSQL_SA_PASSWORD
              valueFrom:
               secretKeyRef:
                 name: {{ template "mssql.fullname" . }}-sa-secret
                 key: sapassword
            - name: MSSQL_TCP_PORT
              value: "{{ .Values.service.port.value }}"
            - name: MSSQL_LCID
              value: "{{ .Values.lcid.value }}"
            - name: MSSQL_COLLATION
              value: "{{ .Values.collation.value }}"
            - name: MSSQL_ENABLE_HADR
              value: "{{ .Values.hadr.value }}"
            {{ if .Values.resources.limits.memory }}
            - name: MSSQL_MEMORY_LIMIT_MB
              valueFrom:
                resourceFieldRef:
                  resource: limits.memory
                  divisor: 1Mi
            {{ end }}
          ports:
            - name: mssql
              containerPort: {{ .Values.service.port.value }}
          volumeMounts:
            - name: data
              mountPath: /var/opt/mssql/data
          livenessProbe:
            tcpSocket:
               port: mssql
            initialDelaySeconds: {{ .Values.livenessprobe.initialDelaySeconds }}
            periodSeconds: {{ .Values.livenessprobe.periodSeconds }}
          readinessProbe:
            tcpSocket:
               port: mssql
            initialDelaySeconds: {{ .Values.readinessprobe.initialDelaySeconds }}
            periodSeconds: {{ .Values.readinessprobe.periodSeconds }}
          resources:
{{ toYaml .Values.resources | indent 12 }}
    {{- if .Values.nodeSelector }}
      nodeSelector:
{{ toYaml .Values.nodeSelector | indent 8 }}
    {{- end }}
      volumes:
      - name: data
      {{- if .Values.persistence.enabled }}
        persistentVolumeClaim:
          {{- if .Values.persistence.existingDataClaim }}
          claimName: {{ .Values.persistence.existingDataClaim }}
          {{- else }}
          claimName: {{ template "mssql.fullname" . }}-data
          {{- end -}}
      {{- else }}
        emptyDir: {}
      {{- end }}

 

My default values (in values.yaml) are the following:

# General parameters
acceptEula: 
  value: "Y"
edition: 
  value: "Developer"
collation: 
  value: SQL_Latin1_General_CP1_CI_AS
lcid: 
  value: 1033
hadr: 
    value: 0
# User parameters
sapassword: 
  value: Password1
# Image parameters
image:
  repository: mcr.microsoft.com/mssql/server
  tag: 2017-CU12-ubuntu
  pullPolicy: IfNotPresent
# Service parameters
service:
  type: 
    value: LoadBalancer
  port: 
    value: 1460
  annotations: {}
deployment:
  annotations: {}
# Volumes & persistence parameters
persistence:
  enabled: true
  storageClass: ""
  dataAccessMode: ReadWriteOnce
  dataSize: 5Gi
# Probe parameters
livenessprobe:
  initialDelaySeconds: 20
  periodSeconds: 15
readinessprobe:
  initialDelaySeconds: 20
  periodSeconds: 15
# Resourcep parameters
resources:
  limits:
  #  cpu: 100m
    memory: 3Gi
  # requests:
  #  cpu: 100m
  #  memory: 2Gi
nodeSelector: {}

You may notice I will pull a SQL Server image from the MCR with the 2017-CU12-ubuntu tag.

Let’s now install SQL2017container release:

$ helm install --name sql2017container .

 

This command will install a helm release which includes among others a deployment, a replicaset with one pod (my SQL Server pod), a secret that contains the sa password, a persistence volume claim to persistent my database files (mapped to the /var/opt/mssql/data path inside the pod) and the service to expose the pod on port 1460 TCP.

$ helm status sql2017container
LAST DEPLOYED: Tue Mar 12 20:36:12 2019
NAMESPACE: ci
STATUS: DEPLOYED

RESOURCES:
==> v1/Secret
NAME                                        TYPE    DATA  AGE
sql2017container-dbi-mssql-linux-sa-secret  Opaque  1     7m7s

==> v1/PersistentVolumeClaim
NAME                                   STATUS  VOLUME                                    CAPACITY  ACCESS MODES  STORAGECLASS  AGE
sql2017container-dbi-mssql-linux-data  Bound   pvc-18304483-44fe-11e9-a668-ca78ebdc2a19  5Gi       RWO           default       7m7s

==> v1/Service
NAME                              TYPE          CLUSTER-IP    EXTERNAL-IP     PORT(S)         AGE
sql2017container-dbi-mssql-linux  LoadBalancer  10.0.104.244  xx.xx.xx.xx  1460:31502/TCP  7m6s

==> v1beta2/Deployment
NAME                              DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
sql2017container-dbi-mssql-linux  1        1        1           1          7m6s

==> v1/Pod(related)
NAME                                               READY  STATUS   RESTARTS  AGE
sql2017container-dbi-mssql-linux-76b4f7c8f5-mmhqt  1/1    Running  0         7m6s

 

My SQL Server pod is running with the expected version and CU:

master> select @@version AS [version];
+-----------+
| version   |
|-----------|
| Microsoft SQL Server 2017 (RTM-CU12) (KB4464082) - 14.0.3045.24 (X64)
        Oct 18 2018 23:11:05
        Copyright (C) 2017 Microsoft Corporation
        Developer Edition (64-bit) on Linux (Ubuntu 16.04.5 LTS)           |
+-----------+
(1 row affected)
Time: 0.354s

 

It’s time now to upgrade my pod with the latest CU13 (at the moment of this write-up). With helm charts this task is pretty simple. I will just upgrade my release with the new desired tag as follows:

$ helm upgrade sql2017container . --set=image.tag=2017-CU13-ubuntu
Release "sql2017container" has been upgraded. Happy Helming!

 

Let’s dig further into deployment stuff:

$ kubectl describe deployment sql2017container-dbi-mssql-linux

 

The interesting part is below:

Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  18m   deployment-controller  Scaled up replica set sql2017container-dbi-mssql-linux-76b4f7c8f5 to 1
  Normal  ScalingReplicaSet  1m    deployment-controller  Scaled down replica set sql2017container-dbi-mssql-linux-76b4f7c8f5 to 0
  Normal  ScalingReplicaSet  1m    deployment-controller  Scaled up replica set sql2017container-dbi-mssql-linux-799ff7979b to 1

 

Referring to the deployment strategy, the deployment controller has recreated a new ReplicaSet (and a new SQL Server pod) accordingly. A quick check from client tool confirms the instance has been upgraded correctly:

master> select @@version AS [version];
+-----------+
| version   |
|-----------|
| Microsoft SQL Server 2017 (RTM-CU13) (KB4466404) - 14.0.3048.4 (X64)
        Nov 30 2018 12:57:58
        Copyright (C) 2017 Microsoft Corporation
        Developer Edition (64-bit) on Linux (Ubuntu 16.04.5 LTS)           |
+-----------+
(1 row affected)
Time: 0.716s

 

Another interesting part is how SQL Server detects the new image and starts upgrading process. Let’s dump the SQL Server log pod. I just put a sample of messages from the pod log to get a picture of scripts used during the upgrade.

$ kubectl logs sql2017container-dbi-mssql-linux-799ff7979b-knqrm
2019-03-12 19:54:59.11 spid22s     Service Broker manager has started.
2019-03-12 19:54:59.44 spid6s      Database 'master' is upgrading script 'ProvisionAgentIdentity.sql' from level 234884069 to level 234884072.
2019-03-12 19:54:59.45 spid6s      Database 'master' is upgrading script 'no_op.sql' from level 234884069 to level 234884072.
2019-03-12 19:54:59.70 spid6s      Database 'master' is upgrading script 'no_op.sql' from level 234884069 to level 234884072.
….
2019-03-12 19:54:59.70 spid6s      -----------------------------------------
2019-03-12 19:54:59.70 spid6s      Starting execution of dummy.sql
2019-03-12 19:54:59.70 spid6s      -----------------------------------------
…
2019-03-12 19:55:00.24 spid6s      Starting execution of PRE_MSDB.SQL
2019-03-12 19:55:00.24 spid6s      ----------------------------------
2019-03-12 19:55:00.70 spid6s      Setting database option COMPATIBILITY_LEVEL to 100 for database 'msdb'.
2019-03-12 19:55:00.90 spid6s      -----------------------------------------
2019-03-12 19:55:00.90 spid6s      Starting execution of PRE_SQLAGENT100.SQL
2019-03-12 19:55:00.90 spid6s      -----------------------------------------
…
2019-03-12 19:55:12.09 spid6s      ----------------------------------
2019-03-12 19:55:12.09 spid6s      Starting execution of MSDB.SQL
2019-03-12 19:55:12.09 spid6s      ----------------------------------
…
2019-03-12 19:55:12.86 spid6s      -----------------------------------------
2019-03-12 19:55:12.86 spid6s      Starting execution of MSDB_VERSIONING.SQL
2019-03-12 19:55:12.86 spid6s      -----------------------------------------
…
2019-03-12 19:55:51.68 spid6s      -----------------------------------------
2019-03-12 19:55:51.68 spid6s      Starting execution of EXTENSIBILITY.SQL
2019-03-12 19:55:51.68 spid6s      -----------------------------------------
…
2019-03-12 19:56:01.51 spid6s      --------------------------------
2019-03-12 19:56:01.51 spid6s      Starting execution of Alwayson.SQL
2019-03-12 19:56:01.51 spid6s      --------------------------------
…
2019-03-12 19:56:29.17 spid6s      ------------------------------------
2019-03-12 19:56:29.17 spid6s      Moving 2005 SSIS Data to 2008 tables
2019-03-12 19:56:29.17 spid6s      ------------------------------------
…
2019-03-12 19:56:32.52 spid6s      ------------------------------------------------------
2019-03-12 19:56:32.52 spid6s      Starting execution of UPGRADE_UCP_CMDW_DISCOVERY.SQL
2019-03-12 19:56:32.52 spid6s      ------------------------------------------------------
…
2019-03-12 19:56:32.66 spid6s      ------------------------------------------------------
2019-03-12 19:56:32.66 spid6s      Starting execution of SSIS_DISCOVERY.SQL
2019-03-12 19:56:32.66 spid6s      ------------------------------------------------------
…
2019-03-12 19:56:32.83 spid6s      ------------------------------------------------------
2019-03-12 19:56:32.83 spid6s      Start provisioning of CEIPService Login
2019-03-12 19:56:32.83 spid6s      ------------------------------------------------------
…

 

A set of scripts developed by the SQL Server team runs during the SQL Server pod startup and updates different parts of the SQL Server instance.

Helm provides a command to view release history …

$ helm history sql2017container
REVISION        UPDATED                         STATUS          CHART                   DESCRIPTION
1               Tue Mar 12 20:36:12 2019        SUPERSEDED      dbi-mssql-linux-1.0.0   Install complete
2               Tue Mar 12 20:53:26 2019        DEPLOYED        dbi-mssql-linux-1.0.0   Upgrade complete

 

… and to rollback to previous release revision if anything goes wrong:

$ helm rollback sql2017container 1

 

The same process applies here. The deployment controller will recreate a ReplicaSet and a downgraded SQL Server pod to the previous version.

$ kubectl describe deployment sql2017container-dbi-mssql-linux
Events:
  Type    Reason             Age               From                   Message
  ----    ------             ----              ----                   -------
  Normal  ScalingReplicaSet  31m               deployment-controller  Scaled down replica set sql2017container-dbi-mssql-linux-76b4f7c8f5 to 0
  Normal  ScalingReplicaSet  31m               deployment-controller  Scaled up replica set sql2017container-dbi-mssql-linux-799ff7979b to 1
  Normal  ScalingReplicaSet  6m                deployment-controller  Scaled down replica set sql2017container-dbi-mssql-linux-799ff7979b to 0
  Normal  ScalingReplicaSet  6m (x2 over 49m)  deployment-controller  Scaled up replica set sql2017container-dbi-mssql-linux-76b4f7c8f5 to 1

 

Same set of TSQL scripts seem to be executed again during the SQL Server pod startup for downgrade purpose this time.

The release rollback is logged in the release history:

$ helm history sql2017container
REVISION        UPDATED                         STATUS          CHART                   DESCRIPTION
1               Tue Mar 12 20:36:12 2019        SUPERSEDED      dbi-mssql-linux-1.0.0   Install complete
2               Tue Mar 12 20:53:26 2019        SUPERSEDED      dbi-mssql-linux-1.0.0   Upgrade complete
3               Tue Mar 12 21:18:57 2019        DEPLOYED        dbi-mssql-linux-1.0.0   Rollback to 1

 

Rollback capabilities of helm charts (and implicitly of K8s) may be attractive but for database applications it will likely not fit with all upgrade scenarios. To be used sparingly … What’s next? Taking a look at the upgrade scenarios with availability groups on K8s for sure … see you on a next write-up!

 

 

David Barbarin
David Barbarin

Principal Consultant & Microsoft Technology Leader