Infrastructure at your Service

David Barbarin

SQL Server on Linux: Introduction to DBFS experimental tool

A couple of months ago, Microsoft announced two additional command line tools for SQL Server that are mssql-scripter and DBFS. The latter has drawn my attention because it exposes live data from SQL Server DMVs as virtual files in a virtual directory on Linux operating system. Microsoft has probably taken another positive step in the SQL Server’s adoption on Linux. Indeed, in a Linux world, we may get Kernel’s performance and configuration data either directly from the procfs or indirectly by using tools that involve procfs in the background.

blog 123 - sqlserver dbfs

DBFS uses the FUSE filesystem module to expose DMVs and according to Microsoft blog, reading data from each concerned DMV file is a live process that ensures to get always fresh data.

[mikedavem@sql server]$ rpm -qR $(rpm -qa | grep dbfs)
glibc
fuse
fuse-devel
freetds
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadIsXz) <= 5.2-1

After installing DBFS, let’s start the tool by using the following command line:

[mikedavem@sql mssql]$ dbfs -m /var/opt/mssql/dmv -c /var/opt/mssql/secrets/dmvtool.conf -v      
1: Processing entry for section server in configuration file:
SUCCESSFULLY added entry for server server.

The configuration file (-c parameter) stores sensitive information to connect to the SQL Server instance (hostname, user and “insecure” password). You shall consider to protect this file with suitable permissions.

DBFS is a background process by default but you may change the behavior by using -f parameter at the startup.

[mikedavem@sql mssql]$ ps -e | grep dbfs
  2673 ?        00:00:00 dbfs

At this stage, the negative point is that to start DBFS manually if the server restarts. I had thought to wrap DBFS in a cron job that will run at the server startup but my preference would be to get the possibility to control DBFS through system as a service.

Exploring the DMV mounted filesystem

In fact, every DMV is exposed in two formats (normal file and JSON file).

[mikedavem@sql server]$ pwd
/var/opt/mssql/dmv/server
[mikedavem@sql server]$ ls -I "*.json" | wc -l
494

DBFS exposes a lot of DMVs as we notice above. We may also want to filter regarding the category plan described in the BOL by using grep.

For instance, AlwaysOn related DMVs …

[mikedavem@sql server]$ ls -I "*.json" | grep dm_hadr
dm_hadr_automatic_seeding
dm_hadr_auto_page_repair
dm_hadr_availability_group_states
dm_hadr_availability_replica_cluster_nodes
dm_hadr_availability_replica_cluster_states
dm_hadr_availability_replica_states
dm_hadr_cluster
dm_hadr_cluster_members
dm_hadr_cluster_networks
dm_hadr_database_replica_cluster_states
dm_hadr_database_replica_states
dm_hadr_instance_node_map
dm_hadr_name_id_map
dm_hadr_physical_seeding_stats
…

… or database related DMVs

[mikedavem@sql server]$ ls -I "*.json" | grep dm_db
dm_db_column_store_row_group_operational_stats
dm_db_column_store_row_group_physical_stats
dm_db_file_space_usage
dm_db_fts_index_physical_stats
dm_db_index_usage_stats
dm_db_log_space_usage
dm_db_mirroring_auto_page_repair
dm_db_mirroring_connections
dm_db_mirroring_past_actions
dm_db_missing_index_details
dm_db_missing_index_groups
dm_db_missing_index_group_stats
dm_db_partition_stats
dm_db_persisted_sku_features
dm_db_rda_migration_status
dm_db_rda_schema_update_status
dm_db_script_level
dm_db_session_space_usage
dm_db_task_space_usage 
…

The schema name suffix (sys) is not present for DMV related files.

Extracting data from DMV related files

Linux provides very powerful tools to consume data from files like tr, cut, split, sort, join, cat, grep and awk to cite few of them. You may find some examples in Github but I decided to get my own experience by attempting to address some usual DBA queries.

  • Extracting data from sys.dm_os_sys_info to retrieve CPU, memory available information on the server as well as the memory manager consumption

We may use either cut command to extract required columns from the dm_os_sys_info file.

[mikedavem@sql server]$ cut -d$'\t' -f3,4,5,7,8,9,15,27 dm_os_sys_info
cpu_count       hyperthread_ratio       physical_memory_kb      committed_kb    committed_target_kb      visible_target_kb       scheduler_count virtual_machine_type_desc
4       4       3918848 207160  3914240 3914240 4       HYPERVISOR

But the above output is not very readable and we may want to display the information differently (in column rather than in row). Using awk may be your best solution in this case:

[mikedavem@sql server]$ for ((i=1;i<$(cut -d$'\t' -f3,4,5,7,8,9,15,27 dm_os_sys_info | head -n 1 | wc -w);i++)); do awk '{print $'$i'}' <(cut -d$'\t' -f3,4,5,7,8,9,15,27 dm_os_sys_info) | tr '\n' ' ';echo;done | column -t
cpu_count			4
hyperthread_ratio	4
physical_memory_kb	3918848
committed_kb		207296
committed_target_kb	3914240
visible_target_kb	3914240
scheduler_count		4

 

  • Extracting information from sys.databases

A typical query I may see is to retrieve database information including their name, id, state, recovery model and owner. Basically , we have to join two DMVs to get all the requested information: sys.databases and sys.server_principals. In Linux world, you may also use the join command to gather information from different files but it implies to pre-sort each data file first.

But getting the desired output may be challenging. Indeed, firstly we will probably choose the grep / cut tools to extract only rows and columns we want. But using such tools will prevent to keep the column context and getting only column values from the DMV related files may be meaningless in this case as shown below :

[mikedavem@sql server]$ join -1 2 -2 3 -t$'\t' -o 2.1,2.2,2.3,2.4,2.5,1.1 \
<(grep -Ev '^[NT|##]|SERVER_ROLE' server_principals | cut -d$'\t' -f 1,3 | sort -t$'\t' -k2) \
<(cut -d$'\t' -f 1,2,4,14,22 databases | sort -t$'\t' -k3) | column -t
model				3  01  ONLINE  FULL    sa
test				6  01  ONLINE  FULL    sa
ApplixEnterprise	5  01  ONLINE  SIMPLE  sa
master				1  01  ONLINE  SIMPLE  sa
msdb				4  01  ONLINE  SIMPLE  sa
tempdb				2  01  ONLINE  SIMPLE  sa

But preserving the column context may become also your concern if you want to sort column data values. Indeed, let’s say you want to sort the above output by the database_id column value. You will quickly notice that the header file will be included by the sort operator. Definitely not the result we expect in this case. So, in order to meet our requirement, we may use again the very powerful awk command as shown below:

[mikedavem@sql server]$ join --header -1 2 -2 3 -t$'\t' -o 2.1,2.2,2.3,2.4,2.5,1.1 \
<(head -n1 server_principals | cut -d$'\t' -f 1,3;grep -Ev '^[NT|##]|SERVER_ROLE' server_principals | cut -d$'\t' -f 1,3 | sort -t$'\t' -k2) \
<(head -n 1 databases | cut -d$'\t' -f 1,2,4,14,22;cut -d$'\t' -f 1,2,4,14,22 databases | sort -t$'\t' -k3) \
| awk 'NR<2{print $0;next}{print $0 | "sort -k2"}' | column -t
name				database_id  	owner_sid	state_desc	recovery_model_desc	name
master				1				01		ONLINE      	SIMPLE				sa
tempdb				2            	01		ONLINE      	SIMPLE				sa
model				3            	01		ONLINE      	FULL				sa
msdb				4            	01		ONLINE      	SIMPLE				sa
ApplixEnterprise	5            	01		ONLINE      	SIMPLE				sa
test				6            	01		ONLINE      	FULL				sa

We finally managed to display the desired output but my feeling is we worked hard for few results and usually we have to deal with more DMVs and complex join than the previous example in the same query. Furthermore, we often have to aggregate data from DMVs to get relevant results and I may easily imagine the additional efforts to produce the corresponding scripts if we extend the scenarios we have to deal with.

There are probably other tools on Linux to make the task easier but my opinion is that DBFS should include the ability to execute custom queries already used by DBAs day-to-day to go beyond the actual capabilities. We didn’t surface JSON format in this blog. This is another way to consume data from DMV related files but in my opinion, it targets more the developers than the DBAs audience.

I noticed that I was not the first and the only one to think about those enhancements by looking at the DBFS Github and the issues section. Anyway, I like the Microsoft’s initiative to give us the ability to consume DMVs related data from the command line in a Linux world and I’m looking forward the next features about this tool!

 

 

 

Leave a Reply


five − = 4

David Barbarin
David Barbarin

Senior Consultant & Microsoft Technology Leader