Infrastructure at your Service

Franck Pachot

12c online move datafile in same filesystem.

On Linux, when you move a datafile, with “mv” within the filesystem it’s just a rename. There is no copy. In 12c you can move a datafile online where oracle takes care of the move at OS level. But be careful. Even if you are in the same filesystem, moving a datafile online does a copy of the file.

Linux mv

I have a file, /u01/oradata/test1.txt and I move it to /u01/oradata/test2.txt within the same filesystem:

mv /u01/oradata/test1.txt /u01/oradata/test2.txt

Actually, I’m running it with strace, tracing file operations for these files:

strace -e trace=file mv /u01/oradata/test1.txt /u01/oradata/test2.txt

Then I can see clearly that there is no open() call but just a rename():

execve("/usr/bin/mv", ["mv", "/u01/oradata/test1.txt", "/u01/oradata/test2.txt"], [/* 29 vars */]) = 0
stat("/u01/oradata/test2.txt", 0x7ffcfa624270) = -1 ENOENT (No such file or directory)
lstat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lstat("/u01/oradata/test2.txt", 0x7ffcfa623f20) = -1 ENOENT (No such file or directory)
rename("/u01/oradata/test1.txt", "/u01/oradata/test2.txt") = 0

If I do the same to another filesystem:

strace -e trace=file mv /u01/oradata/test2.txt /u02/oradata/test2.txt

a rename() is attempted:

execve("/usr/bin/mv", ["mv", "/u01/oradata/test2.txt", "/u02/oradata/test2.txt"], [/* 29 vars */]) = 0
stat("/u02/oradata/test2.txt", 0x7fff1e2b3340) = -1 ENOENT (No such file or directory)
lstat("/u01/oradata/test2.txt", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lstat("/u02/oradata/test2.txt", 0x7fff1e2b2ff0) = -1 ENOENT (No such file or directory)
rename("/u01/oradata/test2.txt", "/u02/oradata/test2.txt") = -1 EXDEV (Invalid cross-device link)

but because it’s another filesystem, the “mv” command will do like a “cp”, open the source in read and the target in write, creating it if not exist:

unlink("/u02/oradata/test2.txt") = -1 ENOENT (No such file or directory)
lgetxattr("/u01/oradata/test2.txt", "security.selinux", "unconfined_u:object_r:unlabeled_t:s0", 255) = 37
open("/u01/oradata/test2.txt", O_RDONLY|O_NOFOLLOW) = 3
open("/u02/oradata/test2.txt", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
newfstatat(AT_FDCWD, "/u01/oradata/test2.txt", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
unlinkat(AT_FDCWD, "/u01/oradata/test2.txt", 0) = 0

Move datafile

Let’s do the same from the database where I’ve created the same datafile:

SQL> create tablespace DEMO datafile '/u01/oradata/test1.txt' size 5M;
Tablespace created.

I get my shadow process PID:

SQL> set define %
SQL> column spid new_value spid
SQL> select spid from v$process join v$session on paddr=addr where sid=sys_context('userenv','sid');
 
SPID
------------------------
7257
 
SQL> host ps -fp %spid
UID PID PPID C STIME TTY TIME CMD
oracle 7257 7256 93 21:35 ? 00:00:03 oracleCDB1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

and run strace on it:
SQL> host strace -o /tmp/tmp.log -e trace=file -p %spid &
I move the file online to the same filesystem:

SQL> alter database move datafile '/u01/oradata/test1.txt' to '/u01/oradata/test2.txt';
Database altered.

and read the trace:
grep /test /tmp/tmp.log


stat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
stat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
open("/u01/oradata/test1.txt", O_RDONLY) = 11
stat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
open("/u01/oradata/test1.txt", O_RDONLY) = 11
stat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
statfs("/u01/oradata/test1.txt", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=3063586, f_bfree=1014477, f_bavail=853095, f_files=786432, f_ffree=672545, f_fsid={1366374386, -1275721753}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/oradata/test1.txt", O_RDONLY) = 11
open("/u01/oradata/test1.txt", O_RDWR|O_DSYNC) = 11
stat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
stat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
statfs("/u01/oradata/test1.txt", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=3063586, f_bfree=1014477, f_bavail=853095, f_files=786432, f_ffree=672545, f_fsid={1366374386, -1275721753}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/oradata/test1.txt", O_RDONLY) = 11
stat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
stat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
statfs("/u01/oradata/test1.txt", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=3063586, f_bfree=1014477, f_bavail=853095, f_files=786432, f_ffree=672545, f_fsid={1366374386, -1275721753}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/oradata/test1.txt", O_RDONLY) = 11

At this point the source datafile is opened in read. It continues with:

open("/u01/oradata/test2.txt", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/u01/oradata/test2.txt", 0x7ffd0201e5d8) = -1 ENOENT (No such file or directory)
open("/u01/oradata/test2.txt", O_RDWR|O_CREAT|O_EXCL|O_SYNC, 0660) = 11

and now the destination datafile in write, created if not exist (existence had been tested before). Then it starts the copy:

stat("/u01/oradata/test2.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
stat("/u01/oradata/test2.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
open("/u01/oradata/test2.txt", O_RDONLY) = 11
stat("/u01/oradata/test2.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
open("/u01/oradata/test2.txt", O_RDONLY) = 11
statfs("/u01/oradata/test2.txt", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=3063586, f_bfree=1014475, f_bavail=853093, f_files=786432, f_ffree=672544, f_fsid={1366374386, -1275721753}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/oradata/test2.txt", O_RDONLY) = 11
open("/u01/oradata/test2.txt", O_RDWR) = 11
open("/u01/oradata/test2.txt", O_RDWR|O_DSYNC) = 11

at the end, because I didn’t use the ‘KEEP’ option, so the source file is deleted:

stat("/u01/oradata/test1.txt", {st_mode=S_IFREG|0640, st_size=5251072, ...}) = 0
unlink("/u01/oradata/test1.txt") = 0
stat("/u01/app/oracle/diag/rdbms/cdb1/CDB1/log/test", {st_mode=S_IFDIR|0750, st_size=4096, ...}) = 0

So what?

As any online move operation, you need two times the space during the copy. Here, the source file is still used for read and write until the move finishes, so this cannot be a rename. A rename would have to put the datafile offline even for a short time. If you can afford a small downtime, then you may prefer to use the offline rename (“mv” at OS level, ALTER DATABASE RENAME FILE at database level).

 

3 Comments

  • Nice discovery on how Oracle does things behind the scenes. I wonder why they didn’t implement it via hard links.

    Using hard links you can kind-a do the move online by yourself, as it allows you to have the file at both locations.

    One cool trick with hard links is the ability to “undelete” a file that you accidentally removed, with zero downtime:

    https://www.pythian.com/blog/how-to-recover-deleted-oracle-datafiles-with-no-downtime/

     
    • Hi Christo,

      Yes they can implement a rename online with your solution. However the goal of online datafile move is not rename but move to another filesystem. Actually, they didn’t implement it to make our life easy but just because they need it with ILM feature. ADO can automatically move cold data to lower cost storage so to do it automatically they need to do it online.
      If one day they need to rename online for the implementation of an an expensive option, then we may get it ;)

      Thanks for the link to you blog. Very well explained. Just take care than when >= 11.2.0.2 the checkpoint will crash the instance when unable to write to a datafile, so you must have to be quick! Or set “_datafile_write_errors_
      crash_instance” to false. Details here: http://www.dbi-services.com/wp-insides/uploads/2016/01/1.-UKOUG_FranckPachot.pdf

      Regards,
      Franck.

       
      • Ha, I haven’t tested with anything beyond 10 I think. Thanks for sharing – didn’t know this little detail. I agree with your points in your article, I am just wondering if shutdown abort is the best option here…

        It could try to dirty buffers elsewhere .. it could “switch” the instance into read only mode… so many better choices. Lots of room for improvement.

        I wonder what the error is however… because removing a file does not cause an error for a process that already has the file open. So they must be doing a “stat” or close/reopen somewhere to notice the file is removed.

         

Leave a Reply


6 − one =

Franck Pachot
Franck Pachot

Technology Leader