Infrastructure at your Service

Morgan Patou

Documentum story – Restrict the number of jpeg renditions

As already explained in this blog, we had to remove the PDF renditions for a customer to only keep the jpeg renditions on the ADTS side because the PDF was generated by another third-party tool. If you take a look at the dmr_content items attached to a specific document (parent_id), you will understand that there is by default only one PDF content while there might be a lot more jpeg renditions. Why? The answer to that is pretty simple, the content of the PDF rendition contains all the pages of the document while the jpeg renditions are only for a single page… Therefore if an ADTS is processing a document of 25 pages, then it will create 25 jpeg renditions. Actually that’s even more than that… Indeed by default the ADTS generates two types/formats of jpeg renditions for each page which double the total number of renditions. Here are the two formats available by default:

  • jpeg_lres: (Low Resolutions) that’s actually a real size preview in low resolution of the page
  • jpeg_story: (StoryBoards) that’s a reduced size preview of the page, quite hard to read what’s written…

 

jpeg_lres is the format used by the Thumbnail Server and also by D2 for the preview widget. On the other hand, jpeg_story isn’t used at all in D2 4.5. According to EMC, it *might* has been used for D2 4.1 and previous versions but they aren’t sure about it… ;)

 

Now that this has been said, let’s go back to the title of this blog. Because of this behavior of the ADTS, it might happen that one day you will see hundreds or even thousands of dmr_content items deleted. You might think that there is something wrong, that the cleanup jobs deleted too many objects or something like that… So if this happens to you, please take a look at the format of these dmr_content! Several months ago, the cleanup jobs were inactive for a few weeks because of a bug and when we reactivated them, this happened to us and we finally found out that 95% of these items where only jpeg renditions and that this was actually the expected behavior!

 

After that, we started thinking about how we should handle these jpeg renditions for really big documents? Because having the preview of the documents available in D2 is great but then is it really needed? Generating a preview of the first page or of the 10 first pages might makes sense but would it makes sense to generate a preview for each page of a document bigger than 10 pages? 100 pages? 1 000 pages? These previews are used in D2 and you need to move from one page to the other one starting with the page 1. If you absolutely want to see the preview of the page *364* in D2, then you will need 20 minutes to reach that page in the first place… I think that downloading the document is a little bit faster ;).

 

So is it possible to only generate previews for a few pages and setup a maximum number of jpeg renditions per document? The short answer to that is: yes! And that’s the purpose of this blog, wonderful, isn’t it?!

 

First check out the configuration file that will need to be updated:

[dmadmin@content_server_01 workspace]$ iapi DOCBASE -Udmadmin -Pxxx


        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2015
        All rights reserved.
        Client Library Release 7.2.0050.0084


Connecting to Server using docbase DOCBASE
[DM_SESSION_I_SESSION_START]info:  "Session 013f245a802173d6 started for user dmadmin."


Connected to Documentum Server running Release 7.2.0050.0214  Linux64.Oracle
Session id is s0
API> retrieve,c,dm_document where folder('/System/Media Server/Command Line Files') and object_name = 'storyboard_pdfstoryboard.xml'
...
093f245a801c9075
API> checkout,c,l
...
093f245a801c9075
API> getfile,c,l,/tmp/workspace/storyboard_pdfstoryboard.xml
...
/tmp/workspace/storyboard_pdfstoryboard.xml
API> flushcache,c
...
OK

 

Then retrieve a test document that will be used to see how it is working:

API> retrieve,c,dm_document where r_object_id='093f245a801a8f56'
...
093f245a801a8f56
API> getfile,c,l,/tmp/workspace/TestRenditions.docx
...
/tmp/workspace/TestRenditions.docx

 

Ok so now the two files are stored locally. I retrieved the content of the file TestRenditions.docx in order to be able to regenerate the renditions, you will see how it works later. So let’s check how many renditions this document currently has:

API> ?,c,select r_object_id, full_format, parent_id, content_size, full_content_size, set_time, set_file from dmr_content where any parent_id='093f245a801a8f56'
r_object_id       full_format         parent_id         content_size  full_content_size       set_time                   set_file                                                                              
----------------  ------------------  ----------------  ------------  ----------------------  -------------------------  ---------------------------------------------------------------------------------------
063f245a801d95cc  jpeg_lres           093f245a801a8f56         60467                   60467  7/4/2016 13:22:39          C:\Users\SYS_AD~1\AppData\Local\Temp\batchFile353776780929575961.tar
063f245a801d95cd  jpeg_lres           093f245a801a8f56        138862                  138862  7/4/2016 13:22:39          C:\Users\SYS_AD~1\AppData\Local\Temp\batchFile353776780929575961.tar
063f245a801d95ce  jpeg_lres           093f245a801a8f56         29596                   29596  7/4/2016 13:22:39          C:\Users\SYS_AD~1\AppData\Local\Temp\batchFile353776780929575961.tar
.....
063f245a8024b99e  jpeg_story          093f245a801a8f56          3392                    3392  7/4/2016 13:22:39          C:\Users\SYS_AD~1\AppData\Local\Temp\batchFile7193259325098763580.tar
063f245a8024b99f  jpeg_story          093f245a801a8f56          4718                    4718  7/4/2016 13:22:39          C:\Users\SYS_AD~1\AppData\Local\Temp\batchFile7193259325098763580.tar
063f245a8024b9a0  jpeg_story          093f245a801a8f56          1567                    1567  7/4/2016 13:22:39          C:\Users\SYS_AD~1\AppData\Local\Temp\batchFile7193259325098763580.tar
.....
063f245a8024b622  msw12               093f245a801a8f56         90535                   90535  7/4/2016 13:22:30          /app/weblogic/tmp/DOCBASE/msD2-01/DefaultFileRenamePolicy.rename8572010254156028259.docx
(52 rows affected)

API> exit
Bye

 

As explained previously in this blog, the document “TestRenditions.docx” (093f245a801a8f56) has 25 pages and therefore there are 25 (jpeg_lres) + 25 (jpeg_story) + 1 (pdf) + 1 (real document) = 52 dmr_content items attached to it and therefore 51 renditions. Now let’s see the content of the configuration file and what to do to change the number of renditions we need:

[dmadmin@content_server_01 workspace]$ pwd
/tmp/workspace
[dmadmin@content_server_01 workspace]$ cat storyboard_pdfstoryboard.xml
<PDFSTORYBOARD_MP_PROPERTIES>
    <FORMAT name="JPEG">
        <PROP name="Format" type="string">JPEG</PROP>
        <PROP name="Width" type="unsigned long" token="doc_token_width">200</PROP>
        <PROP name="Height" type="unsigned long" token="doc_token_height">200</PROP>
        <PROP name="Dpi" type="unsigned long" token="doc_token_dpi">72</PROP>
        <PROP name="KeepRatio" type="boolean">true</PROP>
        <PROP name="Password" type="string">your_password_be_here</PROP>
        <PROP name="Max Pages" type="unsigned long" token="doc_token_maxPages">-1</PROP>
        <PROP name="Frames Requested" type="unsigned long" token="doc_token_frames_requested">-1</PROP>
    </FORMAT>
</PDFSTORYBOARD_MP_PROPERTIES>
[dmadmin@content_server_01 workspace]$ 
[dmadmin@content_server_01 workspace]$ sed -i 's/doc_token_maxPages">-1</doc_token_maxPages">1</' storyboard_pdfstoryboard.xml
[dmadmin@content_server_01 workspace]$ 
[dmadmin@content_server_01 workspace]$ cat storyboard_pdfstoryboard.xml
<PDFSTORYBOARD_MP_PROPERTIES>
    <FORMAT name="JPEG">
        <PROP name="Format" type="string">JPEG</PROP>
        <PROP name="Width" type="unsigned long" token="doc_token_width">200</PROP>
        <PROP name="Height" type="unsigned long" token="doc_token_height">200</PROP>
        <PROP name="Dpi" type="unsigned long" token="doc_token_dpi">72</PROP>
        <PROP name="KeepRatio" type="boolean">true</PROP>
        <PROP name="Password" type="string">your_password_be_here</PROP>
        <PROP name="Max Pages" type="unsigned long" token="doc_token_maxPages">1</PROP>
        <PROP name="Frames Requested" type="unsigned long" token="doc_token_frames_requested">-1</PROP>
    </FORMAT>
</PDFSTORYBOARD_MP_PROPERTIES>

 

As you can see above, I just changed the value assigned to the “doc_token_maxPages” from -1 (unlimited) to 1 (1 page) and that should be it! To apply this change, we need to check in the storyboard_pdfstoryboard.xml file:

[dmadmin@content_server_01 workspace]$ iapi DOCBASE -Udmadmin -Pxxx


        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2015
        All rights reserved.
        Client Library Release 7.2.0050.0084


Connecting to Server using docbase DOCBASE
[DM_SESSION_I_SESSION_START]info:  "Session 013f245a80217406 started for user dmadmin."


Connected to Documentum Server running Release 7.2.0050.0214  Linux64.Oracle
Session id is s0
API> retrieve,c,dm_document where folder('/System/Media Server/Command Line Files') and object_name = 'storyboard_pdfstoryboard.xml'
...
093f245a801c9075
API> setfile,c,l,/tmp/workspace/storyboard_pdfstoryboard.xml
...
OK
API> checkin,c,l
...
093f245a8027254e
API> flushcache,c
...
OK

 

Once this is done, we can remove all current renditions of this document (we saw above pdf, jpeg_lres and jpeg_story renditions) to only let the docx/msw12 file:

API> retrieve,c,dm_document where r_object_id='093f245a801a8f56'
...
093f245a801a8f56
API> removerendition,c,l,pdf
...
OK
API> save,c,l
...
OK
API> flushcache,c
...
OK
API> removerendition,c,l,jpeg_lres
...
OK
API> save,c,l
...
OK
API> flushcache,c
...
OK
API> removerendition,c,l,jpeg_story
...
OK
API> save,c,l
...
OK
API> flushcache,c
...
OK
API> ?,c,select r_object_id, full_format, parent_id, content_size, full_content_size, set_time, set_file from dmr_content where any parent_id='093f245a801a8f56'
r_object_id       full_format         parent_id         content_size  full_content_size       set_time                   set_file                                                                              
----------------  ------------------  ----------------  ------------  ----------------------  -------------------------  ---------------------------------------------------------------------------------------
063f245a8024b622  msw12               093f245a801a8f56         90535                   90535  7/4/2016 13:22:30          /app/weblogic/tmp/DOMAIN/msD2-01/DefaultFileRenamePolicy.rename8572010254156028259.docx
(1 row affected)

 

The last step is to request the recreation of these renditions (using a setfile), wait 20 seconds or so and then check how much renditions have been recreated:

API> retrieve,c,dm_document where r_object_id='093f245a801a8f56'
...
093f245a801a8f56
API> setfile,c,l,/tmp/workspace/TestRenditions.docx
...
OK
API> save,c,l
...
OK
API> ?,c,select r_object_id, full_format, parent_id, content_size, full_content_size, set_time, set_file from dmr_content where any parent_id='093f245a801a8f56';
r_object_id       full_format         parent_id         content_size  full_content_size       set_time                   set_file                                                                              
----------------  ------------------  ----------------  ------------  ----------------------  -------------------------  ---------------------------------------------------------------------------------------
063f245a8024b622  msw12               093f245a801a8f56         90535                   90535  11/13/2016 12:49:48        /tmp/workspace/TestRenditions.docx
063f245a8024b957  jpeg_lres           093f245a801a8f56         60467                   60467  11/13/2016 12:49:56        C:\Users\SYS_AD~1\AppData\Local\Temp\batchFile1116397023537525059.tar
063f245a8024b958  jpeg_story          093f245a801a8f56          3392                    3392  11/13/2016 12:49:57        C:\Users\SYS_AD~1\AppData\Local\Temp\batchFile6054850753334521340.tar
(3 rows affected)

 

As you can see, there is now only one jpeg rendition per format and that’s for the first page only so that’s a success! If you want to keep only X jpeg renditions per document, now you know how to do it :)

Leave a Reply

Morgan Patou
Morgan Patou

Senior Consultant