As already explained in this blog, we had to remove the PDF renditions for a customer to only keep the jpeg renditions on the ADTS side because the PDF was generated by another third-party tool. If you take a look at the dmr_content items attached to a specific document (parent_id), you will understand that there is by default only one PDF content while there might be a lot more jpeg renditions. Why? The answer to that is pretty simple, the content of the PDF rendition contains all the pages of the document while the jpeg renditions are only for a single page… Therefore if an ADTS is processing a document of 25 pages, then it will create 25 jpeg renditions. Actually that’s even more than that… Indeed by default the ADTS generates two types/formats of jpeg renditions for each page which double the total number of renditions. Here are the two formats available by default:
- jpeg_lres: (Low Resolutions) that’s actually a real size preview in low resolution of the page
- jpeg_story: (StoryBoards) that’s a reduced size preview of the page, quite hard to read what’s written…
jpeg_lres is the format used by the Thumbnail Server and also by D2 for the preview widget. On the other hand, jpeg_story isn’t used at all in D2 4.5. According to EMC, it *might* has been used for D2 4.1 and previous versions but they aren’t sure about it… 😉
Now that this has been said, let’s go back to the title of this blog. Because of this behavior of the ADTS, it might happen that one day you will see hundreds or even thousands of dmr_content items deleted. You might think that there is something wrong, that the cleanup jobs deleted too many objects or something like that… So if this happens to you, please take a look at the format of these dmr_content! Several months ago, the cleanup jobs were inactive for a few weeks because of a bug and when we reactivated them, this happened to us and we finally found out that 95% of these items where only jpeg renditions and that this was actually the expected behavior!
After that, we started thinking about how we should handle these jpeg renditions for really big documents? Because having the preview of the documents available in D2 is great but then is it really needed? Generating a preview of the first page or of the 10 first pages might makes sense but would it makes sense to generate a preview for each page of a document bigger than 10 pages? 100 pages? 1 000 pages? These previews are used in D2 and you need to move from one page to the other one starting with the page 1. If you absolutely want to see the preview of the page *364* in D2, then you will need 20 minutes to reach that page in the first place… I think that downloading the document is a little bit faster ;).
So is it possible to only generate previews for a few pages and setup a maximum number of jpeg renditions per document? The short answer to that is: yes! And that’s the purpose of this blog, wonderful, isn’t it?!
First check out the configuration file that will need to be updated:
[dmadmin@content_server_01 workspace]$ iapi DOCBASE -Udmadmin -Pxxx EMC Documentum iapi - Interactive API interface (c) Copyright EMC Corp., 1992 - 2015 All rights reserved. Client Library Release 7.2.0050.0084 Connecting to Server using docbase DOCBASE [DM_SESSION_I_SESSION_START]info: "Session 013f245a802173d6 started for user dmadmin." Connected to Documentum Server running Release 7.2.0050.0214 Linux64.Oracle Session id is s0 API> retrieve,c,dm_document where folder('/System/Media Server/Command Line Files') and object_name = 'storyboard_pdfstoryboard.xml' ... 093f245a801c9075 API> checkout,c,l ... 093f245a801c9075 API> getfile,c,l,/tmp/workspace/storyboard_pdfstoryboard.xml ... /tmp/workspace/storyboard_pdfstoryboard.xml API> flushcache,c ... OK
Then retrieve a test document that will be used to see how it is working:
API> retrieve,c,dm_document where r_object_id='093f245a801a8f56' ... 093f245a801a8f56 API> getfile,c,l,/tmp/workspace/TestRenditions.docx ... /tmp/workspace/TestRenditions.docx
Ok so now the two files are stored locally. I retrieved the content of the file TestRenditions.docx in order to be able to regenerate the renditions, you will see how it works later. So let’s check how many renditions this document currently has:
API> ?,c,select r_object_id, full_format, parent_id, content_size, full_content_size, set_time, set_file from dmr_content where any parent_id='093f245a801a8f56' r_object_id full_format parent_id content_size full_content_size set_time set_file ---------------- ------------------ ---------------- ------------ ---------------------- ------------------------- --------------------------------------------------------------------------------------- 063f245a801d95cc jpeg_lres 093f245a801a8f56 60467 60467 7/4/2016 13:22:39 C:UsersSYS_AD~1AppDataLocalTempbatchFile353776780929575961.tar 063f245a801d95cd jpeg_lres 093f245a801a8f56 138862 138862 7/4/2016 13:22:39 C:UsersSYS_AD~1AppDataLocalTempbatchFile353776780929575961.tar 063f245a801d95ce jpeg_lres 093f245a801a8f56 29596 29596 7/4/2016 13:22:39 C:UsersSYS_AD~1AppDataLocalTempbatchFile353776780929575961.tar ..... 063f245a8024b99e jpeg_story 093f245a801a8f56 3392 3392 7/4/2016 13:22:39 C:UsersSYS_AD~1AppDataLocalTempbatchFile7193259325098763580.tar 063f245a8024b99f jpeg_story 093f245a801a8f56 4718 4718 7/4/2016 13:22:39 C:UsersSYS_AD~1AppDataLocalTempbatchFile7193259325098763580.tar 063f245a8024b9a0 jpeg_story 093f245a801a8f56 1567 1567 7/4/2016 13:22:39 C:UsersSYS_AD~1AppDataLocalTempbatchFile7193259325098763580.tar ..... 063f245a8024b622 msw12 093f245a801a8f56 90535 90535 7/4/2016 13:22:30 /app/weblogic/tmp/DOCBASE/msD2-01/DefaultFileRenamePolicy.rename8572010254156028259.docx (52 rows affected) API> exit Bye
As explained previously in this blog, the document “TestRenditions.docx” (093f245a801a8f56) has 25 pages and therefore there are 25 (jpeg_lres) + 25 (jpeg_story) + 1 (pdf) + 1 (real document) = 52 dmr_content items attached to it and therefore 51 renditions. Now let’s see the content of the configuration file and what to do to change the number of renditions we need:
[dmadmin@content_server_01 workspace]$ pwd /tmp/workspace [dmadmin@content_server_01 workspace]$ cat storyboard_pdfstoryboard.xml <PDFSTORYBOARD_MP_PROPERTIES> <FORMAT name="JPEG"> <PROP name="Format" type="string">JPEG</PROP> <PROP name="Width" type="unsigned long" token="doc_token_width">200</PROP> <PROP name="Height" type="unsigned long" token="doc_token_height">200</PROP> <PROP name="Dpi" type="unsigned long" token="doc_token_dpi">72</PROP> <PROP name="KeepRatio" type="boolean">true</PROP> <PROP name="Password" type="string">your_password_be_here</PROP> <PROP name="Max Pages" type="unsigned long" token="doc_token_maxPages">-1</PROP> <PROP name="Frames Requested" type="unsigned long" token="doc_token_frames_requested">-1</PROP> </FORMAT> </PDFSTORYBOARD_MP_PROPERTIES> [dmadmin@content_server_01 workspace]$ [dmadmin@content_server_01 workspace]$ sed -i 's/doc_token_maxPages">-1</doc_token_maxPages">1</' storyboard_pdfstoryboard.xml [dmadmin@content_server_01 workspace]$ [dmadmin@content_server_01 workspace]$ cat storyboard_pdfstoryboard.xml <PDFSTORYBOARD_MP_PROPERTIES> <FORMAT name="JPEG"> <PROP name="Format" type="string">JPEG</PROP> <PROP name="Width" type="unsigned long" token="doc_token_width">200</PROP> <PROP name="Height" type="unsigned long" token="doc_token_height">200</PROP> <PROP name="Dpi" type="unsigned long" token="doc_token_dpi">72</PROP> <PROP name="KeepRatio" type="boolean">true</PROP> <PROP name="Password" type="string">your_password_be_here</PROP> <PROP name="Max Pages" type="unsigned long" token="doc_token_maxPages">1</PROP> <PROP name="Frames Requested" type="unsigned long" token="doc_token_frames_requested">-1</PROP> </FORMAT> </PDFSTORYBOARD_MP_PROPERTIES>
As you can see above, I just changed the value assigned to the “doc_token_maxPages” from -1 (unlimited) to 1 (1 page) and that should be it! To apply this change, we need to check in the storyboard_pdfstoryboard.xml file:
[dmadmin@content_server_01 workspace]$ iapi DOCBASE -Udmadmin -Pxxx EMC Documentum iapi - Interactive API interface (c) Copyright EMC Corp., 1992 - 2015 All rights reserved. Client Library Release 7.2.0050.0084 Connecting to Server using docbase DOCBASE [DM_SESSION_I_SESSION_START]info: "Session 013f245a80217406 started for user dmadmin." Connected to Documentum Server running Release 7.2.0050.0214 Linux64.Oracle Session id is s0 API> retrieve,c,dm_document where folder('/System/Media Server/Command Line Files') and object_name = 'storyboard_pdfstoryboard.xml' ... 093f245a801c9075 API> setfile,c,l,/tmp/workspace/storyboard_pdfstoryboard.xml ... OK API> checkin,c,l ... 093f245a8027254e API> flushcache,c ... OK
Once this is done, we can remove all current renditions of this document (we saw above pdf, jpeg_lres and jpeg_story renditions) to only let the docx/msw12 file:
API> retrieve,c,dm_document where r_object_id='093f245a801a8f56' ... 093f245a801a8f56 API> removerendition,c,l,pdf ... OK API> save,c,l ... OK API> flushcache,c ... OK API> removerendition,c,l,jpeg_lres ... OK API> save,c,l ... OK API> flushcache,c ... OK API> removerendition,c,l,jpeg_story ... OK API> save,c,l ... OK API> flushcache,c ... OK API> ?,c,select r_object_id, full_format, parent_id, content_size, full_content_size, set_time, set_file from dmr_content where any parent_id='093f245a801a8f56' r_object_id full_format parent_id content_size full_content_size set_time set_file ---------------- ------------------ ---------------- ------------ ---------------------- ------------------------- --------------------------------------------------------------------------------------- 063f245a8024b622 msw12 093f245a801a8f56 90535 90535 7/4/2016 13:22:30 /app/weblogic/tmp/DOMAIN/msD2-01/DefaultFileRenamePolicy.rename8572010254156028259.docx (1 row affected)
The last step is to request the recreation of these renditions (using a setfile), wait 20 seconds or so and then check how much renditions have been recreated:
API> retrieve,c,dm_document where r_object_id='093f245a801a8f56' ... 093f245a801a8f56 API> setfile,c,l,/tmp/workspace/TestRenditions.docx ... OK API> save,c,l ... OK API> ?,c,select r_object_id, full_format, parent_id, content_size, full_content_size, set_time, set_file from dmr_content where any parent_id='093f245a801a8f56'; r_object_id full_format parent_id content_size full_content_size set_time set_file ---------------- ------------------ ---------------- ------------ ---------------------- ------------------------- --------------------------------------------------------------------------------------- 063f245a8024b622 msw12 093f245a801a8f56 90535 90535 11/13/2016 12:49:48 /tmp/workspace/TestRenditions.docx 063f245a8024b957 jpeg_lres 093f245a801a8f56 60467 60467 11/13/2016 12:49:56 C:UsersSYS_AD~1AppDataLocalTempbatchFile1116397023537525059.tar 063f245a8024b958 jpeg_story 093f245a801a8f56 3392 3392 11/13/2016 12:49:57 C:UsersSYS_AD~1AppDataLocalTempbatchFile6054850753334521340.tar (3 rows affected)
As you can see, there is now only one jpeg rendition per format and that’s for the first page only so that’s a success! If you want to keep only X jpeg renditions per document, now you know how to do it 🙂