Infrastructure at your Service

Morgan Patou

Documentum – Checking warnings&errors from an xPlore full re-index

When working with xPlore as a Full Text Server (indexing), there are a few ways to perform a full re-index. You can potentially do it from the IndexAgent UI, from the Dsearch UI, from the file system (with an ids.txt file for example, it is usually for a “small” number of r_object_id so that’s probably not an ideal way) or from the docbase (mass-queue, it’s not really a good way to do it either). Performing a full re-index from the xPlore Server directly will be faster because you remove a few layers where the Content Server asks for an index (the index queues) and expect an answer/result, that’s why I will in this blog only talk about the full re-index performed from the xPlore Server directly and below I will use a full re-index from the IndexAgent UI. For each of these cases, there might be a few warnings or errors along the re-index, some of which might be normal (password protected file), some others might not (timeout because xPlore heavily loaded).

The whole purpose of this blog is to show you how you can check these warnings/errors because there is no information about them directly displayed on the UI, you need to go find that information manually. These warnings/errors aren’t shown in the index queues since they weren’t triggered from the docbase but from the xPlore Server directly.

So first of all, you need to trigger a re-index using the IndexAgent:

  • Open the IndexAgent UI (https://<hostname>:<ia_port>/IndexAgent)
  • Login with the installation owner’s account
  • Stop the IndexAgent if it is currently running in Normal mode and then launch a re-index operation

It should look like that (for xPlore 1.6):
IA1

On the above screenshot, the green represents the success count and the blue is for the filtered count. Once completed and as shown above, you might have a few warnings/errors but you don’t have any information about them as I mentioned previously. To narrow down and facilitate the check of the warnings/errors, you need to know (approximately) the start and end time of the re-index operation: 2018-06-12 11:55 UTC to 2018-06-12 12:05 UTC for the above example. From that point, the analysis of the warnings/errors can be done in two main ways:

 

1. Using the Dsearch Admin

I will start with the way that most of you probably already know: use the Dsearch reports to see the errors/warnings. That’s not the fastest way, clearly not the funniest way either but it is an easy way for sure…

Accessing the reports from the Dsearch Admin:

  • Open the Dsearch Admin UI (https://<hostname>:<ds_port>/dsearchadmin)
  • Login with the admin account (or any other valid account with xPlore 1.6+)
  • Navigate to: Home > Diagnostic and Utilities > Reports
  • Select the “Document Processing Error Summary” report and set the following:
    • Start from: 2018-06-12 11:55
    • To: 2018-06-12 12:05
    • Domain name (optional): leave empty if you only have one IndexAgent, otherwise you can specify the domain name (usually the same name as the docbase)
  • Click on Run to get the report

At this point, you will have a report with the number of warnings/errors per type, meaning that you do not have any information about the documents yet, you only know the number of errors for each of the pre-defined error types (=error code). For the above example, I had 8 warnings once the re-index was completed and I could see them all (seven warnings for ‘777’ and one warning for ‘770’):
IA2

Base on the information from this “Document Processing Error Summary” report, you can go deeper and find the details about the documents but you can only do it for one type, one Error Code, at a time. Therefore, you will have to loop on all Error Codes returned:

  • For each Error Code:
    • Select the “Document Processing Error Detail” report and set the following:
      • Start from: 2018-06-12 11:55
      • To: 2018-06-12 12:05
      • Domain name (optional): leave empty if you only have 1 IndexAgent, otherwise you can specify the domain name (usually the same name as the docbase)
      • Processing Error Code: Select the Error Code you want to see (either 777 or 770 in my case)
      • Number of Results to Display: Set here the number of items you want to display, 10, 20, …
    • Click on Run to get the report

And there you finally have the details about the warnings/errors documents that weren’t indexed properly because of the Error Code you choose. In my case, I selected 770 so I have only 1 document:
IA3

You can export this list to excel if you want, to do some processing on these items for example but you will need to do it for all Error Codes and then merge them or whatever.

 

2. Using the logs

In the above example, I used the IndexAgent to perform the re-index so I will use the IndexAgent logs to find what happened exactly. This section is really the main purpose of this blog because I assume that most people are using the Dsearch Admin reports already but probably not the logs! If you want to script the check of warnings/errors after a re-index of just if you want to play and have fun while doing your job, then this is what you need ;).

So let’s start simple: listing all errors and warnings and keeping only the lines that contain an r_object_id.

[xplore@full_text_server_01 ~]$ cd $JBOSS_HOME/server/DctmServer_Indexagent_DocBase1/logs/
[xplore@full_text_server_01 logs]$
[xplore@full_text_server_01 logs]$ echo; egrep -i "err|warn" Indexagent_*.log* \
                                   | egrep --color "[ (<][0-9a-z]{16}[>) ]"

Indexagent_DocBase1.log:2018-06-12 11:55:26,456 WARN PrepWorkItem [full_text_server_01_9200_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGNT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f12345007f40e message: DOCUMENT_WARNING CPS Warning [Corrupt file].
Indexagent_DocBase1.log:2018-06-12 12:01:00,752 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa97 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
Indexagent_DocBase1.log:2018-06-12 12:01:00,752 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa98 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
Indexagent_DocBase1.log:2018-06-12 12:01:00,754 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aa9f6 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
Indexagent_DocBase1.log:2018-06-12 12:01:00,754 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9a message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
Indexagent_DocBase1.log:2018-06-12 12:01:01,038 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa99 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
Indexagent_DocBase1.log:2018-06-12 12:01:01,038 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9b message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
Indexagent_DocBase1.log:2018-06-12 12:01:01,038 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9d message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
Indexagent_DocBase1.log:2018-06-12 12:01:27,518 INFO ReindexBatch [Worker:Finalization Action:#6][DM_INDEX_AGENT_REINDEX_BATCH] Updating queue item 1b0f1234501327f0 with message= Incomplete btch. From a total of 45, 44 done, 0 filtered, 0 errors, and 8 warnings.
[xplore@full_text_server_01 logs]$

 

As you can see above, there is also one queue item (1b0f1234501327f0) listed because I kept everything that is 16 char long with 0-9 or a-z. If you want, you can rather select only r_object_id starting with 09 to have all dm_documents (using this: “[ (<]09[0-9a-z]{14}[>) ]” ) or you can just remove the r_object_id starting with 1b which are the queue items.

In the above example, all the results are in the timeframe I expected them to be but it is possible that there are older or newer warnings/errors so you might want to apply another filter with the date. Since I want everything from 11:55 to 12:05 on the 12-Jun-2018, this is how I can do it (and removing the log file name too) using a time regex:

[xplore@full_text_server_01 logs]$ time_regex="2018-06-12 11:5[5-9]|2018-06-12 12:0[0-5]"
[xplore@full_text_server_01 logs]$ echo; egrep -i "err|warn" Indexagent_*.log* \
                                   | sed 's,^[^:]*:,,' \
                                   | egrep "${time_regex}" \
                                   | egrep --color "[ (<][0-9a-z]{16}[>) ]"

2018-06-12 11:55:26,456 WARN PrepWorkItem [full_text_server_01_9200_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGNT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f12345007f40e message: DOCUMENT_WARNING CPS Warning [Corrupt file].
2018-06-12 12:01:00,752 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa97 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
2018-06-12 12:01:00,752 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa98 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
2018-06-12 12:01:00,754 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aa9f6 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
2018-06-12 12:01:00,754 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9a message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
2018-06-12 12:01:01,038 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa99 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
2018-06-12 12:01:01,038 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9b message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
2018-06-12 12:01:01,038 WARN PrepWorkItem [full_text_server_01_9260_IndexAgent-full_text_server_01.dbi-services.com-1-full_text_server_01.dbi-services.com-StatusUpdater][DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9d message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
2018-06-12 12:01:27,518 INFO ReindexBatch [Worker:Finalization Action:#6][DM_INDEX_AGENT_REINDEX_BATCH] Updating queue item 1b0f1234501327f0 with message= Incomplete btch. From a total of 45, 44 done, 0 filtered, 0 errors, and 8 warnings.
[xplore@full_text_server_01 logs]$

 

Listing only the messages for each of these warnings/errors:

[xplore@full_text_server_01 logs]$ echo; egrep -i "err|warn" Indexagent_*.log* \
                                   | sed 's,^[^:]*:,,' \
                                   | egrep "${time_regex}" \
                                   | egrep "[ (<][0-9a-z]{16}[>) ]" \
                                   | sed 's,^[^]]*],,' \
                                   | sort -u

[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f12345007f40e message: DOCUMENT_WARNING CPS Warning [Corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aa9f6 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa97 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa98 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa99 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9a message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9b message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9d message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_REINDEX_BATCH] Updating queue item 1b0f1234501327f0 with message= Incomplete batch. From a total of 45, 44 done, 0 filtered, 0 errors, and 1 warnings.
[xplore@full_text_server_01 logs]$

 

Listing only the r_object_id (to resubmit them via the ids.txt for example):

[xplore@full_text_server_01 logs]$ echo; egrep -i "err|warn" Indexagent_*.log* \
                                   | sed 's,^[^:]*:,,' \
                                   | egrep "${time_regex}" \
                                   | egrep "[ (<][0-9a-z]{16}[>) ]" \
                                   | sed 's,.*[ (<]\([0-9a-z]\{16\}\)[>) ].*,\1,' \
                                   | sort -u \
                                   | grep -v "^1b"

090f12345007f40e
090f1234500aa9f6
090f1234500aaa97
090f1234500aaa98
090f1234500aaa99
090f1234500aaa9a
090f1234500aaa9b
090f1234500aaa9d
[xplore@full_text_server_01 logs]$

 

If you want to generate the iapi commands to resubmit them all:

[xplore@full_text_server_01 logs]$ echo; egrep -i "err|warn" Indexagent_*.log* \
                                   | sed 's,^[^:]*:,,' \
                                   | egrep "${time_regex}" \
                                   | egrep "[ (<][0-9a-z]{16}[>) ]" \
                                   | sed 's,.*[ (<]\([0-9a-z]\{16\}\)[>) ].*,\1,' \
                                   | sort -u \
                                   | grep -v "^1b"
                                   | sed 's/.*/queue,c,&,dm_fulltext_index_user/'

queue,c,090f12345007f40e,dm_fulltext_index_user
queue,c,090f1234500aa9f6,dm_fulltext_index_user
queue,c,090f1234500aaa97,dm_fulltext_index_user
queue,c,090f1234500aaa98,dm_fulltext_index_user
queue,c,090f1234500aaa99,dm_fulltext_index_user
queue,c,090f1234500aaa9a,dm_fulltext_index_user
queue,c,090f1234500aaa9b,dm_fulltext_index_user
queue,c,090f1234500aaa9d,dm_fulltext_index_user
[xplore@full_text_server_01 logs]$

 

Finally, to group the warnings/errors per types:

[xplore@full_text_server_01 logs]$ echo; IFS=$'\n'; \
                                   for type in `egrep -i "err|warn" Indexagent_*.log* \
                                     | sed 's,^[^:]*:,,' \
                                     | egrep "${time_regex}" \
                                     | egrep "[ (<][0-9a-z]{16}[>) ]" \
                                     | sed 's,^[^]]*],,' \
                                     | sort -u \
                                     | sed 's,.*\(\[[^\[]*\]\).*,\1,' \
                                     | sort -u`;
                                   do
                                     echo "  --  Listing warnings/errors with the following messages: ${type}";
                                     egrep -i "err|warn" Indexagent_*.log* \
                                       | sed 's,^[^:]*:,,' \
                                       | egrep "${time_regex}" \
                                       | egrep "[ (<][0-9a-z]{16}[>) ]" \
                                       | sed 's,^[^]]*],,' \
                                       | sort -u \
                                       | grep -F "${type}";
                                     echo;
                                   done

  --  Listing warnings/errors with the following messages: [Corrupt file]
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f12345007f40e message: DOCUMENT_WARNING CPS Warning [Corrupt file].

  --  Listing warnings/errors with the following messages: [DM_INDEX_AGENT_REINDEX_BATCH]
[DM_INDEX_AGENT_REINDEX_BATCH] Updating queue item 1b0f1234501327f0 with message= Incomplete batch. From a total of 45, 44 done, 0 filtered, 0 errors, and 1 warnings.

  --  Listing warnings/errors with the following messages: [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file]
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aa9f6 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa97 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa98 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa99 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9a message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9b message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9d message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].

[xplore@full_text_server_01 logs]$
[xplore@full_text_server_01 logs]$ # Or to shorten a little bit the loop command:
[xplore@full_text_server_01 logs]$
[xplore@full_text_server_01 logs]$ command='egrep -i "err|warn" Indexagent_*.log* | sed 's,^[^:]*:,,'
                                   | egrep "${time_regex}"
                                   | egrep "[ (<][0-9a-z]{16}[>) ]"
                                   | sed 's,^[^]]*],,'
                                   | sort -u'
[xplore@full_text_server_01 logs]$
[xplore@full_text_server_01 logs]$ echo; IFS=$'\n'; \
                                   for type in `eval ${command} \
                                     | sed 's,.*\(\[[^\[]*\]\).*,\1,' \
                                     | sort -u`;
                                   do
                                     echo "  --  Listing warnings/errors with the following messages: ${type}";
                                     eval ${command} \
                                       | grep -F "${type}";
                                     echo;
                                   done

  --  Listing warnings/errors with the following messages: [Corrupt file]
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f12345007f40e message: DOCUMENT_WARNING CPS Warning [Corrupt file].

  --  Listing warnings/errors with the following messages: [DM_INDEX_AGENT_REINDEX_BATCH]
[DM_INDEX_AGENT_REINDEX_BATCH] Updating queue item 1b0f1234501327f0 with message= Incomplete batch. From a total of 45, 44 done, 0 filtered, 0 errors, and 1 warnings.

  --  Listing warnings/errors with the following messages: [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file]
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aa9f6 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa97 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa98 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa99 message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9a message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9b message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].
[DM_INDEX_AGENT_RECEIVED_FT_CALLBACK_WARN] Received warn callback: id: 090f1234500aaa9d message: DOCUMENT_WARNING CPS Warning [MIME-Type (application/vnd.openxmlformats-officedocument.wordprocessingml.), Unknown file format or corrupt file].

[xplore@full_text_server_01 logs]$

 

So the above was related to a very simple example where a full reindex took only a few minutes because it is a very small repository. But what about a full reindex that takes days because there are several millions of documents? Well the truth is that checking the logs might actually surprise you because it is usually more accurate than checking the Dsearch Admin. Yes, I said more accurate!

 

3. Accuracy of the Dsearch Admin vs the Logs

Let’s take another example with a repository containing a few TB of documents. A full re-index took 2.5 days to complete and in the commands below, I will check the status of the indexing for the 1st day: from 2018-09-19 07:00:00 UTC to 2018-09-20 06:59:59 UTC. Here is what the Dsearch Admin is giving you:

IA4

So based on this, you would expect 1 230 + 63 + 51 = 1 344 warnings/errors. So what about the logs then? I included below the DM_INDEX_AGENT_REINDEX_BATCH which are the “1b” object_id (item_id) I was talking about earlier but these aren’t document indexing, they are just batches:

[xplore@full_text_server_01 logs]$ time_regex="2018-09-19 0[7-9]|2018-09-19 [1-2][0-9]|2018-09-20 0[0-6]"
[xplore@full_text_server_01 logs]$ command='egrep -i "err|warn" Indexagent_*.log* | sed 's,^[^:]*:,,'
                                   | egrep "${time_regex}"
                                   | egrep "[ (<][0-9a-z]{16}[>) ]"
                                   | sed 's,^[^]]*],,'
                                   | sort -u'
[xplore@full_text_server_01 logs]$
[xplore@full_text_server_01 logs]$ echo; IFS=$'\n'; \
                                   for type in `eval ${command} \
                                     | sed 's,.*\(\[[^\[]*\]\).*,\1,' \
                                     | sort -u`;
                                   do
                                     echo "  --  Number of warnings/errors with the following messages: ${type}";
                                     eval ${command} \
                                       | grep -F "${type}" \
                                       | wc -l;
                                     echo;
                                   done

  --  Number of warnings/errors with the following messages: [Corrupt file]
51

  --  Number of warnings/errors with the following messages: [DM_INDEX_AGENT_REINDEX_BATCH]
293

  --  Number of warnings/errors with the following messages: [DM_STORAGE_E_BAD_TICKET]
7

  --  Number of warnings/errors with the following messages: [Password-protected or encrypted file]
63

  --  Number of warnings/errors with the following messages: [Unknown error during text extraction]
5

  --  Number of warnings/errors with the following messages: [Unknown error during text extraction(native code: 18, native msg: unknown error)]
1

  --  Number of warnings/errors with the following messages: [Unknown error during text extraction(native code: 257, native msg: handle is invalid)]
1053

  --  Number of warnings/errors with the following messages: [Unknown error during text extraction(native code: 30, native msg: out of memory)]
14

  --  Number of warnings/errors with the following messages: [Unknown error during text extraction(native code: 65534, native msg: unknown error)]
157

[xplore@full_text_server_01 logs]$

 

As you can see above, there is more granularity regarding the types of errors from the logs. Here are some key points in the comparison between the logs and the Dsearch Admin:

  1. In the Dsearch Admin, all messages that start with “Unknown error during text extraction” are considered as a single error type (N° 1023). Therefore from the logs, you can addition all of them: 5 + 1 + 1 053 + 14 + 157 = 1 230 to find the same number that was mentioned in the Dsearch Admin. You cannot separate them on the Dsearch Admin on the Error Summary report, it will only be on the Error Details report that you will see the full message and you can then separate them, kind of…
  2. You can find properly the same amount of “Password-protected or encrypted file” (63) as well as “Corrupt file” (51) from the logs and from the Dsearch Admin so no differences here
  3. You can see 7 “DM_STORAGE_E_BAD_TICKET” warnings/errors from the logs but none from the Dsearch Admin… Why is that? That’s because the Dsearch Admin do not have any Error Code for that so these errors aren’t shown!

So like I was saying at the beginning of this blog, using the Dsearch Admin is very easy but that’s not fun and you might actually miss a few information while checking the logs is funny and you are sure that you won’t miss anything (these 7 DM_STORAGE_E_BAD_TICKET errors for example)!

 

You could just as easily do the same thing in perl or using awk, that’s just a question of preferences… Anyway, you understood it, working with the logs allows you to do pretty much what you want but you will need some linux/scripting knowledge obviously while working with the Dsearch Admin is simple and easy but you will have to work with what OTX gives you and with the restrictions that it has.

 

 

Leave a Reply

Morgan Patou
Morgan Patou

Senior Consultant