Infrastructure at your Service

David Diab

Documentum – FT – XQuery is very useful

At a customer we encountered some issues with FT Search after documents migration. In fact, some documents were not indexed correctly or partially indexed, the need was to know which documents are impacted to resubmit the indexing again. The FTIntegrity tool is not able to give the needed result, so the only way was XQuery search.

However, this blog is not only to show you how to solve this issue, my goal here is to share with you what very useful XQuery was to solve my issue, and what I learned from my first XQuery use.

XQuery Introduction

First of all, what is XQuery?

  • XQuery is a language for finding and extracting elements and attributes from XML documents
  • XQuery for XML is like SQL for databases
  • XQuery is built on XPath expressions

XQuery Expressions (I would say usual expressions) :

  • For : selects a sequence of nodes
  • Let : binds a sequence to a variable
  • Where : filters the nodes
  • Order by : sorts the nodes
  • Return : what to return (gets evaluated once for every node)

As an example, the below books.xml file:

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

	<book category="CATEGORY 1">
		<title lang="en">PRODUCT 2</title>
		<author>AUTHOR 1</author>
		<year>2017</year>
		<price>100</price>
	</book>
	
	<book category="CATEGORY 2">
		<title lang="en">PRODUCT 1</title>
		<author>AUTHOR 1</author>
		<author>AUTHOR 2</author>
		<year>2018</year>
		<price>200</price>
	</book>
	
	<book category="CATEGORY 2">
		<title lang="en">PRODUCT 3</title>
		<author>AUTHOR 3</author>
		<year>2015</year>
		<price>20</price>
	</book>

</bookstore> 

Example 1:
Query :

for $x in doc("books.xml")/bookstore/book
where $x/price>30
return $x/title 

Result:

<title lang="en">PRODUCT 2</title>
<title lang="en">PRODUCT 1</title>

Example 2:
Query :

 
    { for $x in doc("books.xml")/bookstore/book/title order by $x return
  • {data($x)}
  • }

Result:

<ul>
<li>PRODUCT 1</li>
<li>PRODUCT 2</li>
<li>PRODUCT 3</li>
</ul> 

How XQuery solved my issue?

Now, let’s come back to the Documentum FullText world, in the Documentum xPlore Administrator under “Diagnostic and Utilities” -> “Test Search”, you can make search by keyword, or execute a XQuery:

The below query means:

let $j:= for $i score $s in /dmftdoc[. ftcontains 'TESTFT' with stemming using stop words default] order by $s descending 
return <b> {$i/dmftmetadata//r_object_id}  { $i/dmftmetadata//object_name } { $i/dmftmetadata//r_modifier } return subsequence($j,1,200) </b>
  • $s correspond to all dmftdoc entries containing “TESTFT” word, in a descending order.
  • $i correspond to each line in $s.
  • For each $i return the r_object_id, object_name, and r_modifier.
  • The returned data are put in $j.
  • subsequence is a XQuery function, where:
    1. – $j : is a sequence, and can contain 0 or more items.
      – index of items from which sub-sequence is to be created. Index starts from 1 here.
      – length of subsequence, here 200.

From this default query you can make yours, but how to know the XML structure?
In fact, I make any search to get any result, then click on any line :

The dmftdoc for this document will appear and you can take it as a base to build your query, for example:

E.g. You can select only hidden documents (see the below query).

To solve my issue and select only documents partially indexed, I executed the below query:

let $j:= for $i score $s in /dmftdoc[.] order by $s descending 
where not(exists($i/dmftdsearchinternals/dmftsummarytokens_0))
return <d> {$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name } { $i/dmftmetadata//a_content_type } </d>
return subsequence($j,1,1500)

The condition here is not anymore the content of the document, I am looking here for all documents not having dmftsummarytokens_0, if so, the r_object_id, object_name, and a_content_type are listed. Moreover, I limited the display to 1500 documents to be handled easily after.

Other useful XQueries:
– List documents indexed as hidden:

let $j:= for $i score $s in /dmftdoc[.] order by $s descending 
where (($i/dmftmetadata//a_is_hidden)="true")
return <d> {$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name } { $i/dmftmetadata//a_content_type } </d>
return subsequence($j,1,150)

– List documents having content size equal to zero:

let $j:= for $i score $s in /dmftdoc[.] order by $s descending 
where (($i/dmftmetadata//r_content_size)=0)
return <d> {$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name } { $i/dmftmetadata//a_content_type } </d>
return subsequence($j,1,150)

XQuery search helped me to solve many issues, and you? Did you already experienced it in this context?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

David Diab
David Diab

Senior Consultant