Infrastructure at your Service

Nicolas Penot

Understand Oracle Text at a glance

What is Oracle Text?

Oracle Text provides indexing, word and theme searching, and viewing capabilities for text in query applications and document classification applications.

Oracle text activation for a user

create user ORATXT identified by oratxt ;
grant ctxapp to ORATXT ;
grant execute on ctxsys.ctx_cls to ORATXT ;
grant execute on ctxsys.ctx_ddl to ORATXT ;
grant execute on ctxsys.ctx_doc to ORATXT ;
grant execute on ctxsys.ctx_output to ORATXT ;
grant execute on ctxsys.ctx_query to ORATXT ;
grant execute on ctxsys.ctx_report to ORATXT ;
grant execute on ctxsys.ctx_thes to ORATXT ;
grant execute on ctxsys.ctx_ulexer to ORATXT ;

Oracle Text configuration and usage

To design an Oracle Text application, first determine the type of queries you expect to run. This enables you to choose the most suitable index for the task. There are 4 use cases with Oracle Text:

  1. Document Collection Applications
    • The collection is typically static with no significant change in content after the initial indexing run. Documents can be of any size and of different formats, such as HTML, PDF, or Microsoft Word. These documents are stored in a document table. Searching is enabled by first indexing the document collection.
    • Queries usually consist of words or phrases. Application users can specify logical combinations of words and phrases using operators such as OR and AND. Other query operations can be used to improve the search results, such as stemming, proximity searching, and wildcarding.
    • An important factor for this type of application is retrieving documents relevant to a query while retrieving as few non-relevant documents as possible. The most relevant documents must be ranked high in the result list.
    • The queries for this type of application are best served with a CONTEXT index on your document table. To query this index, the application uses the SQL CONTAINS operator in the WHERE clause of a SELECT statement.
    • Example of searching
    • SQL> select score(1), doc_id, html_content from docs where contains(html_content, 'dbi', 1) > 0;
       
      SCORE(1) ID HTML_CONTENT
      ---------- ---------- -----------------------------------------------------------
      4 1 <HTML>dbi services provide various IT services</HTML>
      4 9 <HTML>You can become expert with dbi services</HTML>
      4 3 <HTML>The compaany dbi services is in Switzerland.</HTML>

  2. Catalog Information Applications
    • The stored catalog information consists of text information, such as book titles, and related structured information, such as price. The information is usually updated regularly to keep the online catalog up to date with the inventory.
    • Queries are usually a combination of a text component and a structured component. Results are almost always sorted by a structured component, such as date or price. Good response time is always an important factor with this type of query application.
    • Catalog applications are best served by a CTXCAT index. Query this index with the CATSEARCH operator in the WHERE clause of a SELECT statement.
    • Example of searching
    • SQL> select product, price from auction where catsearch(title, 'IT', 'order by price')> 0;
       
      PRODUCT PRICE
      ----------------------------------- ----------
      IT Advice 1 hour 499
      Course IT management 3999
      License IT monitoring 199
      IT desk 810

  3. Document Classification Applications
    • In a document classification application, an incoming stream or a set of documents is compared to a pre-defined set of rules. When a document matches one or more rules, the application performs some action. For example, assume there is an incoming stream of news articles. You can define a rule to represent the category of Finance. The rule is essentially one or more queries that select document about the subject of Finance. The rule might have the form ‘stocks or bonds or earnings’.
    • When a document arrives about a Wall Street earnings forecast and satisfies the rules for this category, the application takes an action, such as tagging the document as Finance or e-mailing one or more users.
    • To create a document classification application, create a table of rules and then create a CTXRULE index. To classify an incoming stream of text, use the MATCHES operator in the WHERE clause of a SELECT statement. See Figure 1-5 for the general flow of a classification application.
    • SQL> select category_id, category_name
      from categories
      where matches(blog_string, 'Dbi services add value to your IT infrastructure by providing experts in different technologies to cover all your needs.');
       
      QUERY_ID QUERY_STRING
      ---------- -----------------------------------
      9 Expertise
      2 Advertisement
      6 IT Services

  4. XML Search Applications
    • An XML search application performs searches over XML documents. A regular document search usually searches across a set of documents to return documents that satisfy a text predicate; an XML search often uses the structure of the XML document to restrict the search.
      Typically, only that part of the document that satisfies the search is returned. For example, instead of finding all purchase orders that contain the word electric, the user might need only purchase orders in which the comment field contains electric.

In conclusion, there is various uses cases for which Oracle Text will help you with text indexation. Before implementing, verify which will best suit your need. Also, it may be interesting to compare with an external text indexer like Solr which is also able to index your database via a JDBC driver.

I hope it may help and please do not hesitate to contact us if you have any questions or require further information.

Leave a Reply

Nicolas Penot
Nicolas Penot