In a previous blog, we described how to install Hitachi Content Intelligence the solution of Hitachi Ventara for data indexing and search. In this blog post, we will see how we can use Hitachi Content Intelligence to perform the basic search on personal information (PII).
HCI allows you to connect to multiple data source using default data connectors. The first step is to create a data connection. By default, multiples data connectors are available:
For our example, we will simply use the Local File System as the data repository. Note that, the directory must be within the HCI install directory
Below the data connection configuration for our PII demo.
Click on Test after adding the information and click on Create.
A new data connection will appear in your dashboard.
After creating the data connection, will build a processing pipeline for our PII example
Click on Processing Pipelines > Create a Pipeline. Enter a name for your pipeline (optionally a description) and click on Create.
Click on Add Stages, and create your desired pipeline. For PII search we will use the following pipeline.
After building your pipeline, you can test it by clicking on the Test Pipeline button at the top right of your page.
We should now, create an index collection to specify how you want to index your data set.
First, click on Create Index inside the Index Collections button. Create an HCI Index and use the schemaless option.
Then you should create your content classes to extract your desired information from your data set. For our PII example, we will create 3 content classes for American Express and Visa credit card and for Security Social Number.
For America Express credit card, your should add the following pattern.
Pattern for Visa credit card.
Pattern for Social Security Number.
Start your workflow
When all steps are completed you can start your workflow and wait until it finish.
Use the HCI Search application to visualize the results.
Select your index name in the search field, and naviguate through the results.
You can also display the results in charts and graphics