Friday, 29 June 2012

             This article depicts how to implement Lucene.net in Asp.net web application. Here I am creating a sample application for creating Lucene index documents from SQL server database and to search and retrieve data from the index documents. And here I also mention how we can search with multiple terms with multiple lucene.net document fields.

What is Lucene.net?

Lucene.net is a high performance Information Retrieval (IR) library or a search engine library. Lucene.Net contains powerful APIs for creating full text indexes and implementing advanced and precise search technologies into programs. Lucene.net is not a ready to use application like a web search or a file search application, but it's a framework library.

Architecture of Lucene.Net

The diagram shows process of creating indexes, searching and retrieving data from lucene index documents.
 
Figure 1: Lucene architecture

Main parts of Lucene.Net

Store Directory - Directory is the place to store the index.

Analyzer – The Analyzer is responsible for breaking the text down into single words or terms.

IndexWriter – The IndexWriter takes on the responsibility of coordinating the Analyzer and throwing the results to the Directory for storage.

Document – The Document is what gets indexed by the IndexWriter. You can think of a Document as an entity that you want to retrieve.

Field – The Document contains a list of Fields that is used to describe the document. Every field has a name and a value. Each of the field’s values contains the text that you want to make searchable.

IndexSearcher – The IndexSearcher is doing the actual search.

Search Term – A Term is the most basic construct for searching. A Term consists of two parts, the name of a field you wish to search, and the value of the field.

Search Query - Using the term the Query works with the IndexSearcher to provide the results.

Hits – This represents a list of documents that were returned in the search. A Hits object can be iterated over, and is responsible for getting the documents from the search.

Follow the few steps to create the lucene.net indexes and make a search:

Step: 1 Creating a database.

Here I am creating a database named dbLucene and three tables in it they are Category, Designation and Person

Category
 
Figure 2: Category table

Designation
 
Figure 3: Designation table

Person
 
Figure 4: Person table

And insert few records to the tables which is need to create index documents.

Step: 2 Create a web application.

Create a web application with two pages Index.aspx and Search.aspx
The index page should contain one button which is for creating Lucene.Net index documents from database.
In search page add three controls - a textbox to enter search keyword, a button to fire the search function and a grid view to display the search result.

Step: 3 Add Lucene.Net reference

You can download the Lucene.Net dll form the bellow link


Add the reference…
 
Figure 5: Adding lucene reference to the project

And import the namespaces to the code 

using Directory = Lucene.Net.Store.Directory;
Using Version = Lucene.Net.Util.Version;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Index;
using Lucene.Net.Documents;
using Lucene.Net.Analysis;
using System.Diagnostics;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Store;

Step: 4 Creating lucene.net index documents

The following code demonstrates how to create the lucene index when you call CreatePersonsIndex method:

// The query fetch all person details
public DataSet GetPersons()
{
   String sqlQuery = @"SELECT dbo.Person.FirstName, dbo.Person.LastName, dbo.Designation.DesigName, dbo.Category.CategoryName, dbo.Person.Address
                    FROM dbo.Designation RIGHT OUTER JOIN
                      dbo.Person ON dbo.Designation.DesignationId = dbo.Person.DesignationId LEFT OUTER JOIN
                      dbo.Category ON dbo.Person.CategoryId = dbo.Category.CategoryId";
   return GetDataSet(sqlQuery);
}

// Returns the dataset
public DataSet GetDataSet(string sqlQuery)
{
    DataSet ds = new DataSet();
    SqlConnection sqlCon = new SqlConnection("Data Source=datasource;Database=dbLucene;User Id=user;Password=password");
    SqlCommand sqlCmd = new SqlCommand();
    sqlCmd.Connection = sqlCon;
    sqlCmd.CommandType = CommandType.Text;
    sqlCmd.CommandText = sqlQuery;
    SqlDataAdapter sqlAdap = new SqlDataAdapter(sqlCmd);
    sqlAdap.Fill(ds);
    return ds;
}

// Creates the lucene.net index with person details
public void CreatePersonsIndex(DataSet ds)
{
    //Specify the index file location where the indexes are to be stored
    string indexFileLocation = @"D:\Lucene.Net\Data\Persons";
    Lucene.Net.Store.Directory dir =                            Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation, true);
    IndexWriter indexWriter = new IndexWriter(dir, new StandardAnalyzer(), true);
    indexWriter.SetRAMBufferSizeMB(10.0);
    indexWriter.SetUseCompoundFile(false);
    indexWriter.SetMaxMergeDocs(10000);
    indexWriter.SetMergeFactor(100);

    if (ds.Tables[0] != null)
    {
      DataTable dt = ds.Tables[0];
      if (dt.Rows.Count > 0)
      {
        foreach (DataRow dr in dt.Rows)
        {
            //Create the Document object
            Document doc = new Document();
            foreach (DataColumn dc in dt.Columns)
            {
              //Populate the document with the column name and value from our query
              doc.Add(new Field(dc.ColumnName, dr[dc.ColumnName].ToString(), Field.Store.YES, Field.Index.TOKENIZED));
            }
            // Write the Document to the catalog
            indexWriter.AddDocument(doc);
        }
      }
    }
    // Close the writer
    indexWriter.Close();
}

protected void btnCreateIndex_Click(object sender, EventArgs e)
{
    CreatePersonsIndex(GetPersons());
}

Now the lucene.net index documents will be created in index file location.  If you open the folder you can see the index documents.

Step: 5 Search and get the hits from the indexed documents 

We can implement our search functionality with the following code,  enter  the search text in the text box and firing the search event, the lucene.net will start searching in the indexed lucene documents using the index searcher and will return hits if found any matching records based on the boolean query.
Here I implemented the multi-field searching; the entered keyword will be searched in multiple fields of the lucene index document.
The code bellow demonstrates the functionality, copy the methods and call the 
 SearchPersons method on buttonSearch click.


    public void SearchPersons(string searchString)
    {
        // Results are collected as a List
        List<SearchResults> Searchresults = new List<SearchResults>();

        // Specify the location where the index files are stored
        string indexFileLocation = @"D:\Lucene.Net\Data\Persons";
        Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation);
        // specify the search fields, lucene search in multiple fields
        string[] searchfields = new string[] { "FirstName", "LastName", "DesigName", "CategoryName" };
        IndexSearcher indexSearcher = new IndexSearcher(dir);
        // Making a boolean query for searching and get the searched hits
        var hits = indexSearcher.Search(QueryMaker(searchString, searchfields));

        for (int i = 0; i < hits.Length(); i++)
        {
            SearchResults result = new SearchResults();
            result.FirstName = hits.Doc(i).GetField("FirstName").StringValue();
            result.LastName = hits.Doc(i).GetField("LastName").StringValue();
            result.DesigName = hits.Doc(i).GetField("DesigName").StringValue();
            result.Address = hits.Doc(i).GetField("Address").StringValue();
            result.CategoryName = hits.Doc(i).GetField("CategoryName").StringValue();
            Searchresults.Add(result);
        }

        indexSearcher.Close();

        GridView1.DataSource = Searchresults;
        GridView1.DataBind();
    }

    // Making the query
    public BooleanQuery QueryMaker(string searchString, string[] searchfields)
    {
        var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
        var finalQuery = new BooleanQuery();
        string searchText;
        searchText = searchString.Replace("+", "");
        searchText = searchText.Replace("\"", "");
        searchText = searchText.Replace("\'", "");
        //Split the search string into separate search terms by word
        string[] terms = searchText.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
        foreach (string term in terms)
        {
            if (searchString.Contains("+"))
            {
                finalQuery.Add(parser.Parse(term.Replace("*", "") + "*"), BooleanClause.Occur.MUST);
            }
            else
            {
                finalQuery.Add(parser.Parse(term.Replace("*", "") + "*"), BooleanClause.Occur.SHOULD);
            }
        }
        return finalQuery;
    }

    // Creating an object to store the searched data
    public class SearchResults
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public string DesigName { get; set; }
        public string Address { get; set; }
        public string CategoryName { get; set; }
    }

   // Calling the search function on button click   
   protected void btnSearch_Click(object sender, EventArgs e)
    {
        SearchPersons(TextBox1.Text);
    }

Summary

Searching data using Lucene.Net provides a nice, faster data retrieval mechanism in your application.  Once you've used the Lucene.Net you can understand the features and flexibility of Lucene.net in our search process. 
Hopefully the above introduction and code samples have helped whet your appetite to learn more.

Hope this helps,

Sony.

9 comments:

  1. Very clearly explained. Thanks for sharing this!

    ReplyDelete
  2. This was very instructive as to how to build indexes directly from MS SQL tables.
    However some of the syntax is no longer supported on versions later than LUCENE_29, example:
    Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation);

    Causes error (not just deprecated) on LUCENE_30.
    Had to figure out alternative in very badly documented environment.

    But thanks. It really helped.

    ReplyDelete
  3. wow its working thanks for sharing this

    ReplyDelete
  4. Very helpful....
    Thanks...

    ReplyDelete
  5. Thanks for sharing
    Will you please share updating lucene.net index documents periodically

    ReplyDelete
  6. Casino News - jtmhub.com
    Casino News - jtmhub.com Casino News · Casino News · 여수 출장마사지 New Casino News · Free 경상남도 출장마사지 Casino News · 시흥 출장샵 Poker 성남 출장안마 News · New Poker News · Poker News · 양주 출장안마 The Best Poker News.

    ReplyDelete