Implementing SQL Server Full-Text Search In An ASP.NET MVC Web Application With Entity Framework

The search facility on this site has been rudimentary at best. I have recently upgraded it to take advantage of the Full-Text Search feature in SQL Server. Although it is still a basic implementation, it is a vast improvement on the search system that existed before, which was based on Entity Framework dynamically generating SQL LIKE clauses. This article describes how I got SQL Server Full-Text Search up and running.

In simple terms, Full-Text Search relies on the indexing of character data, making it a lot quicker to perform matches against the index than the scans required by the SQL statements that use the LIKE predicate. Full-Text searching is also a lot more flexible than using the LIKE predicate, which only supports the wildcard operator.

Creating a Full-Text Index

  1. The first step towards enabling Full-Text Search is to create a virtual storage location or catalog for the full-text index. This is achieved by expanding the Storage node under the database entry in SQL Server Management Studio:

    Full Text Search

    Right click and choose New Full-Text Catalog from the context menu.

  2. Give the catalog a unique name not exceeding 120 characters.

    Full Text Search

    I chose to specify that searches are accent-insensitive, which means that the full text search will see "ole" and "olé" as the same thing.

  3. Having created the catalog, refresh the storage node if needed, then right click to access the context menu and choose Properties:

    Full Text Search

  4. Select the table and columns that you would like to include in the index. Only tables that have a unique index can be included for indexing. Eligible columns include nvarchar, varchar and varbinary data types. In other words, you can index the content of files as well as text:

    Full Text Search

    I have left Track Changes at its default setting of "automatic". This means that the index will be refreshed whenever the content of the selected columns changes. In my case that's fine. I don't post new or updated content very often. You may need to implement dfferent change-tracking strategies based on the nature of your application. If your indexable content is constantly being updated, you may choose to defer the rebuilding of the index to a quiet time such as overnight. If you choose "manual", you must start or schedule this process manually to apply changes to the full-text index.

    Once you click OK, the process will start building the full text index.The process may take a while, depending on the nature and quantity of data you want to index.

Full Text Searching

Now that the index has been created, it can be used for searching. The two basic search operators are CONTAINS and FREETEXT. The second of these, FREETEXT, expands search terms to include a variety of noun and verb variations of the search term. So a FREETEXT search for "search" will also match "searching, searches, searched, searcher" and so on. This may not be what you want and it can return a lot of non-relevant results. The CONTAINS predicate returns only those results that include exact matches for the search terms. The complementary functions, FREETEXTTABLE and CONTAINSTABLE return a KEY column and a RANK column, which is particularly useful when attempting to establish relevance of results.

Full Text Searches can be enormously complex, consisting optionally of phrases, wildcards and boolean operators. SQL Server Full Text Search includes a list of "stop words", such as "and", "or", "in" etc in a number of languages. These are ignored in searches. Full Text Search also includes the NEAR operator, which when used with CONTAINSTABLE, helps to further define relevance. For example, a CONTAINSTABLE search for "cat" NEAR "dog" will return a RANK value based on how close "cat" is to "dog" in the searched content. The closer the two words, the higher the value of RANK. The further away, the lower the value. If the two terms are more than 50 words apart, the RANK value will be 0.

Requirement

The search facility offered by this site permits visitors to enter any number of words as a search phrase. The phrase is sanitised so that any punctuation or numbers are removed, leaving just words and spaces. Then the stop words are removed from the phase and the remaining words are used to construct a predicate dynamically, inserting the NEAR operator between them. Results are returned to the browser in order of relevance (RANK) in pages of 20 results each.

Entity Framework

As regular visitors might be aware, I use Entity Framework for data access on this site. However, EF doesn't provide any kind of support directly for Full Text Search. There are (at least) a couple of ways around this. One is to use command interception, introduced in EF 6, which enables you to modify the SQL generated by EF before it is executed against the database. Another, more straighforward option is to use stored procedures. That is the option I chose to use.

To start with, I needed a class to represent the data that the stored procedure returns so that Entity Framework can map the data directly to a C# object.

public class SearchResult
{
    public int ArticleId { get; set; }
    public string Headline { get; set; }
    public string Abstract { get; set; }
    public DateTime DateCreated { get; set; }
    public int Rank { get; set; }
    public string CategoryName { get; set; }
    public string ArticleTypeName { get; set; }
    public string Categories { get; set; }
    public int TotalRecords { get; set; }
}

The properties will be mapped automatically to the return values from the stored procedure which follows:

CREATE PROCEDURE Search
    @SearchTerm varchar(8000),
    @CurrentPage int = 1, 
    @PageSize int = 20

AS
BEGIN
    DECLARE @NearPredicate varchar(8000), 
            @AndPredicate varchar(8000), 
            @TotalRecords int

    SELECT 
        @NearPredicate = COALESCE(@NearPredicate + ' NEAR ', '') + items
    FROM Split(@SearchTerm, ' ') 
        LEFT JOIN sys.fulltext_system_stopwords ON items = stopword
    WHERE stopword IS NULL

    SET @AndPredicate = REPLACE(@NearPredicate, 'NEAR', 'AND')
    SET @NearPredicate = '(' + @NearPredicate + ')'

    SET @TotalRecords  = (
        SELECT 
            COUNT(*) 
        FROM 
            Articles 
        WHERE CONTAINS(*, @AndPredicate ) 
        )

    SELECT a.ArticleId, 
        a.Headline, 
        a.Abstract,
        a.DateCreated,
        ct.Rank,
        STUFF
        (
            (
                SELECT ',' + c1.CategoryName
                FROM Articles a1 
                INNER JOIN ArticleCategories ac1 ON ac1.ArticleID = a1.ArticleID
                INNER JOIN Categories c1 ON ac1.CategoryID = c1.CategoryID
                WHERE a.ArticleID = a1.ArticleID
                ORDER BY c1.CategoryName
                FOR XML PATH('')
            ), 1, 1, ''
        ) AS Categories,
        at.ArticleTypeName,
        @TotalRecords AS TotalRecords
    FROM 
        Articles a
            INNER JOIN CONTAINSTABLE (Articles, *, @NearPredicate ) AS ct ON a.ArticleId = ct.Key
            INNER JOIN ArticleTypes at ON at.ArticleTypeID = a.ArticleTypeID
    ORDER BY 
        ct.RANK DESC
    OFFSET (@CurrentPage - 1) * @PageSize ROWS FETCH NEXT @PageSize ROWS only
    
END

The procedure requires a bit of explanation. The parameters represent the search phrase, the current page of results (defaulted to 1) and the number of results to return (page size). In the body of the procedure, a couple of local variables are declared. The first represents the predicate that will be passed to the CONTAINSTABLE function, and the second represents the predicate to be passed to a simpler CONTAINS statement. The last variable captures the total number of records matching the search criteria.

The first section of code generates the predicate dynamically from the search term. The search term is passed to a user defined function named Split, which takes the content of the search term and splits it on the character specified in the second parameter (an empty space in this case), returning the resulting individual words as a temporary table. The code for the split function is below:

CREATE FUNCTION Split(@string varchar(8000), @delimiter char(1))
    RETURNS @temptable TABLE(nameIndex int identity(1,1), items varchar(8000))
AS
BEGIN
    DECLARE @index int, @slice varchar(8000)
    SELECT @index = 1
    IF LEN(@string) < 1 or @string IS NULL RETURN
    WHILE @index != 0
    BEGIN
        SET @index = CHARINDEX(@delimiter, @string)
        IF @index != 0
            SET @slice = LEFT(@string, @index-1)
        ELSE
            SET @slice = @string

        IF(LEN(@slice) > 0)
            INSERT INTO @temptable (items) VALUES (@slice)
            SET @string = RIGHT(@string, LEN(@string) - @index)
            IF LEN(@string) = 0 BREAK
    END
    RETURN
END

This temporary table is left-joined on to the system table containing the stop words, and only those that don't exist in the stop words table are selected. They are then subjected to a COALESCE function that joins the resulting words using ' NEAR ' to generate a valid predicate for the CONTAINSTABLE function. If, for example, the search term was "entity framework in mvc", the resulting value of the @NearPredicate variable will be "(entity NEAR framework NEAR mvc)". The word "in" will have been removed as it appears in the stop words table. The @AndPredicate variable is generated from the @NearPredicate variable, with the NEAR's being replaced with ANDs, resulting (using the previous example) in a value of "entity AND framework AND mvc".

The @AndPredicate variable is used in a relatively simple SELECT COUNT query, which returns the total number of items that contain "entity" and "framework" and "mvc". Finally, the actual results are returned in the last block - which is not as complicated as it might first appear.

A visual depiction of the schema can be found in a previous article - ASP.NET MVC, Entity Framework, Modifying One-to-Many and Many-to-Many Relationships. The first few fields selected are from the Articles table. Then the Rank column is selected from the table returned by the CONTAINSTABLE function (which is aliased as "ct"). There is a many-to-many relationship between Articles and Categories in my database, which is managed by a union table called ArticleCategories. I wanted to return all the categories that an article is related to as a comma separated value. I have use the FOR XML clause to return the data and join it using a comma. Then I have used the STUFF function to replace the leading comma with an empty space. The TotalRecords value is also included in the result set, and the results are ordered by rank, with the most relevant (or highest) first. Finally, OFFSET and FETCH (introduced in SQL Server 2012) are used to grab the correct "page" of data.

Client Code

The service layer that calls the stored procedure illustrates how to use the Database.SqlQuery<T> method to execute a stored procedure with Entity Framework and pass the result to a strongly typed collection:

public class SearchService : ISearchService
{
    public List<SearchResult> Search(string searchTerm, int? page)
    {
        using (DotnettingContext _context = new DotnettingContext())
        {
            var param1 = new SqlParameter("@SearchTerm", searchTerm);
            var param2 = new SqlParameter("@CurrentPage", page);
            var result = _context.Database.SqlQuery<SearchResult>("Search @SearchTerm, @CurrentPage", param1, param2).ToList();
            return result;
        }
    }
}

This is called from within the controller to help populate a ViewModel:

public ActionResult Index(string searchTerm, int? page)
{
    page = page ?? 1;
    searchTerm = Regex.Replace(searchTerm, @"[^\sa-zA-Z]", string.Empty).Trim();
    var data = _service.Search(searchTerm, page);

    var model = new SearchResultViewModel();
    model.PageTitle = "Search";

    model.KeyWords = searchTerm;
    model.Results = data;
    model.CurrentPage = page.Value;
            
    return View(model);
}

Summary

This article provied a very basic introduction to Full Text Search, and showed how to create a Full Text Search index from witin SQL Server Management Studio. It then discussed some basic queries and showed how to create a stored procedure to generate paged results from a Full Text Search. Finally it showed how to use the Database.SqlQuery method of the Entity Framework DbContext to execute the stored procedure and to pass the result back into C# code.