Simple Ruby on Rails Full Text Search Using Xapian
In my previous blog post, I wrote about the several full-text search engines that are available to us as Rails programmers and how we ultimately decided to go with Xapian.
Background
I am by no means a Xapian expert. I first heard about it less than a week ago. However, we have been very impressed with its combination of ease of installation and robust features.
Xapian is written in C++ and has been around in its current form since 2001. Its code base is derived from a previous search tool that went closed source.
- Written in C++
- Feature rich including relevance feedback, boolean search operators, stemming, wildcard searching, and similar results
Its user list includes del.icio.us and GMane both of which are indexing tens of millions of records, so scalability should not be an issue. Here is a quote about scalability from the site:
People often want to know how Xapian will scale. The short answer is "very well" - a previous version of the software powered BrightStation's Webtop search engine, which offered a search over around 500 million web pages (around 1.5 terabytes of database files). Searches took less than a second. The largest recent installation we're aware of is probably gmane, which currently indexes over 50 million mail messages.
On the other hand, indexing time could take a while if you have a huge set of documents. According to this article, it took about 5 hours to index the entire Wikipedia database using Xapian for offline Wikipedia access and searching.
Acts_As_Xapian is the Rails plugin for the Xapian. It was written in May 2008 by Francis Irving after he experienced some frustrations with Solr. Acts_As_Xapian has a google group at http://groups.google.com/group/acts_as_xapian.
Xapian Installation:
For Ubuntu Users:
sudo apt-get install libxapian15 libxapian-ruby1.8
For Mac / Other: (taken from the xapian.org install docs)
(version = 1.0.6 at the time of this writing)
wget http://oligarchy.co.uk/xapian/1.0.6/xapian-core-1.0.6.tar.gz wget http://oligarchy.co.uk/xapian/1.0.6/xapian-bindings-1.0.6.tar.gz
curl -O http://oligarchy.co.uk/xapian/1.0.6/xapian-core-1.0.6.tar.gz curl -O http://oligarchy.co.uk/xapian/1.0.6/xapian-bindings-1.0.6.tar.gz tar zxvf xapian-core-versionnumber.tar.gz tar zxvf xapian-bindings-versionnumber.tar.gz cd xapian-core-versionnumber ./configure --prefix=/opt make sudo make install
If you don’t have root access to install Xapian, you can specify a prefix in your home directory, for example:
./configure—prefix=/home/jenny/xapian-install
cd xapian-bindings-<version> ./configure XAPIAN_CONFIG=/opt/bin/xapian-config make sudo make install
Acts_As_Xapian Installation:
Install the plugin:
script/plugin install git://github.com/frabcus/acts_as_xapian.gitGenerate and execute the acts_as_xapian migration:
script/generate acts_as_xapian
rake db:migrateBasic Model Code
To include fields to your index, you will need to add an “acts_as_xapian” call to the model you want to index.
1 class Lesson < ActiveRecord::Base 2 acts_as_xapian :texts => [:name, :description] 3 end
In this case, only the data found in the “name” and “description” attributes of our Lesson model will be searched.
A Quick Test Of Your Index
At this point, you can quickly test that your Xapian install, plugin, and model code are all working together via the following rake commands:
To build the index:
rake xapian:rebuild_index models=“Lesson” RAILS_ENV=development
(You can add more models with a space delimeter. “Lesson User Tag”)
To update index:
rake xapian:update_index RAILS_ENV=development
To test index:
rake xapian:query models=“Lesson” query=“golf” RAILS_ENV=development
The Basic Search
Adding the this line of code
1 @search = ActsAsXapian::Search.new([Lesson], @lesson_search_params)
- description – a techy one, to check how the query has been parsed
- matches_estimated – a guesstimate at the total number of hits
- spelling_correction – the corrected query string if there is a correction, otherwise nil
- words_to_highlight – list of words for you to highlight, perhaps with TextHelper::highlight
- results – an array of hashes each containing:
- :model – your Rails model, this is what you most want!
- :weight – relevancy measure
- :percent – the weight as a %, 0 meaning the item did not match the query at all
- :collapse_count – number of results with the same prefix, if you specified collapse_by_prefix
So, if we want to return an ActiveRecord model object we could do something like this:
1 @lessons = @search.results.collect {|r| r[:model]}
To get the recommended spelling for a misspelled word(s) in a search, we would do:
1 @corrections = @search.spelling_correction
Similar Results
If you have a requirement to show similar or “You may also like” results based on a search result set or a single result, this can be done with a single line of code:
1 @similar_lessons = ActsAsXapian::Similar.new([Lesson], @lessons).results.collect {|r| r[:model]}
Updating The Index
Unfortunately, one drawback to Acts_As_Xapian is that the indexers are not updated automatically when data in an indexed model is added, modified, or deleted. When model data is changed, Xapian will put a record in the table ‘acts_as_xapian_jobs’ in your database to notify the update task to update the index. A cron job or something of the like will be needed to call the following rake task periodically to keep the index up-to-date:
rake xapian:update_indexFrancis, the acts_as_xapian creator, commented on this on my original Mulling Over Our Ruby On Rails Full Text Search Options blog post.
with Xapian you can update and search simultaneously, and updates are immediate. However, only one thread can update a Xapian database at the same time. Since I wanted offline indexing anyway (as my index operation is risky, complex and slow, involving parsing Word documents, PDFs etc.), I didn’t try to find a solution that causes a second thread in the web application to, say, wait for the database lock. So acts_as_xapian currently only supports offline indexing.
Conclusion
This is a very basic tutorial meant to highlight the steps needed to get up and running with simple Xapian searches quickly. There are quite a few topics that I did not touch on such as:
- Sorting and grouping with the :values option in the model
- Advanced searching with the :terms option in the model
- Adding extended attributes from other models into a searched index. (Such as adding lesson comments to the Lesson index)
- Filtering searched data (in our case, we had to filter our results so only “active” lessons were returned)
- Highlighting keywords in the search results

Older articles
Latest comments
Archives
Tweetstream