Trackbacks

Use the following link to trackback from your own site:
http://locomotivation.com/trackbacks?article_id=mulling-over-our-ruby-on-rails-full-text-search-options&day=15&month=07&year=2008

  1. Elavil. Elavil.
    Taking elavil with celexa. Benefits of elavil. Elavil side effects. Elavil.

Comments

  • Avatar
    Chris Heald
    about 9 hours later:

    As a response to the whole Ultrasphinx cron issue, I’d recommend monit or equivalent to make sure your daemond don’t die. It’s also handy for keeping your mongrels/thin instances in line. It works quite decently for me.

    I found Ultrasphinx’s config prohibitive, so I use the thinking-sphinx plugin. It’s a fantastic piece of work, and coming from ferret, “just works” beautifully.

    Xapian seems to have the same issue that Sphinx does, though – offline indexing and only one client at a time writing to the index – except that Sphinx does bridge it with delta indexes for realtime index updates, and thinking-sphinx’s daemon effectively proxies between Sphinx itself and multiple app instances – a step that a Xapian implementation would seem to need as well.

    The “you may also like” sounds really cool, though.


  • Avatar
    Michael
    about 10 hours later:

  • Avatar
    David Welton
    about 11 hours later:

    I use acts_as_tsearch with Postgresql’s tsearch2. It’s not very portable, and I don’t know if I’d use it in a big production system, but for my needs, it works pretty well.


  • Avatar
    Richard Heycock
    about 11 hours later:

    Nice article. I’m using ferret fairly heavily at the moment and while it hasn’t caused me any problems so far I quite keen to replace it with something else.

    I’ve had my eye on Xapian for a while so it’s good to hear that others have has success with it.

    Chris: Unless I misunderstand your comments I don’t think either of the points about xapian are true. From the web site:

    Allows simultaneous update and searching. New documents become searchable right away.

    And from the API docs:

    For efficiency reasons, when performing multiple updates to a database it is best (indeed, almost essential) to make as many modifications as memory will permit in a single pass through the database. To ensure this, Xapian batches up modifications


  • Avatar
    Jim
    2 days later:

    Busy week for search blog posts.

    Rein Henrichs just posted about moving from UltraSphinx to ThinkingSphinx, and Mike Hartl just moved from Ferret to UltraSphinx.


  • Avatar
    Francis Irving
    5 days later:

    Thanks for the nice writeup Jim!

    Chris – if you have multiple app instances, say on different front end servers, then you can get Xapian’s remote backend working http://xapian.org/docs/remote.html i.e. it has a daemon if you really need it.

    Richard/Chris – with Xapian you can update and search simultaneously, and updates are immediate. However, only one thread can update a Xapian database at the same time. Since I wanted offline indexing anyway (as my index operation is risky, complex and slow, involving parsing Word documents, PDFs etc.), I didn’t try to find a solution that causes a second thread in the web application to, say, wait for the database lock. So acts_as_xapian currently only supports offline indexing.


  • Avatar
    Walter McGinnis
    8 days later:

    I’m the original author of the acts_as_zoom plugin which allows use of the ZOOM API for a Z39.50 server like Zebra (http://www.indexdata.com/zebra/).

    The Z39.50 standard protocol is popular in the library and museums worlds for providing behind the scenes machine to machine access to search indexes.

    We use acts_as_zoom as a part of the Kete open source Rails app. You can find out about Kete and acts_as_zoom here:

    http://kete.net.nz/ # software community site http://github.com/kete/ # source for both Kete and acts_as_zoom

    Note that there is some newer refactoring of acts_as_zoom in the version included in Kete’s source. We’ll eventually update the plugin (or someone can fork and we’ll pull it) to include those changes.

    Cheers, Walter McGinnis Kete Project Lead

    P.S. – oh yeah, if you are interested in Ruby ZOOM API support, you’ll probably be interested in ruby-zoom project at http://ruby-zoom.rubyforge.org/


  • Avatar
    Mr. Khan
    10 days later:

    I run ferret on several production servers. It is a bummer you didn’t give it a try.

    The one trick is you have to run it as a separate server—if you have more than one instance of your Rails app.

    Other than that I find it to be easy to use and integrates very well into Rails. But I haven’t tried anything else in while, Ferret just works.


  • Avatar
    Matthew Rudy Jacobs
    11 days later:

    Re: acts_as_solr being discontinued. Quite a few people are working on improving the plugin.

    Notably JobsGoPublic http://github.com/jgp/acts_as_solr

    and a guy called Look http://github.com/look/acts_as_solr who is trying to combine the best commits from all of the github forks.

    We have a Solr server running off our DB machine, and will hopefully move all of our fulltext searching away from Ferret and onto Solr in the next couple of weeks.

    Will get back to you if it all goes wrong. But I hope it wont.


  • Avatar
    John D. Rowell
    12 days later:

    I have to second Mr. Khan’s comment, I also use Ferret in several production servers and have only good things to say about it. Unfortunately plenty of ppl looking for a quick & easy solution tried deploying Ferret without using DRb (which is just silly and explained in the docs) and got corrupted indexes.

    Ferret is very fast and although it’s a bit complex it is also very flexible. Or rather, it’s simple if all you care about is indexing documents, but if you want to have very independent search fields with different indexing strategies and ranking weights, it is flexible enough to do it but will require some work and lots of reading of the API.


  • Avatar
    Aaron H.
    12 days later:

    I have to chime in. Ferret works great and I find FAR more flexible to and easy to configure than Sphinx. If you are running a mongrel cluster or something, you have to use a DRB server as mentioned in the docs (which is also trivial to set up).

    It seems that some time in the past the Rails Envy guys tried to move ferret to clustered server without a using DRB and gave ferret a bad name by having high profile blog posts about it’s “instability in production” due to their improper implementation.

    I like the RailsEnvy guys, but I find that really unfortunate because it’s a great search solution.


  • Avatar
    Peter D.B.
    12 days later:

    I’ve used Ferret in a production environment (yes, with DRB) and it was a nightmare. It wasn’t a bad setup, it wasn’t bad usage, it was just Ferret and DRB, both of them. The problems didn’t start right away, they emerged at the worst possible time: when the app was running in production for a few weeks, holding quite a bit of data. We rebuilt the index, a few weeks later, same problem (newer version of acts_as_ferret and ferret were out and installed by then) and that just kept on going. We were getting frustrated at that time and when two of us had a corrupted index on their development machines (one person interfacing with the index), Ferret was out.

    And we are not the only one having those problems btw. A collegue (who actually recommended Ferret to us like some of the previous posters did) started having problems on a production app that had been running fine for over a year. He didn’t want to believe Ferret was the culprit, but after a long and time consuming search he knew he had to switch to something else, he chose acts_as_solr.

    Solr was also the first thing thing came to our mind, but we decided to spend a few days trying out the different fulltext indexers before making a final choice.
    • acts_as_tsearch: although it was tempting having fulltext indexing not as a separate service, but use the one built-in to the database, we didn’t want to force our customers/prospects into postgreSQL (we know some of them will insist on certain databases, so we need to keep Rails’ abstraction intact)
    • Thinking Sphinx/UltraSphinx: we came to the exact same conclusion as Rein Henrichs mentioned in a previous comment. Sphinx was quite good actually, but our app simply has too much insert/update/delete actions to make Sphinx’ delta index feel right. We also need to index lots of virtual attributes (data that isn’t in the database)
    • acts_as_solr: first of all, solr isn’t bad, it’s very good in fact. The reason why we didn’t pick it, is because we found something better and because of the memory consumption of the JVM.
    • acts_as_searchable: although this indexer never got that much coverage on the mailing list and on blog posts, it just blew us away. Hyperestraier is a separate fulltext indexing service, it runs completely independent of your Rails app and can be used with other languages/services too. You can basically have 100 nodes running for all possible apps on your server, completely independent of each other. It’s a pure C++ library, so it’s memory consumption is low. And boy, this baby is fast, and I mean really fast. It’s been running for over a year now in production in different apps and it hasn’t been restarted just once, it’s so rock stable. It does lack some of the features solr has, such as facetting, so if you need those, you’re out of luck, but we didn’t. I can’t emphasize it enough: if you are looking for a fulltext indexer, consider Hyperestraier too, you won’t be disappointed. The existing plugin could use a little bit of care and nourishment, for now we have to implement :after_update hooks on the related table to rebuild the record index if one of the related records is changed in any way. A little bit of a burden, but easily overcome. It could also use a multimodel search in it’s own namespace (just something like MultiModel.search(:models => [...] instead of the weird way of Ferret), but even that is simply building on the existing plugin and has nothing to do with Hyperestraier.

    In short, acts_as_searchable (Hyperestraier) should be considered just as much of an option as Solr and Sphinx, it’s great and has been so good to us and our customers.


  • Avatar
    jim
    12 days later:

    @peter – Thanks for the detailed analysis on your Hyperestraier / acts_as_searchable experience. As I mentioned at the end of this blog post, I had seen it mentioned before but have not heard of anybody using it on a Rails project before.

    It is good to know that it is working out very well for you and gives us Ruby / Rails developers another full-text search option.


  • Avatar
    tc
    27 days later:

    “Kicking off a job every 3 minutes just to make sure another job is running did not seem right to me.”

    I don’t know what seems not-right about it.

    If you’ve ever watched a movie on Unix and had XScreensaver not kick in, it’s using the same method.

    Apache solves this by continuously creating new processes, and only handling a few requests per process before preemptively killing it. I suspect that’s because it has much less (no?) state to share.

    It seems like a good policy to anticipate things which can go wrong before they do. The alternative is “force users to hook it up to their monitoring system before they even know what’s up or down”. You’ll have to that anyway, but at least the auto-restart means you’re not totally dead before then.




Have a take?