Simple, Automatic Text Classification in Ruby on Rails

Posted by Daniel Butler Thu, 18 May 2006 23:27:00 GMT

Thomas Bayes
Thomas Bayes, 1702-1761

Surendra Singhi of Calcutta, India, has released an extremely useful text classification plugin for Ruby on Rails. Using a Bayesian classifier you can flag comments, email, articles—whatever chunks of text you’d like to keep a handle on so that when you encounter more like it, you can do something automatically with it. Pick some categories, such as ‘spam’ or ‘not spam’, ‘good’ or ‘evil’, or even ‘ironic’ and ‘irony-free’, classify some existing text or data, and then use the method to predict the classification of an unknown text.

Read more for a summary of its usage.

The ‘acts_as_classifiable’ plugin can be easily installed using gem install classifier --include-dependencies, and once simple 3-column table is added to your database to store the learned classification data, you’re ready to go. Add the following magic dust to the model you want

class Comment < ActiveRecords::Base
  acts_as_classifiable :fields => ["text","title"], 
    :categories => ["Ironic", "Boring"]
end

Train the classifier by using @comment.train :ironic or @comment.train :boring on existing objects, or collections of objects that have known classifications.

When a new @comment comes along, figure out what it is automatically with @comment.classify and act accordingly.

Enjoy.

Surendra’s blog entry about ‘acts_as_classifyable’

Posted in  | Tags ,  | 1 comment

Sponsored Links

Sponsored Links

Comments

  1. Avatar churi said 40 days later:

    you still need to install the “actsas_classifiable” plugin separately, it isn’t part of the Classifier gem. http://opensvn.csie.org/sksinghi/actsas_classifiable/

(leave url/email »)

   Comment Markup Help Preview comment