Simple, Automatic Text Classification in Ruby on Rails
Posted by Daniel Butler Thu, 18 May 2006 23:27:00 GMT
Thomas Bayes, 1702-1761
Surendra Singhi of Calcutta, India, has released an extremely useful text classification plugin for Ruby on Rails. Using a Bayesian classifier you can flag comments, email, articles—whatever chunks of text you’d like to keep a handle on so that when you encounter more like it, you can do something automatically with it. Pick some categories, such as ‘spam’ or ‘not spam’, ‘good’ or ‘evil’, or even ‘ironic’ and ‘irony-free’, classify some existing text or data, and then use the method to predict the classification of an unknown text.
Read more for a summary of its usage.
The ‘acts_as_classifiable’ plugin can be easily installed using gem install classifier --include-dependencies, and once simple 3-column table is added to your database to store the learned classification data, you’re ready to go. Add the following magic dust to the model you want
class Comment < ActiveRecords::Base
acts_as_classifiable :fields => ["text","title"],
:categories => ["Ironic", "Boring"]
endTrain the classifier by using @comment.train :ironic or @comment.train :boring on existing objects, or collections of objects that have known classifications.
When a new @comment comes along, figure out what it is automatically with @comment.classify and act accordingly.
Enjoy.


you still need to install the “actsas_classifiable” plugin separately, it isn’t part of the Classifier gem. http://opensvn.csie.org/sksinghi/actsas_classifiable/