Skip to content
jkraemer edited this page Sep 13, 2010 · 1 revision

Multiple Databases

For those of you who are foolhardy enough to be hijacking your database connections in application.rb based upon some random criteria such as serving multiple domains from one application (like we do see Spike@School) then you’ll need to pay particular attention to acts_as_ferret to make it work nicely for you.

To begin with, your acts_as_ferret declarations need to be a little bit special:

acts_as_ferret :fields => {:name => {:store => :yes}, :data_blob => {}, :domain_hash => {:index => :untokenized}},
               :single_index => true, :remote => true,
               :ferret => {
                 :default_field => [:name, :data_blob]
               }

What we’re doing here is hashing the domain name and using it as the name of our database. This doesn’t have much to do with ferret other than when we index a record we add the domain hash as an untokenised column. This means (I’m rusty on this) we are then able to use this to define a unique record as the id, class, and domain_hash. Unless you do this you’ll find that if you had a Page class record with id of 1 in two different databases you’d end up overwriting your index of that record all the time because they won’t be unique. That leads us to our next step, which is a bit of a hack of act_methods.rb:

# these properties are somewhat vital to the plugin and shouldn't
# be overwritten by the user:
aaf_configuration[:ferret].update(
  :key               => (aaf_configuration[:single_index] ? [:id, :class_name, :domain_hash] : [:id, :domain_hash]),
  :path              => aaf_configuration[:index_dir],
  :auto_flush        => true, # slower but more secure in terms of locking problems TODO disable when running in drb mode?
  :create_if_missing => true
)

As you can see, I disagree with Jens and think we need to be able to override the :key property, since we’d have to hack this file either way, i’ve favoured just adding my domain hash bit right in there, you may want to remove the overwriting and supply :key as part of your acts_as_ferret statement. Perhaps in the future we’ll be able to override this without a hack though someone else can post a patch for that. You’ll also notice that I adjusted the key for non-shared indexes (single_indexes). That’s because we have a model in our app that has its own index:

acts_as_ferret  :fields => {:name => {:store => :yes}, :domain_hash => {:index => :untokenized}}, 
                :remote => true, 
                :single_index => false, 
                :ferret => {
                  :default_field => [:name]
                }

The above is just FYI.

Now that’s all well and dandy, but you’ll probably want to reindex all your databases if you change something major in your models. To solve this, I made a reindexing script. But first we have to hack ferret_server.rb and add a method that allows us to switch the database that ferret_server is using so we can index other databases within the same process. Here goes:

def switch_db_connection(clazz, domain_hash)
  database_config = YAML.load(File.open(File.join(RAILS_ROOT,'config/database.yml'),'r'))['login']   
  ActiveRecord::Base.establish_connection(
    :adapter  => database_config['adapter'],
    :host     => database_config['host'],
    :username => database_config['username'],
    :password => database_config['password'],
    :database => domain_hash
  )
end

As you can see, it loads the connection info from our database.yml file and then substitutes the domain_hash we supply as the database we want to connect to. I’m using a special area of our database.yml file called login, so you might want to change that to whatever you use. I placed this method just above the protected declaration in ferret_server.rb.

Now here is my script. It’s ultra-crude and designed to do exactly what we want, but it’ll work for you with a few changes:

# Load our rails environment
config = Rails::Configuration.new
# Connect to out admin database which has a list of the domains that we host
ActiveRecord::Base.establish_connection config.database_configuration[[spikeadmin]]
# These are the models that we want to reindex
models = [ Page,
           Calendar, 
           DailyNotices, 
           DownloadSet, 
           Download, 
           Episode, 
           Form, 
           Hyperlink, 
           LearningCaveSet, 
           LearningCave, 
           LinkSet, 
           Link, 
           NewsletterSet, 
           Newsletter, 
           Notice, 
           Series, 
           StaffProfileList, 
           StaffProfile, 
           StaffVacancyList, 
           StaffVacancy,
           ComponentInstance ]
# Get a list of all the domains (and associated domain hashes)
@entities = Entity.find(:all)
# Loop them
for entity in @entities
  # Load the yaml file section into this variable
  database_config = YAML.load(File.open(File.join(RAILS_ROOT,'config/database.yml'),'r'))['login']   
  # Connect to the database as specified by the current entity
  ActiveRecord::Base.establish_connection(
    :adapter  => database_config['adapter'],
    :host     => database_config['host'],
    :username => database_config['username'],
    :password => database_config['password'],
    :database => entity.domain_hash
  )
  # If it's the first entity on the list then we want to do a rebuild_index.
  # This is because it wipes the old index completely (including all our other sites indexes.
  # We only want to run this once.
  if entity == @entitiesr0
    puts "Rebuilding Index for #{entity.name} (#{entity.domain_hash})"
    Page.rebuild_index Calendar, 
                       DailyNotices, 
                       DownloadSet, 
                       Download, 
                       Episode, 
                       Form, 
                       Hyperlink, 
                       LearningCaveSet, 
                       LearningCave, 
                       LinkSet,
                       Link,
                       NewsletterSet,
                       Newsletter,
                       Notice,
                       Series,
                       StaffProfileList,
                       StaffProfile,
                       StaffVacancyList,
                       StaffVacancy
    # ComponentInstance is a seperate index and this a seperate file
    ComponentInstance.rebuild_index
  end
  # For every other entity, we just bulk add the records for which ferret_enabled? is true for that class
  puts "Bulk Adding for #{entity.name} (#{entity.domain_hash})"
  for model in models
    # This is where we invoke our special hack method on the ferret_server because it's still
    # connected to the connection used by rebuild_index above this could really be done outside
    # of this loop. I just wanted to be sure.
    model.aaf_index.switch_db_connection(entity.domain_hash)
    # Flash code that basically grabs the id's of all enabled records on the current model.
    model.bulk_index(model.find(:all).select(&:ferret_enabled?).map(&:id)) 
  end
end

Self documented, so I’ll let you peruse that at your leisure. So we have our index rebuilt across all domains, now the fun part is searching it. Here’s the code which tooks me ages to figure out but which you can have at a fraction of the grey-hairs:

@results = Page.find_by_contents("+domain_hash:\"#{domain_hash}\" AND #{@query}", 
                                 :lazy => [:name], 
                                 :page => @pagination, 
                                 :limit => PER_PAGE, 
                                 :models => :all)

So there’s some pagination in there, and lazy loading of the name field, but the real fun is +domain_hash:“a7fsdgf7a6sdgf7a6e5fgae” which basically means that if the domain_hash doesn’t match the one supplied this time, then the record won’t appear in the search results even though it might have keywords that match the query. We of course know the domain_hash because this is in a controller in our app, and it’s already sniffed the domain name and found it’s associated hash.

That’s pretty much it. I hope you enjoy the exhale as you realise that this just saved you heaps of time! Go the Hikack!!!

Author: Brendon Muir (brendon AtT spike D0t net Dot nz)

Caveats

One thing that we weren’t able to confirm but which still works perfectly is the ongoing indexing of records as a user adds a Page etc… Jens thought that ferret_server DRb wouldn’t be able to know which database to connect to to index the record. We think otherwise. It would appear that the DRb server uses whatever active connection the requesting rails app currently has (which would have been hijacked in application.rb) which would correctly index the record in the right database. Because ferret_server is supposed to be thread-safe, this shouldn’t be a problem. If anyone has any information on this that would help, please email me. We’ve tested this thoroughly in our production app and it works a treat.

Clone this wiki locally