Monday, July 30, 2007

Portal Round 1

I worked today on bringing up the portal. I started from scratch with the two rails plug-ins acts_as_solr and will_paginate. I installed them locally on my laptop using radrails plugin installation and loaded them as svn externals.
I have elected to use the included solr distribution within the acts_as_solr plugin although I cannot seem to run it from the rake task within the ide. To run it I have to find the vendor/plugins/acts_as_solr/branches/release_0.9/solr directory within my Apatan workspace and then manually run the instance with java -Djetty.port=8982 -jar start.jar which starts the jetty servlet container at the port that acts_as_rails deems the default for the development environment.
I retooled the mods-solr.xsl to use solr style field designations and to take advantage of the dynamic field names. Since I was thinking about it I re-did the JSON field names to follow similar conventions.

I spent most of the day banging my head against the following problems
  1. The little I could find on integrating acts_as_solr and will_paginate did not seem to work. I eventually settled on using the WillPaginate::Collection which seems to work pretty well.
  2. Sorting results sets with acts_as_solr is beyond my googling. I still don't have it working but have learned a lot about the issue in the meantime.
    1. Syntactically the format is :order => " " the discovery of which took me deep within the bowels of acts_as_solr. Errors about "title_t".keys method not found required the full tour.
    2. acts_as_solr as of release 0.9 does not use the standard method for declaration of sorting.
    3. After reading up on lucene, sorting of string pretty much requires the existence of an untokenized field. This led me to the creation of title_sort field of type alphaSortOnly. This is find but required that I alter the supplied schema.xml which causes me problems with svn since it is down in an svn externals directory. -- Also worth noting is that adding the untokenized form of the title added an eyeball estimate of ~20% to my runtime when indexing records. This should be pursued
    4. The date sorting will require either a normalized UTC date or a string that we manufacture. Since we are plucking this data out of the MODS xml it may require a bit more
    5. Either way the sort still did not give me the expected results, I'm not fretting yet I got pretty far today, so I'll just have to dig some more
  3. I just started looking at facets but can imagine that will be pretty exciting as well.

1 comment:

jumpshift said...

The little I could find on integrating acts_as_solr and will_paginate did not seem to work. I eventually settled on using the WillPaginate::Collection which seems to work pretty well.

Hi, I'm trying paginate with acts_as_solr myself. Any chance you could save me some head banging and post sample code showing how you used WillPaginate::Collection with the solr results?