Tuesday, August 7, 2007

Sorting and indexing

I have made a few changes to the portal, mostly adding support for date, title and score sorting
as well as a few minor wording changes per Katherine. I still have not had a chance to go over
Advanced Searching as much as I would like, sometimes it seems to make sense, sometimes it does not.

I am rebuilding the indexes with some date sorting. I need to go over the dates used for sorting. Practically I think
we need to have one sortable date per record, and I need an algorithm for picking that date.
I think the logic should be go through the various date fields in the originInfo in some agreed upon order
first looking for one with a keyDate = w3cdtf attribute
Failing that look again through the list for any keyDate value
Failing that look again through the list for the first date.

I am running the date I get from above (actually at the moment just DateIssued) through a combination of
a standard Java date parser and failing that through the CDL temper library that attempts to normalize the date.
Mostly from that I get some kind of date range. I have arbitrarily chosen the middle date of that range as the sorting
date for that record.

Any comments on the above process would be appreciated.

Another struggle was getting the acts_as_solr module to correctly sort the result sets. I finally found the solution in the acts_methods.rb where it documents that the configuration fields options passed to acts_as_solr can be an array of hashes. This finally got it working, prior to the change, everything was getting a '_t' tacked on to the end which was breaking the date sorting. I still never got it to correctly order by score. But since that was the default is to sort by score, in that case I just don't pass in an :order specification when I call find_by_solr

Saturday, August 4, 2007


I met with Katherine and Kalika from Citrus design and had a good meeting. I am anxious to see the result of that. Ere that happens I press on. I spent the better part of Thursday working on user and login registration functionality only to rip it out again in frustration after doing more research no my options. There are several plugins and gems out there that I think bear more research before proceeding, including one which would give us OpenID functionality.
Instead today, triggered by some conversations I had with Katherine and Jerry yesterday. I started working on some analysis of the subject headings. I believe that having some sort of hierarchical view of the collection is critical to our SEO aspirations and I think the subject headings and dates are probably the lowest threshold methodologies to get us there. To that end I produced a table of subject headings and a primitive viewer. It's useful in that it helped me find a few more small problems with the JSON xsl transform I am using and a number of configuations issues that remain with the subject headings, such as lack of a unifying field subject that collects the various specific subject_topic etc. Also the normalizations required to make the heading useful should be improved and more importantly, to follow my latest project mantra, to be made transparent. I am thing right now of a list of regular expressions that can be viewed, maybe a packager of some type as well that can make a collection of transforms that are applied to a given field. It's a nice idea anyway.
I'll have the headings work on the server Monday.

Wednesday, August 1, 2007

Cleaning up, moving forward

Found a few things wrong with xsl now that I can look at the site a more efficiently,
All subjects of a given type are being concatentated into one field. There was also a spurious
name field being generated when the title contained a nonSort element. Those have been fixed and the server is being rebuilt.
I'm working on a registration and login system now. I'd be farther along and have more time to spend on if I hadn't mistyped something and got the following message every time I hit the application I'd get
We're sorry, but something went wrong.
We've been notified about this issue and we'll take a look at it shortly.
nothing I could do. I lost undo after an aptana restart so I was about to pull all the code I had done. I think the problem was and end statement that was mistyped as ens or enS but man I could have a used a better diagnostic than that. So it goes.