Here is an example of what goes into SOLR
<field name="title">History of the Saginaw Valley,; its resources, progress and business interests</field>
<field name="name">Fox, Truman B</field>
<field name="origin">, Daily Courier Steam Job Print1868</field>
<field name="language"><field name="physical-description">text/xml;image/tiff; 80 p. 20 cm.; reformatted digital</field>
<field name="note">Advertising matter interspersed.</field>
<field name="subject-geographic">Saginaw River Valley (Mich.)</field>
<field name="access-condition">Where applicable, subject to copyright. Other restrictions on distribution may apply. Please go to http://www.umdl.umich.edu/ for more information.</field>
There are problems with some of the fields having extraneous commas from concatenation rules. And more interesting problems like whether fields like access-condition should be indexed at all. But at least with the XSL, the rules are not buried within Java or some other language.
Futher details on how SOLR uses this input is controlled via it's schema.xml file (visible here)
The fields created by the transform must be defined as well as copy information which allows fields to be indexed in more than one way. For example the line
<copyfield source="subject-hierarchic" dest="subject">
causes a topic subject to also be indexed as a generic subject line. More lines like this can cause all text to be indexed under a single default which is used when users do not specify a field in a query. Additional configuration can control how fields are parsed, how sentences and punctuation and are handled on a document or field level.
These files need more work but they are not a bad starting point.