On Friday 21 November we had, as promised, a marathon discussion of transcription principles, followed by a transcription session as we tried to make the system work. Present were: Kyle Dase, Murray Melymick, Peter Robinson (the irrelevant), Brendan Swalm, Aaron Thacker, Adam Vazquez, Megan Wall. Barbara Bordalejo skyped in from Leuven for a time.
From left: Peter Robinson, Murray Melymik, Megan Wall, Adam Vazquez (gesturing), Kyle Dase, Brendan Swalm, Aaron Thacker (photo by Jon Bath)
PR began by outlining the problems we have with abbreviations, and (more generally) the shifts in thinking underlining our movement towards a new transcription policy. He focussed on final u/n+macron, explaining the following three cases:
- In the vast majority of instances, the n+macron simply means final n. So (thousands and thousands of times) in words like on, upon, slepen, and spellings with final -oun, like condicioun, gypoun, etc etc
- In a few cases, u+macron (which might actually appear as n+macron) is certainly an abbreviation of final n. Thus in the adverb "doun", spelt dou+macron (which might actually appear as don+macron)
- In a large number of cases, final u/n+macron might or might not indicate u+n. Thus "condicion+macron", where the final letter might be n, in which case the macron indicates nothing (case 1) or might be u, in which case the macron indicates final n (case 2)
There appear to be three options for dealing with this range of situations:
- Ignore all macrons. But this would not deal with cases in 2, where abbreviation is present.
- Ignore all macrons except those in case 2. But this would give the completely misleading impression that scribes only use u/n macron to indicate abbreviation
- Transcribe all macrons the same way. This would be simple and consistent, and represent what is in the manuscripts, but would tell the reader little except that there are a lot of macrons in the manuscript.
- Figure out some way of transcribing the manuscripts which takes account of the different situations.
It appears that 4 is the way to go. That is: we want to distinguish the three cases we outline above: when n+macron means just n; when it definitely indicates abbreviation; when it might indicate abbreviation. There is a further complication, which Barbara has argued we incorporate in our transcription. The two letters n/u are commonly written as two minims. These two minims are written in one of three ways:
- They are joined at the top, in which case they look like an n;
- They are joined at the bottom, in which case they look like a u;
- They are not joined at either, in which case they look simply like two minims.
In a perfect world, we might find case 1 every time the context demands "n", case 2 in every case the context demands "u", and we would never see case 3. This is not a perfect world. A glance over any page of any manuscript shows that we find everywhere two minims which look like "n" where the context demands "u", two minims which look like "u" where the context demands "n", and two minims which look like neither. Usually we do what transcribers have always done: we transcribe by context, so if the context demands "n" we transcribe it as "n" even if what is written is clearly a "u". We make a similar decision in cases of e/o and y/thorn, which in some scribes are often completely indistinguishable.
One could argue that we should give this information, about how EVERY n/u are written, in our transcription. This would take immense resources: it would slow up transcription; it would likely increase our error rate as transcribers look closely at every u/n and fail to see other things. However, there is a strong case for recording exactly how the minims are written in the instances where there is clear ambiguity: case three above, and (possibly) case two above.
Here is how it could work in case three. Here's what we see in the manuscript (Hg):
This could either be u+macron, which would be abbreviation of final n, or n+macron, which is nothing. And, it appears to be written as two minims, joined neither at top or bottom. We could encode this something like:
In Bo1, the same word appears as:
This we could encode as:
In Fi, we see:
This we could encode as:
There are alternative ways of achieving the same encoding. One might use, instead of the constructs <am rend="u">ıı̄</am> and <am rend="n">ıı̄</am>, <am>ū</am> and <am>n̄</am>. While more compact than the use of the "rend" attribute, this has the problem of asserting that what is written is "u" or "n" where our point is that this is two minims joined together at the top or bottom: a subtle but critical distinction. Alternatively, Barbara suggests use of the TEI <glyph> and <g> mechanism. This has two parts:
- <glyph>: which defines a character, or combination of characters, not available on the standard unicode/etc character set. Hence: <glyph xml:id="umac"><glyphName>Two minims joined at the base with a macron above</glyphName></glyph>
- <g>: which inserts the character, thus: <am><g ref="umac">u</g></am><ex>un</ex>
Note that all three methods (<am rend="ū">ıı̄</am>; <am>ū</am>; <am><g ref="umac">ū</g></am>) are completely interchangeable, with no loss of information in conversion from one to the other.
The arguments for using this mechanism with case 3 above seem clear: one might expect analysis of the distribution of the graphetes to case light on the ambiguity. It could be argued that we should use this mechanism too in case 2, for purposes of comparison with the distribution of these graphetes with case 3. On the other hand: we may argue that the advantages of using this mechanism for case 1, where the macron is simply redundant, do not warrant the effort of encoding these many thousand cases.