Guide To Editing with Lace
Image Zoning
Zoning allows Lace to match sections of text across pages and to note in what order the page should be read or if sections should be omitted in output. For instance, a digital edition usually should not have page titles or page numbers intrude on the primary text, nor is it likely to be desired that the translation and text should alternate, in the case of volumes with facing translation. This information will be especially useful in generating TEI-encoded structured text.
Use the Zone Type
dropdown button to chose the zone type, such as 'Page Number' or 'Primary Text'. Proceding in reading order, draw a rectangle around the text to be identified as an instance of that zone type, and then repeat, selecting a new zone type if necessary. Separate columns of the same zone type should be identified separately, again in reading order. Note that the words on the right side of the web page that are enclosed in a zone will be hilighted when that zone is highlighted. Note also that words are included if the zoning rectangle touches any part of the word's bounding box: the zoning rectangle does not need to fully enclose it.
Normally, the lines within a zone will be output as continuous text, with hyphenated words reduced to their dehyphenated form. However, sometimes the editor will prefer that the line breaks be preserved and that hyphentated words remain hyphenated, for instance when dealing with an inscription or verse. In this case, the Line Mode
button should be clicked, causing it to turn blue. When this is the case, any zones compsed will be rendered with dashed lines. Text within zones that are drawn in Line Mode
will have their line breaks indicated with >tei:pb/>
milestone elements, both internally and at the start and end of the zone. Clicking on the same button returns it to a white colour. Zones drawn in this state are not in Line Mode
and will once again output as continuous text and with automatic dehyphenation.
Clicking on a zone hilights it and reveals a popup label for the zone, indicating its type and its position in the reading order. Pressing the delete
key when a zone is highlighted cases it to be deleted and the reading order to be adjusted. Therefore you cannot delete a zone early in a long reading order and then simply re-draw the zone, since the newly drawn zone always comes at the end of the reading order. In this case, it is best to used the Clear Zones
button to erase all zones and start over.
Text Colour Codes
When the OCR output is post-processed by the computer, it indicates the spellcheck status of the word and what spellchecking strategies were applied to it. In the display, these attributes are indicated with different colours. Spellcheck is performed with dictionaries, large lists of known-good words. Here are how the OCR words relate to dictionary words and the corresponding colour codes:
-
¹²παρελθεῖν: this word has passed spellcheck. I.e., it matches one of the words in the spellcheck dictionary. Notice that there are characters allowed to be before and after the dictionary word, such as the '¹²' here, which do not disrupt the spellchecking.
-
Κυρίου: this word passed spellcheck when it was transformed to its lowercase form.
-
(11): this word comprises numbers or punctuation.
-
κατοικήσομεν: this word has been corrected by substituting one character for another. In this case, a 'α' was replaced with a 'ο'. (The exact substitution is encoded in the html attribute, but not visible to the reader).
-
εὑρήματα: this word was matched to a dictionary word when a pair of the same letter was replaced with only one instance of that letter. In this case, εὑρήμματα was the OCR output, a word that is not in the dictionary.
-
besides, -five: in this case, there was no space between the punctuation separating two dictionary words.
-
τῆς αὐλητρίδος: these words passed spellcheck only when a space was inserted between them. The original OCR output was τῆςαὐλητρίδος.
-
ὥμοσεν: this word cannot be matched with a dictionary word by any of the strategies.
Simple Text Editing
To edit the OCR output, click on a word. The content of that word is now editable: you can type additional characters or use the backspace key to delete; alternatively you can select and delete a range of characters in the usual manner. When a word is clicked on, a tooltip pops up with the corresponding image range from the OCR'd page. It is usually easier to compare the text with this image than it is to scan the page image on the left side of the screen for the word, which is highlighted.
Once the editor is assured that the text in the word is what is in the pop-up image, he or she should press the Return
(or Enter
) key. This action is all that is required to save the edit in the underlying database. The editor will note that the colour of the word changes to light-blue, indicating that the word has been manually verified. The editing cursor now moves to the next word on the page.
In the case where a text is highly accurate, editing will simply entail clicking on the first word, checking that the word text corresponds to the pop-up image and then pressing Return
. The process is repeated again and again, the text being changed only when necessary.
A progress bar above the text indicates how much of the page has been verified.
Advanced Text Editing
The following advanced editing functions are available. They are all accessed though a pop-up menu that appears when an editor does a right button mouse click on a word. (The Apple only has one mouse button, so it simulates right-click with control-click
.)
-
The
Insert Ref. Before
menu item causes two fields to be generated inline whereby you indicate the beginning of a new work or the new section of a work in progress.The field on the left side is for the author and title. Type any keyword from these and a list of matching works will appear in a menu below. Select one of these. (At present, an extensive list of Greek works is provided, and it is not possible to type in a work yourself. It is possible for an administrator to add other works to this list modifying the file at
$LACE_APP/resources/javascript/cts-greek-texts.js
.) This field must always be completed, even in the middle of a work. The field on the right is used to indicate the section of the work, according to your sectioning syntax. For instance, you might use10.2
to indicate "book 10, section 2."When the page is refreshed, this dialog collapses to a book emoji milestone, 📖, in order to conserve space. The stored text title and section can still be seen by mousing over the emoji, as can the URN used by the computer to formally identify this section.
At any point, the dialog or its milestone can be deleted with the 'x' to its right.
-
The
Insert Word After
menu item causes a manually entered word to be added to the line to deal with the instance where the OCR engine omitted a word. This word can be deleted with thex
button following it. It word begins without any letters in it. Type what it should contain and pressReturn
to save this word. Like the others, it will then be highlighted in blue. Manually entered words and lines (see below) have a dashed green outline.For the purpose of alignment with the page zones, the manually added word is positioned just to the right of the word on its left and represented by a narrow vertical strip the height of the enclosing line. Thus in most cases this word will be included in any zoning of this area. (Note that as of version 0.5.9 it is not possible to generate manually a word at the leftmost position of a line.)
-
The
Add Line After
menu item causes a new blank line to be inserted into the document, directly following the current one. This is editable, but its content is not broken into words. PressingReturn
in this line will save its content as expected. This allows one to add content to the page which has been missed by the OCR engine. At any time, the line may be deleted by clicking on thex
at the right hand edge of the page.This word will be assigned a position on the page just below the line above it, and is represented by a narrow horizontal strip the same width of that line. Thus in most cases it will be included in any zoning of this area. (Note that as of version 0.5.9 it is not possible to generate manually a line at the topmost position of a page.)
-
If a word is split in the editor, meaning that it apears in multiple boxes rather than altogether in a single box, you may correct this error by completing the word as it appears in the text in the first editing box and deleting the content of the subsequent box(es). If multiple words are detected as a single word, you may solve this by simply separating them by a
Space
as the computer will detect this as being seprate words. -
The
Verify Following
menu item helps speed up the word verification of a highly accurate text. It verifies and turns blue all the words following the context word until it reaches one that is did not pass spellcheck. This is a very powerful feature and should only be used when those following words have been carefully read.
Using Unicode
Special reminders regarding character use in the editor:- Unicode is a universal set of characters meant to act as a consistent method for encoding plain-text. It standardizes text by assigning every character a universal and unique numeric value and name. This means that unicode creates a unification of characters, making them dynamic to use and simple to convert.
- The use of unicode characters is imperative to creating a convertable, searchable document. When using Greek characters, and especially a Greek keyboard, you must remember that the keys you use may not be the correct symbols which unicode requires. For example, when inserting left or right-angle brackets, you must use that specific symbol (ex. U+2329, Ps.) rather than the
less than
orgreater than
symbols found on your keyboard. Due to the nature of unicode characters formatting bold or italic words is unnecessary. - When a character is unavalible on the standard keyboard, it is likely to be found in the index of Unicode characters which can be found via online search engine.
- Font is irrelivent when using unicode, so if a character must be pasted into the editing environment, although likely apearing on a different coloured background, the character, provided it is unicode, will be detectable to the OCR engine.
- In order to transfer your keyboard to Greek there are various resources avalible online for both Mac and PC users to help with the creation of accents, breathing marks, and iota subscripts.