Guide To Editing with Lace
Colour CodesWhen OCR output is spellchecked, a html attribute is applied to indicate the spellcheck status of the word and what spellchecking strategies were applied to it. In display, these attributes are indicated with different colours. Spellcheck is performed with dictionaries, large lists of known-good words. Here are how the OCR words relate to dictionary words and the corresponding colour codes:
¹²παρελθεῖν: this word has passed spellcheck. I.e., it matches one of the words in the spellcheck dictionary. Notice that there are characters allowed to be before and after the dictionary word, such as the '¹²' here, which do not disrupt the spellchecking.
Κυρίου: this word passed spellcheck when it was transformed to its lowercase form.
(11): this word comprises numbers or punctuation.
κατοικήσομεν: this word has been corrected by substituting one character for another. In this case, a 'α' was replaced with a 'ο'. (The exact substitution is encoded in the html attribute, but not visible to the reader).
εὑρήματα: this word was matched to a dictionary word when a pair of the same letter was replaced with only one instance of that letter. In this case, εὑρήμματα was the OCR output, a word that is not in the dictionary.
besides, -five: in this case, there was no space between the punctuation separating two dictionary words.
τῆς αὐλητρίδος: these words passed spellcheck only when a space was inserted between them. The original OCR output was τῆςαὐλητρίδος.
ὥμοσεν: this word cannot be matched with a dictionary word by any of the strategies.
Additionally, when text has been identified as pertaining to the Apparatus Criticus, it is bordered with vertical blue lines, thus:64 θυραις καθ ημεραν] θυραν Κ* (θυραις κ. η. Bᵃ) 66 αμαρτανοντες εις
εμε 8 | ασεβουσιν] + εις 8SᵃA | om με 8 (hab 2ᵇᵃ)
To edit the OCR output, click on a word. The content of that word is now editable: you can type additional characters or use the backspace key to delete; alternatively you can select and delete a range of characters in the usual manner. When a word is clicked on, a tooltip pops up with the corresponding image range from the OCR'd page. It is usually easier to compare the text with this image than it is to scan the entire page image on the left side of the screen.
Once the editor is assured that the text in the word is what is in the pop-up image, he or she should press the
Enter) key. This action is all that is required to save the edit in the underlying database. The editor will note that the colour of the word changes to light-blue, indicating that the word has been manually verified. The editing cursor now moves to the next word on the page.
In the case where a text is highly accurate, editing will simply entail clicking on the first word, checking that the word text corresponds to the pop-up image and then pressing
Return. The process is repeated again and again, the text being changed only when necessary.
Once the whole page has been verified thus, a 'Download' button appears at the end of the page. It is not necessary to use this: the page is stored in the database in any case.
Advanced EditingThe following advanced editing functions are available:
If an editor verifies a word with the
Controlkey held down while pressing
Return, the edit function is applied to all applicable words on all pages of the text. So, if the word is unchanged and verified, all words that contain that string are similarly verified (sight unseen), and will appear coloured as light-blue, no matter what page they appear on. This is a very powerful function and should be used sparingly, especially at first. However, it may become clear that a word like Ἰσραήλ is very unlikely to be incorrectly identified. In which case, verifying all of these words at once saves time. If a word has been changed, all words in all pages of the text that had the original form of the word will be changed to the edited form. Note that by 'original form' it is mean the very first form outputted. Thus if all words reading ἐπ’ are changed to ἐπ' (with a different final character) and then, using this function, one of those words is used to do a global change to ἐφ', this will not change words that originally were output as ἐπ’.
Controlkey down while also holding
Altkey down while pressing
Returncauses a new work dialog box to be created for the purpose of notes.
Altkey down while pressing
Returncauses the creation of a space to indicate a new section of text.
Shiftdown while pressing
Returncauses a new blank line to be inserted into the document, directly following the current one. This, too, is editable, but its content is not broken into words. Pressing
Returnin this line will save its content as expected. This allows one to add content to the page which has been missed by the OCR engine.
If a word is split in the editor, meaning that it apears in multiple boxes rather than altoghether in a single box, you may correct this error by completing the word as it appears in the text in the first editing box and deleting the content of the subsequent box(es). If multiple words are detected as a single word, you may solve this by simply separating them by a
Spaceas the computer will detect this as being seprate words.
Using UnicodeSpecial reminders regarding character use in the editor:
- Unicode is a universal set of characters meant to act as a consistent method for encoding plain-text. It standardizes text by assigning every character a universal and unique numeric value and name. This means that unicode creates a unification of characters, making them dynamic to use and simple to convert.
- The use of unicode characters is imperative to creating a convertable, searchable document. When using Greek characters, and especially a Greek keyboard, you must remember that the keys you use may not be the correct symbols which unicode requires. For example, when inserting left or right-angle brackets, you must use that specific symbol (ex. U+2329, Ps.) rather than the
greater thansymbols found on your keyboard. Due to the nature of unicode characters formatting bold or italic words is unnecessary.
- When a character is unavalible on the standard keyboard, it is likely to be found in the index of Unicode characters which can be found via online search engine.
- Font is irrelivent when using unicode, so if a character must be pasted into the editing environment, although likely apearing on a different coloured background, the character, provided it is unicode, will be detectable to the OCR engine.
- In order to transfer your keyboard to Greek there are various resources avalible online for both Mac and PC users to help with the creation of accents, breathing marks, and iota subscripts. Some programs make it possible to create on on-screen keyboard, which is controlled by
Mouse. Others may prefer to use their own keyboard in Greek Polytonic form.