This is one of many editing pages for the OCR run you selected of this volume. Beside the volume title, a drop down menu gives you three bands of information about the run:
- Run Info: text describing the run.
- Run Editing View: A colored band representing the sequence of pages in this volume. Black sections are unedited, blue ones are completely edited, the spectrum of colors from red to blue represent the stages in between. Pages for which there is no output is colored gray. This view is generated dynamically, and represents the current state of the volume. A caret shows this page's position in the volume.
- Run Accuracy View: Another colored band representing the sequence of pages in this volume, but in this case the colours show how accurate the OCR output in each page is, with accuracy measured by dictionary words per total words.
Mousing over either of the last two views will cause the page number corresponding to the section of the band to appear, and clicking on that part of the band will navigate to a new editing page.
This menu also provides the following downloads:
- Training Set File: This is a single page tab-separated data table representing the image file name, bounding box and corrected text of all lines in this run, all of whose words have been validated by a user. From this a python script generates the line image and text pairs used to retrain an OCR engine like Ocropus or Kraken.
- XAR Backup of Editing: A complete collection of all data, including editing and zoning rectangles, useful for installing in this or another instance of Lace. This is one option for backing up your data.
- Plain Text Zip File: A zipped archive of all the texts converted to plain text. The corrected text is formatted according to the original OCR text, without the zoning information being applied.
- Training Set Images: This is a zipped archive comprising similarly-named pairs of line images and text files used to train OCR engines like Ocropus or Kraken. Thus unlike the data above, no intermediary program needs to be run to use this training data. However, these images are set at the binarization level of the collection installed in Lace; whereas the tab-separated table can be used to work with the original colour pages if necessary.
- Precheck and Download TEI File: This does a last sanity check on the TEI output, then presents a page for final information before TEI download.
Below this is a pagination bar with which you can navigate through the text's editing pages. The numbers do not necessarily indicate the printed page number or the number in the image file's name. Rather, they are the ordinal value of this image in the collection.
There is a separate page explaining how to zone the image and edit the text.