The goals of our project:
Initial experiments with full and even formatted text showed that this
goal was as yet impractical. There are many relatively small files
with OCR-red versions of (reasonably) formatted texts, but the
quantities really were too small to use ML techniques, and rulebased
solutions were expensive and not very succesful.
- To collect data relevant for images from context.
- Concentrate on full text, as most data was in one way or
another contained in the reports and papers in the library of the
institute. Here we had enough material to apply ML techniques.
- Proceed incrementally: do not try to offer an
all-encompassing solution, but solve each problem on its own
- To offer the archeologist immediate solutions for retrieval
in digitalized archeological texts.