LOGIN
>> Home
>> Topics
>> Students
>> Partners
>> Statistics
Information for topics
Topic Id:
ID topic:
581
Partner Email:
dtdim@iinf.bas.bg
Project Title:
OCR System Supporting the Study of Ancient (e.g. Bulgarian) Texts
Abstract:
The philological investigations of the rich cultural inheritance contained in ancient Bulgarian texts (up to the time of Paisii, XVII century), particularly need a specialized computer system of the type of OCR (Optical Character Recognition), which will aid their study. OCR technologies have been significantly improved lately, moreover, OCR systems designed exactly for Cyrillic and Latin alphabets (for example the Russian ABBYY) are considered to be the best on the software market. But in the case discussed, the problem goes out of the frames of classic re-learning of an OCR system for new characters (contained in Glagolitic alphabet), the least problem being that this might prove economically inefficient. The Ancient Bulgarian writing consists usually of isolated letters, which facilitates their recognition. But these letters are usually found in manuscripts. Though for a given author and/or master-calligrapher, one and the same manner of writing is usually demonstrated, as a general rule the style alters from one page to another, or even more often. In addition, too many modifiers are attached to the letters - marks, accents, aspirations, etc. Quite often the manner of writing of separate letters and/or words, such as captions, requires and attracts the separate interest of art researchers. The attention is drawn towards the graphology dictionaries of the separate letters images, as well as towards the semantic glossaries of the words composed by them. The latter are in the process of creation and study, as far as available on paper, but it is believed that they will be transferred to an electronic carrier in the near future. Hence, one important shortcoming of OCR technologies the final verification with the help of glossaries with samples cannot be directly applied. The problem described is considered not limited to ancient Bulgarian text only. Purpose of the diploma work offered: - To make comparative analysis (by literature references) of the known OCR approaches/methods/technologies, in order to back with arguments the necessity for a specific approach towards automatic processing of ancient (e.g. Bulgarian) texts for the purpose of their scientific study; - To develop and investigate an OCR classifier for ancient alphabet on the basis of neural networks; - To propose and experiment a computer environment, possessing the features of an OCR system, supporting the investigations of ancient texts. The environment will include the following options: - Segmenting of letters corresponding to images of ancient texts and their collection in a dictionary of letters; - Tools for structuring the experiment of learning the neural network for recognition of letters, being currently investigated; - Support of a glossary of the letters recognized and a dictionary of the words recognized. The content accumulated in the dictionary should be used for verification of the functionality of the recognizing neural network. It is a benefit if the post-graduate shows specific interest towards software experiments in “С/C++” language.
Advisor:
Assoc. Prof. Dr Dimo T. Dimov
Link:
Degree:
Bachelor
Keywords:
Artificial intelligence & Neural networks
Image processing