Arabic script is one of the most sophisticated and difficult scripts. It uses different shapes of characters with
complex diacritical marks that are difficult to distinguish from characters’ dots. This script’s distinctive
features make the Optical character recognition (OCR) procedure more difficult and cause low-accuracy
recognition. Different studies have aimed to introduce high-accuracy Arabic OCR in the literature. How-
ever, enhancing the accuracy of reading the words has been an open issue that depends on the used dataset
and the developed recognition model. Besides, considering diacritics has been limited and not sufficiently
addressed. Experimental tests on words with diacritics in prior models have shown bad accuracy that does
not exceed 60%. Consequently, this work aims to introduce a new, accurate deep-learning model for Ara-
bic OCR that considers words with and without diacritical marks. It utilizes a dual encoder transformer
(DTrOCR), a deep-learning architecture that has demonstrated superior performance in identification
and classification tasks. The proposed DTrOCR creates multi-batch sizes. It has been trained using a com-
prehensive, generated Arabic word-based dataset named MFSRHRD and tested on unseen datasets. The
accuracy of configuring Arabic words without diacritics reaches 98.5%. However, for words with diacritics,
it achieved an accuracy of 89.9%.
Key words: Arabic OCR; Multi-Batch Size; Transformer; Dual Encoder Transformer; Decoder; Feature extraction; Self Attention Mechanism.
|