An Enhanced Word Level Arabic Ocr Based on Dual Encoder Transformer  Architecture

Khulood Gaashan; Maram Bani Younes

doi:10.5455/jjcit.71-1746709575

JJCIT. 2025; 11(4): 418-431

An Enhanced Word Level Arabic Ocr Based on Dual Encoder Transformer Architecture

Khulood Gaashan, Maram Bani Younes.

Abstract	Download PDF		Post
Arabic script is one of the most sophisticated and difficult scripts. It uses different shapes of characters with complex diacritical marks that are difficult to distinguish from characters’ dots. This script’s distinctive features make the Optical character recognition (OCR) procedure more difficult and cause low-accuracy recognition. Different studies have aimed to introduce high-accuracy Arabic OCR in the literature. How- ever, enhancing the accuracy of reading the words has been an open issue that depends on the used dataset and the developed recognition model. Besides, considering diacritics has been limited and not sufficiently addressed. Experimental tests on words with diacritics in prior models have shown bad accuracy that does not exceed 60%. Consequently, this work aims to introduce a new, accurate deep-learning model for Ara- bic OCR that considers words with and without diacritical marks. It utilizes a dual encoder transformer (DTrOCR), a deep-learning architecture that has demonstrated superior performance in identification and classification tasks. The proposed DTrOCR creates multi-batch sizes. It has been trained using a com- prehensive, generated Arabic word-based dataset named MFSRHRD and tested on unseen datasets. The accuracy of configuring Arabic words without diacritics reaches 98.5%. However, for words with diacritics, it achieved an accuracy of 89.9%. Key words: Arabic OCR; Multi-Batch Size; Transformer; Dual Encoder Transformer; Decoder; Feature extraction; Self Attention Mechanism.

An Enhanced Word Level Arabic Ocr Based on Dual Encoder Transformer Architecture

Abstract