Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/20660

Title: Improving Bangla ocr output through correction algorithms
Authors: Ahmed, MD Sajib
Gonçalves, Teresa
Sarwar, Hasan
Issue Date: 2016
Publisher: IEEE Xplore
Citation: D Sajib Ahmed, Teresa Gonçalves, and Hasan Sarwar. Improving Bangla ocr output through correction algorithms. In SKIMA’2016 – 10th International Conference on Sofware, Knowledge, Information Management and Applications, Chengdu, CN, December 2016. IEEE Xplore.
Abstract: Bangla OCR (Optical Character Recognition) is a long deserving software for Bengali community all over the world. Numerous e efforts suggest that due to the inherent complex nature of Bangla alphabet and its word formation process development of high fidelity OCR producing a reasonably acceptable output still remains a challenge. One possible way of improvement is by using post processing of OCR’s output; algorithms such as Edit Distance and the use of n-grams statistical information have been used to rectify misspelled words in language processing. This work presents the first known approach to use these algorithms to replace misrecognized words produced by Bangla OCR. The assessment is made on a set of fifty documents written in Bangla script and uses a dictionary of 541,167 words. The proposed correction model can correct several words lowering the recognition error rate by 2.87% and 3.18% for the character based n- gram and edit distance algorithms respectively. The developed system suggests a list of 5 (five) alternatives for a misspelled word. It is found that in 33.82% cases, the correct word is the topmost suggestion of 5 words list for n-gram algorithm while using Edit distance algorithm the first word in the suggestion properly matches 36.31% of the cases. This work will ignite rooms of thoughts for possible improvements in character recognition endeavour.
URI: http://hdl.handle.net/10174/20660
Type: article
Appears in Collections:INF - Artigos em Livros de Actas/Proceedings

Files in This Item:

File Description SizeFormat
2016ahmed-improving.pdf549.72 kBAdobe PDFView/OpenRestrict Access. You can Request a copy!
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois