Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/20660
|
Title: | Improving Bangla ocr output through correction algorithms |
Authors: | Ahmed, MD Sajib Gonçalves, Teresa Sarwar, Hasan |
Issue Date: | 2016 |
Publisher: | IEEE Xplore |
Citation: | D Sajib Ahmed, Teresa Gonçalves, and Hasan Sarwar. Improving Bangla ocr output through correction algorithms. In SKIMA’2016 – 10th International Conference on Sofware, Knowledge, Information Management and Applications, Chengdu, CN, December 2016. IEEE Xplore. |
Abstract: | Bangla OCR (Optical Character Recognition) is a long deserving software for Bengali community all over the world. Numerous e efforts suggest that due to the inherent complex nature of Bangla alphabet and its word formation process development of high fidelity OCR producing a reasonably acceptable output still remains a challenge. One possible way of improvement is by using post processing of OCR’s output; algorithms such as Edit Distance and the use of n-grams statistical information have been used to rectify misspelled words in language processing. This work presents the first known approach to use these algorithms to replace misrecognized words produced by Bangla OCR. The assessment is made on a set of fifty documents written in Bangla script and uses a dictionary of 541,167 words. The proposed correction model can correct several words lowering the recognition error rate by 2.87% and 3.18% for the character based n- gram and edit distance algorithms respectively. The developed system suggests a list of 5 (five) alternatives for a misspelled word. It is found that in 33.82% cases, the correct word is the topmost suggestion of 5 words list for n-gram algorithm while using Edit distance algorithm the first word in the suggestion properly matches 36.31% of the cases. This work will ignite rooms of thoughts for possible improvements in character recognition endeavour. |
URI: | http://hdl.handle.net/10174/20660 |
Type: | article |
Appears in Collections: | INF - Artigos em Livros de Actas/Proceedings
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|