Robustness Testing of TrOCR for Multi-Condition Food Ingredient Labels Detected By YOLO

Charina Mutiara Chairunnisa, Nyayu Latifah Husni, RD. Kusumanto

Abstract


This study aimed to develop an automatic text extraction system for ingredient labels by integrating YOLOv8 for object detection and a Transformer-based Optical Character Recognition (OCR) for text recognition. YOLOv8 was trained to detect and crop the label area in the image, while TrOCR was used to extract text from the cropped bounding box. The evaluation involved 16 sample image inputs under various conditions, including background color (Monochrome and RGB), languages (Bahasa Indonesia and English), and text formatting (single-line and multi-line). The results indicated that TrOCR performed well in single-line format, but struggled with multi-line format and longer text, even omitting words. Character and word error rates reached up to 100% for this complex layout. 


Keywords


Ingredient Label; Optical Character Recognition; Text Extraction; Transformer; YOLOv8

Full Text:

PDF

References


M. Li et al., “TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models,” 2023. [Online]. Available: www.aaai.org

C. Gunasekara, Z. Hamel, F. Du, and C. Baillie, “TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition,” in International Conference on Pattern Recognition Applications and Methods, Science and Technology Publications, Lda, 2025, pp. 151–158. doi: 10.5220/0013340100003905.

M. Fujitake, “DTrOCR: Decoder-only Transformer for Optical Character Recognition.”

L. Beerens and D. J. Higham, “Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.17128

P. B. Ströbel, S. Clematide, M. Volk, and T. Hodel, “Transformer-based HTR for Historical Documents,” Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.11008

W. Khallouli, M. S. Uddin, A. Sousa-Poza, J. Li, and S. Kovacic, “Leveraging Transformer-Based OCR Model with Generative Data Augmentation for Engineering Document Recognition †,” Electronics (Switzerland), vol. 14, no. 1, Jan. 2025, doi: 10.3390/electronics14010005.

A. Mortadi et al., “ALNASIKH: An Arabic OCR System Based on Transformers,” in 3rd International Mobile, Intelligent, and Ubiquitous Computing Conference, MIUCC 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 74–81. doi: 10.1109/MIUCC58832.2023.10278370.

H. Zhang, E. Whittaker, and I. Kitagishi, “Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images.”

Y.-H. Chen and P. B. Ströbel, “TrOCR Meets Language Models: An End-to-End Post-correction Approach,” 2024, pp. 12–26. doi: 10.1007/978-3-031-70645-5_2.

F. Lauar and V. Laurent, “Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation,” Jul. 2024, [Online]. Available: http://arxiv.org/abs/2407.06950

R. L. Zhang, “A Comprehensive Evaluation of TrOCR with Varying Image Effects,” 2024.

R. Ahmed, N. Shabbir, M. W. Raza, A. Zeb, and H. Elahi, “Evaluation of Model Degradation in PaddleOCR, UltOCR, and TrOCR Across Baseline and TensorFlow Lite Environments,” in 6th International Conference on Robotics and Automation in Industry, ICRAI 2024, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/ICRAI62391.2024.10894257.

S. Chandrasekaran, K. I. Ramachandran, S. Adarsh, and B. B. Nair, “Graphical Abstract Transformers for Dashcam: Extraction of Timestamps and GPS.” [Online]. Available: https://ssrn.com/abstract=4975821

S. Tarannum, M. S. Jalal, and M. N. Huda, “HALALCheck: A Multi-Faceted Approach for Intelligent Halal Packaged Food Recognition and Analysis,” IEEE Access, vol. 12, pp. 28462–28474, 2024, doi: 10.1109/ACCESS.2024.3367983.

R. Farokhynia and M. Krikeb, “Simultaneous Detection and Validation of Multiple Ingredients on Product Packages: An Automated Approach,” 2024.

M. Hussain, “YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision,” Jul. 2024, [Online]. Available: http://arxiv.org/abs/2407.02988

M. Sohan, T. Sai Ram, and Ch. V. Rami Reddy, “A Review on YOLOv8 and Its Advancements,” 2024, pp. 529–545. doi: 10.1007/978-981-99-7962-2_39.

R. Varghese and M. Sambath, “YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness,” in 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems, ADICS 2024, Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/ADICS58448.2024.10533619.

D. Chang and Y. Li, “DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer,” Apr. 2024, [Online]. Available: http://arxiv.org/abs/2404.12734

C. P. Vossos, “Handwritten Optical Character Recognition,” 2024.

T. B. Pun, A. Neupane, R. Koech, and K. Walsh, “Detection and counting of root-knot nematodes using YOLO models with mosaic augmentation,” Biosens Bioelectron X, vol. 15, no. September, p. 100407, 2023, doi: 10.1016/j.biosx.2023.100407.




DOI: http://dx.doi.org/10.24014/ijaidm.v8i3.37301

Refbacks

  • There are currently no refbacks.


Office and Secretariat:

Big Data Research Centre
Puzzle Research Data Technology (Predatech)
Laboratory Building 1st Floor of Faculty of Science and Technology
UIN Sultan Syarif Kasim Riau

Jl. HR. Soebrantas KM. 18.5 No. 155 Pekanbaru Riau – 28293
Website: http://predatech.uin-suska.ac.id/ijaidm
Email: ijaidm@uin-suska.ac.id
e-Journal: http://ejournal.uin-suska.ac.id/index.php/ijaidm
Phone: 085275359942

Click Here for Information


Journal Indexing:

Google Scholar | ROAD | PKP Index | BASE | ESJI | General Impact Factor | Garuda | Moraref | One Search | Cite Factor | Crossref | WorldCat | Neliti  | SINTA | Dimensions | ICI Index Copernicus 

IJAIDM Stats