عکس amirabbasasadi
Free Persian Word Level OCR Dataset
موضوع‌ها
۰
فورک‌ها
۴
ستاره‌ها
۱۴
تاریخ ایجاد
۴ مرداد ۱۳۹۹
آخرین بروزرسانی
حدود ۲ سال قبل

Shotor

Word Level OCR Dataset for Persian Language

Shotor (means camel in Persian) is a free synthetic dataset for Word Level OCR.

Sample Images

The current version contains 120000 grayscale 50*100 images and corresponding words. The words contain only alphabet.
Note: To train a robust model, apply augmentations like scaling, translation, additive noise and ... on the images.
To see an example of using the Shotor dataset see this notebook:
A simple word level OCR for Persian Language using Pytorch and OpenCV

I used these resourses to create word lists:

The images have been generated using multiple fonts:

Created by: Amirabbas Asadi (amir137825@gmail.com)