Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset and vocab txt file used not given #1

Open
rishT99 opened this issue Feb 12, 2024 · 3 comments
Open

dataset and vocab txt file used not given #1

rishT99 opened this issue Feb 12, 2024 · 3 comments

Comments

@rishT99
Copy link

rishT99 commented Feb 12, 2024

hey actually i find your project great but i am confused which dataset and all those train.txt and vocab.txt is not available in this repo without that i can't re run or check what you have used can you share those files please?

@surajTade
Copy link
Owner

  1. The train, val, test files contain the relative path to the image and it's corresponding label, seperated by a space.

  2. The hindi_vocab file contains the mapping b/w all the unique words present in the dataset and a vocab ID.

  3. The lexicon file contains the lexicon that was used while testing the test set.

  4. Inside the HindiSeg folder there should be train, val and test folders. Inside each of these 3 folders there are folders which specify the unique writer ID.

  5. Inside each of the unique writer ID folder, there are folders going from 1,2,3 ... x, where x is the number of pages that author wrote.

  6. Inside each of these folders there is a text file, which tells what the vocab ID for image "n.jpg" on the line number "n".

  7. From the hindi_vocab file we can get the actual label corr. to that word.

@rishT99
Copy link
Author

rishT99 commented Feb 13, 2024

ok great i understand,
I'm struggling to find a good dataset, and your Jupyter notebook looks spot-on! Any chance you could share the image dataset used in your project? The ZIP file path mentioned in your notebook drive/MyDrive/hindi_project/hindi-words-dataset.zip. Excited to explore and experiment with it!

@Ujwal6702
Copy link

Is it possible to get the dataset for the model. It'll be highly helpful.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants