Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR: Add IAM corpus with unk decoding support #6

Merged
merged 1 commit into from
Nov 20, 2017
Merged

Conversation

aarora8
Copy link

@aarora8 aarora8 commented Nov 15, 2017

No description provided.

@hhadian
Copy link
Owner

hhadian commented Nov 16, 2017

Ashish, could you please rebase against the ocr branch?
It's showing all the files that are already there. Also it has conflicts.
Please update the headers for the new recipes you add.


num_targets=$(tree-info $tree_dir/tree | grep num-pdfs | awk '{print $2}')
learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
common1="required-time-offsets= height-offsets=-2,-1,0,1,2 num-filters-out=36"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove required-time-offsets= altogether

@aarora8 aarora8 force-pushed the iam branch 2 times, most recently from 6a93702 to f8eb4fd Compare November 16, 2017 18:39
@aarora8
Copy link
Author

aarora8 commented Nov 16, 2017

Thanks, rebased it against ocr branch. updated headers in the new recipes.

Copy link
Owner

@hhadian hhadian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes about the headers

@@ -29,8 +29,8 @@ alignment_subsampling_factor=1
chunk_width=340,300,200,100
Copy link
Owner

@hhadian hhadian Nov 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the results for this recipe if it's not already updated

@@ -33,8 +33,8 @@ alignment_subsampling_factor=1
chunk_width=340,300,200,100
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for this recipe.
Change the description to "chainali_1a is as 1a except it uses chain alignments (using 1a system) instead of gmm alignments" and then append the output (and the command itself) of compare_wer.sh for 1a and chainali_1a (after 1 blank line)

@@ -0,0 +1,235 @@
#!/bin/bash

# chainali_1b uses chain model for lattice instead of gmm-hmm model. It has more cnn layers as compared to 1a
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to "chainali_1b is as chainali_1a except it has 3 more cnn layers."
Then append the compare_wer.sh output (with the command) after adding a blank line

@@ -0,0 +1,226 @@
#!/bin/bash
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this and 1d for now. The improvements are not significant.

@aarora8
Copy link
Author

aarora8 commented Nov 20, 2017

sorry. updated headers for run_cnn_1a.sh, run_cnn_chainali_1a.sh, run_cnn_chainali_1b.sh. removed run_cnn_chainali_1c.sh , run_cnn_chainali_1d.sh.

@hhadian
Copy link
Owner

hhadian commented Nov 20, 2017

Thanks. Merging...

@hhadian hhadian merged commit aa7c19a into hhadian:ocr Nov 20, 2017
hhadian added a commit that referenced this pull request Jan 4, 2018
* OCR: Add IAM corpus with unk decoding support (#3)

* Add a new English OCR database 'UW3'

* Some minor fixes re IAM corpus

* Fix an issue in IAM chain recipes + add a new recipe (#6)

* Some fixes based on the pull request review

* Various fixes + cleaning on IAM

* Fix LM estimation and add extended dictionary + other minor fixes

* Add README for IAM

* Add output filter for scoring

* Fix a bug RE switch to pyhton3

* Add updated results + minor fixes

* Remove unk decoding -- gives almost no gain

* Add UW3 OCR database

* Fix cmd.sh in IAM + fix usages of train/decode_cmd in chain recipes

* Various minor fixes on UW3

* Rename iam/s5 to iam/v1

* Add README file for UW3

* Various cosmetic fixes on UW3 scripts

* Minor fixes in IAM
hhadian added a commit that referenced this pull request Feb 22, 2018
* OCR: Add IAM corpus with unk decoding support (#3)

* Add a new English OCR database 'UW3'

* Some minor fixes re IAM corpus

* Fix an issue in IAM chain recipes + add a new recipe (#6)

* Some fixes based on the pull request review

* Various fixes + cleaning on IAM

* Fix LM estimation and add extended dictionary + other minor fixes

* Add README for IAM

* Add output filter for scoring

* Fix a bug RE switch to pyhton3

* Add updated results + minor fixes

* Remove unk decoding -- gives almost no gain

* Add UW3 OCR database

* Fix cmd.sh in IAM + fix usages of train/decode_cmd in chain recipes

* Various minor fixes on UW3

* Rename iam/s5 to iam/v1

* Add README file for UW3

* Various cosmetic fixes on UW3 scripts

* Minor fixes in IAM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants