-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add recognizion iterator functionality #42
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Remove TessApi.cs an move code to TessEngine.cs for clarity - Create NullPointerException and use ThrowHelper pattern - Start implementing ResultIterator.cs
- Finish first implementation of ResultIterator - Improve Exceptions - Add functionality to TessEngine - Add ResultIterable that is IEnumerable ResultIterator wrapper
- change documentation - free string from memory
- Add ResultIterator copy functionality
- Standardize ResultIterator with IEnumerator - _current field - Reset throws not supported - naming changes - - Add TextSpan Deconstruct method so it can be used like tuple ```cs TextSpan span = new("hello", 0.5f, PageIteratorLevel.Word); var (text, confidence) = span; ```
Add comments to examples
- Changes to dllimport - Implement page iterator - Changes to ResultIterator code structure
- Tidy up code: Add ThrowIfAtBeginning and ThrowIfDisposed - Improve IsDisposed tracking - Fix wrong dll imports in PageIteratorApi.cs - Add ParentDependantDisposableObject to track parent dependency object disposal
- last implementation disposed too early if foreach was used twice
- Rename SpanInfo -> SpanLayout - Add TessEngine.IsImageSet and TessEngine.IsRecognized to track these states - Improve code documentation - Move TessPage logic to TessEngine where it belongs
- Implement LayoutTextIterator - change TessEngine formatting
- BlockIterable to iterate over symbols of different pageIteratorLevel - Rename LayoutTextIterator -> TextMetadataIterable - Remove Pix.FromHandle() -> left same ctor - Change visual code structure in Pix
- Fix Block iterable - Implement BlockLevelCollection output parsing
- Add ios dllimports - Move TesseractOcrMaui dllimport to .Imports -folder - Add wrapper class to wrap around every dll import class, so imports are not directly available - Add IAverage interface so user can choose how to calculate their own average from confidences - Bump up ios project version
- in ios project
Fix linker manifest
- add platform class for iOS that routes correct class to dll import
- Remove not needed code - Add ITessDataProvider.GetLanguagesString() to get available languages as + separated list string
- Add BlockLevelCollection.PrintStructureToOutput() method to print out ACSII tree structure - Add examples
13 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement Tesseract Iterators
From issue #41 implement iterators to achieve better structure analysis from recognized text.
Added IEnumerators include
ResultIterator
Iterator used to iterate over text blocks in image. Iterates in one given level at the time for example TextLine or Word.
PageIterator
Iterator used to iterate over text layout on image. Iterates over text Bounding boxes and paragraph layout at one given level. Enables to draw boxes over text on recognized image.
Added IEnumerables include
ResultIterable
IEnumerable impelmentation of ResultIterator.
PageIterable
IEnumerable impelmentation of PageIterator.
TextMetadataIterable
Links PageIterator and ResultIterator to achieve synchronized iteration over text layout and text value.
TextStructureIterable
Links PageIterator and ResultIterator to achieve more thorough text structure analysis. Returns text structure in Tree-like datastructure.
Image example
With configuration of highest level: TextLine and lowestLevel: Symbol, the image below produces tree structure down below.