Skip to content

Commit

Permalink
add additional bad examples
Browse files Browse the repository at this point in the history
  • Loading branch information
johnathanchiu committed Oct 7, 2024
1 parent ed620b8 commit c157dd5
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,8 @@ img.show()

## Examples

<p>
<img src="https://github.com/johnathanchiu/recursive-segmentation/blob/main/examples/outputs/apple_output.jpg" alt="Image 1" width="400"/>
<img src="https://github.com/johnathanchiu/recursive-segmentation/blob/main/examples/outputs/dell_output.jpg" alt="Image 2" width="400"/>
</p>

See `main.py` or `ex.ipynb` for examples on how to draw the images.

Expand All @@ -50,12 +48,18 @@ pip install -r requirements.txt

This algorithm works particularly well with documents that have a lot of diagrams and that are well spaced. It performs poorly on documents that are purely text-based (but there is usually no need to segment documents that are completely text-based just throw it into RAG directly). It could be interesting to detect situations like this and skip the segmentation step entirely for these sorts of pages.

At the moment, I am looking to build out an ML model to determine when to split chunks in the page. The main principle would be to train a seq2seq model that outputs a binary sequence. The sequence input is the slices of the image and the output is a binary sequence where a 1 represents a split in the image and 0 otherwise.
At the moment, I am looking to build out an ML model to determine when to split chunks in the page. The main principle would be to train a seq2seq model that outputs a binary sequence. The sequence input is the slices of the image and the output is a binary sequence where a 1 represents a split in the image and 0 otherwise. Basic training code setup can be found on my other [branch](https://github.com/johnathanchiu/recursive-segmentation/tree/jchiu/model-training-code/model).

### Limitations

Like any bounding box segmentation algorithm, the main limitation is the shape of the segmentation. Edge cases arise when the input image is not necessarily framed in a grid-shape. Take an example where an image contains "L" shaped objects. This makes it impossible to segment out the "L" shaped object defined by a bounding box. If anyone has any ideas on how to improve this, please feel free to suggest!

For largely text-based PDFs, the results can look like this.

<img src="https://github.com/johnathanchiu/recursive-segmentation/blob/main/examples/outputs/somato_output.jpg" alt="Image 3"/>

I'm still looking for a solution so feel free to suggest any if you have ideas.

## Contributing

Feel free to contribute to this repository through Pull Requests and Issues. Reach out to me if you have any ideas surrounding this that you want to discuss!
Binary file added examples/outputs/somato_output.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit c157dd5

Please sign in to comment.