Skip to content

Commit

Permalink
add paper preprint & fix dates
Browse files Browse the repository at this point in the history
  • Loading branch information
leloykun committed May 12, 2024
1 parent 2c0f2a0 commit 2594dfb
Show file tree
Hide file tree
Showing 91 changed files with 2,126 additions and 282 deletions.
9 changes: 6 additions & 3 deletions config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,15 @@ taxonomies:

menu:
main:
- name: Services
url: services/
weight: 1
- name: Papers
url: papers/
weight: 1
weight: 2
- name: Personal Projects
url: personal-projects/
weight: 2
weight: 3
# - name: Courses
# url: courses/
# weight: 2
Expand All @@ -31,7 +34,7 @@ params:
description: "Mathematician | Machine Learning (AI) Research Scientist"
author: Franz Louis Cesista
# googleAnalyticsID: "G-XXXXX"
DateFormat: "January 2024"
DateFormat: "January 2, 2006"
defaultTheme: light
hideFooter: false
disableThemeToggle: true
Expand Down
26 changes: 19 additions & 7 deletions content/papers/rasg/index.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
---
title: "[Under Review]"
# date: 2012-06-01
# tags: ["Machine Learning", "Retrieval Augmented Generation", "Structured Generation", "Structured Prompting", "Supervised Finetuning", "Document Information Extraction"]
title: "Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use [Under Review]"
date: 2024-04-15
tags: ["Machine Learning", "Retrieval Augmented Generation", "Structured Generation", "Structured Prompting", "Supervised Finetuning", "Document Information Extraction"]
author: "Franz Louis Cesista"
description: "This paper is still under review. It describes a SOTA method for Document Information Extraction tasks (i.e. Key-Information Extraction & Line Items Recognition). The method can augment open-source LLMs by up to 1473% and commercial LLMs by up to 304% on public benchmarks and beating strong, finetuned multi-modal baselines."
summary: "This paper is still under review. It describes a SOTA method for Document Information Extraction tasks (i.e. Key-Information Extraction & Line Items Recognition). The method can augment open-source LLMs by up to 1473% and commercial LLMs by up to 304% on public benchmarks and beating strong, finetuned multi-modal baselines."
description: "Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). And subtasks such as Optical Character Recognition (OCR) and Table Structure Recognition (TSR) are means to these ends. In this paper, we argue that BDIE is best modeled as a \textit{Tool Use} problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks.
The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multi-Modal Models (LMMMs) without RASG such as LayoutLMv3 and Roberta + DeTR on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes - that is, pairs of (x, y) coordinates containing relevant text of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE."
summary: "Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). And subtasks such as Optical Character Recognition (OCR) and Table Structure Recognition (TSR) are means to these ends. In this paper, we argue that BDIE is best modeled as a \textit{Tool Use} problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks.
The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multi-Modal Models (LMMMs) without RASG such as LayoutLMv3 and Roberta + DeTR on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes - that is, pairs of (x, y) coordinates containing relevant text of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE."
---

This paper is still under review. It describes a SOTA method for Document Information Extraction tasks (i.e. Key-Information Extraction & Line Items Recognition). The method can augment open-source LLMs by up to 1473% and commercial LLMs by up to 304% on public benchmarks and beating strong, finetuned *multi-modal* baselines.
Download: [Paper](/RASG-ieee-mipr.pdf)

Authors: [Franz Louis Cesista](mailto:franzlouiscesista@gmail.com), [Rui Aguiar](mailto:rui@expedock.com), [Jason Kim](mailto:jasonminsookim@gmail.com), [Paolo Acilo](mailto:paolo@expedock.com)

---

## Abstract

Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). And subtasks such as Optical Character Recognition (OCR) and Table Structure Recognition (TSR) are means to these ends. In this paper, we argue that BDIE is best modeled as a \textit{Tool Use} problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks.

Please contact me for a copy of the paper.
The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multi-Modal Models (LMMMs) without RASG such as LayoutLMv3 and Roberta + DeTR on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes - that is, pairs of (x, y) coordinates containing relevant text of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE.
2 changes: 1 addition & 1 deletion content/personal-projects/codeball/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Codeball 2018"
# date: 2023-07-25
date: 2019-01-24
tags: ["Artificial Intelligence", "Rule-Based AI", "Game AI", "Python", "3D Physics Simulation"]
author: "Franz Louis Cesista"
description: "My entry for the World Finals of the Russian AI Cup 2018 - Codeball. A 3D physics-aware orchestrator of a pair of bots in a Rocket League-esque soccer game."
Expand Down
2 changes: 1 addition & 1 deletion content/personal-projects/codewars/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Codewars 2017"
# date: 2023-07-25
date: 2018-02-12
tags: ["Artificial Intelligence", "Particle Swarm AI", "Game AI", "Python", "K-Means Clustering", "BFS", "Potential Flows", "Fluid Dynamics"]
author: "Franz Louis Cesista"
description: "My entry for the World Finals of the Russian AI Cup 2017 - Codewars. A particle swarm-based AI that uses potential flows and fluid mechanics to direct units in a Command-and-Conquer-esque game."
Expand Down
2 changes: 1 addition & 1 deletion content/personal-projects/expedock-assistant/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Expedock Assistant: ChatGPT Applied to Logistics Data"
# date: 2023-07-25
date: 2023-01-31
tags: ["Machine Learning", "Tool Use", "AI Agent", "Logistics"]
author: "Franz Louis Cesista"
description: "Expedock Assistant is a chatbot that allows you to ask questions about your shipments and get answers in real time. It’s like having a personal assistant that knows everything about your business, shipments and industry."
Expand Down
2 changes: 1 addition & 1 deletion content/personal-projects/expedock-automl/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Expedock AutoML"
# date: 2023-07-25
date: 2022-07-25
tags: ["Machine Learning", "ML Interpretability"]
author: "Franz Louis Cesista"
description: "Expedock's AutoML Library -- fit a model, run batch inference, and get explanations in one line of code each."
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: "Flash Hyperbolic Attention Minimal [WIP]"
# no date until finished
# date: 2024-04-16
tags: ["Machine Learning", "C++", "CUDA", "PyTorch", "Non-Euclidean Geometry", "Flash Attention", "Hyperbolic Geometry"]
author: "Franz Louis Cesista"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Booking Demand Prediction for Grab SEA"
# date: 2023-07-25
date: 2019-06-16
tags: ["Machine Learning", "Spatio-Temporal Forecasting", "Anomaly Detection", "Econometrics"]
author: "Franz Louis Cesista"
description: "Booking demand prediction for Grab's Southeast Asia operations. The project involves spatio-temporal forecasting, anomaly detection, and econometric modeling."
Expand Down
2 changes: 1 addition & 1 deletion content/personal-projects/llama.cpp/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Llama.cpp"
# date: 2023-07-25
date: 2023-07-25
tags: ["Machine Learning", "C++"]
author: "Franz Louis Cesista"
description: "A C++ implementation of Meta's Llama2 generative large-language model. I also optimized the original C implementation by Karpathy by adding parallelization on
Expand Down
8 changes: 8 additions & 0 deletions deprecated-content/services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "Applied AI Consulting Services"
# date: 2023-07-25
author: "Franz Louis Cesista"
---

- been in the trenches
- can help you setup your AI pipeline from scratch
5 changes: 5 additions & 0 deletions public/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,11 @@
</div>
</div>
<ul id="menu">
<li>
<a href="https://leloykun.github.io/services/" title="Services">
<span>Services</span>
</a>
</li>
<li>
<a href="https://leloykun.github.io/papers/" title="Papers">
<span>Papers</span>
Expand Down
Binary file added public/RASG-ieee-mipr.pdf
Binary file not shown.
102 changes: 102 additions & 0 deletions public/archive/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,11 @@
</div>
</div>
<ul id="menu">
<li>
<a href="https://leloykun.github.io/services/" title="Services">
<span>Services</span>
</a>
</li>
<li>
<a href="https://leloykun.github.io/papers/" title="Papers">
<span>Papers</span>
Expand Down Expand Up @@ -163,6 +168,103 @@

<header class="page-header">
</header>
<div class="archive-year">
<h2 class="archive-year-header">2024
</h2>
<div class="archive-month">
<h3 class="archive-month-header">April
</h3>
<div class="archive-posts">
<div class="archive-entry">
<h3 class="archive-entry-title">Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use [Under Review]
</h3>
<a class="entry-link" aria-label="post link to Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use [Under Review]" href="https://leloykun.github.io/papers/rasg/"></a>
</div>
</div>
</div>
</div>
<div class="archive-year">
<h2 class="archive-year-header">2023
</h2>
<div class="archive-month">
<h3 class="archive-month-header">July
</h3>
<div class="archive-posts">
<div class="archive-entry">
<h3 class="archive-entry-title">Llama.cpp
</h3>
<a class="entry-link" aria-label="post link to Llama.cpp" href="https://leloykun.github.io/personal-projects/llama.cpp/"></a>
</div>
</div>
</div>
<div class="archive-month">
<h3 class="archive-month-header">January
</h3>
<div class="archive-posts">
<div class="archive-entry">
<h3 class="archive-entry-title">Expedock Assistant: ChatGPT Applied to Logistics Data
</h3>
<a class="entry-link" aria-label="post link to Expedock Assistant: ChatGPT Applied to Logistics Data" href="https://leloykun.github.io/personal-projects/expedock-assistant/"></a>
</div>
</div>
</div>
</div>
<div class="archive-year">
<h2 class="archive-year-header">2022
</h2>
<div class="archive-month">
<h3 class="archive-month-header">July
</h3>
<div class="archive-posts">
<div class="archive-entry">
<h3 class="archive-entry-title">Expedock AutoML
</h3>
<a class="entry-link" aria-label="post link to Expedock AutoML" href="https://leloykun.github.io/personal-projects/expedock-automl/"></a>
</div>
</div>
</div>
</div>
<div class="archive-year">
<h2 class="archive-year-header">2019
</h2>
<div class="archive-month">
<h3 class="archive-month-header">June
</h3>
<div class="archive-posts">
<div class="archive-entry">
<h3 class="archive-entry-title">Booking Demand Prediction for Grab SEA
</h3>
<a class="entry-link" aria-label="post link to Booking Demand Prediction for Grab SEA" href="https://leloykun.github.io/personal-projects/grab-booking-demand-prediction/"></a>
</div>
</div>
</div>
<div class="archive-month">
<h3 class="archive-month-header">January
</h3>
<div class="archive-posts">
<div class="archive-entry">
<h3 class="archive-entry-title">Codeball 2018
</h3>
<a class="entry-link" aria-label="post link to Codeball 2018" href="https://leloykun.github.io/personal-projects/codeball/"></a>
</div>
</div>
</div>
</div>
<div class="archive-year">
<h2 class="archive-year-header">2018
</h2>
<div class="archive-month">
<h3 class="archive-month-header">February
</h3>
<div class="archive-posts">
<div class="archive-entry">
<h3 class="archive-entry-title">Codewars 2017
</h3>
<a class="entry-link" aria-label="post link to Codewars 2017" href="https://leloykun.github.io/personal-projects/codewars/"></a>
</div>
</div>
</div>
</div>
</main>

<footer class="footer">
Expand Down
Loading

0 comments on commit 2594dfb

Please sign in to comment.