add paper preprint & fix dates

leloykun · May 12, 2024 · 2594dfb · 2594dfb
1 parent 2c0f2a0
commit 2594dfb
Show file tree

Hide file tree

Showing 91 changed files with 2,126 additions and 282 deletions.
diff --git a/config.yml b/config.yml
@@ -11,12 +11,15 @@ taxonomies:
 
 menu:
     main:
+        - name: Services
+          url: services/
+          weight: 1
         - name: Papers
           url: papers/
-          weight: 1
+          weight: 2
         - name: Personal Projects
           url: personal-projects/
-          weight: 2
+          weight: 3
         # - name: Courses
         #   url: courses/
         #   weight: 2
@@ -31,7 +34,7 @@ params:
     description: "Mathematician | Machine Learning (AI) Research Scientist"
     author: Franz Louis Cesista
     # googleAnalyticsID: "G-XXXXX"
-    DateFormat: "January 2024"
+    DateFormat: "January 2, 2006"
     defaultTheme: light
     hideFooter: false
     disableThemeToggle: true

diff --git a/content/papers/rasg/index.md b/content/papers/rasg/index.md
@@ -1,12 +1,24 @@
 ---
-title: "[Under Review]"
-# date: 2012-06-01
-# tags: ["Machine Learning", "Retrieval Augmented Generation", "Structured Generation", "Structured Prompting", "Supervised Finetuning", "Document Information Extraction"]
+title: "Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use [Under Review]"
+date: 2024-04-15
+tags: ["Machine Learning", "Retrieval Augmented Generation", "Structured Generation", "Structured Prompting", "Supervised Finetuning", "Document Information Extraction"]
 author: "Franz Louis Cesista"
-description: "This paper is still under review. It describes a SOTA method for Document Information Extraction tasks (i.e. Key-Information Extraction & Line Items Recognition). The method can augment open-source LLMs by up to 1473% and commercial LLMs by up to 304% on public benchmarks and beating strong, finetuned multi-modal baselines."
-summary: "This paper is still under review. It describes a SOTA method for Document Information Extraction tasks (i.e. Key-Information Extraction & Line Items Recognition). The method can augment open-source LLMs by up to 1473% and commercial LLMs by up to 304% on public benchmarks and beating strong, finetuned multi-modal baselines."
+description: "Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). And subtasks such as Optical Character Recognition (OCR) and Table Structure Recognition (TSR) are means to these ends. In this paper, we argue that BDIE is best modeled as a \textit{Tool Use} problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks.
+
+The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multi-Modal Models (LMMMs) without RASG such as LayoutLMv3 and Roberta + DeTR on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes - that is, pairs of (x, y) coordinates containing relevant text of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE."
+summary: "Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). And subtasks such as Optical Character Recognition (OCR) and Table Structure Recognition (TSR) are means to these ends. In this paper, we argue that BDIE is best modeled as a \textit{Tool Use} problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks.
+
+The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multi-Modal Models (LMMMs) without RASG such as LayoutLMv3 and Roberta + DeTR on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes - that is, pairs of (x, y) coordinates containing relevant text of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE."
 ---
 
-This paper is still under review. It describes a SOTA method for Document Information Extraction tasks (i.e. Key-Information Extraction & Line Items Recognition). The method can augment open-source LLMs by up to 1473% and commercial LLMs by up to 304% on public benchmarks and beating strong, finetuned *multi-modal* baselines.
+Download: [Paper](/RASG-ieee-mipr.pdf)
+
+Authors: [Franz Louis Cesista](mailto:franzlouiscesista@gmail.com), [Rui Aguiar](mailto:rui@expedock.com), [Jason Kim](mailto:jasonminsookim@gmail.com), [Paolo Acilo](mailto:paolo@expedock.com)
+
+---
+
+## Abstract
+
+Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). And subtasks such as Optical Character Recognition (OCR) and Table Structure Recognition (TSR) are means to these ends. In this paper, we argue that BDIE is best modeled as a \textit{Tool Use} problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks.
 
-Please contact me for a copy of the paper.
+The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multi-Modal Models (LMMMs) without RASG such as LayoutLMv3 and Roberta + DeTR on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes - that is, pairs of (x, y) coordinates containing relevant text of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE.
diff --git a/content/personal-projects/codeball/index.md b/content/personal-projects/codeball/index.md
@@ -1,6 +1,6 @@
 ---
 title: "Codeball 2018"
-# date: 2023-07-25
+date: 2019-01-24
 tags: ["Artificial Intelligence", "Rule-Based AI", "Game AI", "Python", "3D Physics Simulation"]
 author: "Franz Louis Cesista"
 description: "My entry for the World Finals of the Russian AI Cup 2018 - Codeball. A 3D physics-aware orchestrator of a pair of bots in a Rocket League-esque soccer game."

diff --git a/content/personal-projects/codewars/index.md b/content/personal-projects/codewars/index.md
@@ -1,6 +1,6 @@
 ---
 title: "Codewars 2017"
-# date: 2023-07-25
+date: 2018-02-12
 tags: ["Artificial Intelligence", "Particle Swarm AI", "Game AI", "Python", "K-Means Clustering", "BFS", "Potential Flows", "Fluid Dynamics"]
 author: "Franz Louis Cesista"
 description: "My entry for the World Finals of the Russian AI Cup 2017 - Codewars. A particle swarm-based AI that uses potential flows and fluid mechanics to direct units in a Command-and-Conquer-esque game."

diff --git a/content/personal-projects/expedock-assistant/index.md b/content/personal-projects/expedock-assistant/index.md
@@ -1,6 +1,6 @@
 ---
 title: "Expedock Assistant: ChatGPT Applied to Logistics Data"
-# date: 2023-07-25
+date: 2023-01-31
 tags: ["Machine Learning", "Tool Use", "AI Agent", "Logistics"]
 author: "Franz Louis Cesista"
 description: "Expedock Assistant is a chatbot that allows you to ask questions about your shipments and get answers in real time. It’s like having a personal assistant that knows everything about your business, shipments and industry."

diff --git a/content/personal-projects/expedock-automl/index.md b/content/personal-projects/expedock-automl/index.md
@@ -1,6 +1,6 @@
 ---
 title: "Expedock AutoML"
-# date: 2023-07-25
+date: 2022-07-25
 tags: ["Machine Learning", "ML Interpretability"]
 author: "Franz Louis Cesista"
 description: "Expedock's AutoML Library -- fit a model, run batch inference, and get explanations in one line of code each."

diff --git a/content/personal-projects/flash-hyperbolic-attention-minimal/index.md b/content/personal-projects/flash-hyperbolic-attention-minimal/index.md
@@ -1,5 +1,6 @@
 ---
 title: "Flash Hyperbolic Attention Minimal [WIP]"
+# no date until finished
 # date: 2024-04-16
 tags: ["Machine Learning", "C++", "CUDA", "PyTorch", "Non-Euclidean Geometry", "Flash Attention", "Hyperbolic Geometry"]
 author: "Franz Louis Cesista"

diff --git a/content/personal-projects/grab-booking-demand-prediction/index.md b/content/personal-projects/grab-booking-demand-prediction/index.md
@@ -1,6 +1,6 @@
 ---
 title: "Booking Demand Prediction for Grab SEA"
-# date: 2023-07-25
+date: 2019-06-16
 tags: ["Machine Learning", "Spatio-Temporal Forecasting", "Anomaly Detection", "Econometrics"]
 author: "Franz Louis Cesista"
 description: "Booking demand prediction for Grab's Southeast Asia operations. The project involves spatio-temporal forecasting, anomaly detection, and econometric modeling."

diff --git a/content/personal-projects/llama.cpp/index.md b/content/personal-projects/llama.cpp/index.md
@@ -1,6 +1,6 @@
 ---
 title: "Llama.cpp"
-# date: 2023-07-25
+date: 2023-07-25
 tags: ["Machine Learning", "C++"]
 author: "Franz Louis Cesista"
 description: "A C++ implementation of Meta's Llama2 generative large-language model. I also optimized the original C implementation by Karpathy by adding parallelization on

diff --git a/deprecated-content/services.md b/deprecated-content/services.md
@@ -0,0 +1,8 @@
+---
+title: "Applied AI Consulting Services"
+# date: 2023-07-25
+author: "Franz Louis Cesista"
+---
+
+- been in the trenches
+- can help you setup your AI pipeline from scratch
diff --git a/public/404.html b/public/404.html
@@ -84,6 +84,11 @@
             </div>
         </div>
         <ul id="menu">
+            <li>
+                <a href="https://leloykun.github.io/services/" title="Services">
+                    <span>Services</span>
+                </a>
+            </li>
             <li>
                 <a href="https://leloykun.github.io/papers/" title="Papers">
                     <span>Papers</span>

diff --git a/public/RASG-ieee-mipr.pdf b/public/RASG-ieee-mipr.pdf
diff --git a/public/archive/index.html b/public/archive/index.html
@@ -134,6 +134,11 @@
             </div>
         </div>
         <ul id="menu">
+            <li>
+                <a href="https://leloykun.github.io/services/" title="Services">
+                    <span>Services</span>
+                </a>
+            </li>
             <li>
                 <a href="https://leloykun.github.io/papers/" title="Papers">
                     <span>Papers</span>
@@ -163,6 +168,103 @@
 
 <header class="page-header">
 </header>
+<div class="archive-year">
+  <h2 class="archive-year-header">2024
+  </h2>
+  <div class="archive-month">
+    <h3 class="archive-month-header">April
+    </h3>
+    <div class="archive-posts">
+      <div class="archive-entry">
+        <h3 class="archive-entry-title">Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use [Under Review]
+        </h3>
+        <a class="entry-link" aria-label="post link to Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use [Under Review]" href="https://leloykun.github.io/papers/rasg/"></a>
+      </div>
+    </div>
+  </div>
+</div>
+<div class="archive-year">
+  <h2 class="archive-year-header">2023
+  </h2>
+  <div class="archive-month">
+    <h3 class="archive-month-header">July
+    </h3>
+    <div class="archive-posts">
+      <div class="archive-entry">
+        <h3 class="archive-entry-title">Llama.cpp
+        </h3>
+        <a class="entry-link" aria-label="post link to Llama.cpp" href="https://leloykun.github.io/personal-projects/llama.cpp/"></a>
+      </div>
+    </div>
+  </div>
+  <div class="archive-month">
+    <h3 class="archive-month-header">January
+    </h3>
+    <div class="archive-posts">
+      <div class="archive-entry">
+        <h3 class="archive-entry-title">Expedock Assistant: ChatGPT Applied to Logistics Data
+        </h3>
+        <a class="entry-link" aria-label="post link to Expedock Assistant: ChatGPT Applied to Logistics Data" href="https://leloykun.github.io/personal-projects/expedock-assistant/"></a>
+      </div>
+    </div>
+  </div>
+</div>
+<div class="archive-year">
+  <h2 class="archive-year-header">2022
+  </h2>
+  <div class="archive-month">
+    <h3 class="archive-month-header">July
+    </h3>
+    <div class="archive-posts">
+      <div class="archive-entry">
+        <h3 class="archive-entry-title">Expedock AutoML
+        </h3>
+        <a class="entry-link" aria-label="post link to Expedock AutoML" href="https://leloykun.github.io/personal-projects/expedock-automl/"></a>
+      </div>
+    </div>
+  </div>
+</div>
+<div class="archive-year">
+  <h2 class="archive-year-header">2019
+  </h2>
+  <div class="archive-month">
+    <h3 class="archive-month-header">June
+    </h3>
+    <div class="archive-posts">
+      <div class="archive-entry">
+        <h3 class="archive-entry-title">Booking Demand Prediction for Grab SEA
+        </h3>
+        <a class="entry-link" aria-label="post link to Booking Demand Prediction for Grab SEA" href="https://leloykun.github.io/personal-projects/grab-booking-demand-prediction/"></a>
+      </div>
+    </div>
+  </div>
+  <div class="archive-month">
+    <h3 class="archive-month-header">January
+    </h3>
+    <div class="archive-posts">
+      <div class="archive-entry">
+        <h3 class="archive-entry-title">Codeball 2018
+        </h3>
+        <a class="entry-link" aria-label="post link to Codeball 2018" href="https://leloykun.github.io/personal-projects/codeball/"></a>
+      </div>
+    </div>
+  </div>
+</div>
+<div class="archive-year">
+  <h2 class="archive-year-header">2018
+  </h2>
+  <div class="archive-month">
+    <h3 class="archive-month-header">February
+    </h3>
+    <div class="archive-posts">
+      <div class="archive-entry">
+        <h3 class="archive-entry-title">Codewars 2017
+        </h3>
+        <a class="entry-link" aria-label="post link to Codewars 2017" href="https://leloykun.github.io/personal-projects/codewars/"></a>
+      </div>
+    </div>
+  </div>
+</div>
     </main>
 
 <footer class="footer">