Skip to content

Commit

Permalink
added latest pdf2parquet and docchunk with docling 2
Browse files Browse the repository at this point in the history
Signed-off-by: Maroun Touma <touma@us.ibm.com>
  • Loading branch information
touma-I committed Oct 31, 2024
2 parents 8b0b077 + 5877e4d commit d7224e5
Show file tree
Hide file tree
Showing 47 changed files with 4,736 additions and 4,421 deletions.
16 changes: 8 additions & 8 deletions examples/notebooks/intro/dpk_intro_1_python.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"\n",
"Here is the workflow\n",
"\n",
"![](https://raw.githubusercontent.com/sujee/data-prep-kit/intro-example1/examples/notebooks/intro/images/data-prep-kit-3-workflow.png)\n"
"![](https://raw.githubusercontent.com/IBM/data-prep-kit/dev/examples/notebooks/intro/images/data-prep-kit-3-workflow.png)\n"
]
},
{
Expand All @@ -27,7 +27,7 @@
"\n",
"Two options:\n",
"\n",
"- **Option 1 - Google Colab:** easiest option. no setup required. Click this link to open this on google colab. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sujee/data-prep-kit/blob/intro-example1/examples/notebooks/intro/dpk_intro_1_python.ipynb)\n",
"- **Option 1 - Google Colab:** easiest option. no setup required. Click this link to open this on google colab. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IBM/data-prep-kit/blob/dev/examples/notebooks/intro/dpk_intro_1_python.ipynb)\n",
"- **Option 2 - Local python dev environment:** Setup using this [guide](../../../README.md#-getting-started)\n",
"\n",
"The notebook will work as in both environments"
Expand All @@ -42,10 +42,10 @@
"source": [
"## Step-1: Inspect the Data\n",
"\n",
"We will use simple PDFs about Solar system. The files are [here](https://github.com/sujee/data-prep-kit/tree/intro-example1/examples/notebooks/intro/input/solar-system)\n",
"We will use simple PDFs about Solar system. The files are [here](https://github.com/IBM/data-prep-kit/tree/dev/examples/notebooks/intro/input/solar-system)\n",
"\n",
"- [earth.pdf](https://github.com/sujee/data-prep-kit/blob/intro-example1/examples/notebooks/intro/input/solar-system/earth.pdf)\n",
"- [mars.pdf](https://github.com/sujee/data-prep-kit/blob/intro-example1/examples/notebooks/intro/input/solar-system/mars.pdf)\n"
"- [earth.pdf](https://github.com/IBM/data-prep-kit/blob/dev/examples/notebooks/intro/input/solar-system/earth.pdf)\n",
"- [mars.pdf](https://github.com/IBM/data-prep-kit/blob/dev/examples/notebooks/intro/input/solar-system/mars.pdf)\n"
]
},
{
Expand Down Expand Up @@ -118,9 +118,9 @@
"source": [
"if RUNNING_IN_COLAB:\n",
" !mkdir -p 'input/solar-system'\n",
" !wget -O 'input/solar-system/earth.pdf' 'https://raw.githubusercontent.com/sujee/data-prep-kit/intro-example1/examples/notebooks/intro/input/solar-system/earth.pdf'\n",
" !wget -O 'input/solar-system/mars.pdf' 'https://raw.githubusercontent.com/sujee/data-prep-kit/intro-example1/examples/notebooks/intro/input/solar-system/mars.pdf'\n",
" !wget -O 'my_utils.py' 'https://raw.githubusercontent.com/sujee/data-prep-kit/intro-example1/examples/notebooks/intro/my_utils.py'"
" !wget -O 'input/solar-system/earth.pdf' 'https://raw.githubusercontent.com/IBM/data-prep-kit/dev/examples/notebooks/intro/input/solar-system/earth.pdf'\n",
" !wget -O 'input/solar-system/mars.pdf' 'https://raw.githubusercontent.com/IBM/data-prep-kit/dev/examples/notebooks/intro/input/solar-system/mars.pdf'\n",
" !wget -O 'my_utils.py' 'https://raw.githubusercontent.com/IBM/data-prep-kit/dev/examples/notebooks/intro/my_utils.py'"
]
},
{
Expand Down
Loading

0 comments on commit d7224e5

Please sign in to comment.