From 95ab258aa741bac2d86a2f81ce12a0c2682b912c Mon Sep 17 00:00:00 2001 From: bracesproul Date: Mon, 20 May 2024 12:47:30 -0700 Subject: [PATCH 1/9] docs[minor]: Unstructured MD loader doc --- .../how_to/document_loader_markdown.ipynb | 744 ++++++++++++++++++ .../src/load/import_map.ts | 2 + .../src/document_loaders/fs/unstructured.ts | 6 +- .../src/load/import_map.ts | 2 + 4 files changed, 751 insertions(+), 3 deletions(-) create mode 100644 docs/core_docs/docs/how_to/document_loader_markdown.ipynb diff --git a/docs/core_docs/docs/how_to/document_loader_markdown.ipynb b/docs/core_docs/docs/how_to/document_loader_markdown.ipynb new file mode 100644 index 000000000000..a588be101f2c --- /dev/null +++ b/docs/core_docs/docs/how_to/document_loader_markdown.ipynb @@ -0,0 +1,744 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d836a98a-ad14-4bed-af76-e1877f7ef8a4", + "metadata": {}, + "source": [ + "# How to load Markdown\n", + "\n", + "[Markdown](https://en.wikipedia.org/wiki/Markdown) is a lightweight markup language for creating formatted text using a plain-text editor.\n", + "\n", + "Here we cover how to load `Markdown` documents into LangChain [Document](https://v02.api.js.langchain.com/classes/langchain_core_documents.Document.html) objects that we can use downstream.\n", + "\n", + "We will cover:\n", + "\n", + "- Basic usage;\n", + "- Parsing of Markdown into elements such as titles, list items, and text.\n", + "\n", + "LangChain implements an [UnstructuredLoader](https://v02.api.js.langchain.com/classes/langchain_document_loaders_fs_unstructured.UnstructuredLoader.html) class.\n", + "\n", + ":::info Prerequisites\n", + "\n", + "This guide assumes familiarity with the following concepts:\n", + "\n", + "- [Documents](/docs/concepts#document)\n", + "- [Document Loaders](/docs/concepts#document-loaders)\n", + "\n", + ":::\n", + "\n", + "## Installation\n", + "\n", + "```{=mdx}\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\"\n", + "\n", + "\n", + " @langchain/community\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "897a69e9", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "Although Unstructured has an open source offering, you're still required to provide an API key to access the service. To get everything up and running, follow these two steps:\n", + "\n", + "1. Download & start the Docker container:\n", + " \n", + "```bash\n", + "docker run -p 8000:8000 -d --rm --name unstructured-api quay.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0\n", + "```\n", + "\n", + "2. Get a free API key & API URL [here](https://unstructured.io/api-key), and set it in your environment (as per the Unstructured website, it may take up to an hour to allocate your API key & URL.):\n", + "\n", + "```bash\n", + "export UNSTRUCTURED_API_KEY=\"...\"\n", + "# Replace with your `Full URL` from the email\n", + "export UNSTRUCTURED_API_URL=\"https://-.api.unstructuredapp.io/general/v0/general\" \n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "ea8c41f8-a8dc-48cc-b78d-7b3e2427a34c", + "metadata": {}, + "source": [ + "Basic usage will ingest a Markdown file to a single document. Here we demonstrate on LangChain's readme:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "80c50cc4-7ce9-4418-81b9-29c52c7b3627", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " Document {\n", + " pageContent: '🦜️🔗 LangChain.js',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '⚡ Building applications with LLMs through composability ⚡',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Looking for the Python version? Check out LangChain.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '7ea17bcb17b10f303cbb93b4cb95de93',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'To help you ship LangChain apps to production faster, check out LangSmith.\\n' +\n", + " 'LangSmith is a unified developer platform for building, testing, and monitoring LLM applications.\\n' +\n", + " 'Fill out this form to get on the waitlist or speak with our sales team.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '7ea17bcb17b10f303cbb93b4cb95de93',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '⚡️ Quick Install',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'You can use npm, yarn, or pnpm to install LangChain.js',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '8f698a6f3038c268bf6d65bc6065890b',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'npm install -S langchain or yarn add langchain or pnpm add langchain',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '8f698a6f3038c268bf6d65bc6065890b',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'typescript\\nimport { ChatOpenAI } from \"langchain/chat_models/openai\";',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '🌐 Supported Environments',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'LangChain is written in TypeScript and can be used in:',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '975643d774ab3b861962f9dc13588d84',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Node.js (ESM and CommonJS) - 18.x, 19.x, 20.x',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '975643d774ab3b861962f9dc13588d84',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Cloudflare Workers',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '975643d774ab3b861962f9dc13588d84',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Vercel / Next.js (Browser, Serverless and Edge functions)',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '975643d774ab3b861962f9dc13588d84',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Supabase Edge Functions',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '975643d774ab3b861962f9dc13588d84',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Browser',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '975643d774ab3b861962f9dc13588d84',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Deno',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '975643d774ab3b861962f9dc13588d84',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '🤔 What is LangChain?',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'LangChain is a framework for developing applications powered by language models. It enables applications that:\\n' +\n", + " '- Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)\\n' +\n", + " '- Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'This framework consists of several parts.\\n' +\n", + " '- LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic runtime for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.\\n' +\n", + " '- LangChain Templates: (currently Python-only) A collection of easily deployable reference architectures for a wide variety of tasks.\\n' +\n", + " '- LangServe: (currently Python-only) A library for deploying LangChain chains as a REST API.\\n' +\n", + " '- LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'The LangChain libraries themselves are made up of several different packages.\\n' +\n", + " '- @langchain/core: Base abstractions and LangChain Expression Language.\\n' +\n", + " '- @langchain/community: Third party integrations.\\n' +\n", + " \"- langchain: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.\",\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Integrations may also be split into their own compatible packages.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'This library aims to assist in the development of those types of applications. Common examples of these applications include:',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '❓Question Answering over specific documents',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Documentation',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '2321e263d4278955b49ae7185a2e7071',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'End-to-end Example: Doc-Chatbot',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '2321e263d4278955b49ae7185a2e7071',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '💬 Chatbots',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Documentation',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '13bfe7de8241ff139f084c9528169836',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'End-to-end Example: Chat-LangChain',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '13bfe7de8241ff139f084c9528169836',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '🚀 How does LangChain help?',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'The main value props of the LangChain libraries are:\\n' +\n", + " '1. Components: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not\\n' +\n", + " '2. Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '1967058b7817d63c366c58df67e61178',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '1967058b7817d63c366c58df67e61178',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Components fall into the following modules:',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '1967058b7817d63c366c58df67e61178',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '📃 Model I/O:',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '7742f15be2acbf645543557b71bee56e',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '📚 Retrieval:',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '6a6b63610d2ca00f121f094a94d520be',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '🤖 Agents:',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'cc022877b6536240ca7e38e6827c4dba',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '📖 Documentation',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Please see here for full documentation, which includes:',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Getting started: installation, setting up the environment, simple examples',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Overview of the interfaces, modules and integrations',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Use case walkthroughs and best practice guides',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Reference: full API docs',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '💁 Contributing',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '248eb0e90cb2116083e2351ddd5218b8',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'For detailed information on how to contribute, see here.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '248eb0e90cb2116083e2351ddd5218b8',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Please report any security issues or concerns following our security guidelines.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '248eb0e90cb2116083e2351ddd5218b8',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '🖇️ Relationship with Python LangChain',\n", + " metadata: {\n", + " languages: [Array],\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'This is built to integrate as seamlessly as possible with the LangChain Python package. Specifically, this means all objects (prompts, LLMs, chains, etc) are designed in a way where they can be serialized and shared between languages.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: '48411b9b9512447054ee50f01d3fd6ee',\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " category: 'NarrativeText'\n", + " }\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "import { UnstructuredLoader } from \"@langchain/community/document_loaders/fs/unstructured\";\n", + "\n", + "const markdownPath = \"../../../../README.md\";\n", + "\n", + "const loader = new UnstructuredLoader(markdownPath, {\n", + " apiKey: process.env.UNSTRUCTURED_API_KEY,\n", + " apiUrl: process.env.UNSTRUCTURED_API_URL,\n", + "});\n", + "\n", + "const data = await loader.load()\n", + "console.log(data);" + ] + }, + { + "cell_type": "markdown", + "id": "b7560a6e-ca5d-47e1-b176-a9c40e763ff3", + "metadata": {}, + "source": [ + "## Retain Elements\n", + "\n", + "Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `chunkingStrategy: \"by_title\"`." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a986bbce-7fd3-41d1-bc47-49f9f57c7cd1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Number of documents: 13\n", + "\n", + "Document {\n", + " pageContent: '🦜️🔗 LangChain.js\\n' +\n", + " '\\n' +\n", + " '⚡ Building applications with LLMs through composability ⚡\\n' +\n", + " '\\n' +\n", + " 'Looking for the Python version? Check out LangChain.\\n' +\n", + " '\\n' +\n", + " 'To help you ship LangChain apps to production faster, check out LangSmith.\\n' +\n", + " 'LangSmith is a unified developer platform for building, testing, and monitoring LLM applications.\\n' +\n", + " 'Fill out this form to get on the waitlist or speak with our sales team.',\n", + " metadata: {\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " languages: [ 'eng' ],\n", + " orig_elements: 'eJzNUtuO0zAQ/ZVRnquSS3PjBcGyPHURgr5tV2hijxNTJ45ip0u14t8Zp1y6CCF4ACFLlufuc+bcPkRkqKfBv9cyegpREWNZosxS0RRVzmeTCiFlnmRUFZmQ0QqinjxK9Mj5D5HShgbsKRS/vX7+8uZ63S9ZIeBP4xLw9NE/6XxvQsDg0M7YkuPIbURDG919Wp1zQu5+llVGfMta7GdFsVo8MniSErZcfdWhHtYfXOj2dcROe0MRN/oRUUmYlI1o+EpilcWZaJo6azaiqXNJdfYvEKUFJvBi1kbqoQUcR6MFem0HB/fad7Dd3jjw3WTntgNh+9E6bLTR/gTn4t9CmhHFTc1w80oKSUlTpFWaFKWsVR5nFf0dpOwdcfoDvi+p2Vp7CJQoOzF+gjcn39kBjjQ5ZucZXHUkDmBnf7H3Sy5e4zQxkUfahYY/4UQqVcZJpSpspKqSMslVllWJzDdMC6XVf8jJzkJHZoSTncF1evwOPSiHdWJhnKycRRAQKHSephWIR0y961lW6/3w7Q3aAcI8aKVJgqQjGTvSBKNBz+T3ywaaLwpdgSfnlwcOEno7aG+nsCcW6iP58ohX2phlru94xtKLf9iSB/5d2Ok9smC1Y3sCNxIezpq3M5toiAER9r/a6t1n6BJ/zg==',\n", + " category: 'CompositeElement'\n", + " }\n", + "}\n", + "\n", + "\n", + "Document {\n", + " pageContent: '⚡️ Quick Install\\n' +\n", + " '\\n' +\n", + " 'You can use npm, yarn, or pnpm to install LangChain.js\\n' +\n", + " '\\n' +\n", + " 'npm install -S langchain or yarn add langchain or pnpm add langchain\\n' +\n", + " '\\n' +\n", + " 'typescript\\n' +\n", + " 'import { ChatOpenAI } from \"langchain/chat_models/openai\";\\n' +\n", + " '\\n' +\n", + " '🌐 Supported Environments\\n' +\n", + " '\\n' +\n", + " 'LangChain is written in TypeScript and can be used in:\\n' +\n", + " '\\n' +\n", + " 'Node.js (ESM and CommonJS) - 18.x, 19.x, 20.x\\n' +\n", + " '\\n' +\n", + " 'Cloudflare Workers\\n' +\n", + " '\\n' +\n", + " 'Vercel / Next.js (Browser, Serverless and Edge functions)\\n' +\n", + " '\\n' +\n", + " 'Supabase Edge Functions\\n' +\n", + " '\\n' +\n", + " 'Browser\\n' +\n", + " '\\n' +\n", + " 'Deno',\n", + " metadata: {\n", + " filename: 'README.md',\n", + " filetype: 'text/markdown',\n", + " languages: [ 'eng' ],\n", + " orig_elements: 'eJzNlm1v2zYQx7/KQa9WwE1Iik/qXnWpB2RoM2wOOgx1URzJY6pVogyJTlME/e6j3KZIhgBzULjIG0Li3VH+/e/BfHNdUUc9pfyuDdUzqGzUjUUda1ZbL7R1UQetnNdMK9swVy2g6iljwIzF/7qKbUcJe5qD/1w+f/FqedSH2Ws25E+bnSHTVT5+n/tuNnSYLrZ4QVOxvKkoXVRvPy+++My+663QyNfbSCzCH9vWf4DTNGXsdsE3J563uaOqxP0XIDSxCdobSZIYd9w7JpQlLU3TaKf4YQDK7gbHB8h4m/jvYQseE2wngrTpF/AJx7SAYYRNeYU8QPtFAHhZvnzyHtt09M90W40zHEfM7SWdz0fep0otuUISLBqMjfNFjMYzI6SWFFWQj1CVGf2G++kK5uP9jD7rMgsEGMLd3Z1ad3YfpJHWsubSchGQeNRItUGPElF7wck2hy/9OWbyY7vJ69T2m2HMcA0l3/n3DaXnp/AZ4jj0sK6+AR6XNb/rh0DddDwUL2zX1c97NUpjVAEOxkh0tbOaN1qU1vG8VtYGe6CSuNvpwda+rJEzWG03MzAFWKbLdhzS/FOnvUhcdChlNC6iKBWuJVrCGMhxIaKMP6i4/1fP2+jfGhnaCT6Obc5UHhOcl4+vdhUAmMJuKjiaB0Mo1mcPKmdBvlFWK6ZMaXfNI2ojIvNORMsUHWiSf5cqZ6WOy2SDn5arVzv+k6Hvh/Tb6gk8BW6PrhbAm3kV7Ojqthgv2ymfZurvrQ4hvRLCSaUEj8YG77TzQTNriYv6B/0hPEiHk24oTdGVePhrGD/QOO0LyxRHKZivAxldS41akzXcxELPm/oxJv01jZ46OIazsrHL/i/j8HGicQErGi9p7GiadtWwDBcEcZt8boc0PdlXE9KlAoSkZh4PtUBZ5oRjTAbiSgd3oLn+XZqUYYgOy3Vgh/zrDfK+xA0rqY6GaQrGo5JM1azcgawzjeOa2CMk/przvXMayvXQEA8meEmCsxiDrkO54/iAVvtHSPiC0nA/3tt/AY+igwk=',\n", + " category: 'CompositeElement'\n", + " }\n", + "}\n", + "\n", + "\n" + ] + } + ], + "source": [ + "const loader = new UnstructuredLoader(markdownPath, {\n", + " chunkingStrategy: \"by_title\"\n", + "});\n", + "\n", + "\n", + "const data = await loader.load()\n", + "\n", + "console.log(`Number of documents: ${data.length}\\n`)\n", + "\n", + "for (const doc of data.slice(0, 2)) {\n", + " console.log(doc);\n", + " console.log(\"\\n\");\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "117dc6b0-9baa-44a2-9d1d-fc38ecf7a233", + "metadata": {}, + "source": [ + "Note that in this case we recover just one distinct element type:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "75abc139-3ded-4e8e-9f21-d0c8ec40fdfc", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Set(1) { 'CompositeElement' }\n" + ] + } + ], + "source": [ + "const categories = new Set(data.map((document) => document.metadata.category));\n", + "console.log(categories);" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/langchain-anthropic/src/load/import_map.ts b/libs/langchain-anthropic/src/load/import_map.ts index eba2067309ac..93126ee0aecd 100644 --- a/libs/langchain-anthropic/src/load/import_map.ts +++ b/libs/langchain-anthropic/src/load/import_map.ts @@ -2,3 +2,5 @@ export * as index from "../index.js"; export * as experimental from "../experimental/index.js"; + + diff --git a/libs/langchain-community/src/document_loaders/fs/unstructured.ts b/libs/langchain-community/src/document_loaders/fs/unstructured.ts index 1048018e3d94..8340249d5325 100644 --- a/libs/langchain-community/src/document_loaders/fs/unstructured.ts +++ b/libs/langchain-community/src/document_loaders/fs/unstructured.ts @@ -1,7 +1,7 @@ import type { basename as BasenameT } from "node:path"; import type { readFile as ReadFileT } from "node:fs/promises"; import { Document } from "@langchain/core/documents"; -import { getEnv } from "@langchain/core/utils/env"; +import { getEnv, getEnvironmentVariable } from "@langchain/core/utils/env"; import { StringWithAutocomplete } from "@langchain/core/utils/types"; import { DirectoryLoader, @@ -181,8 +181,8 @@ export class UnstructuredLoader extends BaseDocumentLoader { } else { this.filePath = filePathOrLegacyApiUrl; const options = optionsOrLegacyFilePath; - this.apiKey = options.apiKey; - this.apiUrl = options.apiUrl ?? this.apiUrl; + this.apiKey = options.apiKey ?? getEnvironmentVariable("UNSTRUCTURED_API_KEY"); + this.apiUrl = options.apiUrl ?? getEnvironmentVariable("UNSTRUCTURED_API_URL") ?? this.apiUrl; this.strategy = options.strategy ?? this.strategy; this.encoding = options.encoding; this.ocrLanguages = options.ocrLanguages ?? this.ocrLanguages; diff --git a/libs/langchain-community/src/load/import_map.ts b/libs/langchain-community/src/load/import_map.ts index 5d1c44843493..efae950641de 100644 --- a/libs/langchain-community/src/load/import_map.ts +++ b/libs/langchain-community/src/load/import_map.ts @@ -71,3 +71,5 @@ export * as document_loaders__web__sort_xyz_blockchain from "../document_loaders export * as utils__event_source_parse from "../utils/event_source_parse.js"; export * as experimental__graph_transformers__llm from "../experimental/graph_transformers/llm.js"; export * as experimental__chat_models__ollama_functions from "../experimental/chat_models/ollama_functions.js"; + + From 4388595e588ab901f2708c5995cd27417dcce326 Mon Sep 17 00:00:00 2001 From: bracesproul Date: Mon, 20 May 2024 12:48:54 -0700 Subject: [PATCH 2/9] chore: lint files --- libs/langchain-anthropic/src/load/import_map.ts | 2 -- .../src/document_loaders/fs/unstructured.ts | 8 ++++++-- libs/langchain-community/src/load/import_map.ts | 2 -- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/libs/langchain-anthropic/src/load/import_map.ts b/libs/langchain-anthropic/src/load/import_map.ts index 93126ee0aecd..eba2067309ac 100644 --- a/libs/langchain-anthropic/src/load/import_map.ts +++ b/libs/langchain-anthropic/src/load/import_map.ts @@ -2,5 +2,3 @@ export * as index from "../index.js"; export * as experimental from "../experimental/index.js"; - - diff --git a/libs/langchain-community/src/document_loaders/fs/unstructured.ts b/libs/langchain-community/src/document_loaders/fs/unstructured.ts index 8340249d5325..f9040b11110a 100644 --- a/libs/langchain-community/src/document_loaders/fs/unstructured.ts +++ b/libs/langchain-community/src/document_loaders/fs/unstructured.ts @@ -181,8 +181,12 @@ export class UnstructuredLoader extends BaseDocumentLoader { } else { this.filePath = filePathOrLegacyApiUrl; const options = optionsOrLegacyFilePath; - this.apiKey = options.apiKey ?? getEnvironmentVariable("UNSTRUCTURED_API_KEY"); - this.apiUrl = options.apiUrl ?? getEnvironmentVariable("UNSTRUCTURED_API_URL") ?? this.apiUrl; + this.apiKey = + options.apiKey ?? getEnvironmentVariable("UNSTRUCTURED_API_KEY"); + this.apiUrl = + options.apiUrl ?? + getEnvironmentVariable("UNSTRUCTURED_API_URL") ?? + this.apiUrl; this.strategy = options.strategy ?? this.strategy; this.encoding = options.encoding; this.ocrLanguages = options.ocrLanguages ?? this.ocrLanguages; diff --git a/libs/langchain-community/src/load/import_map.ts b/libs/langchain-community/src/load/import_map.ts index efae950641de..5d1c44843493 100644 --- a/libs/langchain-community/src/load/import_map.ts +++ b/libs/langchain-community/src/load/import_map.ts @@ -71,5 +71,3 @@ export * as document_loaders__web__sort_xyz_blockchain from "../document_loaders export * as utils__event_source_parse from "../utils/event_source_parse.js"; export * as experimental__graph_transformers__llm from "../experimental/graph_transformers/llm.js"; export * as experimental__chat_models__ollama_functions from "../experimental/chat_models/ollama_functions.js"; - - From 9f9fa5fdcbcdba063a2e05658268dc6ce4affad1 Mon Sep 17 00:00:00 2001 From: bracesproul Date: Mon, 20 May 2024 12:54:13 -0700 Subject: [PATCH 3/9] html loader --- .../docs/how_to/document_loader_html.ipynb | 1089 +++++++++++++++++ 1 file changed, 1089 insertions(+) create mode 100644 docs/core_docs/docs/how_to/document_loader_html.ipynb diff --git a/docs/core_docs/docs/how_to/document_loader_html.ipynb b/docs/core_docs/docs/how_to/document_loader_html.ipynb new file mode 100644 index 000000000000..4d8871fdabf8 --- /dev/null +++ b/docs/core_docs/docs/how_to/document_loader_html.ipynb @@ -0,0 +1,1089 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0c6c50fc-15e1-4767-925a-53a37c430b9b", + "metadata": {}, + "source": [ + "# How to load HTML\n", + "\n", + "The HyperText Markup Language or [HTML](https://en.wikipedia.org/wiki/HTML) is the standard markup language for documents designed to be displayed in a web browser.\n", + "\n", + "This covers how to load `HTML` documents into a LangChain [Document](https://v02.api.js.langchain.com/classes/langchain_core_documents.Document.html) objects that we can use downstream.\n", + "\n", + "Parsing HTML files often requires specialized tools. Here we demonstrate parsing via [Unstructured](https://unstructured-io.github.io/unstructured/). Head over to the integrations page to find integrations with additional services, such as [FireCrawl](/docs/integrations/document_loaders/firecrawl).\n", + "\n", + ":::info Prerequisites\n", + "\n", + "This guide assumes familiarity with the following concepts:\n", + "\n", + "- [Documents](/docs/concepts#document)\n", + "- [Document Loaders](/docs/concepts#document-loaders)\n", + "\n", + ":::\n", + "\n", + "## Installation\n", + "\n", + "```{=mdx}\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\"\n", + "\n", + "\n", + " @langchain/community\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "868cfb85", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "Although Unstructured has an open source offering, you're still required to provide an API key to access the service. To get everything up and running, follow these two steps:\n", + "\n", + "1. Download & start the Docker container:\n", + " \n", + "```bash\n", + "docker run -p 8000:8000 -d --rm --name unstructured-api quay.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0\n", + "```\n", + "\n", + "2. Get a free API key & API URL [here](https://unstructured.io/api-key), and set it in your environment (as per the Unstructured website, it may take up to an hour to allocate your API key & URL.):\n", + "\n", + "```bash\n", + "export UNSTRUCTURED_API_KEY=\"...\"\n", + "# Replace with your `Full URL` from the email\n", + "export UNSTRUCTURED_API_URL=\"https://-.api.unstructuredapp.io/general/v0/general\" \n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "a4d93b2e", + "metadata": {}, + "source": [ + "## Loading HTML with Unstructured" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "7d167ca3-c7c7-4ef0-b509-080629f0f482", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " Document {\n", + " pageContent: 'Word of the Day',\n", + " metadata: {\n", + " category_depth: 0,\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: ': April 10, 2023',\n", + " metadata: {\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " languages: [Array],\n", + " parent_id: 'b845e60d85ff7d10abda4e5f9a37eec8',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'UncategorizedText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'foible',\n", + " metadata: {\n", + " category_depth: 1,\n", + " languages: [Array],\n", + " parent_id: 'b845e60d85ff7d10abda4e5f9a37eec8',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'play',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'noun',\n", + " metadata: {\n", + " category_depth: 0,\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'FOY-bul',\n", + " metadata: {\n", + " category_depth: 0,\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Prev',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Next',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'What It Means',\n", + " metadata: {\n", + " category_depth: 1,\n", + " languages: [Array],\n", + " parent_id: '211c3644cd30f9bc846f0582444e9451',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Foibles are minor flaws or shortcomings in\\n' +\n", + " ' character or behavior. In fencing,\\n' +\n", + " \" foible refers to the part of a sword's blade\\n\" +\n", + " ' between the middle and point, which is considered the\\n' +\n", + " ' weakest part.',\n", + " metadata: {\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " languages: [Array],\n", + " parent_id: '7daafb9067bef1d8d72040b9015a168b',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '// He was amused daily by the foibles of his\\n' +\n", + " ' eccentric neighbor.',\n", + " metadata: {\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " languages: [Array],\n", + " parent_id: '7daafb9067bef1d8d72040b9015a168b',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'See the entry >',\n", + " metadata: {\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: '7daafb9067bef1d8d72040b9015a168b',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'foible in\\n Context',\n", + " metadata: {\n", + " category_depth: 1,\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " languages: [Array],\n", + " parent_id: '211c3644cd30f9bc846f0582444e9451',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: '\"Films about important historical moments are often\\n' +\n", + " ' marked by a heavy solemnity, a sometimes suffocating\\n' +\n", + " ' respectfulness that can make one forget that these\\n' +\n", + " ' events involved real people, human beings with\\n' +\n", + " ' passions and foibles.\" — Michael Ordoña,\\n' +\n", + " ' The Los Angeles Times, 20 Jan. 2023',\n", + " metadata: {\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " languages: [Array],\n", + " parent_id: 'e57874be3146e3cb252d37874a07840c',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Build your vocabulary! Get Word of the Day in\\n' +\n", + " ' your inbox every day.',\n", + " metadata: {\n", + " category_depth: 2,\n", + " languages: [Array],\n", + " parent_id: 'e57874be3146e3cb252d37874a07840c',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Test Your Vocabulary',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'What Did You Just Call Me?',\n", + " metadata: {\n", + " category_depth: 3,\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " languages: [Array],\n", + " parent_id: 'e63fbd8b85aa3915e02c1c91f1c8adf0',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: undefined,\n", + " metadata: {\n", + " category_depth: 1,\n", + " languages: [Array],\n", + " parent_id: 'a17ddfeddba8873cf88e4ff041e439c5',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Before we went to her house, Hannah told\\n' +\n", + " ' us her aunt was a\\n' +\n", + " ' flibbertigibbet.',\n", + " metadata: {\n", + " category_depth: 1,\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " languages: [Array],\n", + " parent_id: 'a17ddfeddba8873cf88e4ff041e439c5',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Insulting\\n Complimentary',\n", + " metadata: {\n", + " category_depth: 1,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: 'a17ddfeddba8873cf88e4ff041e439c5',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'You know what it looks like… but what is\\n' +\n", + " ' it called?',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'a17ddfeddba8873cf88e4ff041e439c5',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'TAKE THE QUIZ',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Can you make 12 words with 7 letters?',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'eec9e49b2e4f6c99b5517e44ec9efeb4',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'PLAY',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Did You Know?',\n", + " metadata: {\n", + " category_depth: 1,\n", + " languages: [Array],\n", + " parent_id: '7a6d7635930938a56182b0cbc12b5b88',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Many word lovers agree that the pen is mightier than the\\n' +\n", + " ' sword. But be they\\n' +\n", + " ' honed\\n' +\n", + " ' in wit or form, even the sharpest tools in the shed have\\n' +\n", + " ' their flaws. That’s where foible comes in\\n' +\n", + " ' handy. Borrowed from French in the 1600s, the word\\n' +\n", + " ' originally referred to the weakest part of a fencing\\n' +\n", + " ' sword, that part being the portion between the middle\\n' +\n", + " ' and the pointed tip. The English foible soon\\n' +\n", + " ' came to be applied not only to weaknesses in blades but\\n' +\n", + " ' also to minor failings in character. The French source\\n' +\n", + " ' of foible is also at a remove from the fencing\\n' +\n", + " ' arena; the French foible means \"weak,\" and it\\n' +\n", + " ' comes from the same Old French term, feble,\\n' +\n", + " ' that gave us\\n' +\n", + " ' feeble.',\n", + " metadata: {\n", + " emphasized_text_contents: [Array],\n", + " emphasized_text_tags: [Array],\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: 'fbe2a45ff9ac86d8b3e5e311ba4aaf94',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Test Your Vocabulary with M-W Quizzes',\n", + " metadata: {\n", + " category_depth: 1,\n", + " languages: [Array],\n", + " parent_id: '7a6d7635930938a56182b0cbc12b5b88',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Famous Novels, First Lines Quiz',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Weather Words',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Name That Animal: Volume 3',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Name That Hat!',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Name That Flower',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Are You Feeling Lucky?',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Famous Novels, First Lines Quiz',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Weather Words',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Name That Animal: Volume 3',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Name That Hat!',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Name That Flower',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Are You Feeling Lucky?',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Famous Novels, First Lines Quiz',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Weather Words',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Name That Animal: Volume 3',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Play Now',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Test Your Vocabulary',\n", + " metadata: {\n", + " category_depth: 1,\n", + " languages: [Array],\n", + " parent_id: '63b33a2339cd82a6aec91b7c6c4e75c2',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Unscramble the letters to create a word that refers to a\\n' +\n", + " ' particular kind of fencing sword: BRASE.',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'b95a085d102592f456bf277b90a7f7b9',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'VIEW THE ANSWER',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Podcast',\n", + " metadata: {\n", + " category_depth: 1,\n", + " languages: [Array],\n", + " parent_id: '2669d693b8827c7babf22eb05c106af5',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'More Words of the Day',\n", + " metadata: {\n", + " category_depth: 1,\n", + " languages: [Array],\n", + " parent_id: '2669d693b8827c7babf22eb05c106af5',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Apr 09\\n \\n auspicious',\n", + " metadata: {\n", + " category_depth: 1,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Apr 08\\n' +\n", + " ' \\n' +\n", + " ' circumscribe',\n", + " metadata: {\n", + " category_depth: 1,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Apr 07\\n \\n equivocal',\n", + " metadata: {\n", + " category_depth: 1,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Apr 06\\n \\n seder',\n", + " metadata: {\n", + " category_depth: 1,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Apr 05\\n' +\n", + " ' \\n' +\n", + " ' gerrymander',\n", + " metadata: {\n", + " category_depth: 1,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Apr 04\\n \\n belated',\n", + " metadata: {\n", + " category_depth: 1,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'ListItem'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'SEE ALL WORDS OF THE DAY',\n", + " metadata: {\n", + " category_depth: 0,\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'Title'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Can you solve 4 words at once?\\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' Play\\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' Play\\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' Can you solve 4 words at once?\\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' Play\\n' +\n", + " ' \\n' +\n", + " ' \\n' +\n", + " ' Play',\n", + " metadata: {\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: 'f9af15bb60bb691bd715c92448636671',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Love words? Need even more definitions?',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'f9af15bb60bb691bd715c92448636671',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: \"Subscribe to America's largest dictionary and get thousands\\n\" +\n", + " ' more definitions and advanced search—ad free!',\n", + " metadata: {\n", + " languages: [Array],\n", + " parent_id: 'f9af15bb60bb691bd715c92448636671',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " },\n", + " Document {\n", + " pageContent: 'Merriam-Webster unabridged',\n", + " metadata: {\n", + " link_texts: [Array],\n", + " link_urls: [Array],\n", + " link_start_indexes: [Array],\n", + " languages: [Array],\n", + " parent_id: 'f9af15bb60bb691bd715c92448636671',\n", + " filename: 'wordoftheday.html',\n", + " filetype: 'text/html',\n", + " category: 'NarrativeText'\n", + " }\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "import { UnstructuredLoader } from \"@langchain/community/document_loaders/fs/unstructured\";\n", + "\n", + "const filePath = \"../../../../libs/langchain-community/src/tools/fixtures/wordoftheday.html\"\n", + "\n", + "const loader = new UnstructuredLoader(filePath, {\n", + " apiKey: process.env.UNSTRUCTURED_API_KEY,\n", + " apiUrl: process.env.UNSTRUCTURED_API_URL,\n", + "});\n", + "\n", + "const data = await loader.load()\n", + "console.log(data);" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From c16a455aba39b81447fa255f15f8ec672924dc6f Mon Sep 17 00:00:00 2001 From: bracesproul Date: Mon, 20 May 2024 13:00:01 -0700 Subject: [PATCH 4/9] update firecrawl loader --- examples/src/document_loaders/firecrawl.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/src/document_loaders/firecrawl.ts b/examples/src/document_loaders/firecrawl.ts index d8524e1b4b6c..3eb4b46e4eb9 100644 --- a/examples/src/document_loaders/firecrawl.ts +++ b/examples/src/document_loaders/firecrawl.ts @@ -1,4 +1,4 @@ -import { FireCrawlLoader } from "langchain/document_loaders/web/firecrawl"; +import { FireCrawlLoader } from "@langchain/community/document_loaders/web/firecrawl"; const loader = new FireCrawlLoader({ url: "https://firecrawl.dev", // The URL to scrape From c8846229ab12d3d7b5216ceaedf3276fe5d8943c Mon Sep 17 00:00:00 2001 From: bracesproul Date: Mon, 20 May 2024 13:49:42 -0700 Subject: [PATCH 5/9] fix firecrawl loader --- docs/core_docs/docs/how_to/document_loader_html.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/core_docs/docs/how_to/document_loader_html.ipynb b/docs/core_docs/docs/how_to/document_loader_html.ipynb index 4d8871fdabf8..66d32f9ebbb5 100644 --- a/docs/core_docs/docs/how_to/document_loader_html.ipynb +++ b/docs/core_docs/docs/how_to/document_loader_html.ipynb @@ -11,7 +11,7 @@ "\n", "This covers how to load `HTML` documents into a LangChain [Document](https://v02.api.js.langchain.com/classes/langchain_core_documents.Document.html) objects that we can use downstream.\n", "\n", - "Parsing HTML files often requires specialized tools. Here we demonstrate parsing via [Unstructured](https://unstructured-io.github.io/unstructured/). Head over to the integrations page to find integrations with additional services, such as [FireCrawl](/docs/integrations/document_loaders/firecrawl).\n", + "Parsing HTML files often requires specialized tools. Here we demonstrate parsing via [Unstructured](https://unstructured-io.github.io/unstructured/). Head over to the integrations page to find integrations with additional services, such as [FireCrawl](/docs/integrations/document_loaders/web/firecrawl).\n", "\n", ":::info Prerequisites\n", "\n", From 0cd13f2e353139f46ef6887f7610f1582384e2d1 Mon Sep 17 00:00:00 2001 From: bracesproul Date: Mon, 20 May 2024 14:03:31 -0700 Subject: [PATCH 6/9] cr --- docs/core_docs/docs/how_to/document_loader_html.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/core_docs/docs/how_to/document_loader_html.ipynb b/docs/core_docs/docs/how_to/document_loader_html.ipynb index 66d32f9ebbb5..ccee10bdac40 100644 --- a/docs/core_docs/docs/how_to/document_loader_html.ipynb +++ b/docs/core_docs/docs/how_to/document_loader_html.ipynb @@ -11,7 +11,7 @@ "\n", "This covers how to load `HTML` documents into a LangChain [Document](https://v02.api.js.langchain.com/classes/langchain_core_documents.Document.html) objects that we can use downstream.\n", "\n", - "Parsing HTML files often requires specialized tools. Here we demonstrate parsing via [Unstructured](https://unstructured-io.github.io/unstructured/). Head over to the integrations page to find integrations with additional services, such as [FireCrawl](/docs/integrations/document_loaders/web/firecrawl).\n", + "Parsing HTML files often requires specialized tools. Here we demonstrate parsing via [Unstructured](https://unstructured-io.github.io/unstructured/). Head over to the integrations page to find integrations with additional services, such as [FireCrawl](/docs/integrations/document_loaders/web_loaders/firecrawl).\n", "\n", ":::info Prerequisites\n", "\n", From 6cc66306f064b60b1e197afc4ff37227aefbaa90 Mon Sep 17 00:00:00 2001 From: bracesproul Date: Mon, 20 May 2024 17:28:33 -0700 Subject: [PATCH 7/9] Add links to docs from how to index --- docs/core_docs/docs/how_to/index.mdx | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/core_docs/docs/how_to/index.mdx b/docs/core_docs/docs/how_to/index.mdx index cab7fc40414d..541c8efb4138 100644 --- a/docs/core_docs/docs/how_to/index.mdx +++ b/docs/core_docs/docs/how_to/index.mdx @@ -100,6 +100,9 @@ Document Loaders are responsible for loading documents from a variety of sources - [How to: load data from a directory](/docs/how_to/document_loader_directory) - [How to: load PDF files](/docs/how_to/document_loader_pdf) - [How to: write a custom document loader](/docs/how_to/document_loader_custom) +- [How to: load HTML data](/docs/how_to/document_loader_html) +- [How to: load Markdown data](/docs/how_to/document_loader_markdown) + ### Text splitters From 57134658a0c1b861d1d3c952d6070c3f8d586342 Mon Sep 17 00:00:00 2001 From: bracesproul Date: Mon, 20 May 2024 17:32:01 -0700 Subject: [PATCH 8/9] cr --- .../docs/how_to/document_loader_html.ipynb | 915 +----------------- .../how_to/document_loader_markdown.ipynb | 451 +-------- 2 files changed, 2 insertions(+), 1364 deletions(-) diff --git a/docs/core_docs/docs/how_to/document_loader_html.ipynb b/docs/core_docs/docs/how_to/document_loader_html.ipynb index ccee10bdac40..e7116a606924 100644 --- a/docs/core_docs/docs/how_to/document_loader_html.ipynb +++ b/docs/core_docs/docs/how_to/document_loader_html.ipynb @@ -133,919 +133,6 @@ " filetype: 'text/html',\n", " category: 'Title'\n", " }\n", - " },\n", - " Document {\n", - " pageContent: 'FOY-bul',\n", - " metadata: {\n", - " category_depth: 0,\n", - " emphasized_text_contents: [Array],\n", - " emphasized_text_tags: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Prev',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Next',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'What It Means',\n", - " metadata: {\n", - " category_depth: 1,\n", - " languages: [Array],\n", - " parent_id: '211c3644cd30f9bc846f0582444e9451',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Foibles are minor flaws or shortcomings in\\n' +\n", - " ' character or behavior. In fencing,\\n' +\n", - " \" foible refers to the part of a sword's blade\\n\" +\n", - " ' between the middle and point, which is considered the\\n' +\n", - " ' weakest part.',\n", - " metadata: {\n", - " emphasized_text_contents: [Array],\n", - " emphasized_text_tags: [Array],\n", - " languages: [Array],\n", - " parent_id: '7daafb9067bef1d8d72040b9015a168b',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '// He was amused daily by the foibles of his\\n' +\n", - " ' eccentric neighbor.',\n", - " metadata: {\n", - " emphasized_text_contents: [Array],\n", - " emphasized_text_tags: [Array],\n", - " languages: [Array],\n", - " parent_id: '7daafb9067bef1d8d72040b9015a168b',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'See the entry >',\n", - " metadata: {\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: '7daafb9067bef1d8d72040b9015a168b',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'foible in\\n Context',\n", - " metadata: {\n", - " category_depth: 1,\n", - " emphasized_text_contents: [Array],\n", - " emphasized_text_tags: [Array],\n", - " languages: [Array],\n", - " parent_id: '211c3644cd30f9bc846f0582444e9451',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '\"Films about important historical moments are often\\n' +\n", - " ' marked by a heavy solemnity, a sometimes suffocating\\n' +\n", - " ' respectfulness that can make one forget that these\\n' +\n", - " ' events involved real people, human beings with\\n' +\n", - " ' passions and foibles.\" — Michael Ordoña,\\n' +\n", - " ' The Los Angeles Times, 20 Jan. 2023',\n", - " metadata: {\n", - " emphasized_text_contents: [Array],\n", - " emphasized_text_tags: [Array],\n", - " languages: [Array],\n", - " parent_id: 'e57874be3146e3cb252d37874a07840c',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Build your vocabulary! Get Word of the Day in\\n' +\n", - " ' your inbox every day.',\n", - " metadata: {\n", - " category_depth: 2,\n", - " languages: [Array],\n", - " parent_id: 'e57874be3146e3cb252d37874a07840c',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Test Your Vocabulary',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'What Did You Just Call Me?',\n", - " metadata: {\n", - " category_depth: 3,\n", - " emphasized_text_contents: [Array],\n", - " emphasized_text_tags: [Array],\n", - " languages: [Array],\n", - " parent_id: 'e63fbd8b85aa3915e02c1c91f1c8adf0',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: undefined,\n", - " metadata: {\n", - " category_depth: 1,\n", - " languages: [Array],\n", - " parent_id: 'a17ddfeddba8873cf88e4ff041e439c5',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Before we went to her house, Hannah told\\n' +\n", - " ' us her aunt was a\\n' +\n", - " ' flibbertigibbet.',\n", - " metadata: {\n", - " category_depth: 1,\n", - " emphasized_text_contents: [Array],\n", - " emphasized_text_tags: [Array],\n", - " languages: [Array],\n", - " parent_id: 'a17ddfeddba8873cf88e4ff041e439c5',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Insulting\\n Complimentary',\n", - " metadata: {\n", - " category_depth: 1,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: 'a17ddfeddba8873cf88e4ff041e439c5',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'You know what it looks like… but what is\\n' +\n", - " ' it called?',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'a17ddfeddba8873cf88e4ff041e439c5',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'TAKE THE QUIZ',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Can you make 12 words with 7 letters?',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'eec9e49b2e4f6c99b5517e44ec9efeb4',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'PLAY',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Did You Know?',\n", - " metadata: {\n", - " category_depth: 1,\n", - " languages: [Array],\n", - " parent_id: '7a6d7635930938a56182b0cbc12b5b88',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Many word lovers agree that the pen is mightier than the\\n' +\n", - " ' sword. But be they\\n' +\n", - " ' honed\\n' +\n", - " ' in wit or form, even the sharpest tools in the shed have\\n' +\n", - " ' their flaws. That’s where foible comes in\\n' +\n", - " ' handy. Borrowed from French in the 1600s, the word\\n' +\n", - " ' originally referred to the weakest part of a fencing\\n' +\n", - " ' sword, that part being the portion between the middle\\n' +\n", - " ' and the pointed tip. The English foible soon\\n' +\n", - " ' came to be applied not only to weaknesses in blades but\\n' +\n", - " ' also to minor failings in character. The French source\\n' +\n", - " ' of foible is also at a remove from the fencing\\n' +\n", - " ' arena; the French foible means \"weak,\" and it\\n' +\n", - " ' comes from the same Old French term, feble,\\n' +\n", - " ' that gave us\\n' +\n", - " ' feeble.',\n", - " metadata: {\n", - " emphasized_text_contents: [Array],\n", - " emphasized_text_tags: [Array],\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: 'fbe2a45ff9ac86d8b3e5e311ba4aaf94',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Test Your Vocabulary with M-W Quizzes',\n", - " metadata: {\n", - " category_depth: 1,\n", - " languages: [Array],\n", - " parent_id: '7a6d7635930938a56182b0cbc12b5b88',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Famous Novels, First Lines Quiz',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Weather Words',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Name That Animal: Volume 3',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Name That Hat!',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Name That Flower',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Are You Feeling Lucky?',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Famous Novels, First Lines Quiz',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Weather Words',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Name That Animal: Volume 3',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Name That Hat!',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Name That Flower',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Are You Feeling Lucky?',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Famous Novels, First Lines Quiz',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Weather Words',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Name That Animal: Volume 3',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Play Now',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Test Your Vocabulary',\n", - " metadata: {\n", - " category_depth: 1,\n", - " languages: [Array],\n", - " parent_id: '63b33a2339cd82a6aec91b7c6c4e75c2',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Unscramble the letters to create a word that refers to a\\n' +\n", - " ' particular kind of fencing sword: BRASE.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'b95a085d102592f456bf277b90a7f7b9',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'VIEW THE ANSWER',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Podcast',\n", - " metadata: {\n", - " category_depth: 1,\n", - " languages: [Array],\n", - " parent_id: '2669d693b8827c7babf22eb05c106af5',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'More Words of the Day',\n", - " metadata: {\n", - " category_depth: 1,\n", - " languages: [Array],\n", - " parent_id: '2669d693b8827c7babf22eb05c106af5',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Apr 09\\n \\n auspicious',\n", - " metadata: {\n", - " category_depth: 1,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Apr 08\\n' +\n", - " ' \\n' +\n", - " ' circumscribe',\n", - " metadata: {\n", - " category_depth: 1,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Apr 07\\n \\n equivocal',\n", - " metadata: {\n", - " category_depth: 1,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Apr 06\\n \\n seder',\n", - " metadata: {\n", - " category_depth: 1,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Apr 05\\n' +\n", - " ' \\n' +\n", - " ' gerrymander',\n", - " metadata: {\n", - " category_depth: 1,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Apr 04\\n \\n belated',\n", - " metadata: {\n", - " category_depth: 1,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: '4ab7b48ab6922a6ce36b46af27bbf0ae',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'SEE ALL WORDS OF THE DAY',\n", - " metadata: {\n", - " category_depth: 0,\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Can you solve 4 words at once?\\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' Play\\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' Play\\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' Can you solve 4 words at once?\\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' Play\\n' +\n", - " ' \\n' +\n", - " ' \\n' +\n", - " ' Play',\n", - " metadata: {\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: 'f9af15bb60bb691bd715c92448636671',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Love words? Need even more definitions?',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'f9af15bb60bb691bd715c92448636671',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: \"Subscribe to America's largest dictionary and get thousands\\n\" +\n", - " ' more definitions and advanced search—ad free!',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'f9af15bb60bb691bd715c92448636671',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Merriam-Webster unabridged',\n", - " metadata: {\n", - " link_texts: [Array],\n", - " link_urls: [Array],\n", - " link_start_indexes: [Array],\n", - " languages: [Array],\n", - " parent_id: 'f9af15bb60bb691bd715c92448636671',\n", - " filename: 'wordoftheday.html',\n", - " filetype: 'text/html',\n", - " category: 'NarrativeText'\n", - " }\n", " }\n", "]\n" ] @@ -1062,7 +149,7 @@ "});\n", "\n", "const data = await loader.load()\n", - "console.log(data);" + "console.log(data.slice(0, 5));" ] } ], diff --git a/docs/core_docs/docs/how_to/document_loader_markdown.ipynb b/docs/core_docs/docs/how_to/document_loader_markdown.ipynb index a588be101f2c..a72849c739df 100644 --- a/docs/core_docs/docs/how_to/document_loader_markdown.ipynb +++ b/docs/core_docs/docs/how_to/document_loader_markdown.ipynb @@ -129,455 +129,6 @@ " filetype: 'text/markdown',\n", " category: 'Title'\n", " }\n", - " },\n", - " Document {\n", - " pageContent: 'You can use npm, yarn, or pnpm to install LangChain.js',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '8f698a6f3038c268bf6d65bc6065890b',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'npm install -S langchain or yarn add langchain or pnpm add langchain',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '8f698a6f3038c268bf6d65bc6065890b',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'typescript\\nimport { ChatOpenAI } from \"langchain/chat_models/openai\";',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '🌐 Supported Environments',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'LangChain is written in TypeScript and can be used in:',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '975643d774ab3b861962f9dc13588d84',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Node.js (ESM and CommonJS) - 18.x, 19.x, 20.x',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '975643d774ab3b861962f9dc13588d84',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Cloudflare Workers',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '975643d774ab3b861962f9dc13588d84',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Vercel / Next.js (Browser, Serverless and Edge functions)',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '975643d774ab3b861962f9dc13588d84',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Supabase Edge Functions',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '975643d774ab3b861962f9dc13588d84',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Browser',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '975643d774ab3b861962f9dc13588d84',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Deno',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '975643d774ab3b861962f9dc13588d84',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '🤔 What is LangChain?',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'LangChain is a framework for developing applications powered by language models. It enables applications that:\\n' +\n", - " '- Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)\\n' +\n", - " '- Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'This framework consists of several parts.\\n' +\n", - " '- LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic runtime for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.\\n' +\n", - " '- LangChain Templates: (currently Python-only) A collection of easily deployable reference architectures for a wide variety of tasks.\\n' +\n", - " '- LangServe: (currently Python-only) A library for deploying LangChain chains as a REST API.\\n' +\n", - " '- LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'The LangChain libraries themselves are made up of several different packages.\\n' +\n", - " '- @langchain/core: Base abstractions and LangChain Expression Language.\\n' +\n", - " '- @langchain/community: Third party integrations.\\n' +\n", - " \"- langchain: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.\",\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Integrations may also be split into their own compatible packages.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'This library aims to assist in the development of those types of applications. Common examples of these applications include:',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e2396958560b4688b2a242fbe54cd832',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '❓Question Answering over specific documents',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Documentation',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '2321e263d4278955b49ae7185a2e7071',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'End-to-end Example: Doc-Chatbot',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '2321e263d4278955b49ae7185a2e7071',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '💬 Chatbots',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Documentation',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '13bfe7de8241ff139f084c9528169836',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'End-to-end Example: Chat-LangChain',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '13bfe7de8241ff139f084c9528169836',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '🚀 How does LangChain help?',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'The main value props of the LangChain libraries are:\\n' +\n", - " '1. Components: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not\\n' +\n", - " '2. Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '1967058b7817d63c366c58df67e61178',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '1967058b7817d63c366c58df67e61178',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Components fall into the following modules:',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '1967058b7817d63c366c58df67e61178',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '📃 Model I/O:',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '7742f15be2acbf645543557b71bee56e',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '📚 Retrieval:',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '6a6b63610d2ca00f121f094a94d520be',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '🤖 Agents:',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'cc022877b6536240ca7e38e6827c4dba',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '📖 Documentation',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Please see here for full documentation, which includes:',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Getting started: installation, setting up the environment, simple examples',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Overview of the interfaces, modules and integrations',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Use case walkthroughs and best practice guides',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Reference: full API docs',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: 'e38f3af90533af34e7e50debd571bfc1',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'ListItem'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '💁 Contributing',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '248eb0e90cb2116083e2351ddd5218b8',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'For detailed information on how to contribute, see here.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '248eb0e90cb2116083e2351ddd5218b8',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'Please report any security issues or concerns following our security guidelines.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '248eb0e90cb2116083e2351ddd5218b8',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: '🖇️ Relationship with Python LangChain',\n", - " metadata: {\n", - " languages: [Array],\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'Title'\n", - " }\n", - " },\n", - " Document {\n", - " pageContent: 'This is built to integrate as seamlessly as possible with the LangChain Python package. Specifically, this means all objects (prompts, LLMs, chains, etc) are designed in a way where they can be serialized and shared between languages.',\n", - " metadata: {\n", - " languages: [Array],\n", - " parent_id: '48411b9b9512447054ee50f01d3fd6ee',\n", - " filename: 'README.md',\n", - " filetype: 'text/markdown',\n", - " category: 'NarrativeText'\n", - " }\n", " }\n", "]\n" ] @@ -594,7 +145,7 @@ "});\n", "\n", "const data = await loader.load()\n", - "console.log(data);" + "console.log(data.slice(0, 5));" ] }, { From af4e9d384d6ebd09d138d20efc6e4926859056a4 Mon Sep 17 00:00:00 2001 From: bracesproul Date: Mon, 20 May 2024 17:32:31 -0700 Subject: [PATCH 9/9] chore: lint files --- docs/core_docs/docs/how_to/index.mdx | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/core_docs/docs/how_to/index.mdx b/docs/core_docs/docs/how_to/index.mdx index 541c8efb4138..5d5e27e74481 100644 --- a/docs/core_docs/docs/how_to/index.mdx +++ b/docs/core_docs/docs/how_to/index.mdx @@ -103,7 +103,6 @@ Document Loaders are responsible for loading documents from a variety of sources - [How to: load HTML data](/docs/how_to/document_loader_html) - [How to: load Markdown data](/docs/how_to/document_loader_markdown) - ### Text splitters Text Splitters take a document and split into chunks that can be used for retrieval.