feat: Enhance the triplets extraction in the knowledge graph by the batch size #2091

Appointat · 2024-10-23T03:47:11Z

Description

It calls the async function to accelerate the process of triplets extraction from the chunk text. The config can be set in the .env TRIPLET_EXTRACTION_BATCH_SIZE (default to 20).

How Has This Been Tested?

I have run the app server by set the value of TRIPLET_EXTRACTION_BATCH_SIZE differently. (1, 5, 100), and the running time varies.

Snapshots:

        batch_size = self._triplet_extraction_batch_size

        for i in range(0, len(chunks), batch_size):
            batch_chunks = chunks[i : i + batch_size]

            extraction_tasks = [
                self._graph_extractor.extract(chunk.content) for chunk in batch_chunks
            ]
            async_graphs: List[List[MemoryGraph]] = await asyncio.gather(
                *extraction_tasks
            )

            for chunk, graphs in zip(batch_chunks, async_graphs):
                for graph in graphs:
                    if document_graph_enabled:
                        # append the chunk id to the edge
                        for edge in graph.edges():
                            edge.set_prop("_chunk_id", chunk.chunk_id)
                            graph.append_edge(edge=edge)

                    # upsert the graph
                    self._graph_store_apdater.upsert_graph(graph)

                    # chunk -> include -> entity
                    if document_graph_enabled:
                        for vertex in graph.vertices():
                            self._graph_store_apdater.upsert_chunk_include_entity(
                                chunk=chunk, entity=vertex
                            )

Checklist:

My code follows the style guidelines of this project
I have already rebased the commits and make the commit message conform to the project standard.
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
Any dependent changes have been merged and published in downstream modules

Co-authored-by: Appointat <appointat@shu.edu.cn>

Appointat · 2024-10-23T03:48:20Z

@Aries-ckt @fanzhidongyzby Could you please review it and add some tags? thanks

Aries-ckt

LGTM.

fanzhidongyzby

The read and write order of chunk_history in _graph_extractor needs to be adjusted, otherwise it will lead to inconsistency between text block recall and serial semantics.

Co-authored-by: Appointat <appointat@shu.edu.cn>

…thod Co-authored-by: Appointat <appointat@shu.edu.cn>

Appointat · 2024-10-28T08:34:37Z

The read and write order of chunk_history in _graph_extractor needs to be adjusted, otherwise it will lead to inconsistency between text block recall and serial semantics.

Thank you for your comment, I fixed it just now.

fanzhidongyzby

refine the code by following comments

.env.template

dbgpt/storage/knowledge_graph/community_summary.py

dbgpt/rag/transformer/graph_extractor.py

…ySummaryKnowledgeGraph

…Extractor

…GraphStoreAdapter

…CommunitySummaryKnowledgeGraph

Appointat · 2024-10-30T02:10:40Z

fanzhidongyzby

lgtm

Aries-ckt

LGTM

Appointat added 2 commits October 23, 2024 11:30

feat: Improve triplet extraction batch size and handling

89db275

Co-authored-by: Appointat <appointat@shu.edu.cn>

feat: Improve triplet extraction batch size and handling

f8e3ed1

Co-authored-by: Appointat <appointat@shu.edu.cn>

github-actions bot added the enhancement New feature or request label Oct 23, 2024

Aries-ckt added the hacktoberfest label Oct 23, 2024

Aries-ckt previously approved these changes Oct 24, 2024

View reviewed changes

fanzhidongyzby requested changes Oct 24, 2024

View reviewed changes

Appointat and others added 4 commits October 28, 2024 16:30

refactor: Add batch_extract method to ExtractorBase

a57029e

Co-authored-by: Appointat <appointat@shu.edu.cn>

refactor: refactor: Add batch_extract method to GraphExtractor

3fc7640

Co-authored-by: Appointat <appointat@shu.edu.cn>

refactor: Add batch_extract method to LLMExtractor

fee90cc

Co-authored-by: Appointat <appointat@shu.edu.cn>

refactor: Refactor CommunitySummaryKnowledgeGraph batch extraction me…

ccd2cdf

…thod Co-authored-by: Appointat <appointat@shu.edu.cn>

Appointat dismissed Aries-ckt’s stale review via ccd2cdf October 28, 2024 08:32

fanzhidongyzby reviewed Oct 29, 2024

View reviewed changes

.env.template Outdated Show resolved Hide resolved

dbgpt/storage/knowledge_graph/community_summary.py Show resolved Hide resolved

dbgpt/rag/transformer/graph_extractor.py Outdated Show resolved Hide resolved

Appointat added 9 commits October 29, 2024 12:15

refactor: Update knowledge graph extraction batch size

3f65e49

refactor: Update knowledge graph extraction batch size

a253542

Refactor batch extraction methods in GraphExtractor and LLMExtractor

c565600

Refactor knowledge graph extraction batch size and method in Communit…

a4e602e

…ySummaryKnowledgeGraph

refactor: Refactor batch extraction methods in GraphExtractor and LLM…

7d4d7f4

…Extractor

feat: Refactor knowledge graph extraction batch size and method in Tu…

5aaa393

…GraphStoreAdapter

refactor: Update knowledge graph extraction batch size and method in …

e8b82db

…CommunitySummaryKnowledgeGraph

Refactor method signature in TuGraphStoreAdapter

0b87218

Refactor markdown format in community_summary.py

e6f6d33

Appointat added 5 commits October 30, 2024 18:07

fix: Refactor graph store configuration and enable/disable graph search

1ff3184

chore: format the code

a8f9321

fix: Refactor TuGraphStoreAdapter to improve graph retrieval logic

7e3c3c7

fix

0c263bf

Refactor markdown format in community_summary.py

f0216d7

fanzhidongyzby approved these changes Nov 5, 2024

View reviewed changes

Aries-ckt approved these changes Nov 5, 2024

View reviewed changes

Aries-ckt merged commit 25d47ce into eosphoros-ai:main Nov 5, 2024
4 checks passed

Appointat deleted the feat/async_triplets_extraction branch November 5, 2024 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enhance the triplets extraction in the knowledge graph by the batch size #2091

feat: Enhance the triplets extraction in the knowledge graph by the batch size #2091

Appointat commented Oct 23, 2024 •

edited

Loading

Appointat commented Oct 23, 2024 •

edited

Loading

Aries-ckt left a comment

fanzhidongyzby left a comment

Appointat commented Oct 28, 2024

fanzhidongyzby left a comment

Appointat commented Oct 30, 2024 •

edited

Loading

fanzhidongyzby left a comment

Aries-ckt left a comment

feat: Enhance the triplets extraction in the knowledge graph by the batch size #2091

feat: Enhance the triplets extraction in the knowledge graph by the batch size #2091

Conversation

Appointat commented Oct 23, 2024 • edited Loading

Description

How Has This Been Tested?

Snapshots:

Checklist:

Appointat commented Oct 23, 2024 • edited Loading

Aries-ckt left a comment

Choose a reason for hiding this comment

fanzhidongyzby left a comment

Choose a reason for hiding this comment

Appointat commented Oct 28, 2024

fanzhidongyzby left a comment

Choose a reason for hiding this comment

Appointat commented Oct 30, 2024 • edited Loading

fanzhidongyzby left a comment

Choose a reason for hiding this comment

Aries-ckt left a comment

Choose a reason for hiding this comment

Appointat commented Oct 23, 2024 •

edited

Loading

Appointat commented Oct 23, 2024 •

edited

Loading

Appointat commented Oct 30, 2024 •

edited

Loading