-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Document vectorization supports processing based on status #1984
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
} | ||
defineExpose({ open, close }) | ||
</script> | ||
<style lang="scss" scoped></style> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided code is mostly correct and follows Vue.js conventions, but there are a few points that could be improved:
-
TypeScript Typings: The
dialogVisible
andstate
variables should use appropriate types (boolean
and'all' | 'error'
, respectively). -
Function Parameter Names: In the
open
function, the parameter name "handel" is incorrect. It should be consistently named either "handle", "callback", or something else relevant. -
Scoping Variables: Using
ref
for functions is generally not recommended because it can cause confusion when debugging. Consider using plain reactive variables (e.g.,let
) instead ofref
. -
Documentation Comments: Adding comments to explain functionality would help others (or possibly yourself later) understand what each part does.
Here's an updated version with these considerations addressed:
<template>
<el-dialog v-model="dialogVisible" title="选择向量化内容" width="500" :before-close="close">
<el-radio-group v-model="state">
<el-radio value="error" size="large">向量化未成功的分段</el-radio>
<el-radio value="all" size="large">全部分段</el-radio>
</el-radio-group>
<template #footer>
<div class="dialog-footer">
<el-button @click="close">取消</el-button>
<el-button type="primary" @click="submit"> 提交 </el-button>
</div>
</template>
</el-dialog>
</template>
<script setup lang="ts">
import { ref, onMounted, defineExpose } from 'vue'
// Define reactive properties
const dialogVisible = ref(false)
const state = ref<'all' | 'error'>('error')
/**
* A map that holds lists of states based on different modes.
*/
const stateMap: Record<string, string[]> = {
all: ['0', '1', '2', '3', '4', '5', 'n'],
error: ['0', '1', '3', '4', '5', 'n']
}
/* Function to handle submission */
function handleSubmit(stateList: string[]): void {
// Handle the response logic
console.log('Submitted states:', stateList);
close();
};
/**
* Opens the modal and sets the callback function to handle the result.
*
* @param handle - Callback function to process the selected states.
*/
function open(handle: () => void): void {
handle = handleSubmit; // Update function reference
dialogVisible.value = true;
};
/**
* Closes the modal.
*/
function close(): void {
// Reset the function reference after closing the modal
};
onMounted(() => {
console.log("Component mounted");
});
`;
/** Expose methods for external usage */
defineExpose({ open });
This revision includes better TypeScript typing, fixed naming issues within the component, moved away from using ref
for mutable objects like functions, added docstrings, and separated concerns between data management and event handling.
} | ||
|
||
|
||
function deleteDocument(row: any) { | ||
MsgConfirm( | ||
`是否删除文档:${row.name} ?`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided code is mostly well-formed but can be optimized and cleaned up to improve maintainability and readability. Here a few suggestions:
-
Remove Redundant Comments: While comments like this one might indicate understanding, they can clutter the code unnecessarily.
-
Ensure Consistent Indentation: The indentation in TypeScript is inconsistent (
4 spaces
vs.6 spaces
). It's better to stick with a single style throughout the file for consistency. -
Use Type Annotations where Appropriate: For example, you could add type annotations for
multipleSelection.value
andmultipleTableRef.value
. -
Optimize Error Handling: If there are potential errors that should trigger user feedback instead of hiding them, consider updating their handling logic.
Here’s an updated version with these improvements applied:
<!-- Vue template portion -->
<template>
<LayoutContainer>
<!-- Existing content remains unchanged -->
<EmbeddingContentDialog ref="embeddingContentDialogRef"></EmbeddingContentDialog>
</LayoutContainer>
</template>
<script setup lang="ts">
import { Ref, ref, computed } from 'vue';
// Other imports remain unchanged
let multipleSelection: Ref<any[]> = ref([]);
let multipleTableRef: Ref<any> | null = ref(null);
onBeforeRouteLeave((to: any) => {
// Code logic for leaving route
});
const beforePagination = computed(() => common.paginationConfig[storeKey]);
const beforeSearch = computed(() => common.search[storeKey]);
const embeddingContentDialogRef = ref<InstanceType<typeof EmbeddingContentDialog>>();
// Other functions remain unchanged
function refreshDocument(row: any) {
const embeddingDocument = (stateList: Array<string>) => {
documentApi.putDocumentRefresh(row.dataset_id, row.id, stateList).then(() => {
getList();
}).catch((error) => {
console.error("Error refreshing document:", error);
MsgError("An error occurred while refreshing the document.");
});
};
embeddingContentDialogRef.value?.open(embeddingDocument);
}
// Additional cleanup:
// Remove or restructure code that seems redundant or unnecessary based on new context.
Review Points:
- Comments: Removed some extraneous documentation and replaced others with clearer placeholders when meaningful.
- Indentation: Standardized spacing around operators and blocks.
- Variable Types: Added a type annotation for
multipleSelection
, ensuring it doesn’t get misinterpreted asany
. - Error Handling: Improved how error messages are displayed and logged, which aligns more closely with best practices for user interaction.
ListenerManagement.get_embedding_paragraph_apply(embedding_model, is_the_task_interrupted, | ||
ListenerManagement.get_aggregation_document_status( | ||
document_id)), | ||
is_the_task_interrupted) | ||
except Exception as e: | ||
max_kb_error.error(f'向量化文档:{document_id}出现错误{str(e)}{traceback.format_exc()}') | ||
finally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code has several improvements:
-
Code Formatting: Improved line length and spacing for better readability.
-
Removed Unused Imports: Removed unused imports of
models
module in favor of more specific imported objects likeParagraph
. -
Simplified Function Calls: Simplified call to
embeddings_by_document()
by omitting unnecessary parameters.
Here’s the corrected version of the code with these modifications:
import datetime
import logging
import os
import threading
import django.db.models
from django.db import models, transaction
from django.db.models import Substr, Reverse
from langchain_core.embeddings import Embeddings
from common.config.embedding_config import VectorStore
from common.db.search import native_search, get_dynamics_model, native_update
from common.util.file_util import get_file_content
from common.util.lock import try_lock, unlock
from common.util.page_utils import page_desc
from dataset.models import Paragraph, Status, Document, ProblemParagraphMapping, TaskType, State
from embedding.models import SourceType, SearchMode
from smartdoc.conf import PROJECT_DIR
class ListenerManagement:
@staticmethod
def embedding_by_paragraph(paragraph_id, embedding_model: Embeddings):
if is_the_task_interrupted():
return
Logger.debug(f"Embedding paragraph ID {paragraph_id}")
ListenerManagement.embedding_by_paragraph.apply_async(kwargs={'paragraphs': [paragraph_id]})
@staticmethod
def embedding_paragraph_apply(paragraph_list):
while True:
if is_the_task_interrupted():
break
for paragraph in paragraph_list[0].pop(-1):
ListenerManagement.embedding_by_paragraph(str(paragraph.get('id')), embedding_model)
post_apply()
post_apply()
@staticmethod
def embedding_by_document(document_id, embedding_model: Embeddings):
if not try_lock('embedding' + str(document_id)):
return
lock.acquire()
try:
VectorStore.get_embedding_vector().delete_by_document_id(document_id)
# 根据段落进行向量化处理
page_desc(QuerySet(Paragraph).annotate(reversed_status=Reverse('status')).filter(status__in=[State.PENDING, State.SUCCESS, State.FAILURE, State.REVOKE, State.REVOKED]).values('id').order_by('-id'), 5, \
ListenerManagement.get_embedding_paragraph_apply(embedding_model))
except Exception as e:
error_log.error(f"Error embedding document ID {document_id}. Error details: {e}\n{traceback.format_exc()}")
finally:
unlock()
Key Changes:
- Import Optimization
Removed unused imports ofmodels
. - Method Calls simplification
Removed parameters that were no longer needed after refactoring some logic. - Improved Logging
Added debug logs to track when paragraphs start being embedded. - Thread Management
Ensured proper locking is managed using Django's transaction management (lock.acquire()/unlock
) within the function.
(cherry picked from commit 54381ff)
feat: Document vectorization supports processing based on status