-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better orion language detection #337
Merged
zeeshanakram3
merged 13 commits into
Joystream:master
from
ikprk:feat/custom-migration-language-detection-api
Jul 10, 2024
Merged
Changes from 10 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
b089185
New model to track orion language processing
ikprk 23d8f11
New function to detect language
ikprk 54e2844
New custom migration
ikprk 06b09c1
Add cursor tracker and video orion language to offchain export
ikprk 71c4997
Running `custom-migration` command
ikprk 4f49891
Add manger to trigger video language updates
ikprk 72c3f64
Adjust orion video language manager to support video update
ikprk 32d7c5b
Revert "Running `custom-migration` command"
ikprk c42d1ac
Second run of `create-migrations` command
ikprk 7945106
CR fixes
ikprk 8311af2
move 'orion_offchain_cursor' table to admin schema
zeeshanakram3 8e2a2d3
bump package version and add change log
zeeshanakram3 61db7d0
fix: bug in case detected language is undefined
zeeshanakram3 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
module.exports = class Data1719233585592 { | ||
name = 'Data1719233585592' | ||
|
||
async up(db) { | ||
await db.query(`CREATE TABLE "orion_offchain_cursor" ("cursor_name" character varying NOT NULL, "value" bigint NOT NULL, CONSTRAINT "PK_7083797352af5a21224b6c8ccbc" PRIMARY KEY ("cursor_name"))`) | ||
} | ||
|
||
async down(db) { | ||
await db.query(`DROP TABLE "orion_offchain_cursor"`) | ||
} | ||
} |
4 changes: 2 additions & 2 deletions
4
db/migrations/1709641962433-Views.js → db/migrations/1719233585692-Views.js
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
import { Entity, Column, PrimaryColumn } from 'typeorm' | ||
|
||
@Entity() | ||
export class OrionOffchainCursor { | ||
constructor(props?: Partial<OrionOffchainCursor>) { | ||
Object.assign(this, props) | ||
} | ||
|
||
/** | ||
* Name of the offchain cursor | ||
*/ | ||
@PrimaryColumn() | ||
cursorName!: string | ||
|
||
/** | ||
* Value of the cursor | ||
*/ | ||
@Column('int8', { nullable: false }) | ||
value!: number | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
export * from './generated' | ||
export { NextEntityId } from './NextEntityId' | ||
export { OrionOffchainCursor } from './OrionOffchainCursor' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
import { EntityManager } from 'typeorm' | ||
import { | ||
detectVideoLanguageWithProvider, | ||
updateVideoLanguages, | ||
VIDEO_ORION_LANGUAGE_CURSOR_NAME, | ||
} from './customMigrations/setOrionLanguageProvider' | ||
import { globalEm } from './globalEm' | ||
|
||
export class OrionVideoLanguageManager { | ||
private videoToDetect: Set<string> = new Set() | ||
|
||
async init(intervalMs: number): Promise<void> { | ||
if (!VIDEO_ORION_LANGUAGE_CURSOR_NAME) { | ||
return | ||
} | ||
|
||
this.updateLoop(intervalMs) | ||
.then(() => { | ||
/* Do nothing */ | ||
}) | ||
.catch((err) => { | ||
console.error(err) | ||
process.exit(-1) | ||
}) | ||
} | ||
|
||
scheduleVideoForDetection(id: string | null | undefined) { | ||
if (id) { | ||
this.videoToDetect.add(id) | ||
} | ||
} | ||
|
||
async updateScheduledVideoLanguage(em: EntityManager) { | ||
if (!this.videoToDetect.size) { | ||
return | ||
} | ||
|
||
const videos = await em.query(` | ||
SELECT id, title, description | ||
FROM admin.video | ||
WHERE id in (${[...this.videoToDetect.values()].map((id) => `'${id}'`).join(',')}) | ||
`) | ||
|
||
await updateVideoLanguages(em, videos) | ||
this.videoToDetect.clear() | ||
} | ||
|
||
async updateOrionVideoLanguage() { | ||
return detectVideoLanguageWithProvider() | ||
} | ||
|
||
private async updateLoop(intervalMs: number): Promise<void> { | ||
const em = await globalEm | ||
while (true) { | ||
await this.updateScheduledVideoLanguage(em).catch((e) => { | ||
console.log(`Updating scheduled videos Orion language with provider failed`, e) | ||
}) | ||
await this.updateOrionVideoLanguage().catch((e) => { | ||
console.log(`Updating Orion language with provider failed`, e) | ||
}) | ||
await new Promise((resolve) => setTimeout(resolve, intervalMs)) | ||
} | ||
} | ||
} |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
import { EntityManager } from 'typeorm' | ||
import { OrionOffchainCursor } from '../../model' | ||
import { globalEm } from '../globalEm' | ||
import { predictLanguageWithProvider } from '../language' | ||
|
||
const batchSize = 5_000 // Adjust the batch size based on your database and network performance | ||
|
||
type VideoUpdateType = { | ||
id: string | ||
title: string | ||
description: string | ||
} | ||
|
||
export const VIDEO_ORION_LANGUAGE_CURSOR_NAME = 'video_orion_language' | ||
|
||
export async function updateVideoLanguages(em: EntityManager, videos: VideoUpdateType[]) { | ||
const mappedVideos = videos.map((video) => `${video.title} ${video.description}`) | ||
|
||
const predictionForVideos = await predictLanguageWithProvider(mappedVideos) | ||
|
||
const videosWithDetections = videos.map((video, index) => ({ | ||
...video, | ||
detectedLanguage: predictionForVideos[index], | ||
})) | ||
|
||
const query = ` | ||
UPDATE admin.video AS v SET | ||
orion_language = c.orion_language | ||
FROM (VALUES ${videosWithDetections | ||
.map((_, idx) => `($${idx * 2 + 1}, $${idx * 2 + 2})`) | ||
.join(',')}) AS c(orion_language, id) | ||
WHERE c.id = v.id; | ||
` | ||
|
||
const queryParams = videosWithDetections.flatMap((update) => [update.detectedLanguage, update.id]) | ||
|
||
// Execute batch update | ||
await em.query(query, queryParams) | ||
} | ||
|
||
export async function detectVideoLanguageWithProvider() { | ||
const em: EntityManager = await globalEm | ||
let cursorEntity: { value: number }[] = await em.query( | ||
`SELECT value FROM orion_offchain_cursor WHERE cursor_name='${VIDEO_ORION_LANGUAGE_CURSOR_NAME}'` | ||
) | ||
while (true) { | ||
const cursor = +(cursorEntity[0]?.value ?? 0) | ||
|
||
const videos: VideoUpdateType[] = await em.query(` | ||
SELECT id, title, description | ||
FROM admin.video | ||
ORDER BY id::INTEGER ASC | ||
OFFSET ${cursor} | ||
LIMIT ${batchSize} | ||
`) | ||
|
||
if (!videos.length) { | ||
console.log('No more videos!') | ||
break | ||
} | ||
|
||
await updateVideoLanguages(em, videos) | ||
const newCursor = new OrionOffchainCursor({ | ||
cursorName: VIDEO_ORION_LANGUAGE_CURSOR_NAME, | ||
value: cursor + Math.min(batchSize, videos.length), | ||
}) | ||
await em.save(newCursor) | ||
cursorEntity = [newCursor] | ||
console.log( | ||
`Updated languages for videos in range ${cursor}-${ | ||
cursor + Math.min(batchSize, videos.length) | ||
}` | ||
) | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to create this in admin schema