Subtitle Downloading #1117

bakshiutkarsha · 2020-05-15T06:54:15Z

should resolve #877

codecov · 2020-05-15T06:59:13Z

Codecov Report

Merging #1117 into master will increase coverage by 0.33%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1117      +/-   ##
==========================================
+ Coverage   68.65%   68.98%   +0.33%     
==========================================
  Files          25       25              
  Lines        2249     2270      +21     
  Branches      441      441              
==========================================
+ Hits         1544     1566      +22     
+ Misses        522      521       -1     
  Partials      183      183

Impacted Files	Coverage Δ
src/S3.ts	`75.67% <ø> (+1.99%)`	⬆️
src/util/misc.ts	`71.84% <100.00%> (ø)`
src/util/saveArticles.ts	`81.81% <100.00%> (+0.93%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b9c3fff...9634b73. Read the comment docs.

kelson42 · 2020-05-15T08:10:07Z

@bakshiutkarsha What is the status of this PR?! There is not WIP or draft... but nobody is assigned to review it?!

kelson42

@bakshiutkarsha Please secure that the the development process is respected properly. Missing the review assignement like all other kinds of thing going forgotten in the process (not putting draft, not reviewieng, forgotting to merge, forgotting to rebase properly, etc...) just slows down the whole process and creates overhead (often on my side).

Concerning the testing, the instructions were:

Here is how your test should looks like IMO: (1) you have a small wikicode for a video with subtitle (2) you send it to https://en.wikipedia.org/api/rest_v1/#/Transforms/post_transform_wikitext_to_html (3) you get the result rewrite it and verify that the track URL is proper.

So far I can see, (3) is there but not (1) and (2) which makes the whole test relatively weak. What will happen if for some reason the HTML generated by Mediawiki changes?! The answer is that it might go through unoticed!

bakshiutkarsha · 2020-05-16T07:32:23Z

@bakshiutkarsha Please secure that the the development process is respected properly. Missing the review assignement like all other kinds of thing going forgotten in the process (not putting draft, not reviewieng, forgotting to merge, forgotting to rebase properly, etc...) just slows down the whole process and creates overhead (often on my side).

Concerning the testing, the instructions were:

Here is how your test should looks like IMO: (1) you have a small wikicode for a video with subtitle (2) you send it to https://en.wikipedia.org/api/rest_v1/#/Transforms/post_transform_wikitext_to_html (3) you get the result rewrite it and verify that the track URL is proper.

So far I can see, (3) is there but not (1) and (2) which makes the whole test relatively weak. What will happen if for some reason the HTML generated by Mediawiki changes?! The answer is that it might go through unoticed!

@kelson42 I will make sure in future, these things won't affect the development timeline and the process is respected properly.

src/util/saveArticles.ts

midik · 2020-05-16T14:27:44Z

src/util/saveArticles.ts

    let mediaDependencies: Array<{ url: string, path: string }> = [];
    let doc = domino.createDocument(html);
-    const tmRet = await treatMedias(doc, mw, dump, articleId);
+    const tmRet = await treatMedias(doc, mw, dump, articleId, downloader, zimCreator);


not sure it's a good idea to carry downloader and zimCreator 3 levels down into treatMedias. Normally, we get a set of links back to upper level then handle them by iterating mediaDependencies here. This way we don't deal with zim file in treat*() functions which follow OOP spirit. I belive the function called processArticleHtml should know nothing about the entity like zim, so I feel there's better way to handle this ;)
Could we handle subtitles outside this function?

I am not sure I understood this correctly, I agree with point that downloader and zimcreator got dragged 3 level down, but trackEle is tightly coupled with videoEle, so IMO it makes sense to have them in a same place. But le me know if you suggest differently?

Probably it does make sense to keep status quo because that would require so much changes to make the order here in processArticleHtml(). Let's table this until we get a resources to catch up a technical debt.

I reopen this because I'm not happy about that either. We need to architecture this in a smarter way.

Reverted the change of dragging downloader and zimcreator to 3 level down and handled this in different way as suggested.

src/util/saveArticles.ts

test/util.ts

test/unit/saveArticles.test.ts

kelson42 · 2020-05-18T07:16:04Z

src/util/saveArticles.ts

    let mediaDependencies: Array<{ url: string, path: string }> = [];
    let doc = domino.createDocument(html);
-    const tmRet = await treatMedias(doc, mw, dump, articleId);
+    const tmRet = await treatMedias(doc, mw, dump, articleId, downloader, zimCreator);


I reopen this because I'm not happy about that either. We need to architecture this in a smarter way.

test/unit/saveArticles.test.ts

src/util/const.ts

src/util/saveArticles.ts

midik · 2020-05-18T07:41:49Z

BTW, here's the fix for the bug that we're probably facing on v14

kelson42 · 2020-05-18T07:50:09Z

BTW, here's the fix for the bug that we're probably facing on v14

Nice, happy if the bug is not on our side... and happy again to see this will be part of next v14 release (my guess).

midik

minorities

test/util.ts

test/unit/util.test.ts

midik

I've no more feedback here at this point

kelson42

CI must pass

kelson42 · 2020-05-31T07:26:50Z

test/unit/s3.test.ts

@@ -30,7 +30,7 @@ _test('S3 checks', async(t) => {
    const imageExist = await s3.downloadIfPossible('bm.wikipedia.org/static/images/project-logos/bmwiki-2x.png', 'https://bm.wikipedia.org/static/images/project-logos/bmwiki-2x.png');
    t.assert(!!imageExist, 'Image exists in S3');
    // Checking the data related to image matches
-    t.equals(imageExist.headers.Metadata.etag, '"aeff-54a391a807034"', 'Etag matches');
+    t.equals(imageExist.headers.Metadata.etag, '"a740-5a6b0464619c2"', 'Etag matches');


This is really weak... and consequently has been broken only a few months after been implemented. Automated test should not have the this ETAG hardcoded. If the whole lifecycle has been tested properly you should not need to do that. Should be pretty easy to fix.

I have removed this case, because lifecycle is getting tested.

test/unit/util.test.ts

test/util.ts

src/util/saveArticles.ts

kelson42 · 2020-06-06T07:37:19Z

@bakshiutkarsha It needs a few changes and the CI shoudl pass.

test/unit/util.test.ts

kelson42 · 2020-06-08T15:13:27Z

test/unit/saveArticles.test.ts

+test('treat multiple subtitles in one video', async(t) => {
+    const { downloader, mw, dump } = await setupScrapeClasses({ format: '' });
+
+    // Wikicode is taken from article "User:Charliechlorine/sandbox" which has multiple(4) subtitles in this video


My last comment is confusing:

'treat one subtitle' should test exactly which URL

'treat multiple subtitles in one video' should do the same

I would add an global test, testing the whole HTML testHtmlRewritingE2e

kelson42 · 2020-06-14T11:56:20Z

@bakshiutkarsha Can you please rebase on master?

bakshiutkarsha · 2020-06-15T05:12:54Z

'treat one subtitle' should test exactly which URL - DONE
'treat multiple subtitles in one video' should do the same - DONE
I would add an global test, testing the whole HTML testHtmlRewritingE2e - DONE, but can you confirm is this what you wanted?

kelson42

You should have one test for treatSubtitle(), one test for treatVideo() and one E2E test for a video wiki text using testHtmlRewritingE2e().

Yout current usage of testHtmlRewritingE2e() for subtitles is useless because you test it with the value of your own code... so this is probably always going to work. A unit test should test against hardcoded values (so here HTML)

bakshiutkarsha · 2020-06-15T07:40:30Z

You should have one test for treatSubtitle(), one test for treatVideo() and one E2E test for a video wiki text using testHtmlRewritingE2e().

Yout current usage of testHtmlRewritingE2e() for subtitles is useless because you test it with the value of your own code... so this is probably always going to work. A unit test should test against hardcoded values (so here HTML)

The HTML I am getting from the API is not exactly same as the online version of it. For eg. API gives me HTML starting from figure tag and there is no such tag in online version of it. So, what should be done in this case?

kelson42 force-pushed the issue/877-new branch from 435c4c1 to 9ea0be2 Compare May 15, 2020 13:11

bakshiutkarsha requested review from kelson42 and midik May 16, 2020 01:51

kelson42 requested changes May 16, 2020

View reviewed changes

midik suggested changes May 16, 2020

View reviewed changes

midik reviewed May 17, 2020

View reviewed changes

src/util/saveArticles.ts Outdated Show resolved Hide resolved

bakshiutkarsha force-pushed the issue/877-new branch from 9ea0be2 to f8297c4 Compare May 18, 2020 06:09

bakshiutkarsha requested review from midik and kelson42 May 18, 2020 06:22

midik reviewed May 18, 2020

View reviewed changes

test/util.ts Outdated Show resolved Hide resolved

kelson42 requested changes May 18, 2020

View reviewed changes

bakshiutkarsha marked this pull request as draft May 23, 2020 08:00

bakshiutkarsha force-pushed the issue/877-new branch from 7d71384 to e40387f Compare May 25, 2020 20:13

bakshiutkarsha marked this pull request as ready for review May 25, 2020 20:14

bakshiutkarsha requested review from kelson42 and midik May 25, 2020 20:14

midik reviewed May 25, 2020

View reviewed changes

test/util.ts Outdated Show resolved Hide resolved

test/util.ts Outdated Show resolved Hide resolved

test/unit/util.test.ts Outdated Show resolved Hide resolved

bakshiutkarsha requested a review from midik May 26, 2020 06:52

kelson42 force-pushed the issue/877-new branch from eb771c3 to 27545a5 Compare May 28, 2020 06:48

midik approved these changes May 28, 2020

View reviewed changes

kelson42 requested changes May 28, 2020

View reviewed changes

bakshiutkarsha force-pushed the issue/877-new branch from 27545a5 to 0e391a5 Compare May 31, 2020 06:05

kelson42 requested changes May 31, 2020

View reviewed changes

midik mentioned this pull request Jun 2, 2020

fix handling page lead section (arch) #1138

Closed

bakshiutkarsha requested a review from kelson42 June 3, 2020 02:32

bakshiutkarsha force-pushed the issue/877-new branch from a025e0a to 5f26b54 Compare June 8, 2020 05:35

kelson42 requested changes Jun 8, 2020

View reviewed changes

test/unit/util.test.ts Outdated Show resolved Hide resolved

kelson42 requested changes Jun 8, 2020

View reviewed changes

bakshiutkarsha added 16 commits June 15, 2020 03:56

simplified tests

10d7bf6

for loop changes

1deba55

test case changed and review changes

c2055ef

early retrun added

f6eedda

codefactor suggestions added

b721ab7

loggers removed

890bf62

lint fix

faeee03

with new architecture of subtitle download

5998e85

test cases modified

6c24532

rebased with master

8faf5a5

changes added

a31ae2e

try-catch block added

2437cbc

multiple subtitles handled

21b15ba

new chages

151eade

comment added

f34ddfe

test cases changed for subtitles

ddb15cb

bakshiutkarsha force-pushed the issue/877-new branch from 5f26b54 to ddb15cb Compare June 15, 2020 05:04

trailing spaces resolved

361668f

bakshiutkarsha requested a review from kelson42 June 15, 2020 05:13

Remove unused code

9634b73

kelson42 requested changes Jun 15, 2020

View reviewed changes

kelson42 merged commit 39a5e62 into master Jun 15, 2020

kelson42 deleted the issue/877-new branch June 15, 2020 08:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtitle Downloading #1117

Subtitle Downloading #1117

bakshiutkarsha commented May 15, 2020

codecov bot commented May 15, 2020 •

edited

Loading

kelson42 commented May 15, 2020

kelson42 left a comment

bakshiutkarsha commented May 16, 2020

midik May 16, 2020

bakshiutkarsha May 17, 2020

midik May 17, 2020

kelson42 May 18, 2020

bakshiutkarsha May 26, 2020 •

edited

Loading

kelson42 May 18, 2020

midik commented May 18, 2020

kelson42 commented May 18, 2020

midik left a comment

midik left a comment

kelson42 left a comment

kelson42 May 31, 2020

bakshiutkarsha Jun 3, 2020

kelson42 commented Jun 6, 2020

kelson42 Jun 8, 2020

kelson42 commented Jun 14, 2020

bakshiutkarsha commented Jun 15, 2020

kelson42 left a comment

bakshiutkarsha commented Jun 15, 2020 •

edited

Loading

Subtitle Downloading #1117

Subtitle Downloading #1117

Conversation

bakshiutkarsha commented May 15, 2020

codecov bot commented May 15, 2020 • edited Loading

Codecov Report

kelson42 commented May 15, 2020

kelson42 left a comment

Choose a reason for hiding this comment

bakshiutkarsha commented May 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bakshiutkarsha May 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

midik commented May 18, 2020

kelson42 commented May 18, 2020

midik left a comment

Choose a reason for hiding this comment

midik left a comment

Choose a reason for hiding this comment

kelson42 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kelson42 commented Jun 6, 2020

Choose a reason for hiding this comment

kelson42 commented Jun 14, 2020

bakshiutkarsha commented Jun 15, 2020

kelson42 left a comment

Choose a reason for hiding this comment

bakshiutkarsha commented Jun 15, 2020 • edited Loading

codecov bot commented May 15, 2020 •

edited

Loading

bakshiutkarsha May 26, 2020 •

edited

Loading

bakshiutkarsha commented Jun 15, 2020 •

edited

Loading