Integrate the wikitext task in the webapp #675

JulienVig · 2024-05-15T13:37:51Z

Fixes Can't load tokenizer from the web-client #669.
Fixed by preventing local caching
Fixes WebGL prevents training wikitext on Firefox #673.
Added an error message suggesting changing browser.
Fixes Saving and loading GPT-tfjs models with IndexedDB fails with error #674
Extended memory caching and saving for gpt-tfjs

Rework the validator's test and inference method to enable users to stop and also fix tfjs memory leaks

…s in the browser

tharvik

wouhou, gpt in browser, well done!

testing the model is a bit weird, as there is not specific handler for that (got 5000% accuracy, such impressive model). but that's alright for now IMO, we can add it in a later iteration

discojs-web/src/memory/memory.ts

discojs/src/memory/model_type.ts

discojs/src/models/gpt/index.ts

discojs/src/models/tokenizer.ts

discojs/src/task/training_information.ts

webapp/src/store/memory.ts

JulienVig · 2024-06-13T10:01:13Z

Did you really get 5000% accuracy? If that's the case this is alarming, I don't see how it could happen

tharvik · 2024-06-13T10:48:13Z

Did you really get 5000% accuracy? If that's the case this is alarming, I don't see how it could happen

yep, it's fluctuating around 5000%±1000%, and use a huge amount of memory (before crashing my tab). the reproductability that I've is training wikitext once with "wiki.train.tokens" then testing it with "wiki.test.tokens". I'm using Firefox fwiw. before crashing it shows the next line in console:
High memory usage in GPU: 1199.58 MB, most likely due to a memory leak
and indeed, my system memory gets filled up quite rapidly (which triggers OOM, which kills the tab)

JulienVig · 2024-06-13T11:59:14Z

Indeed I'll try to fix that before merging the PR, testing the model takes up more than 50GB

… leak

JulienVig · 2024-06-13T14:38:00Z

@tharvik all fixed, I reworked the validator:

removed the graph_informant (it was only used to query the validator accuracy)
fixed the memory leak (tfjs sucks)
fixed the logic mistake that yielded 5000 of accuracy
renamed assess and predict into test and inference respectively
made these methods generators to let users stop them. I had to turn the mapAsync into an iterator while loop to allow stopping, do you know if we could combine the functional way and generators somehow?

Until an LLM UI is implemented running inference is useless as it doesn't display anything

tharvik · 2024-06-14T09:51:21Z

@tharvik all fixed, I reworked the validator:

superbe, it tests nicely now 🥳 (tfjs is shitty indeed)
and good idea to update the validator to be a real generator, that's more in line w/ training and more usuable.

made these methods generators to let users stop them. I had to turn the mapAsync into an iterator while loop to allow stopping, do you know if we could combine the functional way and generators somehow?

hum, no, you sadly have to have a while loop as tf.data.Dataset is not using generators. however, as datasets are lazy, you can have a dataset shaped with mapAsync and have a small while loop taking care of transforming it in a generator (maybe a bit more readable).
generators a quite powerful, but if libraries aren't exposing theses, it doesn't know how to stop processing a function. it can only return between lines, and in this case, there is a huge await dataset.mapAsync.[...].toArray so it can only stop before or after that.

Until an LLM UI is implemented running inference is useless as it doesn't display anything

yep, I'm trying to draft up something basic in my PR, let's see what comes out of it :)

JulienVig added 2 commits May 15, 2024 15:20

Prevent tokenizer from trying to load model from filesystem

beb682f

Slightly refactor web memory

40a785a

JulienVig added bug Something isn't working web client Related to the browser environment discojs Related to Disco.js labels May 15, 2024

JulienVig self-assigned this May 15, 2024

JulienVig added 10 commits May 27, 2024 14:37

Add UI error messages

15b9031

Add tensorBackend field to task training information

2fcac7a

Include tensorBackend type in the memory mechanism

a29d43b

Remove stateful attributes from gpt-tfs to allow saving/loading model…

c0464c3

…s in the browser

Throw error rather than toaster message

6a85b70

Remove unused variable

8056221

Add tensorbackend to tests

9af0305

Merge

c9175fb

Remove git merge artifacts

f3f702b

tensorBackend to skin condition task

b2c10c3

JulienVig marked this pull request as ready for review June 11, 2024 16:17

JulienVig requested a review from tharvik June 11, 2024 16:17

tharvik approved these changes Jun 12, 2024

View reviewed changes

Clean code amd address PR comments

83fe2fc

JulienVig added 2 commits June 13, 2024 12:18

Make StoredModelType type rather than enum

77ffc45

Fix linting error

e915a0f

JulienVig added 4 commits June 13, 2024 14:28

Remove graph_informant

04f73e5

Save transition between functional validator and while loop generator

947b243

Make validator test generator (to allow stopping test) and fix memory…

d4a9255

… leak

Make validator inference a generator and fix memory leak

86767e3

JulienVig added 2 commits June 13, 2024 16:43

Fix validator generator use

d04678d

Clean code

4f6de99

JulienVig merged commit 111981d into develop Jun 17, 2024
23 checks passed

JulienVig deleted the 669-wikitext-web-julien branch June 17, 2024 07:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate the wikitext task in the webapp #675

Integrate the wikitext task in the webapp #675

JulienVig commented May 15, 2024 •

edited

Loading

tharvik left a comment

JulienVig commented Jun 13, 2024

tharvik commented Jun 13, 2024 •

edited

Loading

JulienVig commented Jun 13, 2024

JulienVig commented Jun 13, 2024 •

edited

Loading

tharvik commented Jun 14, 2024

Integrate the wikitext task in the webapp #675

Integrate the wikitext task in the webapp #675

Conversation

JulienVig commented May 15, 2024 • edited Loading

tharvik left a comment

Choose a reason for hiding this comment

JulienVig commented Jun 13, 2024

tharvik commented Jun 13, 2024 • edited Loading

JulienVig commented Jun 13, 2024

JulienVig commented Jun 13, 2024 • edited Loading

tharvik commented Jun 14, 2024

JulienVig commented May 15, 2024 •

edited

Loading

tharvik commented Jun 13, 2024 •

edited

Loading

JulienVig commented Jun 13, 2024 •

edited

Loading