-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
todo.txt
808 lines (569 loc) · 30.5 KB
/
todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
* user prior text and response from failed substitutions between [] instead of just iteration number (line 89, dialogue_answerer.py)
* remove dead code
* re-fine-tune phi to get better performance
* delete rules and memory from discourse_answerer
* This is wrong - from wafl_ll
<|end|><|assistant|><|user|> Hi!<|end|><|assistant|>
The user is sandwiched between the assistant. It should be:
<|end|><|assistant|> Hi!<|end|><|user|>
/* make interruptible speech optional
* use entailment score to flag a rule for execution before the answer.
* get all model list from wafl_llm backend. Only specify the connection port and host in wafl
* the answer from the indexed files should be directed from a rule.
- facts and rules should live at the highest level of the retrieval
/* apply entailer to rule retrieval:
/ if more than one rule is retrieved, then the one
/ that is entailed by the query should be chosen
/* Add tqdm to indexing.
/* Make it index when wafl start first, not at the first use/login
/* The prior items with timestamps might not be necessary.
/ - Just implement a queue with a fixed size
* add entailer to wafl_llm
/* why do I need to re-initialise the retrievers after unpickling the knowledge?
- maybe you should save the retrievers in the knowledge object separately?
- It was gensim that was not serializable. Took it out
/* knowledge cache does not cache the rules or facts
* multiple knowledge bases, one for internal facts and one for each indexed paths
* perhaps a way to structure the prompt using <> tags. The memory items need to be distinct.
* use poetry
/* why is the cache not working? The system re-loads the knowledge every time
/* dependabot!!!
/* update readme with index.
/* interruptible speech
* upload to hetzner and make it work for some retrieval tasks
* develop more rules + use-cases for voice and other
/* add control over which llm to use from the frontend
/ - add list of models in the backend
/* add quantization of llm to wafl_llm config
/* write docs about it on wafl
/* add option so use llama.cpp from wafl_llm
/* add option to have None as a model setting in wafl_llm
/* add pdf to indexing
* add json to indexing
/* add metadata to indexing items
/make backend it run with ollama as well (no too slow)
/None of the knowledge is loaded from the web interface. Why?
/- you have just changed the load_knowledge function to make it async.
wafl:
/- create indices
- allow files/folders to be indexed (modify rules.yaml and then re-index)
- add keywords in retrieval from tfidf
- silence output when someone speak
- multiple models in wafl-llm, with selection from frontend
training:
- retrain phi3
- add tokens <execute> and <remember> to the training data
- add some prior conversation to the training data, taken from other examples
- add more unused rules in the prompt
* after a rule is deleted, you should also prune the conversation above.
The system can get confused if the conversation becomes too long
re-train the system with prior conversations before calling the rule
* substitute utterances in base_interface with the conversation class
* add config file for model names
- llm model name
- whisper model name
/* add version name
/* let user decide port for frontend
/* update docs about port
/* push new version
/* update pypi with wafl and wafl-llm
/* clean code for llm eval and make it public
/* update huggingface readme
* read overleaf paper
* on wafl_llm make it so only some LLMs are supported
* change speaker model with newer one
1) train on more steps
a) try 3 epochs, save each
b) use lr=1e-6
c) use batch_size=4
d) do not use 4 bit original model, use 16 bit (on the GPU)
2) evaluate result
3) Upload to hf
4) create a test set of 50 elements for paper. Find a way to test it. repeat from 1)
5) refactor code
6) maybe change voice model
6) write paper
### TODO
* script to add wrong <tags> when none are needed
On the to_modify set:
* sometimes the user answers yes (after "do you confirm?") and the dialogue does not have "user: yes"
On the accepted set:
* CHANGE <|USER|>\n into user: (some of the elements are in the wrong format)
* Perhaps change <memory>function()</memory> into <memory|retrieve><execute>function()</execute></memory> (the memory should store the results of the function)
* Create a first paragraphs with the summary of the conversation: The conversation must always be grounded on the summary (USE LLM TO CREATE THE SUMMARY)
* The LLM wrote text after </execute|run|running> hallucinating the result of the execution. Think about how to deal with that.
* all the rules that says "two level of retrieval" should have the trigger rewritten to something more specific
* change "bot" into "assistant" some of times
* some sentences are between [] and should be removed
* put the items in <memory> so far in the conversation summary. If it is a function then you need to simulaten the relevant output using the LLM
* sometimes at the end of the conversation the bot says "Process finished with exit code 0". Erase this
* add ability to index files and files in entire folders
* if the bot uses a function to retrieve information, you should add <memory>. This is symmetrical to <memory> with a function call when necessary.
* some tags like <output> should end the training item text
* todo User -> user, or at least be internally consistent
* find a way to use HuggingFaceH4/ultrachat_200k as a starting point for each item
- each item should be easy to copy into a csv.
- Separate the items with special tokens/lines
* Create a dataset with about 500 elements
- use hugginface chat dataset as a starting point for
- themes
- conversation guide in prompt
- use LLM to create corresponding python code
* retriever in create_prompt
* change num_replicas back to 10 in remote_llm_connector
/* create actions from command line
/* add condition of when to stop to the actions
Actions:
#### Find way to delete cache in remote llm connector
#### Put colors in action output (and dummy interface)
#### Add green for when an expectation is matched
#### write docs about actions
#### push new version to main
* Perhaps the expectation pattern could be build in the rules themselves
/* BUG: the prior memory leaks even when re-loading the interface!!!
* clean single_file_knowledge: it still divides facts, question, and incomplete for rule retrieval.
Use just one retriever and threshold for all
/* push docker image to docker hub
/* update all to the vast.ai
/* write new docs
/* new version on github!
/* make it easy to run the llm on the server (something more than docker perhaps)?
/* re-train the whisper model using the distilled version
/* make rules reloadable
/* nicer UI?
/ * New icons
/ * darker left bar
/* update tests
/* lots of duplicates in facts! Avoid that
/ * use timestamp for facts (or an index in terms of conversation item)
/ * select only most n recent timestamps
/ * do not add facts that are already in the list (before cluster_facts)
/* redeploy locally and on the server
/* new version on github
/* add rules for
/ shopping lists
trains and music
* add yaml like in the github issue
* test testcases work (only local entailer)
/* update wafl init: It should create the project the modern way.
* use deci-lm
* make sure system works with audio too
* aggregate rules into a tree using a rule builder (like in the old system)
* perhaps one use-case is for diary entries: what is my diary for next week requires today's date first
/* I am not sure the system is cancelling code that has been executed. Check the whole pipeline of prior_functions
/* when an import throws an exception, add import <module> to the code and try again
/ * if the import does not exist, return the code as is without <execute> substitution
/* don't use replicas, use a beam decoder where <execute> and <remember> are pushed upwards. (this means no sampling - perhaps there is a better way)
/ * do it in the local llm connector first
/ * use sequence_bias in generate() together with epsilon_cutoff
/ (for example if the <execute> token is not likely its prob should not be increased)
DOESN'T WORK: It needs to use beam search, but I want to keep sampling with temperature
/ * ALTERNATIVELY increase the number of replicas to 6?
/* quantize the model to 4 bits
TOO SLOW on 3090
/* merge remote and local llm connector. Both should derive from the same class with common functions
/**** make it so the computer does not repeat! reset conversation when the bot repeats itself
/* only one rule at the time!!
/ * if a rule is executed, it is then consumed
* bug: the system kept executing "The bot predicts:"
**** what to do with conversational collapse?
- the system just repeats the last utterance
- how much is 2+2, what is the real name of bon jovi, how tall is mt everest
- the collapse is due to <execute>NUMBER</execute> being returing unknown (execute becomes more likely after one prior <execute>)
- the system is also more likely to return unknown after one unknown. Select the answer that has no unknowns?
* solve math expression execute (import do not work in eval and exec needs a print on stdout)
* add errors when loading config file (add log to stderr)
* add a memory that the execute command was called/not called.
* no more than one rule (two rules it already gets confused)
* better ui
* better documentation
* better tests
# the dimension of the sentence emn model 384 should be in config
* multi-step interactions for function execution are still hard.
- perhaps the rules need to stay in the prior discourse for longer
- the same rule appears after the first execution, therefore the bot thinks it has already executed it
- user: follow rules...
user: compute stuff
bot: answer
user: compute stuff
bot: answer
user: follow rules... (this was at the beginning in the prior discourse)
user: do it again
maybe you need a tag for user: follow rule. Maybe a superuser tag that is removed from the output (but stays in the interface)?
#### keep prior rules for a couple of turns
#### log execute in green and memory in blue
#### keep the temp low 0.2 (otherwise it doesn't follow multi-point
# rules well)
/* Put the answer of <execute> in the facts and re-compute the answer
/* fine-tune the llm to follow the rules
/ - create dataset of about 50 examples
/ - fine tune only last layer
This does not work
------------------------------------------------
* use perplexity in arbiter to determine what to do
- use a perplexity budget?
* the answer filter is too fickle
- wrong transcriptions
- code is transcribed as prior text
- It needs examples!! => CREATE A LIST OF EXAMPLES TO BE RETRIEVED FOR THE FILTER
- examples: code -> code
- "this is what i came up with:" -> same
* add commands to navigate the conversational tree
- go back (?)
- "I am asking you that question"
/* why is <|EOS|> added in the code generated by asking: "write a function in python"
- the error was in wafl_llm, every # was replaced with <|EOS|> (legacy from MPT)
/* write a function in C++ does not trigger the rule about writing in a language different from python
- this is because the rules cannot specify when not to be activated: the entailer blocks the retrieval
write a function in c++ does not entail write a function that is not in python
* write a function in python does not work
- the system stops at what is the name of the function, there is no reply after that
- this is because the system fails at "what is the goal of the function"
* IMPLEMENT A ASK USER, ASK BOT, GENERATE, VERIFY, otherwise it's just a string assignment
- can you code does not retrieve anything
- the system creates a new rule, even if a good rule already exists
/* what is the weather like/what about tomorrow -> every new query gets a reply about the weather
/* items are not updated in the web interface:
/ - new utterances by the bot are not added to the list
/ - they only appear after the user has typed something
/ - should you wait to update list?
/ - should you yield all conversations in the conversations events and then say them all at once?
/* add a filter dialogue answerer on top of everything (top-answerer) !!!!
/ - add filter ability to all interfaces
/ - add filter to web interface when it runs from command line
------------------------------------------------
* remember: entailment is related to mutual information.
* If the system generates a rule that has a question implying the trigger -> the answer to that question is unknown
* make it answer: who is the mayor of london, who was the first james bond
* find a way to re-load the knowledges from the answer bridges while the system is running
* make it so the system does not repeat questions that are in the trigger
* slim down the corpus task_creator.csv:
- do not repeat instructions for every item, just use instructions at the beginning.
* solve issue about intermediate item taking the value of the prior textarea instead of "typing..."
* implement notebook style web interface
/ * will you need to remove user: bot: from utterances?
* change the interface so you can navigate it
* allow for web components to write output
* test with output from matplotlib
/* use temperature in generate: the system tends to repeat itself and is terrible.
/* avoid newline in textarea after pressing enter
/* sometimes (when I say "nothing/no/...") the conversation stops.
there is no answer and all the next replies are the queries themselves
/* why does the weather not work?
/* make system faster to load locally. Why does it load functions.py 4 times? why the long wait?
/* add a small wafl_home to the init, with folders and everything
/* make stand-alone connectors (no need for server-side)
/* modify config for all connectors
/ * check remote connectors work!
/ * create note about local speaker for python 3.11
* rewrite documentation !!!!
* talk about how the system will improvises rules
* explain arbiter pipeline
* update config description (possibly with stand-alone connection) !!!!
* add init-default (with modified wafl_home) and init (as empty)
* push to main!
* push wafl-llm to main
* and docker
* add import from files in rules space
/* split bullet lists into lists for mapping onto a query
/ * add something for merging as well
/* If the inference fail, should you try to improvise?
/* make it so it is possible to navigate among levels of tasks
/ * e.g.: what is the name of the list? that's not what i meant
/ * INTERRUPTIONS: nevermind, nothing, no one, not what i meant
/ * questions: answering a question with a question should trigger a new resolution tree
-> resolution: before answer_is_informative() I added an entailment check whether the answer makes sense/user doesn't want to answer
/* add #using ../
/* why does sqrt throw exception when eval(code)?
/* make all tests pass
/* REMOVE computer FROM UTTERANCES!! (after activation)
/* modify wafl_llm to have a config file for the models to use/download
/* add a standard variable for the dialogue history to be used in rules space
/* only files declared with #using can be used in rules
* retrievers for in-context learning
/* all answerers and extractors
/* create adversarial examples for task extractor
* make this work:
- Do I need an umbrella?
- Tell me if I need an umbrella
The issue is that the relevant rule is not retrieved. The entailer gives ~0 btw the user says"do I need an umbrella" :- the user wants to know the weather
Find a good way to entail what you need.
- Issues were:
- Retriever is not very good. There are a lot other rules scooped up when asking for a task
- Interruption rule must not be used for tasks
- Text-generation tasks are understood as questions, therefore no text generation
* Maybe you need to add text_generation task right before searching the answer in rules
/* add feedback from frontend
* Feebdack texts should be used in-context for dialogue and task extractor
* Use retriever to create dynamic prompts
* make the text disappar in the input after you write
* input should be frozen when the bot is thinking/speaking
* add a check for rules that are too generic.
- For example, "the user wants to add something"
- The rule priority should be weighed down according to
how easy it is to trigger it from different utterances.
* better web UI
* add a way to change the rules on the fly
* create a battery of tests for questions that interrupt a question
from the bot
/* Make tests pass
/* RETRIEVAL for prompts
/* Add RETRIEVE as a command
* add all the rules you have to the retrievable examples
* only first level for #using
* flexible config for faster answers:
* make it so the task extractor can be skipped
* same for task creator
* allow for facts to be checked by the llm directly
* create command line instructions "wafl list all the files"
* why does "[computer] computer what should I buy" does not trigger a rule in wafl_home?
* if utterance would trigger rule but task is unknown, then trigger the rule
* interruptions should always be called. Make it so it is impossible to forget to call them
* give the system the ability to create rules to solve tasks
* use <internal_thoughts></internal_thoughts> in prompt for dialogue
/* add interaction on lists (actions for each item in the list)
/* allow code creation from task creator
/*Does it make sense to have the task_extractor work only when the user issues a command?
/ - Use LLM only after entailment with "The user asks to do something"
/* prompt generation only if it is task
/* otherwise = should return the result of a rule that is being executed
/* implement rules in case the task has not immediate trigger
/* is the sound wave sent to wafl-llm whisper_handler correct? is it corrupted
/ - rememeber that the sound wave was corrupted the other way round0
/ - you might want to use base64 encoding for the sending part as well
/ - save sound file from whisper_handler.py and listen to it
* why is it so slow??
/- would .generate(do_sample=False) accelerate? in llm_handler.py
/ - play around with early stopping and max_length. Use eos_token_id properly
- would shorter prompt accelerate? maybe you can retrieve the examples that are most relevant
/ - USE past_key_values as argument in generate.
/ This argument is returned when use_cache=True.
/ Save first past_key_values and then use it as argument in generate.
-> It does not do much for a single query
/ - try to go line-by-line in the code to see where the problem is
/ - would .generate(do_sample=False) accelerate? in llm_handler.py
- would shorter prompt accelerate? maybe you can retrieve the examples that are most relevant
/* thank you does not close the conversation because of the entailer in inference_answerer
/* refactor sentence-transformer to backend
* policy should only be about finding the correct rule to apply.
* no y/n stuff, only choose btw candidate rules and none of the above.
* erase y/n policy, only rule policy remains
/* refactor whisper to backend
/* refactor speaker
/* better frontend (smaller facts and choices)
/* Computer what is your name
/ -> Correct remember
/ -> answer is "I don't know". why?!
/ explanation: question -> task from question -> answer from task
/ solution: there should be a way to use the chitchat answerer to answer questions
* make it so one can change rules on the fly (reload when changed)
* add definions to arbiter (what is tea...?)
* add the ability to ask questions to activate a trigger rule.
* Make chit chat work!
* should you take it out of arbiter? Maybe it should be the last item in ListAnswerer
in conversational_events
* if answer is unknown in item = function() then that line should be considered False
* add main.wafl
* it should divide by
* the user wants to do something
* the user asks for information
* anything else
* add a way for the answerer to access gpt information
* it should say "I believe ..."
* the rules should be chosen by gptj according to the prior conversation
(possibly from bw_inference _look_for_answer_in_rules)
* add docker compose for running server + interface
* add a flask interface for api
* create errors within parser
* dependency does not exist
* . is allowes in dependency, / is not
* Clean conversation summary
* [computer] is in the summary
* Sorry? is in the summary
* repetitions are in the summary
* sentences that are said by the bot are treated the same as they were said by the user
* maybe questions can be asked at a summary level? Not just sentence by sentence.
* rules should be added and REMOVED!
/* all sentences that are said when the interface is deactivated should not be appended to the conversation
/* Write choices, tasks and remember in web interface
/* erase conversation after deactivation in voice interface
/* refactor everyting to have a chat log with all the choices, retrieved facts
in the same prompt for chitchat
/* All answerers should have access to the same list of utterances, choices, and facts.
The only difference should be the final line of prompt.
/* MAKE TEST PASS
/* ADD CACHE TO ALL CONNECTORS (it cannot be done easily for async functions)
/* 1024 is limit in wafl_llm (deepspeed) change this to 2048
/* make it sure 2048 is limit when using gptj
/* only last three conversation items in task extractor
/* Complete the task extractor + policy guidance on the whole of the answerer/inference tree
/ * Create test for different policies
/ * create test for "I didn't mean that"
/ * You need to log the rules that have been chosen.
This needs to part of the policy decision.
Dialogue alone is not enough.
/ * create test for "do it again"
* FIND WAY TO EXTRACT TASK FROM DISCOURSE in arbiter_answerer
/* Selected_answer() returns unknown if all answers are None
/* add task recognition to choices and make tests pass
/* make alarm work on wafl_home.
/ * Make async work on wafl
/ * Do not forget to reinstate "what time is it" rule
/* Try to use only GPTJ. Answer questions with dialogue and story.
Give few examples where the answer is unknown.
/* volume threshold should be sampled continously
/* change {"%%"} into something more typeable (possibly "% %")
/* add torchserve handler as a wafl init
/* Add GENERATE
/ * Use it to get "1" from "one minute"
/* time needs to be pre-processed to trigger rules (5 past seven -> 7,05)
/* add event loop here to test_scheduler
/* create way to add rules
This and next week:
/* rules in folder
/* distill gpt2 from gpt-JT onto the CoQA dataset
-> not done, using gpt-jt directly
First week of the year:
/* Make test conversations work
* Debugging with picture as output
-> not doing it now
Second week of the year:
* User-defined events
/* Scheduler
Third week of the year:
* Refactoring
* Write up docs
Fourth week:
* Write demo paper
* Demo paper should include website with code editor (and connection to GPU)
/* rules and functions should be in a folder.
/ * Think about ability to install
/* add ability to create rules through text "->"
* main conversation loop should be scheduler
* Add functions that can trigger rules
* InferenceAnswerer can be broken down into simpler answerer
* within backward inference there is the need for an answerer (when tasks are interrupted)
* Do lists as hard-coded
* y/n questions are never searched in working memory. Is this the right behavior?
/* test_testcases blocks the tests. RESOLVE THIS!!
* fine tune entailer to the tasks in this systems (the bot says: "", the user asks ""...)
* functions use inference from depth_level = 1. This can induce infinite recursion
* if query is not question and answer is False then the system should say "I cannot do it because"
* After because there should be the answer to the bot asking itself why
* The way to do it is to have a narrator connected to the logs
* you need better readable logs
* you need a way to translate the logs into a coherent text
* THIS WILL ADD INTROSPECTION!!
* move all thresholds to variables.py
* rules are sorted by retriever but not by entailer!! Do that
* add error detection in parser.
* for example ( without a closing ). same for {
/* find way to connect to local ngrok from github
/* entailment should be :- instead of <-
/* take entailer and qa out of the __init__ in entailer.py and qa.py
/* implement functions within each dependency
/* remove Batches output in prediction
/* fact retrieval should work in python space as well
/* y/n questions should only accept yes or no (and loop if there is something else)
/* Why does remember not work??!!!
/* separate items added to list with "and"
/* The same answer cannot belong to more than one question in the same task! (it's an approx but needed)
/* Create answering class on top of the conversation
/ * create arbiter class
/ * This should solve the test_executables failing tests
/ * is the dialogue part really needed? if not, you can use a simple qa system
/* Use entailer for common sense? Creak sense does not work very well
/* Make infinite recursion impossible (set max limit or check for repetitions)
/* Do the conversational memory (start with test_working_memory.py)
/ * done but you need to refine the interaction with the narrator class (events are splitted manually in qa.py)
/ * RUN TESTS!!!
/ * START WITH test_conversation (many tests are failing)
/ * The issue is in "the user says" (line 85 in qa.py)
/ - The system should be able to understand if the the user is speaking or the user
/ - possibly if the question is from the user (like in working memory) add "user says"
/ - what would you add when the fact comes from the knowledge base?
/ - should you change the hypothesys "when -> says?" (line 91 and 101)
/
/ * USE LOGGER IN CONVERSATION() TO SPEED UP DEBUGGING!
/* numbers in speaker should be translated to English
/* remove computer as first word
/* Confidence in listener results
/* lists should filter the items
/* a list cannot contain itself, you should check the name
/* Yes/No questions should be more flexible
/* voice thresholds should be in config (write test about them)
/* ADD VERSION NUMBER WHEN STARTING UP
/* check tests.test_working_memory.TestWorkingMemory.test_working_memory_works_for_yes_questions
/* computer name triggers a "faulty" sound
/* functions.py need a more clever way of handling hidden arguments
- should all functions have a hidden arg?
- make it possible to have more files
* use different voice model (this one from hf? )
* speech to text
- use your own beam decoder + n-gram to improve quality
- use newer model?
- filter out filler sounds
* Unify conversation/utils.py "the user says/asks" and the presupposition replace "to the bot" in entailer.py
* Detect who is talking to whom. Some rules can only be activated by the user speaking to the bot
- Use get_sequence_probability_given_prompt()
* add in knowledge facts about "the user". You don't need to remember everything that is said
* Is there a need for a "Main Task" ? One that oversees everything?
* Python hooks need to be in a class, like with Tests
* rms threshold should be average of background noise
* add delete last item
* Change the speaker voice
* Create unit tests for conversation activation/deactivation
* The answer to the question can be found in the conversation from the bot.
The bot can speak to itself and then answer the user
* train your own retriever using the conversations in dailydialogue? MAYBE NOT
* If the query is a question YOU NEED A QA RETRIEVER. take MULTI_QA instead of MSMARCO
- also if the rule.effect is a question
* If a function calls another one within functions.py then there is an argument missing! inference is not there in the code
*yes/no questions should *never* trigger an interruption.
It's not just yes/no answers, the deal is with the question!
* Select by levenstein distance before text_retriever (lev_retriever?)
* train bart for qa/facts
* train your own retriever
* Why is everything interpreted as "hello" or "hi"? You need a better retriever
- Change retriever with the one you liked
* allow {variable} to be interpreted as code/call for another task (at least add tests)
* allow some type of introspection
* finish config
- add hotwords to config.json
* upload to github, with tests and logs
* validate the user code (is REMEMBR spelled correctly?)
* should yes/no filter be in retriever instead of knowledge?
* Create interfaces for google voice, other apis
* A goal oriented bot would scan all the rules to find how to obtain the goal.
The user can be simulated using a generative model for dialogue.
* for yes/no or limited choice questions there should not be ambiguity. The machine should match the closest item and
if there is no item close enough ask again.
* New voice! It's ridiculous to have to have a memory leak from picotts
- Use fairseq voices
* why is Alberto not recognized as a name by the retriever? need a better retriever than MSMARCO distill
* Working memory should really be working knowledge
* refactor BackwardInference
!* make it so if the user does not know the answer, one can continue inference?
- Or should you try to do the inference first??
!* Implement a standard sign for code. Should it be '''> ?
!* Implement FORGET (the whole Fact should disappear from Knowledge)
* Do not allow arbitrary input (at least for voice)
* Working memory is unnecessarily complicated. It can just contain the story and some method to automatically fill it.
!* Why did it say "no" on "can you please add bananas to the shopping list?"
* Better QA for yes/no question (maybe add SNLI to qa?)
* Add math expressions to fact-checker (some, any, every)
* USER REQUEST: Multiple items to the shopping list in one go: apples AND bananas AND vegetables
* create tests for voice
* Parser should allow for empty lines within rules
* Implement Server with HTML page (docker-compose up)
* Refactor code and clean up
* Investigate interplay btw substitutions and already_matched
- Maybe one can avoid having same answer for btw same question and same depth (and same rule)
* Say "This is true" or "this is false" if a statement matches (My name is Alberto -> True)
* working memory only within the same level of rules with same activation
GET WORKING MEMORY FOR FACTS WORKING WITH SHOPPING LIST!!
/* Add working memory for python-space
- Maybe exclude WM answer if it is the same as the prior answer
/* Use entailment to make generated qa more accurate
/- Upload qa system to huggingface and pip
/-**** Use conversation_qa -> refactor qa.py