-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please update documentation on various things #299
Comments
In the discord, someone mentioned that such documentation could be tasked to the community. To that end, I think the best way to realize this is to have a Github wiki page that anyone can edit, where everyone can post their command chains and insights for their own individual purposes. Personally, I have only made 3 runs now, each not totally complete. And even if my understanding of things seems to be quite helpful to a lot of people, I still don't really feel in a position where I could actually rewrite the README.md or error-check my statements with absolute certainty. |
Hello @ballerburg9005, First, thank you for the detailed information and the invested work you did by writing your personal insights. I totally understand your confusion and how the community can struggle to decide which actual configuration to use, and how actually RAVE manage its inner guts. RAVE was actually a research product that gained a lot of attention among years, but because of time and resources the documentation did not get along with the various modifications of the original repository. Furthermore, the actual behavior of RAVE with a given data, a given use, and a customized training is hard to predict, such that a precise documentation is not an easy task. Yet, IRCAM and Forum IRCAM have put some resources to help us document a little bit more how to use RAVE : for example, you will find a tutorial explaining configuration and which one to choose on the Forum IRCAM webpage. A video version will also be released very soon. I will read along in details your personal experiencer to see if some information could be missing to help you and other users not wasting time, and have a nicer experience. I close this issue here, but do not hesitate to contact me on the Discord for additional or more specific questions. Thanks again! |
On the discord, people are puzzled about various things, especially relating to mixing different configs.
For example it is suggested in the README.md that you can add --config discrete --config causal to decrease latency.
But when you check the config files, it seems that discrete is overriding the encoder and many other things. Those things are rather mysterious in nature to someone not deeply involved with Rave. But for one, it seems obvious that wasserstein needs the wasserstein encoder and the discrete config overrides this with discrete encoder. So this in my mind then would lead to botched unpredictable and undesired results. But no one even realizes this until they maybe inspect the config files further, and there are no errors during training. Furthermore, does it matter in what order you put the config command line arguments, regarding the various parameters that conflict each other?
What about the dozens of other config files?
Most importantly, what are the penalties for using this and that config? For example it has been explained that discrete results in lower latency, and better functionality with msprior (not mentioned). But the penalty supposedly is, that you can't really manipulate the latent space with nn~ anymore (not mentioned). Does "causal" have any such major penalties, or whatever other thing over the other? It could be worth documenting simple things, such as approximate training speed differences.
What configs are truly incompatible with msprior, rave prior, 2 channels and incompatible with each other? For example I used v3 and then couldn't make it work with 2 channels. Then I used wasserstein 2 channels discrete and couldn't make it work with msprior (glanced over source code, didn't see anything relating to 2 channels). Then again, a lot of stuff equally failed by equally mysterious error messages, like 2 channel export, but there are solutions mentioned in the Github issues for this (e.g. model.RAVE.n_channels = 2 in config.gin, and tons of other stuff - but then again it makes you wonder why this bug remains unfixed so long, like, is 2 channel support even really working?).
Does the rave prior training even still work (or just with v1 config?)? Because we just couldn't make it work and then used msprior instead.
Then in msprior, the documented configs no longer exist, and it is not really obvious what the new config files do and what the equivalent choice would be compared to docs.
Then when training msprior, it notes/complains that some features are disabled without discrete. So is choosing discrete or other config options really important/"sort of required" for prior training? Like, does msprior even work well with wasserstein, or v2/v3, whatever? What is discrete_v3 intended for (like is this just v3 of discrete with v2 missing, or does it fix an issue when combining config/v3 + discrete), and does msprior work properly with this or just rave prior, or doesn't it even work well with any prior?
People have often used Phase 1 override from 200k to 1M. When and why is this necessary or a good idea?
How long should you train with which configs? Does changing batch size affect the results much? Is the attention window accurate much to how the model outputs sounds, or should you add like +3 seconds extra or even double the amount if you desire a certain length in your output?
I am not sure those are the most important questions, but I am just politely asking for an overhaul of the documentation.
The text was updated successfully, but these errors were encountered: