-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I decided to abandon this framework for the time being #226
Comments
could you share your dataset? Maybe something wrong with your dataset? |
I cannot share it, because it is not open source. What I did was
It would help tremendously to have a good tutorial about how to create good input files for this to not be a source of error. |
How long is your dataset? I made myself 5 hours Mongolian dataset and trained successfully. Only thing I have to change was to lower fmin and update the vocabulary for Mongolian. I resampled also the audio files to 22050 to keep it compatible with LJSpeech. |
About 2.2 hours. Yet several people claimed to have had success with datasets even below 1 hour. |
Could you share at least a few audio samples from your dataset? |
What are you aiming at in those files? |
Maybe look for obvious errors? OK I give up :) |
Which errors are you talking about? Not only can I not share them for legal reasons, but it also wouldn't help anyone else. It would help tremendously to have a good tutorial for all kinds of languages and input sources about how to create good input files for this to not be a source of error. |
Hi, first of, thanks @tugstugi for your assistance, much appreciated :) @ErfolgreichCharismatisch I am sorry you feel that way. Let me just correct few misunderstandings here and there:
At the end of this long boring comment, I will simply give my quick notes that you may find helpful:
Thanks for trying our work, we hope to have some positive feedback from your end in the future! |
Dear @Rayhane-mamah, put yourself into my or any other beginner's shoes. Which kind of tutorial would help you get started with your own data? |
@ErfolgreichCharismatisch I think this is a good one to start. You are going so fast. Why did not you use his data, try to understand first, then you can use yours. Like you, I'm a newbie. I started by getting his code, and make it runs. At the begining, It did not run, for some reasons (I've made nothing change, though). I searched for problems (they were my machine problems, such as my machine does not has GPU, some packagew were not installed right...). When it run, I try to change something, a little at once, see the different, and understand what is the code parts used for. Now I can run the project with even Korean language (and only 30 minutes of training data, trying to reduce more). Be patient friend. you can do it. There is no problem with the code. (I have not checked out the updated version). |
Hi,@Rayhane-mamah |
@Hayes515 Please create your own thread. |
@tdplaza Great it works for you. How do you go about creating a new corpus, please be detailed. |
Testing this model can be challenging at first. I did that, too. But... when it comes to Speech synthesis, this framework is a best place to learn about speech synthesis in github and i know no one would make this level of repository non-commercial. |
After getting the code runs fine. I realized that to apply to my own corpus, I have to well prepare metadata.csv file. and write a module to preprocess the text. The metadata.csv file has 2 infomation: wav file name, and the text. see how the program get and process these info in build_from_path function. My dataset has transcript with other format. So instead of change the transcript format due to metadata.csv file, I changed the function to read my transcript.txt, return exactly what the function wants (they are the text, wav path and index). In case of text processing. I realize that the processing modul has 1 task: transform input text to sequence array (the inverse function sequence to text is not used for training, it used for loging information, you can bypass the inverse function). Before transforming, the english_cleaner converts number, special character, monetary... to text. So, I have to find a module, that helps me tranform number to text, deal with currency number, special character... and integrate it into another module that helps me transform Korean text to sequences. When the code could not run fine. I have to find where is the problem. By using melspectrogram function to convert a wav to mel, and inv_mel_spectrogram to convert mel to wav .... I have applied directly them on the mel that was generated by preprocessing task (.npy files) to test if the preprocessing ran well. Also I directly used these function to convert a wav file to audio array, and audio array to mel array, then I convert mel back to wav file to check if they are well functioning. These are some simple tricks. There are a lot of things you can do to debug. You have to do it yourself. Such as your GPU has less memory, so you have to calculate how many batchs it can process at once. [80,1200] x 32 x 48 80: (mel channel) Here, everything is easy, you can decrease mel channels, drop utterances has many mel frames, decrease batch size, or cut out the number of samples in 1 batch. Try changing them slowly, see how does it affect to mel quality. choose the best for your machine. |
This is awesome @tdplaza. Do you want to add anything, @Rayhane-mamah? |
@ErfolgreichCharismatisch Alternatively, you could pay Rayhane-mamah for his consultation, if he has the time, where he'll be able to answer any of your questions. I'm not aware of Rayhane-mamah's rates, but for enterprise-level support especially in ML I think 200-500usd/hr is not something extraordinary. Please let us know, if you prefer the consultation, so that community wouldn't spend more time here, as you'll get all the needed answers in private consultation. |
I agree with @gsoul - I've been really impressed by how many hours Rayhane-mamah put into this. There were Saturdays where my mail account was just flooded by the repo notifications where he's seemingly been answering issues for 5+ hours straight. That's unpaid time that could as well be spent with the family, earning money (or developing :), instead of answering). Generally we are now in the luxurious situation that deep learning enthusiasts rush into the field and produce lots of open source material. |
@gsoul, with all due respect, it seems to me that you confuse asking for a tutorial with writing a documentation of a paid product. I don't owe you anything either. So why don't you just watch your tone and actually contribute something useful like @tdplaza? I cannot honestly show any appreciation for something that doesn't work for me. Also, where is your appreciation? You came here just to lecture me unsolicitly whereas you didn't do anything to deserve that position in the first place - nobody does. We both know that Rayhane-mamah did not publish this only because he is such a great guy. He wants to put this on his resumeé. And it would work way better for him if he made entry simpler, more forks, more exposure, more job offers. Also how would private consultation benefit anyone else? |
@m-toman I am pretty sure this is a great framework, but if I cannot make it work and people don't puzzle together a tutorial, I couldn't care less about how much work he put in. And if you were honest, you would say the same. Exactly because he used to support people so often with probably the same answers, there is an even bigger incentive to expand the wiki and point beginners to it instead of repeating himself. Again, why don't you - being qualified as a Phd in the field - expand the wiki with how you made this framework work? http://www.speech.zone is quite impressive, actually. |
I'm not involved with this framework except I fixed a small bug. So I don't see why exactly my free time (to write the thing) should be worth less than yours (to figure things out)? Generally I haven't worked much more with it than running the default LJ training (which more or less just worked as described). |
@m-toman It is not about balancing each other's effort and time invested. I wouldn't mind you only playing with your daughter and not showing up here again to pester those who actually care about this project enough to help beginners. |
Alright this has been going for long enough. Dear @ErfolgreichCharismatisch. We happily accept all sorts of criticism as long as it's constructive and is delivered in a polite manner, I personally encourage such feedback as it helps me improve my work, and with it other works based on it. However, we do NOT tolerate any form of lack of respect, thing you have been doing on multiple occasions. Community is one of the most important aspects of open source, it would be a shame that a bad actor ruins this experience for the entire group. I am thus revoking you access from commenting or opening any further issues on this repository. As stated earlier, your remarks will most certainly help improve this project and we will make sure to make our work's usage easier for others. While I believe your intentions are genuinely good, your execution seems to be the worst.. To make sure I am not being unfair (and because feedback is usually beneficial to all of us), here are the remarks on your attitude that lead me to take such drastic measures:
Please also keep in mind that most OS projects you will find out there are not 100% what you're looking for, and that you will need to make your own modifications (which translates to time..) to make them suit your needs. Please also do not expect continuous support from contributors as most of them are doing hobby projects on the side because they are passionate about what they do, and they want to share this passion with others. With that said, if you do not like our work, you are still free to use other's works, no one is forcing you to use ours I believe? In fact, here are some awesome other contributions that you can use: I apologize to anyone offended by this "issue" and thank you @tdplaza @m-toman @gsoul @Piligram for your assistance and contributions! @ErfolgreichCharismatisch Please avoid having a negative attitude in others' repos as it is no fun for anyone. Other than that, @Hayes515 batch_size 2 with wavenet usually is not a big issue, it will just take longer to converge, but I believe I stabilized gradients as best as possible to allow the model to hit proper minimum. of course if you do face problems with it, please open an issue and we'll look into it. Thanks for reaching out! :) |
Reasons
average loss
getting lower? -> means nothing.loss
getting lower? -> means nothing. Step 45000? -> means nothing. How do you even determine anything?I am disappointed, honestly.
The text was updated successfully, but these errors were encountered: