WIP: A new steps/data/reverberate_data_dir.py script #706

tomkocse · 2016-04-17T16:31:02Z

No description provided.

vijayaditya · 2016-04-19T20:30:26Z

egs/wsj/s5/steps/data/reverberate_data_dir.py

+
+def GetArgs():
+    # we add compulsary arguments as named arguments for readability
+    parser = argparse.ArgumentParser(description="Generate corrupted data"


"Reverberate the data directory with an option to add isotropic and point source noises"

In the help message say that this script only deals with single channel wave files. If multi-channel noise/rir/speech files are provided one of the channels will be randomly picked.

vijayaditya · 2016-04-20T19:17:23Z

egs/wsj/s5/steps/data/reverberate_data_dir.py

+# The noise list would have the following format:
+# --noise-id <string,compulsary> --noise-type <choices = (isotropic, point source),compulsary> --bg-fg-type <choices=(background|foreground), default=background> --rir-file <str, compulsary if isotropic, should not be specified if point-source> < location=(support Kaldi IO strings) >
+def CorruptWav(wav_scp, durations, output_dir, room_list, noise_list, snr_string, num_replica, prefix, speech_rvb_probability, noise_adding_probability, max_noises_added):
+    rooms = list_cyclic_iterator(room_list, random_seed = 1)


let the user specify the random seed

Vijay, I'm not sure if Python, like Perl, sets the random seed from the
current time, but if so we should probably by default set the seed to a
constant in any Python program that uses random numbers, for
reproducibility. We already do this in perl scripts that use random
numbers.
Dan

On Wed, Apr 20, 2016 at 12:17 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

In egs/wsj/s5/steps/data/reverberate_data_dir.py
#706 (comment):

for line in open(file, 'r'):

parts = line.split()

if assert2fields:

assert(len(parts) == 2)

dict[parts[0]] = value_processor(parts[1:])

return dict

+# This is the major function to generate pipeline command for the corruption
+# The rir list would have the following format:
+# --rir-id <string,compulsary> --room-id <string,compulsary> --receiver-position-id <string,optional> --source-position-id <string,optional> --rt-60 < <float,optional> --drr <float, optional> < location(support Kaldi IO strings) >
+# The noise list would have the following format:
+# --noise-id <string,compulsary> --noise-type <choices = (isotropic, point source),compulsary> --bg-fg-type <choices=(background|foreground), default=background> --rir-file <str, compulsary if isotropic, should not be specified if point-source> < location=(support Kaldi IO strings) >
+def CorruptWav(wav_scp, durations, output_dir, room_list, noise_list, snr_string, num_replica, prefix, speech_rvb_probability, noise_adding_probability, max_noises_added):

rooms = list_cyclic_iterator(room_list, random_seed = 1)

let the user specify the random seed

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/kaldi-asr/kaldi/pull/706/files/356fa8258d848e3b092c5b894f6a7b4cab63b29c#r60470528

I have been using a consistent default value in the python scripts I wrote, but I haven't done a check of all the scripts. I will create an issue requesting for a review of existing python scripts. We would also have to take care of the inline python scripts.

It seems to depend on the OS.

random.seed([x])
Initialize the basic random number generator. Optional argument x can be any hashable object. If x is omitted or None, current system time is used; current system time is also used to initialize the generator when the module is first imported. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability). Reference

…specified random seed; always handle isotropic noise as background noise

tomkocse · 2016-04-22T14:13:08Z

In the current commit, background noise is always extended to the length of speech and added at the start of the speech; foreground noise is always not extended and added at the random point of the speech. For more detailed noise adding mechanism i guess we have to add more fields to the noise information list so that user can have a better control to the noise addition.
Sampling of RIR according to different RIR probability is not yet handled.
--max-noised-added option is not yet modified as it is difficult to control the number of overlapped noise occurred at the same time. I need to be more clear about the noise adding mechanism before i can go on.

vimalmanohar · 2016-04-24T18:49:47Z

egs/wsj/s5/steps/data/reverberate_data_dir.py

    if os.path.isfile(input_dir + "/reco2file_and_channel"):
-        ReplicateFileType2(input_dir + "/reco2file_and_channel", output_dir + "/reco2file_and_channel", num_replica, prefix)
+        AddPrefixToFields(input_dir + "/reco2file_and_channel", output_dir + "/reco2file_and_channel", num_replica, prefix, field = [0,1])

    train_lib.RunKaldiCommand("utils/validate_data_dir.sh --no-feats {output_dir}"


You need to add --no-text in case there is no text.

danpovey · 2016-05-01T23:42:56Z

@vijayaditya, what the status of this pull request? Is it OK as far as you're concerned?
Does it need to be committed soon or should it be marked WIP?

vijayaditya · 2016-07-22T20:37:10Z

@tomkocse Could you please update one of the recipes which uses data reverberation to use this new script. I think ASpIRE is the only recipe in master branch which uses reverberation, so you can update that.

(For now prepare a basic RIR list and noise list using the information available in data/impulses_noises/)

tomkocse · 2016-07-23T00:57:31Z

@vijayaditya Do you mean making the rir_list and noise_list for the real RIR and noises in the Aspire recipe ?

vijayaditya · 2016-07-23T01:17:47Z

Yes.

Vijay

On Jul 22, 2016 8:57 PM, "tomkocse" notifications@github.com wrote:

@vijayaditya https://github.com/vijayaditya Do you mean making the
rir_list and noise_list for the real RIR and noises in the Aspire recipe ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#706 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoGiDWDvq9PZ1xQvS9CL8ddGUs2VWks5qYWb-gaJpZM4IJM2C
.

tomkocse · 2016-07-23T15:48:52Z

@vijayaditya Can you point to me the location of your recent rerun of the aspire experiment ? That helps me for no need to repeat most of the procedures

tomkocse · 2016-07-25T10:40:00Z

@vijayaditya I have finished modifying the aspire recipe to use the new steps/data/reverberate_data_dir.py . I am verifying if I can repeat your result.

vijayaditya · 2016-07-25T15:35:27Z

You need not worry about the replication of results right now as we do not
have spare GPUs. We can test it out later.

Please test the following

Sampling of noise and RIRs is being done appropriately
1. To test this create non-uniform distributions, sample using your
  script and see if the relative frequencies in the sample are
  similar to the
  specified PMF.
Test if the specified noise, SNR and RIR files are being used
properly by manually listening to the output. Also ensure that the time and
durations of the point source noises are correct
Ensure there is no mismatch between room-id of isotropic noises
selected and the RIRs

--Vijay

On Mon, Jul 25, 2016 at 6:40 AM, tomkocse notifications@github.com wrote:

@vijayaditya https://github.com/vijayaditya I have finished modifying
the aspire recipe to use the new steps/data/reverberate_data_dir.py . I am
verifying if I can repeat your result.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#706 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoNO-gVbfX4tdcRj8AAmuiX8DeOrKks5qZJKDgaJpZM4IJM2C
.

…ng bugs in reverberate_data_dir.py; add aspire_prep_rir_noise_list.py for generating rir_list and noise_list for aspire

vijayaditya · 2016-07-26T15:11:22Z

egs/aspire/s5/local/multi_condition/aspire_prep_rir_noise_list.py

+    return args
+
+
+# This function generate the rir_list file for the aspire real RIR


This function generates the rir_list file for the real RIRs being used in ASpIRE experiments. It assumes the availability of data/impulses_noises directory prepared by local/multi_condition/prepare_impulses_noises.sh.

vijayaditya · 2016-07-26T15:53:37Z

I agree that your file pattern based pairing might be working. I am suggesting an easier way to generate these pairings as this info is already present in the data/impulses_noises/info file.

I see a problem in the handling of isotropic noises. Sorry I should noticed this sooner. Isotropic noises are assigned to a room and not an RIR (i.e., room and microphone position). This makes sense as the characteristics of the noise are supposed to be similar in all directions.

@sw005320 Could you please confirm if the isotropic noise recordings available in RWCP and REVERB-2014 RIR databases can be used with any RIR recorded in the same room.

--Vijay

sw005320 · 2016-07-26T20:39:17Z

Yes, RWCP has the isotropic noise recordings.
For example, in http://www.openslr.org/resources/13/RWCP.tar.gz, you can find some noises at RWCP\nospeech\drysrc.
The REVERB isotropic noises are at the 'NOISE' directories in each-condition directory of http://reverb2014.dereverberation.com/tools/reverb_tools_for_Generate_SimData.tgz

vijayaditya · 2016-07-26T23:26:42Z

@sw005320 Thanks !

vijayaditya · 2016-07-26T23:27:57Z

@tomkocse Is it possible to prioritize this PR ? Could we try to get this PR in shape by tomorrow ?

tomkocse · 2016-07-27T01:40:31Z

@vijay I will change the rir linkage of the iso noise to room linkage

…r; Support using string as room id

vijayaditya · 2016-07-27T14:01:12Z

egs/aspire/s5/local/multi_condition/aspire_prep_rir_noise_list.py

    for rir in rir_files:
      filename = rir.split('/')[-1]
      if "noise" not in filename:
-        rir_list_file.write('--rir-id {0} --room-id {1} {2}\n'.format(str(rir_id).zfill(5), str(room_id).zfill(3), rir))
+        parts = filename.split('_')


Rather than parsing the file name for the necessary parameters, @tomkocse will later submit a modified list generation function which will use information available in data/impulses_noises/info/. This is however not a high priority change as we will preprocess the individual databases in the future and these scripts would not be part of any recipe.

vijayaditya · 2016-07-27T14:43:55Z

There are minor changes necessary in this PR, however these are not high priority. @tomkocse Would continue work on those changes in a different PR. I would like to merge these scripts now as this a high priority requirement for some of the experiments we are working on. These scripts are isolated from the rest of the code base and are not used in any checked-in recipe other than ASpIRE.

@danpovey , @jtrmal could you please perform a second review and perform the merge once @tomkocse resolves the conflicts.

update function names; split snrs to background and foreground; user specified random seed; always handle isotropic noise as background noise Pick the RIRs and noises according to assigned probabilities. Modify wav-reverberate.cc according to the new steps/data/reverberate_data_dir.py Change the functions in signal.cc to extend the length of the convolved signal, the correct length should be original signal length + rir length - 1; add the shift option to wav-reverberate.cc Adding more comments and remove duplicate function in reverberate_data_dir.py Change option --max-noises-added to --max-noises-per-minute Adding data_lib.py; adding more comments, splitting large function in reverberate_data_dir.py adding AddPointSourceNoise() Fixing spelling mistake and modifying comments Modify the aspire recipe to use the new reverberate_data_dir.py; fixing bugs in reverberate_data_dir.py; add aspire_prep_rir_noise_list.py for generating rir_list and noise_list for aspire Changing isotropic noise linkage to a room instead of a particular rir; Support using string as room id Change comments in wav-reverberate.cc

tomkocse · 2016-07-27T16:24:04Z

Please go to #927 for the continuation of this PR

A new steps/data/reverberate_data_dir.py script

356fa82

tomkocse mentioned this pull request Apr 18, 2016

[WIP] Simulated RIR reverberation in SWB #667

Closed

vijayaditya reviewed Apr 19, 2016
View reviewed changes

vijayaditya mentioned this pull request Apr 19, 2016

Enhancements to wav-reverberate command #716

Closed

vijayaditya reviewed Apr 20, 2016
View reviewed changes

update function names; split snrs to background and foreground; user …

8671e59

…specified random seed; always handle isotropic noise as background noise

vimalmanohar reviewed Apr 24, 2016
View reviewed changes

Modify the aspire recipe to use the new reverberate_data_dir.py; fixi…

617982b

…ng bugs in reverberate_data_dir.py; add aspire_prep_rir_noise_list.py for generating rir_list and noise_list for aspire

vijayaditya reviewed Jul 26, 2016
View reviewed changes

Changing isotropic noise linkage to a room instead of a particular ri…

93a2295

…r; Support using string as room id

vijayaditya reviewed Jul 27, 2016
View reviewed changes

tomkocse added 3 commits July 27, 2016 10:47

Change comments in wav-reverberate.cc

cbe5762

Merge branch 'new_rvb' of github.com:tomkocse/kaldi into new_rvb

4431af6

tomkocse mentioned this pull request Jul 27, 2016

Creating a new steps/data/reverberate_data_dir.py for corrupting spee… #927

Merged

vijayaditya closed this Jul 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: A new steps/data/reverberate_data_dir.py script #706

WIP: A new steps/data/reverberate_data_dir.py script #706

tomkocse commented Apr 17, 2016

vijayaditya Apr 19, 2016

vijayaditya Apr 21, 2016

vijayaditya Apr 20, 2016

danpovey Apr 20, 2016

vijayaditya Apr 20, 2016 •

edited

Loading

tomkocse commented Apr 22, 2016

vimalmanohar Apr 24, 2016

danpovey commented May 1, 2016

vijayaditya commented Jul 22, 2016

tomkocse commented Jul 23, 2016

vijayaditya commented Jul 23, 2016

tomkocse commented Jul 23, 2016

tomkocse commented Jul 25, 2016

vijayaditya commented Jul 25, 2016

vijayaditya Jul 26, 2016 •

edited

Loading

vijayaditya commented Jul 26, 2016

sw005320 commented Jul 26, 2016

vijayaditya commented Jul 26, 2016

vijayaditya commented Jul 26, 2016

tomkocse commented Jul 27, 2016

vijayaditya Jul 27, 2016

vijayaditya commented Jul 27, 2016

tomkocse commented Jul 27, 2016

		return args


		# This function generate the rir_list file for the aspire real RIR

WIP: A new steps/data/reverberate_data_dir.py script #706

WIP: A new steps/data/reverberate_data_dir.py script #706

Conversation

tomkocse commented Apr 17, 2016

vijayaditya Apr 19, 2016

Choose a reason for hiding this comment

vijayaditya Apr 21, 2016

Choose a reason for hiding this comment

vijayaditya Apr 20, 2016

Choose a reason for hiding this comment

danpovey Apr 20, 2016

Choose a reason for hiding this comment

vijayaditya Apr 20, 2016 • edited Loading

Choose a reason for hiding this comment

tomkocse commented Apr 22, 2016

vimalmanohar Apr 24, 2016

Choose a reason for hiding this comment

danpovey commented May 1, 2016

vijayaditya commented Jul 22, 2016

tomkocse commented Jul 23, 2016

vijayaditya commented Jul 23, 2016

tomkocse commented Jul 23, 2016

tomkocse commented Jul 25, 2016

vijayaditya commented Jul 25, 2016

vijayaditya Jul 26, 2016 • edited Loading

Choose a reason for hiding this comment

vijayaditya commented Jul 26, 2016

sw005320 commented Jul 26, 2016

vijayaditya commented Jul 26, 2016

vijayaditya commented Jul 26, 2016

tomkocse commented Jul 27, 2016

vijayaditya Jul 27, 2016

Choose a reason for hiding this comment

vijayaditya commented Jul 27, 2016

tomkocse commented Jul 27, 2016

vijayaditya Apr 20, 2016 •

edited

Loading

vijayaditya Jul 26, 2016 •

edited

Loading