Skip to content

Supplementary materials to Trowitzsch et al (2019), "Joining Sound Event Detection and Localization Through Spatial Segregation", in prep., IEEE Transactions on Audio, Speech, Language Processing.

License

Notifications You must be signed in to change notification settings

nigroup/Supplementaries-to-TASLP-SELD-Spatial-Segregation

Repository files navigation

Joining Sound Event Detection and Localization Through Spatial Segregation

Supplementary materials to

Trowitzsch, I., Schymura, C., Kolossa, D., Obermayer, K. (2019), "Joining Sound Event Detection and Localization Through Spatial Segregation", submitted, IEEE Transactions on Audio, Speech, Language Processing, https://arxiv.org/abs/1904.00055.

  • Models
    • Fullstream sound event detection models
    • Segregated sound event detection models
    • Spatial segregation model
  • Code
    • for training fullstream and segregated sound event detection models
    • for testing fullstream and segregated sound event detection models
    • for evaluating test results and plotting graphs
  • Scene parameters lists
  • Feature set descriptions

Prerequisites for using the code

Auditory Machine Learning Training and Testing Pipeline

The code needs the Auditory Machine Learning Training and Testing Pipeline (AMLTTP) installed to run, and AMLTTP makes use of other software modules of the Two!Ears Computational Framework. You will need to download

In your Two!Ears-"main" directory, please first edit TwoEarsPath.xml to point to your respective directories.

Once Matlab opened, the Two!Ears main folder needs to be added to the Matlab path. This will be accomplished by executing the following command:

>> addpath( '<path-to-your-TwoEars-Main-directory>' )

NIGENS sounds database

Training and testing is performed on the sounds of the NIGENS database. Please download it from

Trowitzsch, Ivo, et al (2019), "NIGENS general sound events database", Zenodo. http://doi.org/10.5281/zenodo.2535878

Training models

First edit train_fullstream and train_segId to set nigensPath and dataCachePath (lines 5 and 6) to your respective pathes. Then training can be executed:

>> train_segId()
>> train_fullstream()

It is, computationally, reasonable to first execute train_segId and then train_fullstream. The other way around will take a bit longer (any direction actually will take very long due to the necessary preprocessing: scene-rendering, segregation, feature construction, etc.; and use a lot of disk space (a few terrabytes...) for the data cache...).

It is fine to cancel preprocessing of the data at any time. Due to the caching mechanism of AMLTTP, preprocessing will be continued next time.

Code functionality check

To check that the code actually works without having to process all data and use so much disk space, execute

>> train_segId( 9:10, 1:4, [1,11,21,31] )
>> train_fullstream( 9:10, 1:4, [1,11,21,31] )

This will train on only ten sound files, only four scenes, and only four classes. Of course the obtained models are not able to generalize reasonably.

Testing models

You either need to first train models (see above), or unzip our trained models (fullstream_detection_models.zip and segregated_detection_models.zip, extract to directories named as the archives).

Edit gen_fullstream_testdata, gen_segId_testdata, test_fullstream and test_on_segId to set nigensPath and dataCachePath to your respective pathes. Then testing can be executed:

>> gen_segId_testdata()
>> gen_fullstream_testdata`()
>> test_fullstream()
>> test_on_segId()
>> 
>> % Testing also with loc-error and nsrcs-error, if wanted:
>> gen_segId_testdata( [], [], [], 5, 0 )    % 5deg sigma location error
>> test_on_segId( [], [], [], 5, 0 )
>> gen_segId_testdata( [], [], [], 10, 0 )   % 10deg sigma location error
>> test_on_segId( [], [], [], 10, 0 )
>> gen_segId_testdata( [], [], [], 20, 0 )   % 20deg sigma location error
>> test_on_segId( [], [], [], 20, 0 )
>> gen_segId_testdata( [], [], [], 45, 0 )   % 45deg sigma location error
>> test_on_segId( [], [], [], 45, 0 )
>> gen_segId_testdata( [], [], [], 1000, 0 ) % random localization
>> test_on_segId( [], [], [], 1000, 0 )
>> gen_segId_testdata( [], [], [], 0, -1 )   % source count error := -1
>> test_on_segId( [], [], [], 0, -1 )
>> gen_segId_testdata( [], [], [], 0, -2 )   % source count error := -2
>> test_on_segId( [], [], [], 0, -2 )
>> gen_segId_testdata( [], [], [], 0, +1 )   % source count error := +1
>> test_on_segId( [], [], [], 0, +1 )
>> gen_segId_testdata( [], [], [], 0, +2 )   % source count error := +2
>> test_on_segId( [], [], [], 0, +2 )

It is, computationally, reasonable to first execute gen_segId_testdata and then gen_fullstream_testdata. The other way around will take a bit longer (any direction actually will take very long due to the necessary preprocessing: scene-rendering, segregation, feature construction, etc.; and use a lot of disk space (a few terrabytes...) for the data cache...).

It is fine to cancel preprocessing of the data at any time. Due to the caching mechanism of AMLTTP, preprocessing will be continued next time.

Code functionality check

To check that the code actually works without having to process all data and use so much disk space, execute

>> gen_segId_testdata( 11, 1:4, [1,11,21,31] )
>> gen_fullstream_testdata`( 11, 1:4, [1,11,21,31] )
>> test_on_segId( 11, 1:4, [1,11,21,31])
>> test_fullstream( 11, 1:4, [1,11,21,31] )

This will test on only five sound files, only four scenes, and only four classes.

Evaluation

To run evaluation directly on the test data produced by us, just run:

>> eval_mc7_gt()
>> eval_mc7_locError()
>> eval_mc7_nsrcsError()

To run evaluation on test data produced by you (be it from our or your trained models)(but it must be on test data of all test scenes, not only the functionality check one from above), run:

>> testEval_collect( '../testdata/fullstream.test' )
>> testEval_collect( '../testdata/segId.on.segId_0-0.test' )
>> eval_mc7_gt( true )
>> % The following requires having tested also with loc-error and nsrcs-error.
>> testEval_collect( '../testdata/segId.on.segId_5-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_10-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_20-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_45-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_1000-0.test' )
>> testEval_collect( '../testdata/segId.on.segId_0--1.test' )
>> testEval_collect( '../testdata/segId.on.segId_0--2.test' )
>> testEval_collect( '../testdata/segId.on.segId_0-1.test' )
>> testEval_collect( '../testdata/segId.on.segId_0-2.test' )
>> eval_mc7_locError( true )
>> eval_mc7_nsrcsError( true )

After the first run of eval_mc7_gt on your data, the "true" parameter can be left away.

License

The contained materials are published under the GNU GENERAL PUBLIC LICENSE, Version 3.

Credits

If you use any contained material for your own work, please acknowledge our work by citing as

Trowitzsch, I., Schymura, C., Kolossa, D., Obermayer, K. (2019), "Joining Sound Event Detection and Localization Through Spatial Segregation", submitted, IEEE Transactions on Audio, Speech, Language Processing, https://arxiv.org/abs/1904.00055.

Furthermore, if you change the code and use subsequent results, please additionally cite

Trowitzsch, Ivo, et al (2019). "Auditory Machine Learning Training and Testing Pipeline: AMLTTP v3.0". Zenodo. http://doi.org/10.5281/zenodo.2575086

Thank you.

About

Supplementary materials to Trowitzsch et al (2019), "Joining Sound Event Detection and Localization Through Spatial Segregation", in prep., IEEE Transactions on Audio, Speech, Language Processing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages