Make force alignment accessible from pocketsphinx_batch and the ps_decoder API #144

dhdaines · 2018-09-09T03:09:31Z

This provides a simple (maybe too simple) API for doing force alignment as well as a command-line interface for it via pocketsphinx_batch. This works like any other kind of search, you do:

ps_set_align(decoder, name, text); /* where name is whatever you like, and text is the transcription */
ps_set_search(name);
/* now decode as usual, and get the alignment using ps_seg_iter() */

In pocketsphinx_batch there is -alignctl, -aligndir and -alignext, these point you to a control file with transcription files (one file per utterance), the directory and file extensions.

The transcription is expected to be whitespace-separated tokens. It will add the <s> and </s> tokens for you, which may or may not be the right thing to do (perhaps we should just add them if they aren't present).

This will only do word alignments even though it is capable of doing more than that, because that's all the ps_seg_iter interface allows. We should probably fix that. In the near term I will add output to TextGrid files to the batch interface so we can get the phone segmentation that way, and also be drop-in compatible with the Montreal Force Aligner.

…imal command line interface in pocketsphinx_batch which allows you to do force alignment of transcripts. Test it out on test/data/librivox, also there is a unit test.

nshmyrev · 2018-09-09T07:02:24Z

Welcome back, David!

dhdaines · 2018-09-10T01:49:50Z

Hi! Thanks!

…

On Sun, Sep 9, 2018 at 3:02 AM, Nickolay V. Shmyrev < ***@***.***> wrote: Welcome back, David! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#144 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADK8UEyNbsEaTk1cfMwdL2o0S0tVyp33ks5uZL0EgaJpZM4WgHyT> .

dhdaines · 2018-12-24T18:45:44Z

Hi! I realized that this branch doesn't exactly do what the user would expect for force alignment. The issue is that state_align search wasn't actually designed to do force alignment - it really just aligns a state sequence to a feature sequence. The reason why I wrote it in the first place was for two purposes:

To obtain exact phone alignments from ASR output
To collect state occupation counts for speaker adaptation

So, there are some things that force alignment should do which it doesn't do, specifically:

It won't insert optional silences between words or at the start or end of the utterance
It won't choose between alternate pronunciations

On the other hand, I have gotten very good results by using FSG search for force alignment at the word level - this is because it is already equipped to do the stuff mentioned above.

The other issue is that VAD and noise removal must be turned off for force alignment, because otherwise the output timestamps won't necessarily correspond to the input.

Since the state alignment search isn't useful on its own I would like to switch the meaning of the "alignment" interface in pocketsphinx_batch to do traditional force alignment with FSG search. In addition I would probably add something to either force noise removal and VAD off if -adcin is enabled, because this behaviour is very unexpected to the user in this case (even if it is super useful for ASR).

Hook up the force alignment code to the ps_decoder API, and add a min…

e734d3b

…imal command line interface in pocketsphinx_batch which allows you to do force alignment of transcripts. Test it out on test/data/librivox, also there is a unit test.

nshmyrev merged commit f54f6c3 into cmusphinx:master Sep 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make force alignment accessible from pocketsphinx_batch and the ps_decoder API #144

Make force alignment accessible from pocketsphinx_batch and the ps_decoder API #144

dhdaines commented Sep 9, 2018 •

edited

Loading

nshmyrev commented Sep 9, 2018

dhdaines commented Sep 10, 2018 via email

dhdaines commented Dec 24, 2018

Make force alignment accessible from pocketsphinx_batch and the ps_decoder API #144

Make force alignment accessible from pocketsphinx_batch and the ps_decoder API #144

Conversation

dhdaines commented Sep 9, 2018 • edited Loading

nshmyrev commented Sep 9, 2018

dhdaines commented Sep 10, 2018 via email

dhdaines commented Dec 24, 2018

dhdaines commented Sep 9, 2018 •

edited

Loading