kaldi-offline-transcriber issueshttps://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues2016-02-17T08:02:39Zhttps://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/11Installing pyfst2016-02-17T08:02:39ZTANEL ALUMÄEInstalling pyfst*Created by: siilats*
On OSX you need:
CPPFLAGS="-I/home/speech/tools/kaldi-trunk/tools/openfst/include -L/home/speech/tools/kaldi-trunk/tools/openfst/lib -stdlib=libstdc++"
pip install pyfst
*Created by: siilats*
On OSX you need:
CPPFLAGS="-I/home/speech/tools/kaldi-trunk/tools/openfst/include -L/home/speech/tools/kaldi-trunk/tools/openfst/lib -stdlib=libstdc++"
pip install pyfst
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/8No rule to make target 'build/output/intervjuu201306211256.txt'. Stop2015-11-03T18:52:34ZTANEL ALUMÄENo rule to make target 'build/output/intervjuu201306211256.txt'. Stop*Created by: vince62s*
I built the system exacty as in the readme.
make .init seems fine.
then I download the demo audio
make build/output/intervjuu201306211256.txt
gives me the error in the subject line ....
Something wrong ?
*Created by: vince62s*
I built the system exacty as in the readme.
make .init seems fine.
then I download the demo audio
make build/output/intervjuu201306211256.txt
gives me the error in the subject line ....
Something wrong ?
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/20Can't find parse_options.sh2020-10-28T13:46:42ZTANEL ALUMÄECan't find parse_options.sh*Created by: anderleich*
I cannot find _parse_options.sh_ script used by main script _speech2text.sh_*Created by: anderleich*
I cannot find _parse_options.sh_ script used by main script _speech2text.sh_https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/9Question about diarization2016-04-10T17:44:41ZTANEL ALUMÄEQuestion about diarization*Created by: vince62s*
Hi Tanel,
I got a question regarding your diarization.sh script
in the lines below, these files are language independent ?
are they coming with the package from Lium ?
if not how do we generate them ?
thanks
# ...*Created by: vince62s*
Hi Tanel,
I got a question regarding your diarization.sh script
in the lines below, these files are language independent ?
are they coming with the package from Lium ?
if not how do we generate them ?
thanks
# define the directory where the results will be saved
datadir=`dirname $uem`
# define where the UBM GMM is
ubm=models/ubm.gmm
# define where the speech / non-speech set of GMMs is
# pmsgmm=./model/sms.gmms
pmsgmm=models/sms.gmms
# define where the silence set of GMMs is
sgmm=models/s.gmms
# define where the gender and bandwidth set of GMMs (4 models) is
# (female studio, male studio, female telephone, male telephone)
ggmm=models/gender.gmms
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/10generating alignments2020-10-28T13:45:55ZTANEL ALUMÄEgenerating alignments*Created by: yasheshgaur*
Hi,
Kaldi scripts usually also generate alignments with lattices. You have both lat._.gz and ali._.gz files.
While in the offline transcriber, we only have the lattices as outputs. Is there any way to also g...*Created by: yasheshgaur*
Hi,
Kaldi scripts usually also generate alignments with lattices. You have both lat._.gz and ali._.gz files.
While in the offline transcriber, we only have the lattices as outputs. Is there any way to also generate alignments?
Thanks!https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/22Skip RNNLM model2019-10-10T06:01:33ZTANEL ALUMÄESkip RNNLM model*Created by: anjul1008*
hi all,
I have done arpa lm adaptation part with nnet3, but while decoding it required RNNLM model in this code segment:
build/trans/%/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk/decode/log: build/trans...*Created by: anjul1008*
hi all,
I have done arpa lm adaptation part with nnet3, but while decoding it required RNNLM model in this code segment:
build/trans/%/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk/decode/log: build/trans/%/$(ACOUSTIC_MODEL)_pruned_rescored_main_unk/decode/log build/fst/data/rnnlm_unk
$(info ************* RNN resocroing starting here ***************)
rm -rf build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk
mkdir -p build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk
(cd build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk; for f in ../../../fst/$(ACOUSTIC_MODEL)/*; do ln -s $$f; done)
rnnlm/lmrescore_pruned.sh \
--skip-scoring true \
--max-ngram-order 3 \
build/fst/data/largelm_unk \
build/fst/data/rnnlm_unk \
build/trans/$* \
build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_unk/decode \
build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk/decode
cp -r --preserve=links build/trans/$*/$(ACOUSTIC_MODEL)_pruned_unk/graph build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk/
I don't have RNNLM model, is there any way to skip the RNNLM part.https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/15about the model files:gender.gmms,s.gmms, sms.gmms, ubm.gmm2017-06-15T01:54:07ZTANEL ALUMÄEabout the model files:gender.gmms,s.gmms, sms.gmms, ubm.gmm*Created by: shunfeichen*
Hello,
Recently,I want to use LIUM and Kaldi for ASR. But when I use LIUM ,the original models(gender.gmms,s.gmms, sms.gmms, ubm.gmm) are trained by French ,and how to train the four models using my Eng...*Created by: shunfeichen*
Hello,
Recently,I want to use LIUM and Kaldi for ASR. But when I use LIUM ,the original models(gender.gmms,s.gmms, sms.gmms, ubm.gmm) are trained by French ,and how to train the four models using my English corpus?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/7Makefile:106: recipe for target 'build/fst/data/largelm' failed2016-03-19T16:57:28ZTANEL ALUMÄEMakefile:106: recipe for target 'build/fst/data/largelm' failed*Created by: raitraidma*
Hello.
I was trying to setup VM by using Vagrant. I managed to pass all previous steps in readme but `make .init` gives an error after running a while:
`HCLGa is not stochastic
add-self-loops --self-loop-scale=...*Created by: raitraidma*
Hello.
I was trying to setup VM by using Vagrant. I managed to pass all previous steps in readme but `make .init` gives an error after running a while:
`HCLGa is not stochastic
add-self-loops --self-loop-scale=0.1 --reorder=true build/fst/nnet2_online_ivector/final.mdl
rm -rf build/fst/data/largelm
mkdir -p build/fst/data/largelm
utils/build_const_arpa_lm.sh \
language_model/pruned.vestlused-dev.splitw2.arpa.gz build/fst/data/prunedlm build/fst/data/largelm
arpa-to-const-arpa --bos-symbol=199694 --eos-symbol=199695 --unk-symbol=29439 'gunzip -c language_model/pruned.vestlused-dev.splitw2.arpa.gz | utils/map_arpa_lm.pl build/fst/data/largelm/words.txt|' build/fst/data/largelm/G.carpa
utils/map_arpa_lm.pl: Processing "\data\"
utils/map_arpa_lm.pl: Processing "\1-grams:\"
LOG (arpa-to-const-arpa:Read():const-arpa-lm.cc:310) Reading "\data\" section.
LOG (arpa-to-const-arpa:Read():const-arpa-lm.cc:357) Reading "\1-grams:" section.
utils/map_arpa_lm.pl: Processing "\2-grams:\"
LOG (arpa-to-const-arpa:Read():const-arpa-lm.cc:357) Reading "\2-grams:" section.
utils/map_arpa_lm.pl: Processing "\3-grams:\"
LOG (arpa-to-const-arpa:Read():const-arpa-lm.cc:357) Reading "\3-grams:" section.
utils/build_const_arpa_lm.sh: line 48: 18956 Killed arpa-to-const-arpa --bos-symbol=$bos --eos-symbol=$eos --unk-symbol=$unk "gunzip -c $arpa_lm | utils/map_arpa_lm.pl $new_lang/words.txt|" $new_lang/G.carpa
Makefile:106: recipe for target 'build/fst/data/largelm' failed
make: *** [build/fst/data/largelm] Error 1`
I don't know much about Makefiles, but to me it seems that when in Makefile recepie `build/fst/data/largelm` is called, it calls `utils/build_const_arpa_lm.sh` but `utils/build_const_arpa_lm.sh` does not get the arguments. (At least variables like $new_lang are not changed)
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/21File missing on make2020-09-20T08:52:27ZTANEL ALUMÄEFile missing on make*Created by: ngopee*
Hi,
I get this error on make .init:
```
utils/prepare_lang.sh: expected --unk-fst build/fst/data/unk_lang_model/unk_fst.txt to exist as a file
Makefile:123: recipe for target 'build/fst/data/prunedlm_unk' fa...*Created by: ngopee*
Hi,
I get this error on make .init:
```
utils/prepare_lang.sh: expected --unk-fst build/fst/data/unk_lang_model/unk_fst.txt to exist as a file
Makefile:123: recipe for target 'build/fst/data/prunedlm_unk' failed
make: *** [build/fst/data/prunedlm_unk] Error 1
```
Any idea on what would cause this file to be missing?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/16Multi-core, multi-threading - possible?2020-10-28T13:47:36ZTANEL ALUMÄEMulti-core, multi-threading - possible?*Created by: lkraav*
8-core machine could plow through diarization faster if parallelized - what's the biggest complexity stopping us from having it?*Created by: lkraav*
8-core machine could plow through diarization faster if parallelized - what's the biggest complexity stopping us from having it?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/19Unable to login to cloud.canister.io with docker login2018-10-11T06:38:46ZTANEL ALUMÄEUnable to login to cloud.canister.io with docker login*Created by: zmaslem*
docker login is unavailable, getting an error: request canceled (Client.Timeout exceeded while awaiting headers)
Is it going to be available?
*Created by: zmaslem*
docker login is unavailable, getting an error: request canceled (Client.Timeout exceeded while awaiting headers)
Is it going to be available?
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/14Drop pyfst, migrate to openfst python bindings?2017-09-28T11:51:40ZTANEL ALUMÄEDrop pyfst, migrate to openfst python bindings?*Created by: lkraav*
Recent kaldi requires recent openfst-1.6.x, but pyfst has trouble building with it.
```
fst/_fst.cpp:30878:191: note: candidate is:
In file included from /usr/include/fst/script/draw.h:10:0,
f...*Created by: lkraav*
Recent kaldi requires recent openfst-1.6.x, but pyfst has trouble building with it.
```
fst/_fst.cpp:30878:191: note: candidate is:
In file included from /usr/include/fst/script/draw.h:10:0,
from fst/_fst.cpp:293:
/usr/include/fst/script/draw-impl.h:29:3: note: fst::FstDrawer<Arc>::FstDrawer(const fst::Fst<Arc>&, const fst::SymbolTable*, const fst::SymbolTable*, const fst::SymbolTable*, bool, const string&, float, float, bo
ol, bool, float, float, int, int, const string&, bool) [with Arc = fst::ArcTpl<fst::LogWeightTpl<float> >; std::string = std::basic_string<char>]
FstDrawer(const Fst<Arc> &fst, const SymbolTable *isyms,
^
/usr/include/fst/script/draw-impl.h:29:3: note: candidate expects 16 arguments, 15 provided
```
openfst has built-in python bindings since 1.5. Is it within reach to drop pyfst and migrate over?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/2transcription of sample intervjuu201306211256.mp3 fails2014-12-02T10:18:05ZTANEL ALUMÄEtranscription of sample intervjuu201306211256.mp3 fails*Created by: archibaldhaddock*
Hello,
I've just tried to decode intervjuu201306211256.mp3.
Everything seems to work fine until the "final pass of acoustic scoring". I've got this message :
steps/decode_nnet.sh: missing file build/t...*Created by: archibaldhaddock*
Hello,
I've just tried to decode intervjuu201306211256.mp3.
Everything seems to work fine until the "final pass of acoustic scoring". I've got this message :
steps/decode_nnet.sh: missing file build/trans/intervjuu201306211256/nnet5c1_pruned/final.nnet
make: **\* [build/trans/intervjuu201306211256/nnet5c1_pruned/decode/log] Erreur 1
The complete output of the command "make build/output/intervjuu201306211256.txt" can be found here :
http://wikisend.com/download/446498/make.interview.out
And the make .init output is there :
http://wikisend.com/download/376740/make.init.out
Thanks a lot !
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/17kaldi-5.2 compatibility?2020-10-28T13:46:31ZTANEL ALUMÄEkaldi-5.2 compatibility?*Created by: lkraav*
I updated my kaldi copy to current latest 5.1.114 on that branch and everything seems to work well. @alumae have you tested on 5.2 already?*Created by: lkraav*
I updated my kaldi copy to current latest 5.1.114 on that branch and everything seems to work well. @alumae have you tested on 5.2 already?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/12fixes ?2020-10-28T13:46:11ZTANEL ALUMÄEfixes ?*Created by: vince62s*
Hi,
I think you have twice the same section in your makefile
build/audio/segmented/%: build/diarization/%/show.seg
Also I do not fully understand this line
build/trans/%/nnet2_online_ivector_pruned/decode/log: bu...*Created by: vince62s*
Hi,
I think you have twice the same section in your makefile
build/audio/segmented/%: build/diarization/%/show.seg
Also I do not fully understand this line
build/trans/%/nnet2_online_ivector_pruned/decode/log: build/fst/nnet2_online_ivector/final.mdl build/fst/nnet2_online_ivector/graph_prunedlm build/trans/%/spk2utt build/trans/%/mfcc
I think
build/trans/%/spk2utt is redundant because called already by build/trans/%/mfcc
Cheers,https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/6Attempting to adapt to English2019-02-04T16:19:34ZTANEL ALUMÄEAttempting to adapt to English*Created by: aolney*
I'm a Kaldi noob but interested in using your set up for English. I looked at your other project and the Kaldi discussion boards, and this model seems like a good fit
http://kaldi-asr.org/downloads/build/8/trunk/
...*Created by: aolney*
I'm a Kaldi noob but interested in using your set up for English. I looked at your other project and the Kaldi discussion boards, and this model seems like a good fit
http://kaldi-asr.org/downloads/build/8/trunk/
However I'm not sure how to adapt your Makefile to use the new model. It seems I would need to at least swap out these lines:
```
# Main language model (should be slightly pruned), used for rescoring
LM ?=language_model/pruned.vestlused-dev.splitw2.arpa.gz
# More aggressively pruned LM, used in decoding
PRUNED_LM ?=language_model/pruned6.vestlused-dev.splitw2.arpa.gz
COMPOUNDER_LM ?=language_model/compounder-pruned.vestlused-dev.splitw.arpa.gz
# Vocabulary in dict format (no pronouncation probs for now)
VOCAB?=language_model/vestlused-dev.splitw2.dict
```
but I'm not finding comparable files in Fisher.
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/18README: Speaker ID process clarification2017-09-28T12:42:32ZTANEL ALUMÄEREADME: Speaker ID process clarification*Created by: lkraav*
Perhaps the README could clarify what the expected process output is when speaker ID feature is enabled? What is supposed to look different in the text output compared to disabling speaker ID. Is it possible to give...*Created by: lkraav*
Perhaps the README could clarify what the expected process output is when speaker ID feature is enabled? What is supposed to look different in the text output compared to disabling speaker ID. Is it possible to give speakers names via some transcription configuration file, or is that post-text-editing work?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/5Missing file scripts/ctm2srt.py2014-12-05T17:25:37ZTANEL ALUMÄEMissing file scripts/ctm2srt.py*Created by: riebling*
In the latest version I see that the Makefile has a section like this:
%/decode/srt: %/decode/.ctm
echo `dirname $*`
cat $_/decode/score_$(LM_SCALE)/_.ctm | perl -npe 's/(._)-(S\d+)---(\S+)/\1_\3_...*Created by: riebling*
In the latest version I see that the Makefile has a section like this:
%/decode/srt: %/decode/.ctm
echo `dirname $*`
cat $_/decode/score_$(LM_SCALE)/_.ctm | perl -npe 's/(._)-(S\d+)---(\S+)/\1_\3_\2/' | python scripts/unsegment-ctm.py | LC_ALL=C sort -k 1,1 -k 3,3n -k 4,4n | python scripts/ctm2srt.py > $_/decode/`basename \`dirname $*``.srt
But it seems the script scripts/ctm2srt.py is not present in this repository.
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/13Problem with Makefile.options file2016-09-14T16:55:07ZTANEL ALUMÄEProblem with Makefile.options file*Created by: Wickee*
Created a Makefile.options file with the following:
`KALDI_ROOT=/home/$USER/tools/kaldi`
But when running the makefile, somehow the symlinks created, namely sid, steps and utils, are broken. When I examined the ...*Created by: Wickee*
Created a Makefile.options file with the following:
`KALDI_ROOT=/home/$USER/tools/kaldi`
But when running the makefile, somehow the symlinks created, namely sid, steps and utils, are broken. When I examined the symlinks, instead of the expansion for `$USER, I see`SER`. I do not know why this happens since I am not well versed in programming in general and linux shell in particular. When I change the Makefile.options to use the expanded username instead of the variable, the script runs fine.
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/4Implement one-pass decoding using Kaldi's nnet2 online setup2014-12-02T10:17:36ZTANEL ALUMÄEImplement one-pass decoding using Kaldi's nnet2 online setup*Created by: alumae*
Offer a faster alternative to 3-pass decoding using Kaldi's new "nnet2" online decoding functionality. Should offer about the same accuracy (opmpare WER).
*Created by: alumae*
Offer a faster alternative to 3-pass decoding using Kaldi's new "nnet2" online decoding functionality. Should offer about the same accuracy (opmpare WER).