kaldi-offline-transcriber issueshttps://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues2015-11-03T18:52:34Zhttps://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/8No rule to make target 'build/output/intervjuu201306211256.txt'. Stop2015-11-03T18:52:34ZTANEL ALUMÄENo rule to make target 'build/output/intervjuu201306211256.txt'. Stop*Created by: vince62s*
I built the system exacty as in the readme.
make .init seems fine.
then I download the demo audio
make build/output/intervjuu201306211256.txt
gives me the error in the subject line ....
Something wrong ?
*Created by: vince62s*
I built the system exacty as in the readme.
make .init seems fine.
then I download the demo audio
make build/output/intervjuu201306211256.txt
gives me the error in the subject line ....
Something wrong ?
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/20Can't find parse_options.sh2020-10-28T13:46:42ZTANEL ALUMÄECan't find parse_options.sh*Created by: anderleich*
I cannot find _parse_options.sh_ script used by main script _speech2text.sh_*Created by: anderleich*
I cannot find _parse_options.sh_ script used by main script _speech2text.sh_https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/9Question about diarization2016-04-10T17:44:41ZTANEL ALUMÄEQuestion about diarization*Created by: vince62s*
Hi Tanel,
I got a question regarding your diarization.sh script
in the lines below, these files are language independent ?
are they coming with the package from Lium ?
if not how do we generate them ?
thanks
# ...*Created by: vince62s*
Hi Tanel,
I got a question regarding your diarization.sh script
in the lines below, these files are language independent ?
are they coming with the package from Lium ?
if not how do we generate them ?
thanks
# define the directory where the results will be saved
datadir=`dirname $uem`
# define where the UBM GMM is
ubm=models/ubm.gmm
# define where the speech / non-speech set of GMMs is
# pmsgmm=./model/sms.gmms
pmsgmm=models/sms.gmms
# define where the silence set of GMMs is
sgmm=models/s.gmms
# define where the gender and bandwidth set of GMMs (4 models) is
# (female studio, male studio, female telephone, male telephone)
ggmm=models/gender.gmms
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/10generating alignments2020-10-28T13:45:55ZTANEL ALUMÄEgenerating alignments*Created by: yasheshgaur*
Hi,
Kaldi scripts usually also generate alignments with lattices. You have both lat._.gz and ali._.gz files.
While in the offline transcriber, we only have the lattices as outputs. Is there any way to also g...*Created by: yasheshgaur*
Hi,
Kaldi scripts usually also generate alignments with lattices. You have both lat._.gz and ali._.gz files.
While in the offline transcriber, we only have the lattices as outputs. Is there any way to also generate alignments?
Thanks!https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/22Skip RNNLM model2019-10-10T06:01:33ZTANEL ALUMÄESkip RNNLM model*Created by: anjul1008*
hi all,
I have done arpa lm adaptation part with nnet3, but while decoding it required RNNLM model in this code segment:
build/trans/%/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk/decode/log: build/trans...*Created by: anjul1008*
hi all,
I have done arpa lm adaptation part with nnet3, but while decoding it required RNNLM model in this code segment:
build/trans/%/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk/decode/log: build/trans/%/$(ACOUSTIC_MODEL)_pruned_rescored_main_unk/decode/log build/fst/data/rnnlm_unk
$(info ************* RNN resocroing starting here ***************)
rm -rf build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk
mkdir -p build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk
(cd build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk; for f in ../../../fst/$(ACOUSTIC_MODEL)/*; do ln -s $$f; done)
rnnlm/lmrescore_pruned.sh \
--skip-scoring true \
--max-ngram-order 3 \
build/fst/data/largelm_unk \
build/fst/data/rnnlm_unk \
build/trans/$* \
build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_unk/decode \
build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk/decode
cp -r --preserve=links build/trans/$*/$(ACOUSTIC_MODEL)_pruned_unk/graph build/trans/$*/$(ACOUSTIC_MODEL)_pruned_rescored_main_rnnlm_unk/
I don't have RNNLM model, is there any way to skip the RNNLM part.https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/15about the model files:gender.gmms,s.gmms, sms.gmms, ubm.gmm2017-06-15T01:54:07ZTANEL ALUMÄEabout the model files:gender.gmms,s.gmms, sms.gmms, ubm.gmm*Created by: shunfeichen*
Hello,
Recently,I want to use LIUM and Kaldi for ASR. But when I use LIUM ,the original models(gender.gmms,s.gmms, sms.gmms, ubm.gmm) are trained by French ,and how to train the four models using my Eng...*Created by: shunfeichen*
Hello,
Recently,I want to use LIUM and Kaldi for ASR. But when I use LIUM ,the original models(gender.gmms,s.gmms, sms.gmms, ubm.gmm) are trained by French ,and how to train the four models using my English corpus?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/7Makefile:106: recipe for target 'build/fst/data/largelm' failed2016-03-19T16:57:28ZTANEL ALUMÄEMakefile:106: recipe for target 'build/fst/data/largelm' failed*Created by: raitraidma*
Hello.
I was trying to setup VM by using Vagrant. I managed to pass all previous steps in readme but `make .init` gives an error after running a while:
`HCLGa is not stochastic
add-self-loops --self-loop-scale=...*Created by: raitraidma*
Hello.
I was trying to setup VM by using Vagrant. I managed to pass all previous steps in readme but `make .init` gives an error after running a while:
`HCLGa is not stochastic
add-self-loops --self-loop-scale=0.1 --reorder=true build/fst/nnet2_online_ivector/final.mdl
rm -rf build/fst/data/largelm
mkdir -p build/fst/data/largelm
utils/build_const_arpa_lm.sh \
language_model/pruned.vestlused-dev.splitw2.arpa.gz build/fst/data/prunedlm build/fst/data/largelm
arpa-to-const-arpa --bos-symbol=199694 --eos-symbol=199695 --unk-symbol=29439 'gunzip -c language_model/pruned.vestlused-dev.splitw2.arpa.gz | utils/map_arpa_lm.pl build/fst/data/largelm/words.txt|' build/fst/data/largelm/G.carpa
utils/map_arpa_lm.pl: Processing "\data\"
utils/map_arpa_lm.pl: Processing "\1-grams:\"
LOG (arpa-to-const-arpa:Read():const-arpa-lm.cc:310) Reading "\data\" section.
LOG (arpa-to-const-arpa:Read():const-arpa-lm.cc:357) Reading "\1-grams:" section.
utils/map_arpa_lm.pl: Processing "\2-grams:\"
LOG (arpa-to-const-arpa:Read():const-arpa-lm.cc:357) Reading "\2-grams:" section.
utils/map_arpa_lm.pl: Processing "\3-grams:\"
LOG (arpa-to-const-arpa:Read():const-arpa-lm.cc:357) Reading "\3-grams:" section.
utils/build_const_arpa_lm.sh: line 48: 18956 Killed arpa-to-const-arpa --bos-symbol=$bos --eos-symbol=$eos --unk-symbol=$unk "gunzip -c $arpa_lm | utils/map_arpa_lm.pl $new_lang/words.txt|" $new_lang/G.carpa
Makefile:106: recipe for target 'build/fst/data/largelm' failed
make: *** [build/fst/data/largelm] Error 1`
I don't know much about Makefiles, but to me it seems that when in Makefile recepie `build/fst/data/largelm` is called, it calls `utils/build_const_arpa_lm.sh` but `utils/build_const_arpa_lm.sh` does not get the arguments. (At least variables like $new_lang are not changed)
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/21File missing on make2020-09-20T08:52:27ZTANEL ALUMÄEFile missing on make*Created by: ngopee*
Hi,
I get this error on make .init:
```
utils/prepare_lang.sh: expected --unk-fst build/fst/data/unk_lang_model/unk_fst.txt to exist as a file
Makefile:123: recipe for target 'build/fst/data/prunedlm_unk' fa...*Created by: ngopee*
Hi,
I get this error on make .init:
```
utils/prepare_lang.sh: expected --unk-fst build/fst/data/unk_lang_model/unk_fst.txt to exist as a file
Makefile:123: recipe for target 'build/fst/data/prunedlm_unk' failed
make: *** [build/fst/data/prunedlm_unk] Error 1
```
Any idea on what would cause this file to be missing?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/16Multi-core, multi-threading - possible?2020-10-28T13:47:36ZTANEL ALUMÄEMulti-core, multi-threading - possible?*Created by: lkraav*
8-core machine could plow through diarization faster if parallelized - what's the biggest complexity stopping us from having it?*Created by: lkraav*
8-core machine could plow through diarization faster if parallelized - what's the biggest complexity stopping us from having it?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/19Unable to login to cloud.canister.io with docker login2018-10-11T06:38:46ZTANEL ALUMÄEUnable to login to cloud.canister.io with docker login*Created by: zmaslem*
docker login is unavailable, getting an error: request canceled (Client.Timeout exceeded while awaiting headers)
Is it going to be available?
*Created by: zmaslem*
docker login is unavailable, getting an error: request canceled (Client.Timeout exceeded while awaiting headers)
Is it going to be available?
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/14Drop pyfst, migrate to openfst python bindings?2017-09-28T11:51:40ZTANEL ALUMÄEDrop pyfst, migrate to openfst python bindings?*Created by: lkraav*
Recent kaldi requires recent openfst-1.6.x, but pyfst has trouble building with it.
```
fst/_fst.cpp:30878:191: note: candidate is:
In file included from /usr/include/fst/script/draw.h:10:0,
f...*Created by: lkraav*
Recent kaldi requires recent openfst-1.6.x, but pyfst has trouble building with it.
```
fst/_fst.cpp:30878:191: note: candidate is:
In file included from /usr/include/fst/script/draw.h:10:0,
from fst/_fst.cpp:293:
/usr/include/fst/script/draw-impl.h:29:3: note: fst::FstDrawer<Arc>::FstDrawer(const fst::Fst<Arc>&, const fst::SymbolTable*, const fst::SymbolTable*, const fst::SymbolTable*, bool, const string&, float, float, bo
ol, bool, float, float, int, int, const string&, bool) [with Arc = fst::ArcTpl<fst::LogWeightTpl<float> >; std::string = std::basic_string<char>]
FstDrawer(const Fst<Arc> &fst, const SymbolTable *isyms,
^
/usr/include/fst/script/draw-impl.h:29:3: note: candidate expects 16 arguments, 15 provided
```
openfst has built-in python bindings since 1.5. Is it within reach to drop pyfst and migrate over?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/2transcription of sample intervjuu201306211256.mp3 fails2014-12-02T10:18:05ZTANEL ALUMÄEtranscription of sample intervjuu201306211256.mp3 fails*Created by: archibaldhaddock*
Hello,
I've just tried to decode intervjuu201306211256.mp3.
Everything seems to work fine until the "final pass of acoustic scoring". I've got this message :
steps/decode_nnet.sh: missing file build/t...*Created by: archibaldhaddock*
Hello,
I've just tried to decode intervjuu201306211256.mp3.
Everything seems to work fine until the "final pass of acoustic scoring". I've got this message :
steps/decode_nnet.sh: missing file build/trans/intervjuu201306211256/nnet5c1_pruned/final.nnet
make: **\* [build/trans/intervjuu201306211256/nnet5c1_pruned/decode/log] Erreur 1
The complete output of the command "make build/output/intervjuu201306211256.txt" can be found here :
http://wikisend.com/download/446498/make.interview.out
And the make .init output is there :
http://wikisend.com/download/376740/make.init.out
Thanks a lot !
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/17kaldi-5.2 compatibility?2020-10-28T13:46:31ZTANEL ALUMÄEkaldi-5.2 compatibility?*Created by: lkraav*
I updated my kaldi copy to current latest 5.1.114 on that branch and everything seems to work well. @alumae have you tested on 5.2 already?*Created by: lkraav*
I updated my kaldi copy to current latest 5.1.114 on that branch and everything seems to work well. @alumae have you tested on 5.2 already?https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/12fixes ?2020-10-28T13:46:11ZTANEL ALUMÄEfixes ?*Created by: vince62s*
Hi,
I think you have twice the same section in your makefile
build/audio/segmented/%: build/diarization/%/show.seg
Also I do not fully understand this line
build/trans/%/nnet2_online_ivector_pruned/decode/log: bu...*Created by: vince62s*
Hi,
I think you have twice the same section in your makefile
build/audio/segmented/%: build/diarization/%/show.seg
Also I do not fully understand this line
build/trans/%/nnet2_online_ivector_pruned/decode/log: build/fst/nnet2_online_ivector/final.mdl build/fst/nnet2_online_ivector/graph_prunedlm build/trans/%/spk2utt build/trans/%/mfcc
I think
build/trans/%/spk2utt is redundant because called already by build/trans/%/mfcc
Cheers,https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/5Missing file scripts/ctm2srt.py2014-12-05T17:25:37ZTANEL ALUMÄEMissing file scripts/ctm2srt.py*Created by: riebling*
In the latest version I see that the Makefile has a section like this:
%/decode/srt: %/decode/.ctm
echo `dirname $*`
cat $_/decode/score_$(LM_SCALE)/_.ctm | perl -npe 's/(._)-(S\d+)---(\S+)/\1_\3_...*Created by: riebling*
In the latest version I see that the Makefile has a section like this:
%/decode/srt: %/decode/.ctm
echo `dirname $*`
cat $_/decode/score_$(LM_SCALE)/_.ctm | perl -npe 's/(._)-(S\d+)---(\S+)/\1_\3_\2/' | python scripts/unsegment-ctm.py | LC_ALL=C sort -k 1,1 -k 3,3n -k 4,4n | python scripts/ctm2srt.py > $_/decode/`basename \`dirname $*``.srt
But it seems the script scripts/ctm2srt.py is not present in this repository.
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/4Implement one-pass decoding using Kaldi's nnet2 online setup2014-12-02T10:17:36ZTANEL ALUMÄEImplement one-pass decoding using Kaldi's nnet2 online setup*Created by: alumae*
Offer a faster alternative to 3-pass decoding using Kaldi's new "nnet2" online decoding functionality. Should offer about the same accuracy (opmpare WER).
*Created by: alumae*
Offer a faster alternative to 3-pass decoding using Kaldi's new "nnet2" online decoding functionality. Should offer about the same accuracy (opmpare WER).
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/1final.mdl missing2014-08-08T14:42:42ZTANEL ALUMÄEfinal.mdl missing*Created by: skpvox*
I'm receiving the following error when running it on the example audio file:
(Diarization output has been omitted).
Any ideas?
```
[...]
rm -rf build/audio/segmented/intervjuu201306211256
mkdir -p build/audio/seg...*Created by: skpvox*
I'm receiving the following error when running it on the example audio file:
(Diarization output has been omitted).
Any ideas?
```
[...]
rm -rf build/audio/segmented/intervjuu201306211256
mkdir -p build/audio/segmented/intervjuu201306211256
cat build/diarization/intervjuu201306211256/show.seg | cut -f 3,4,8 -d " " | \
while read LINE ; do \
start=`echo $LINE | cut -f 1 -d " " | perl -npe '$_=$_/100.0'`; \
len=`echo $LINE | cut -f 2 -d " " | perl -npe '$_=$_/100.0'`; \
sp_id=`echo $LINE | cut -f 3 -d " "`; \
timeformatted=`echo "$start $len" | perl -ne '@t=split(); $start=$t[0]; $len=$t[1]; $end=$start+$len; printf("%08.3f-%08.3f\n", $start,$end);'` ; \
sox build/audio/base/intervjuu201306211256.wav --norm build/audio/segmented/intervjuu201306211256/intervjuu201306211256_${timeformatted}_${sp_id}.wav trim $start $len ; \
done
sox WARN dither: dither clipped 1 samples; decrease volume?
sox WARN dither: dither clipped 1 samples; decrease volume?
sox WARN dither: dither clipped 1 samples; decrease volume?
sox WARN dither: dither clipped 1 samples; decrease volume?
mkdir -p `dirname build/trans/intervjuu201306211256/wav.scp`
/bin/ls build/audio/segmented/intervjuu201306211256/*.wav | \
perl -npe 'chomp; $orig=$_; s/.*\/(.*)_(\d+\.\d+-\d+\.\d+)_(S\d+)\.wav/\1-\3---\2/; $_=$_ . " $orig\n";' | LC_ALL=C sort > build/trans/intervjuu201306211256/wav.scp
cat build/trans/intervjuu201306211256/wav.scp | perl -npe 's/\s+.*//; s/((.*)---.*)/\1 \2/' > build/trans/intervjuu201306211256/utt2spk
utils/utt2spk_to_spk2utt.pl build/trans/intervjuu201306211256/utt2spk > build/trans/intervjuu201306211256/spk2utt
rm -rf build/trans/intervjuu201306211256/mfcc
steps/make_mfcc.sh --mfcc-config conf/mfcc.conf --cmd "$train_cmd" --nj 1 \
build/trans/intervjuu201306211256 build/trans/intervjuu201306211256/exp/make_mfcc build/trans/intervjuu201306211256/mfcc || exit 1
steps/make_mfcc.sh --mfcc-config conf/mfcc.conf --cmd run.pl --nj 1 build/trans/intervjuu201306211256 build/trans/intervjuu201306211256/exp/make_mfcc build/trans/intervjuu201306211256/mfcc
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
Succeeded creating MFCC features for intervjuu201306211256
steps/compute_cmvn_stats.sh build/trans/intervjuu201306211256 build/trans/intervjuu201306211256/exp/make_mfcc build/trans/intervjuu201306211256/mfcc || exit 1;
steps/compute_cmvn_stats.sh build/trans/intervjuu201306211256 build/trans/intervjuu201306211256/exp/make_mfcc build/trans/intervjuu201306211256/mfcc
Succeeded creating CMVN stats for intervjuu201306211256
rm -rf build/trans/intervjuu201306211256/tri3b_mmi_pruned
mkdir -p build/trans/intervjuu201306211256/tri3b_mmi_pruned
(cd build/trans/intervjuu201306211256/tri3b_mmi_pruned; for f in ../../../fst/tri3b_mmi/*; do ln -s $f; done)
steps/decode_fmllr.sh --num-threads 10 --config conf/decode.conf --skip-scoring true --nj 1 --cmd "$decode_cmd" \
--alignment-model build/fst/tri3b/final.alimdl --adapt-model build/fst/tri3b/final.mdl \
build/fst/tri3b/graph_prunedlm build/trans/intervjuu201306211256 `dirname build/trans/intervjuu201306211256/tri3b_mmi_pruned/decode/log`
steps/decode_fmllr.sh --num-threads 10 --config conf/decode.conf --skip-scoring true --nj 1 --cmd run.pl --alignment-model build/fst/tri3b/final.alimdl --adapt-model build/fst/tri3b/final.mdl build/fst/tri3b/graph_prunedlm build/trans/intervjuu201306211256 build/trans/intervjuu201306211256/tri3b_mmi_pruned/decode
cat: build/trans/intervjuu201306211256/text: No such file or directory
steps/decode.sh --parallel-opts --scoring-opts --num-threads 10 --skip-scoring true --acwt 0.083333 --nj 1 --cmd run.pl --beam 10.0 --model build/fst/tri3b/final.alimdl --max-arcs -1 --max-active 2000 build/fst/tri3b/graph_prunedlm build/trans/intervjuu201306211256 build/trans/intervjuu201306211256/tri3b_mmi_pruned/decode.si
decode.sh: feature type is lda
steps/decode_fmllr.sh: no such file build/trans/intervjuu201306211256/tri3b_mmi_pruned/final.mdl
make: *** [build/trans/intervjuu201306211256/tri3b_mmi_pruned/decode/log] Error 1
```
https://koodivaramu.eesti.ee/taltechnlp/kaldi-offline-transcriber/-/issues/3segmentation of phones?2014-08-08T14:42:27ZTANEL ALUMÄEsegmentation of phones?*Created by: drokia2*
In the readme it says that the kaldi-offline transcriber does speech segementation. Does it do this of phones? and if so how can I get it to spit out phones if I input some mp3 or other sound file? Thanks I dug ar...*Created by: drokia2*
In the readme it says that the kaldi-offline transcriber does speech segementation. Does it do this of phones? and if so how can I get it to spit out phones if I input some mp3 or other sound file? Thanks I dug around for a bit but it seemed that it only outputted the speech's words