Interview
Project 295B
Genesys Project
Dictionary
12/20/10 : Project Start (Location : VisualHTK2 - Alphabet)
hmm9
test.scp
monophones1
dict
wdnet
te-cue+dict_MA.mfc
Include “HTKLib” folder
Include all ”.c” files from the folder
Add ”#define ARCH “WIN32”” in “esignal.c”
Remove “HGraf.c”
Remove “HGraf.null.c”
Program argument : “HVite -H hmm9/macros -H hmm9/hmmdefs -S test.scp -i recout_m.mlf -w wdnet -n 10 10 -p 0.0 -s 5.0 dict monophones1”
After these steps, “math” problem will be gone.
DoRecognition() in HVite.c
Recognition batch file : recog.bat
Monophones
hmm9, test.scp, wdnet, dict, monophones1, te-*.mfc
hmm9, monophones1 : one time job
test.scp : updated by genscrs.exe
dict : updated by step_03.bat
te-*.mfc : updated by step_08.bat
Triphones
hmm15, test.scp, rdnet, dict, tiedlist, te-*.mfc
genscrs.exe : Read ”.wav” files in a directory and generate files following :
“myprompts.txt”
“codetr.scp”
“codete.scp”
“train.scp”
“test.scp”
“adapt.scp”
“promptsadapt.txt”
“promptstest.txt”
“lmtext.txt”
“labfile.txt”
step_00.bat :
perl prompts2wlist myprompts.txt wlist
addmissingwords.exe : Open “wlist” and “missingwords”, and update “wlist”
out : “wlist”
step_01.bat :
out : “dlog”, “dict”, “monophones1”, “dict_nosp”

HDMan -m -w wlist -g global.ded -n
monophones1 -l
dlog dict cmudict.0.7a_htk_100418 names

HDMan -m -w wlist -n
monophones1 -l
dlog dict_nosp cmudict.0.7a_htk_100418 names
htkbook : Step 2 - the Dictionary
step_02.bat :
step_03.bat :
step_04.bat :
step_05.bat :
step_06.bat :
step_07.bat :
step_08.bat :
step_09.bat :
step_10.bat :
step_11.bat :
step_12.bat :
step_12.exe
out : “hmm4”

HCompV -C config -f 0.01 -m -S trans.scp -M hmm0 proto
htkbook : Step
step_13.bat :
step_14.bat :
out : “hmm6”, “hmm7”

HERest -A -D -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm5/macros -H hmm5/hmmdefs -M
hmm6 monophones1
step_15.bat : add “silence sil” in “dict”
step_16.bat :
step_17.bat :
out : “hmm8”, “hmm9”

HERest -C config -I aligned.mlf -S train.scp -H hmm7/macros -H hmm7/hmmdefs -M
hmm8 monophones1

HERest -C config -I aligned.mlf -t 250.0 150.0 2000.0 -S train.scp -H hmm8/macros -H hmm8/hmmdefs -M
hmm9 monophones
htkbook : Step
step_18.bat :
step_19.bat :
out : “result_m”

HResults -I testscript.mlf monophones1 recout_m.mlf > result_m
htkbook : Step
step~19.bat
step~19nt.bat : no HCOPY
step_20.bat :
step_21.bat :
out : “mktri.hed”

perl maketrihed monophones1 triphones1
htkbook : Step
step_22.bat :
out : “hmm10”

HHEd -H hmm9/macros -H hmm9/hmmdefs -M
hmm10 mktri.hed monophones1
htkbook : Step
step_23.bat :
out : “hmm11”, “hmm12”

HERest - C config -I wintri.mlf -t 250.0 150.0 1000.0 -s stats -S train.scp -H hmm10/macros -H hmm10/hmmdefs -M
hmm11 triphones1

HERest - C config -I wintri.mlf -t 250.0 150.0 1000.0 -s stats -S train.scp -H hmm11/macros -H hmm12/hmmdefs -M
hmm12 triphones1
htkbook : Step
step_24.bat :
step_25.bat :
step_26.bat :
out : fulllist, flog, cmudict-tri

HDMan -m -w wlist -b sp -n
fulllist -g global.ded -l
flog cmudict-tri cmudict.0.7a_htk_091207t names
htkbook : Step 10 - Making Tied-State Triphones
step_27.bat :
step_28.bat :
out : “hmm13”, “log”, “tiedlist”, “trees”

HHEd -B -H hmm12/macros -H hmm12/hmmdefs -M
hmm13 tree.hed triphones1 >
log
htkbook : Step 10 - Makingn Tied-State Triphones
step_29.bat :
out : “hmm14”, “hmm15”

HERest -C config -I wintri.mlf -t 250.0 150.0 1000.0 -s hmm14/stats -S train.scp -H hmm13/macros -H hmm13/hmmdefs -M
hmm14 tiedlist

HERest -C config -I wintri.mlf -t 250.0 150.0 1000.0 -s hmm15/stats -S train.scp -H hmm14/macros -H hmm14/hmmdefs -M
hmm15 tiedlist
htkbook : Step
step_30g.bat :
step_31.bat :
step~31g.bat
step_32.bat :
step_33.bat :
step_34.bat :
Align – Extra alignments information for state/model level traceback
-
me = se→spdf.cpdf+1;
me→mpdf→ckind = 1; (?)
-
-
m→id→name HMM NAME
-
-
int pri→nToks; Maximum tokens to propagate (0==1)
Boolean pri→models; Keep track of model history;
int pri→id; Unique observation identifier
pri→genMaxNode; – Most likely node in network
pri→wordMaxNode; – Most likely word end token
pri→genMaxTok; – Most likely token
pri→wordMaxTok; – Most likely word end token
pri→obs; –;
pri→pNoRef; – Head of PathNoRef linked list?
pri→pYesRef; – Head of PathYesRef linked list?
pri→npth; – Current number of path records
pri→net→final.inst→exit – ⇐ This is the recognized final node. (Used for tracing back)
Token ; – Tokens are reasonably standard except for extra;
Token only has
Path info.
Align field
-
-
NetNode vri→genMaxNode – FINAL RESULT
Token vri→genMaxTok – FINAL RESULT
NetNode *genMaxNode; – Most likely node in network
NetNode *wordMaxNode; – Most likely word end node in network
Token genMaxTok; – Most likely token
Token wordMaxTok; – Most likely word end token
-
HMMSet hset; the HMMset; in HRec.c, pri→psi→hset;
Vocab vocab; the dictionary.
Lattice *wdNet; the word level recognition network.
PSetInfo *psi; Private data used by HRec.
-
-
-
void
ConvDiagC (HMMSet *hset, Boolean convData); – in Initialise() now ignoring?
-
-
-
-
LogFloat
DOutP (Vector x, int vecSize, MixPDF *mp); : Log prob of x in given mixture - Diagonal Case
SVector
GetMean (HMMSet *hset, Source *src, Token *tok);
MixPDF *
GetMixPDF (HMMSet *hset, Source *src, Token *tok); mp→ckind = DIAGC; hset→hsKind = SHAREDHS;
SVector
GetVariance (HMMSet *hset, Source *src, Token *tok);
LogFloat
IDOutP(Vector x, int vecSize, MixDDF *mp); Log prob of x in given mixture - Inverse Diagonal Case
InitPronHolders(); | First create context arrays and pronunciation instances;
-
-
-
-
PrintChain(net, hset); - ExpandwordNet → PrintChain → PrintNode
Wave2FBank – Perform filterbank analysis on speech
110518 : vcore_platform done
110518 : vcore is done (slimest edk core)
-
110426 : HCopy() porting…
110420 : IModule done.
110409 : One phase of DOutP success.
110407 : Memory Unit Verilog implementation
110407 : Progress Report
110401 : CoreGen FPO
110331 : Struggling with floating point units.
110330 : FPGA design for(ONE~ZERO)
110329 : [PHASE3] eASRfpga (ONE~ZERO) project done : All same results with HTK3.4.1
110328 : eASRfpga project
110326 : HNet.c analyzing again.
110325 : sphinx3 installed and tested.
110321 : [PHASE2] EDK 11.4 test done with AC97 recording / playback;
-
110305 : Windows 7 update
110302 : Genesys board shipped
-
110225 : viterbi.c
110221 : eRemote project
110219 : SD-el project, easr has timestamped.
110216 : htk_dict_cmu_is modified to the version of today 110216 = reducing unrealistic chance of having many pronuciations
110216 : TIMIT is being tested / steps are reviewed.
110214 : Steps Review
110212 : [PHASE1] LG tv remote done; easrs skeleton project done;
-
-
-
110202 : New Porting
eVitep - hvite.c (done) / hrec.c
110128 : Path
110127 : vri→genTok, vri→genNode;
-
-
-
-
110122 : eVite2 porting - started
110118 : eVite : Viterbi Token Propagation
101223 : “recog.bat”
101223 : Procedure updates
Loaded speech data (
Observation) is stored in “pri→obs→fv[1][1]…[13]
What is 'pdf' and what impact is it on hmm calculation?
How many utterances are needed to be speaker independent ASR?
How much speaker dependent ASR speaker dependent?
How to measure the trainness of samples? Is there some kind of distance that we can use for the use of that?
How could we determine whether the system is well trained or not?
Which phonemes needs more training?
Eliot Lee : Senior Researcher (dr.eliot@gmail.com)