Interview

Project 295B

Genesys Project

Dictionary

Alphabet Project

12/20/10 : Project Start (Location : VisualHTK2 - Alphabet)

Ingredient

  • prompts2wlist
  • missingwords
  • cmudict.0.7.a_htk_100410
  • names
  • gram
  • prompts2mlf
  • mkphones0.led
  • config_wav
  • config
  • proto
  • sil.hed
  • mlf2textgrid
  • tr-*.wav
  • te-*.wav
  • mktri.led
  • maketrihed
  • mkclscript.prl
  • tree.hed1
  • tree.hed2
  • cmudict.0.7.a_htk_091207t

HVite ingredient

  • hmm9
  • test.scp
  • monophones1
  • dict
  • wdnet
  • te-cue+dict_MA.mfc

Project Settings

  • Include “HTKLib” folder
  • Include all ”.c” files from the folder
  • Add ”#define ARCH “WIN32”” in “esignal.c”
  • Remove “HGraf.c”
  • Remove “HGraf.null.c”
  • Program argument : “HVite -H hmm9/macros -H hmm9/hmmdefs -S test.scp -i recout_m.mlf -w wdnet -n 10 10 -p 0.0 -s 5.0 dict monophones1”
  • After these steps, “math” problem will be gone.
  • DoRecognition() in HVite.c
  • Recognition batch file : recog.bat
    • Monophones
      • hmm9, test.scp, wdnet, dict, monophones1, te-*.mfc
        • hmm9, monophones1 : one time job
        • test.scp : updated by genscrs.exe
        • dict : updated by step_03.bat
        • te-*.mfc : updated by step_08.bat
    • Triphones
      • hmm15, test.scp, rdnet, dict, tiedlist, te-*.mfc
        • hmm15, tiedlist : one time job

Procedure

  1. genscrs.exe : Read ”.wav” files in a directory and generate files following :
    • “myprompts.txt”
    • “codetr.scp”
    • “codete.scp”
    • “train.scp”
    • “test.scp”
    • “adapt.scp”
    • “promptsadapt.txt”
    • “promptstest.txt”
    • “lmtext.txt”
    • “labfile.txt”
  2. step_00.bat :
    • perl prompts2wlist myprompts.txt wlist
    • addmissingwords.exe : Open “wlist” and “missingwords”, and update “wlist”
    • out : “wlist”
  3. step_01.bat :
    • out : “dlog”, “dict”, “monophones1”, “dict_nosp”
    • :!: HDMan -m -w wlist -g global.ded -n monophones1 -l dlog dict cmudict.0.7a_htk_100418 names
    • :!: HDMan -m -w wlist -n monophones1 -l dlog dict_nosp cmudict.0.7a_htk_100418 names
    • htkbook : Step 2 - the Dictionary
  4. step_02.bat :
    • step03.exe : Add “SENT-END”, “SENT-START”
    • out : “dict”
  5. step_03.bat :
    • addenter.exe ” : Add ”!ENTER”, ”!EXIT” in wlist and dict
    • out : “wlist”, “bg.wdnet”, “bigfn”, “dict”, “wdnet
    • :!: HParse gram wdnet
    • htkbook : Step 1 - the Task Grammar
  6. step_04.bat :
  7. step_05.bat :
  8. step_06.bat :
    • out : “testscript.mlf”, “words.mlf”
    • :!: HLEd -l '*' -d dict -i phones0.mlf mkphones0.led words.mlf
    • htkbook : Step 4 - Creating the Transcription Files
  9. step_07.bat :
    • out : “phones0.mlf”
  10. step_08.bat :
    • codetr.scp
    • codete.scp
    • out : “*.mfc”
    • :!: HCopy -T 1 -C config -S codetr.scp
    • htkbook : Step 5 - Coding the Data
  11. step_09.bat :
    • out : “hmm0/proto”, “hmm0/vFloors”
    • :!: HCompV -C config -f 0.01 -m -S trans.scp -M hmm0 proto
    • htkbook : Step 6 - Creating Flat Start Monophones
  12. step_10.bat :
    • step_10.exe
    • out : “monophones1”, “monophones0”, “hmmdefs”, “macros”
    • :!: HCompV -C config -f 0.01 -m -S trans.scp -M hmm0 proto
    • htkbook : Step
  13. step_11.bat :
    • out : “hmm1”, “hmm2”, “hmm3”
    • :!: HERest -A -D -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones0
    • htkbook : Step
  14. step_12.bat :
    • step_12.exe
    • out : “hmm4”
    • :!: HCompV -C config -f 0.01 -m -S trans.scp -M hmm0 proto
    • htkbook : Step
  15. step_13.bat :
    • out : “hmm5”
    • :!: HHEd -h hmm4/macros -H hmm4/hmmdefs -M hmm5 sil.hed monophones1
    • htkbook : Step 7 - Fixing the Silence Models
  16. step_14.bat :
    • out : “hmm6”, “hmm7”
    • :!: HERest -A -D -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm5/macros -H hmm5/hmmdefs -M hmm6 monophones1
  17. step_15.bat : add “silence sil” in “dict”
    • step_15.exe
    • out : “dict”
  18. step_16.bat :
    • out : “aligned.mlf”
    • :!: HVite -o SWT -b silence -C config -a -H hmm7/macros -H hmm7/hmmdefs -i aligned.mlf -m -t 250.0 -y lab -I words.mlf -S train.scp dict monophones1
    • htkbook : Step 8 - Realigning the Training Data
  19. step_17.bat :
    • out : “hmm8”, “hmm9”
    • :!: HERest -C config -I aligned.mlf -S train.scp -H hmm7/macros -H hmm7/hmmdefs -M hmm8 monophones1
    • :!: HERest -C config -I aligned.mlf -t 250.0 150.0 2000.0 -S train.scp -H hmm8/macros -H hmm8/hmmdefs -M hmm9 monophones
    • htkbook : Step
  20. step_18.bat :
    • out : ”recout_m.mlf”, ”recout_mp.mlf”, “recout_ms.mlf”, “recout_m.mlf.txt”, “recout_mp.mlf.txt”, “recout_ms.mlf.txt”
  21. step_19.bat :
    • out : “result_m”
    • :!: HResults -I testscript.mlf monophones1 recout_m.mlf > result_m
    • htkbook : Step
  22. step~19.bat
  23. step~19nt.bat : no HCOPY
  24. step_20.bat :
    • out : “wintri.mlf”, “triphones1”
    • :!: HLEd -n triphones1 -i wintri.mlf mktri.led aligned.mlf
    • htkbook : Step
  25. step_21.bat :
    • out : “mktri.hed”
    • :!: perl maketrihed monophones1 triphones1
    • htkbook : Step
  26. step_22.bat :
    • out : “hmm10”
    • :!: HHEd -H hmm9/macros -H hmm9/hmmdefs -M hmm10 mktri.hed monophones1
    • htkbook : Step
  27. step_23.bat :
    • out : “hmm11”, “hmm12”
    • :!: HERest - C config -I wintri.mlf -t 250.0 150.0 1000.0 -s stats -S train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 triphones1
    • :!: HERest - C config -I wintri.mlf -t 250.0 150.0 1000.0 -s stats -S train.scp -H hmm11/macros -H hmm12/hmmdefs -M hmm12 triphones1
    • htkbook : Step
  28. step_24.bat :
    • out : “tree.hed”
    • :!: perl mkclscript.prl TB 350 monophones1 > tree.hed
    • htkbook : Step 10 - Making Tied-State Triphones
  29. step_25.bat :
    • step_25.exe : Merge “tree.hed1” and “tree.hed2” to “tree.hed”
    • out : tree.hed
    • htkbook : Step
  30. step_26.bat :
    • out : fulllist, flog, cmudict-tri
    • :!: HDMan -m -w wlist -b sp -n fulllist -g global.ded -l flog cmudict-tri cmudict.0.7a_htk_091207t names
    • htkbook : Step 10 - Making Tied-State Triphones
  31. step_27.bat :
    • step_27.exe : Merge “fulllist” and “Triphones1” to fulllist.
    • htkbook : Step
  32. step_28.bat :
    • out : “hmm13”, “log”, “tiedlist”, “trees”
    • :!: HHEd -B -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones1 > log
    • htkbook : Step 10 - Makingn Tied-State Triphones
  33. step_29.bat :
    • out : “hmm14”, “hmm15”
    • :!: HERest -C config -I wintri.mlf -t 250.0 150.0 1000.0 -s hmm14/stats -S train.scp -H hmm13/macros -H hmm13/hmmdefs -M hmm14 tiedlist
    • :!: HERest -C config -I wintri.mlf -t 250.0 150.0 1000.0 -s hmm15/stats -S train.scp -H hmm14/macros -H hmm14/hmmdefs -M hmm15 tiedlist
    • htkbook : Step
  34. step_30g.bat :
    • out : “recout_t.mlf”, “rcout_tp.mlf”, “rcout_ts.mlf”
    • :!: HVite -T 1 -H hmm15/macros -H hmm15/hmmdefs -S test.scp -i recout_t.mlf -s 5.0 -p 0.0 -n 5 5 -w wdnet dict tidelist
    • htkbook : Step
  35. step_31.bat :
  36. step~31g.bat
  37. step_32.bat :
  38. step_33.bat :
  39. step_34.bat :

API Params

A

  • Align – Extra alignments information for state/model level traceback

H

  • hci→xc ; /* Number of cross word contexts */
    • hss.mp→ckind;

I

  • LogFloat inst→wdlk; Max likelihood of t=0 path to word end node;
  • NetNode inst→node;
  • inst→node→type;
  • inst→node→info.pron→word→wordName→name; : WORD NAME

L

M

N

O

P

  • Path path;
  • PRecInfo pri;
    • int pri→nToks; Maximum tokens to propagate (0==1)
    • Boolean pri→models; Keep track of model history;
    • int pri→id; Unique observation identifier
    • pri→genMaxNode; – Most likely node in network
    • pri→wordMaxNode; – Most likely word end token
    • pri→genMaxTok; – Most likely token
    • pri→wordMaxTok; – Most likely word end token
    • pri→obs; –;
    • pri→pNoRef; – Head of PathNoRef linked list?
    • pri→pYesRef; – Head of PathYesRef linked list?
    • pri→npth; – Current number of path records
    • pri→net→final.inst→exit – ⇐ This is the recognized final node. (Used for tracing back)

R

S

T

  • Token ; – Tokens are reasonably standard except for extra; Token only has Path info.
  • Align field
  • TokenSet *res, cmp, *cur;

V

  • Vocab vocab;
  • NetNode vri→genMaxNode – FINAL RESULT
  • Token vri→genMaxTok – FINAL RESULT
  • NetNode *genMaxNode; – Most likely node in network
  • NetNode *wordMaxNode; – Most likely word end node in network
  • Token genMaxTok; – Most likely token
  • Token wordMaxTok; – Most likely word end token

W

Global Variables

  • Observation obs; current observation.
  • HMMSet hset; the HMMset; in HRec.c, pri→psi→hset;
  • Vocab vocab; the dictionary.
  • Lattice *wdNet; the word level recognition network.
  • PSetInfo *psi; Private data used by HRec.
    • psi→hset;
  • VRecInfo *vri; Visible HRec Info.
    • vri→frame;

API Functions

A

B

C

D

  • Dispose (MemHeap *x, void *p); Free item p from memory heap x
  • LogFloat DOutP (Vector x, int vecSize, MixPDF *mp); : Log prob of x in given mixture - Diagonal Case

E

F

G

  • SVector GetMean (HMMSet *hset, Source *src, Token *tok);
  • MixPDF *GetMixPDF (HMMSet *hset, Source *src, Token *tok); mp→ckind = DIAGC; hset→hsKind = SHAREDHS;
  • SVector GetVariance (HMMSet *hset, Source *src, Token *tok);

I

  • LogFloat IDOutP(Vector x, int vecSize, MixDDF *mp); Log prob of x in given mixture - Inverse Diagonal Case
  • InitPronHolders(); | First create context arrays and pronunciation instances;
  • Initialise(); Set up global data structure;

M

P

R

S

T

W

  • Wave2FBank – Perform filterbank analysis on speech

HTK Tools

Progress

  • 110518 : vcore_platform done
  • 110518 : vcore is done (slimest edk core)
  • 110502 : HCopy() done 110502_ehcopy_feature_done.zip
  • 110426 : HCopy() porting…
  • 110420 : IModule done.
  • 110409 : One phase of DOutP success.
  • 110407 : Memory Unit Verilog implementation
  • 110407 : Progress Report
  • 110401 : CoreGen FPO
  • 110331 : Struggling with floating point units.
  • 110330 : FPGA design for(ONE~ZERO)
  • 110329 : [PHASE3] eASRfpga (ONE~ZERO) project done : All same results with HTK3.4.1
  • 110328 : eASRfpga project
  • 110326 : HNet.c analyzing again.
  • 110325 : sphinx3 installed and tested.
  • 110321 : [PHASE2] EDK 11.4 test done with AC97 recording / playback;
  • 110308 : EDK studying (Genesys_BSB_Design)
  • 110305 : Windows 7 update
  • 110302 : Genesys board shipped
  • 110225 : Added a link to here from wikipedia page http://en.wikipedia.org/wiki/Viterbi_algorithm#Implementations
  • 110225 : viterbi.c
  • 110221 : eRemote project
  • 110219 : SD-el project, easr has timestamped.
  • 110216 : htk_dict_cmu_is modified to the version of today 110216 = reducing unrealistic chance of having many pronuciations
  • 110216 : TIMIT is being tested / steps are reviewed.
  • 110214 : Steps Review
  • 110212 : [PHASE1] LG tv remote done; easrs skeleton project done;
  • 110207 : Review ReadLattice (), ExpandWordNet ();
  • 110205 : Porting : ReadObservation();
  • 110203 : Porting : StepWord2();
  • 110202 : New Porting eVitep - hvite.c (done) / hrec.c
  • 110128 : Path
  • 110127 : vri→genTok, vri→genNode;
  • 110126 : StepHMM1 (NetNode *node);
  • 110125 : StepInst2 (NetNode *node); CUE-SEE Project
  • 110124 : ExpandWordNet ();
  • 110123 : ReadLattice ();
  • 110122 : eVite2 porting - started
  • 110118 : eVite : Viterbi Token Propagation
  • 101223 : “recog.bat”
  • 101223 : Procedure updates

Research Topics

Glossary

A BC

H

I

M

S

T

V

Reports

References

Hall of Fame

FAQ

  • Loaded speech data (Observation) is stored in “pri→obs→fv[1][1]…[13]
  • What is 'pdf' and what impact is it on hmm calculation?
  • How many utterances are needed to be speaker independent ASR?
  • How much speaker dependent ASR speaker dependent?
  • How to measure the trainness of samples? Is there some kind of distance that we can use for the use of that?
  • How could we determine whether the system is well trained or not?
  • Which phonemes needs more training?

News

Note

Members

  • Eliot Lee : Senior Researcher (dr.eliot@gmail.com)

295 Report

 
start.txt · Last modified: 2012/01/06 12:57 by admin
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki