The Grand Janitor's Blog: March 2006

Well. couple of nasty issues in legacy Sphinx 3:
1, Putting Sphinx 3 to be used by some applications: first to put it work with the Galaxy/Communicator framework (David Huggins-Daines is behind that), then put it to work with the speech component we will contribute to the CALO Project (Yitao Sun is behind that.) A lot of check-ins are based on that.

2, For me, I was trying to make Sphinx 3 and CMU-Cambridge LM toolkit to work with vocabulary more than 65536 words. (The new limit is around 4 billion) The Sphinx 3's work is completed. The CMU-Cambridge LM toolkit requires me to essentially upgrade it.

You will ask, are there still any work on CMU-Cambridge LM Toolkit V2?
This is my answer, if you want something to happen, you just need to go forward to change it. This is true in Sphinx 3 and this is also true in CMU-Cambridge LM Toolkit V2. I have gathered code from Dave, Prof Yannick Esteve and couple of contributers. I definitely think some kind of alpha release will be there in May and June time frame.

Let us see how it goes. :-)

Arthur

Sphinx 3.6 Release Candiate I is now released! You could find it at

"Latest File Releases" in the CMU Sphinx's sourceforge web page.
http://sourceforge.net/projects/cmusphinx

Here are the release notes:

2006-03-22  Arthur Chan (archan@cs.cmu.edu) at Carnegie Mellon
University

Sphinx 3.6 Release Candidate I
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The corresponding SphinxTrain tag is
SPHINX3_6_CMU_INTERNAL_RELEASE, which can be checked out using
the command:
cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cmusphinx/ co -r
 SPHINX3_6_CMU_INTERNAL_RELEASE SphinxTrain

A Summary of Sphinx 3.6 RCI
---------------------------

Sphinx 3.6 is a gently refactored version of Sphinx 3.5.  Our
goal is further consolidating and unifying our code-bases in Sphinx 3.

Despite the modest goal, there are still several
interesting and new features that can be found in this
release. Their details could be found in the "New Feature"
section below. Here is a brief summary:

1. Further speed-up of CIGMMS in the 4 level GMM Computation
Schemes (4LGC)
2. Multiple regression classes on MAP adaptation in SphinxTrain
3. Better support for using LM in Sphinx 3.X.
4. FSG search is now supported. This is adapted from Sphinx 2.
5. Support of full triphone search in flat lexicon search.
6. Some support of different character sets ASCII.  Models in
multiple languages are now tested in Sphinx 3.X, among them GB2312.

We hope you enjoy this release candidate. In the future, we will
continue to improve the quality of CMU Sphinx and CMU Sphinx's
related software.

New Features
------------
-Speaker Adaptation:
a. Multiple regression class (phoneme-based) is now supported.

-GMM Computation
a. Improvements of CIGMMs is now incorporated.
  i. One can specify the upper limit of the number of CD
  senones to be computed in each frame by specifying -maxcdsenpf.
        ii. The best Gaussian index (BGI) is not stored and can
  be used as a mechanism to speed up GMM computation
  iii. tightening-factor (-tighten_factor) is introduced to
  smooth between fixed naive down-sampling technique and CI-GMMS.
b, Support of SCHMM and FCHMM
  i. decode fully supports computation of SCHMM in S3 format.

-Language Model
a. reading an LM in ARPA text format, in addition to the DMP
format, is now supported. Users now have an option to by-pass
the use of lm3g2dmp.
b. live decoding API now supports switching of language models.
c. full support of class-based LM. See also the Bug fixes section
d. lm_convert is introduced. lm_convert supersedes the
functionalities of lm3g2dmp. lm_convert can convert an LM
from TXT format to DMP format and vice versa.

-Search
Changes we made in different search algorithms are detailed below.
In 3.6, a collection of algorithms can be used under a
single executable, decode.  decode_anytopo is still reserved
for backward compatibility purpose.
Decode now supports three modes of search.
Mode 2 (FSG): (Adapted from Sphinx 2) FSG search.
Mode 3 (FLAT): Flat-lexicon search. (The original search in decode_anytopo
in 3.X (X < 6))  Mode 4 (TREE): Tree-lexicion search. (The original search
in decode in 3.X (x<6)     Some of these functionalities will only be applicable
in one  particular search. We will mark them with FSG, FLAT and TREE.  

a. One can turn off -bt_wsil to control whether silence  should be used as
the ending word.  (FLAT, TREE) 

b. In FLAT, full triphones could be used instead of  multiplexed triphones. 

c. FSG is a new added routine in 3.6 which is adapted from  Sphinx 2.5  

-Frontend  a. -dither is now supported in live_pretend and  live_decode.

The initial seed can be set with the switch  -seed.  

-Miscellaneous 

a. One can turn on built-in letter-to-sound rules in dict.c by using -lts_mismatch. 

b. current Sphinx 3.6 is tested to work with acoustic and  language models created from data in English, Chinese  Mandarin. 

c. allphone can now generate a match and a matchseg just like the decode* recognizers.  

Bug fixes 
--------- 

-Miscellaneous memory leaks fixed in the tree search (mode 4) 

-Initialization of class-based LM routine switched the order of  word insertion
penalty and language model weight. 

-Assertion in vithist.c is now an error message. Instead of  causing the whole
program to stop, decoding will fail  for that sentence alone.  We suspect
that this is the problem which  caused memory wipe out in Sphinx 3.4 & 3.5 

-Number of CI phones can now be at most 32767 (up from 127) 

-[bug report #1236322]: libutil\str2words special character bugs.  

Behavior Changes 
---------------- 
-Endpointer (ep) now uses computation of log using the same base as s3.  
-Multi-stream GMM computation will no longer truncate the pdf to 8  bits. 
This will avoid programmer's confusion.  
-Except in allphone and align, when .cont. is used in with the switch 
-senmgau, the code will automatically use the fast GMM computation routine.
To make sure that the multiple-stream GMM computation will be in effect,
specify .s3cont.  
-executable dag did not account for the language weight. 
This issue has been fixed.   

-(See Bug fixes also) decode will now return an error message when vithist
was fed in with history
-1. Instead of asserting,  the recognizer will print a warning message.
Usually this  means that the beam widths need to increase.  

Functions still under test 
----------------------------------------- 

-Encoding conversion in lm_convert. 
-LIUM contribution: LM could now be represented in AT&T fsm format.  

Known bugs 
------------------------------------------ 
-In confidence estimation, the computations of forward and
backward posterior probability have mismatch 

-In allphone, sometimes the scores generated in the matchseg file  will
have very low scores. 

-Regression test on second-stage search still has bugs.  

Corresponding changes in SphinxTrain 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 

Please note that SphinxTrain is distributed as a separate package. You can
get it by 

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cmusphinx/ co -r
SPHINX3_6_CMU_INTERNAL_RELEASE SphinxTrain 

i.e., checking out the code tagged as SPHINX3_6_CMU_INTERNAL_RELEASE   

-Support for generation of MAP, multiple-class MLLR. 

-Support for BBI tree generation

The Grand Janitor's Blog

Thursday, March 30, 2006

What are we up to in these days?

Friday, March 17, 2006

Add Google Search in my page.