Friday, December 15, 2006

Left CMU

Hi Guys,
It was a sad decision. After a long soul-searching, I decided to leave CMU and join a startup company called Scanscout. I must be out of mind!!

Anyway, my new job require knowledge in speech recognition, information retrieval and video processing. These are all good fit for me. I could tell you I have a lot of fun!

Sphinx, in particular the trio, Sphinx 3.X, SphinxTrain and CMULMTKV3 are now maintained by David Huggins-Daines and Evandro Gouvea. I still keep a nominal maintainership but these two are the true heros in the story now.

However, feel free to chat with me on anything related language processing. I am more than happy to be there.

Regards,
Arthur Chan

Friday, June 16, 2006

Sphinx 3.6 is officially released

Sphinx 3.6 Official Release 
^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
 
Sphinx 3.6 official release included all changes one found in Sphinx 3.6 RC I. 
 
From 3.6 RC I to 3.6 official:  
 
New Features:  
-Added support for sphinx 2-style semi-continuous HMM in Sphinx 3.  
-Added sphinx3_continuous which performs on-line decoding in both windows and linux platforms. 
-Synchronized the frontend with Sphinx2, adding implementation of VTLN. (i.e. -warp_type = inverse_linear, piewise_linear, affine)  
 
Changes:  
-Prefix "sphinx3_" has been added to programs align, allphone, astar, dag, decode, decode_anytopo, ep to avoid confusion in some unix systems. 
 
For Developers:  
-All public headers (*.h) are now put under $root/include instead of the same directories as their source .c file. 
-The directory name libutil is now changed to libs3util  
-Sphinx3, as well as all other modules in the CMU Sphinx 
project, is now versioned by Subversion. 
 
Bug Fixes:  
-[1459402] Serious memory relocation is fixed.  
-In RCI, -dither was not properly implemented, this has been fixed.  
 
Known Problem:  
-When the model contains nan, there will be abnormal output of the result. At this point, this issue is resolved in SphinxTrain 
 
Sphinx 3.6 Release Candidate I 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
The corresponding SphinxTrain's tag is SPHINX3_6_CMU_INTERNAL_RELEASE 
One can check out the matching SphinxTrain of the sphinx3.6 release by command, 
svn co https://svn.sourceforge.net/svnroot/cmusphinx/tags/SPHINX3_6_CMU_INTERNAL_RELEASE/SphinxTrain 
 
A Summary of Sphinx 3.6 RC I 
---------------------------- 
 
Sphinx 3.6 is a gently refactored version of Sphinx 3.5. Our programming is defensive and we only aim at further consolidation and unification our code-bases in Sphinx 3. 
 
Despite our programming is defensive, there are still several interesting and new features could be found in this 
release. Their details could be found in the "New Feature" 
section below. Here is a brief summary: 
 
1, Further speed-up of CIGMMS in the 4 level GMM Computation Schemes (4LGC) 
2, Multiple regression classes an MAP adaptation in SphinxTrain 
3, Better support in using LM in Sphinx 3.X.  
4, FSG search is now supported. This is adapted from Sphinx 2.  
5, Support of full triphone search in flat lexicon search.  
6, Some support of different character sets other than of 
Sphinx 3.X. Models in multiple languages are now tested in 
Sphinx 3.X. 
 
We hope you enjoy this release candidate. In future, we will 
continue to improve the quality of CMU Sphinx and CMU Sphinx's related software. 
 
New Features  
------------ 
-Speaker Adaptation: 
a, Multiple regression class (phoneme-based) is now supported.  
 
-GMM Computation  
a, Improvements of CIGMMs is now incorporated.  
i, One could specify the upper limit of the number of CD 
senones to be computed in each frame by specifying -maxcdsenpf. 
ii, The best Gaussian index (BGI) are not stored and could 
be used as a mechanism to speed up GMM computation 
iii, tightening-factor (-tighten_factor) is introduced to 
smooth between fix naive down-sampling technique and CI-GMMS. 
b, Support of SCHMM and FCHMM 
i, decode will fully support computation of SCHMM.  
 
-Language Model  
a, reading an LM in ARPA text format is now supported. Users now have an option to by-pass the use of lm3g2dmp. 
b, live decoding API now supports switching of language models.  
c, full support of class-based LM. See also the Bug fixes section 
d, lm_convert is introduced. lm_convert supersede the functionalities of lm3g2dmp. Not only could lm_convert convert an LM from TXT format to DMP format. It could also do the reverse.  
 
-Search  
This part will detail the change we make in different search 
algorithms. 
In 3.6, collection of algorithms could all be used under a 
single executable decode. decode_anytopo is still reserved 
for backward compatibility purpose.  
Decode now support three modes of search.  
Mode 2 (FSG): (Adapted from Sphinx 2) FSG search.  
Mode 3 (FLAT): Flat-lexicon search. (The original search in decode_anytopo in 3.X (X < 6)) 
Mode 4 (TREE): Tree-lexicion search. (The original search in decode in 3.X (x<6)  
 
Some of these functionalities will only be applicable in one 
particular search. We will mark them with FSG, FLAT and TREE.  
 
a, One could now turn off -bt_wsil to control whether silence should be used as the ending word. (FLAT, TREE) 
b, In FLAT, full triphones could be used instead of 
multiplexed triphones.  
c, FSG is a new added routine in 3.6 which is adapted from 
Sphinx 2.5 
 
-Frontend 
a, -dither is now supported in live_pretend and 
live_decode, the initial seed could always be set the command 
-seed. (Jerry Wolf will be very happy about this feature.) 
 
-Miscellaneous 
a, One could turn on built-in letter-to-sound rule in dict.c by using -lts_mismatch.  
b, current Sphinx 3.6 is tested to work on setup of English, 
Chinese Mandarin, French and English.  
c, changes in allphone: allphone can now generate a match and a matchseg just like decode* recognizers.  
 
Bug fixes 
--------- 
-Miscellaneous memory leak fixed in the tree search (mode 4) 
-Initialization class-based LM routine use to switch the order of 
word insertion penalty and language model weight. This is now fixed.  
-Assertion generated vithist.c is now turn into an error 
message. Instead of causing the whole program stopped. The decoding will just fail for that sentnece. We suspect that this is the problem which caused possible wipe out of memory in Sphinx 
3.4 & 3.5  
-Number of CI phones could now be at most 32767 (instead of 127) 
-[1236322]: libutil\str2words special character bugs.  
 
Behavior Changes 
---------------- 
-Endpointer (ep) now used computation of s3 log.  
 
-Multi-stream GMM computation will not truncate the pdf to 8 bit anymore. This will avoid confusion of programmer. However 
 
-Except in allphone and align, When .cont. is used in 
-senmgau, the code will automatically turn to use fast GMM computation routine. To make sure the multiple-stream GMM computation will be in effect, one need to specify .s3cont. 
 
-executable dag hadn't accounted for the language weight. Now this issue is fixed. 
 
-(See Bug fixes also) decode will now return error message 
when vithist was fed in with history -1. Instead of asserting the problem. The recognizer will dump Warning message. Usually that means beam widths need to increase. 
 
Functions still under test  
----------------------------------------- 
-Encoding conversion in lm_convert.  
-LIUM contribution: LM could now represented as AT&T fsm format.  
 
Known bugs 
------------------------------------------ 
-Confidence estimation, the computations of forward and 
backward posterior probability have mismatch 
-In allphone, sometimes the scores generated in the matchseg file will have very low scores.  
-Regression test on second-stage search still have bugs.  
 
Corresponding changes in SphinxTrain 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Please note that SphinxTrain is distributed as a separate package and you can get it by: 
svn co https://svn.sourceforge.net/svnroot/cmusphinx/tags/SPHINX3_6_CMU_INTERNAL_RELEASE/SphinxTrain 
 
-Support for generation of MAP, multiple-class MLLR.  
-Support for BBI tree generation

Sunday, April 16, 2006

Use more than 65536 words in Sphinx 3.X

You could take a look at this page . It is still in its experimental stage so if you it "as is". Though, kindly give me feedback after using it.

-a

Thursday, March 30, 2006

What are we up to in these days?

Well. couple of nasty issues in legacy Sphinx 3:
1, Putting Sphinx 3 to be used by some applications: first to put it work with the Galaxy/Communicator framework (David Huggins-Daines is behind that), then put it to work with the speech component we will contribute to the CALO Project (Yitao Sun is behind that.) A lot of check-ins are based on that.

2, For me, I was trying to make Sphinx 3 and CMU-Cambridge LM toolkit to work with vocabulary more than 65536 words. (The new limit is around 4 billion) The Sphinx 3's work is completed. The CMU-Cambridge LM toolkit requires me to essentially upgrade it.

You will ask, are there still any work on CMU-Cambridge LM Toolkit V2?
This is my answer, if you want something to happen, you just need to go forward to change it. This is true in Sphinx 3 and this is also true in CMU-Cambridge LM Toolkit V2. I have gathered code from Dave, Prof Yannick Esteve and couple of contributers. I definitely think some kind of alpha release will be there in May and June time frame.

Let us see how it goes. :-)

Arthur
Sphinx 3.6 Release Candiate I is now released! You could find it at

"Latest File Releases" in the CMU Sphinx's sourceforge web page.
http://sourceforge.net/projects/cmusphinx

Here are the release notes:

2006-03-22  Arthur Chan (archan@cs.cmu.edu) at Carnegie Mellon
University

Sphinx 3.6 Release Candidate I
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The corresponding SphinxTrain tag is
SPHINX3_6_CMU_INTERNAL_RELEASE, which can be checked out using
the command:
cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cmusphinx/ co -r
SPHINX3_6_CMU_INTERNAL_RELEASE SphinxTrain

A Summary of Sphinx 3.6 RCI
---------------------------

Sphinx 3.6 is a gently refactored version of Sphinx 3.5. Our
goal is further consolidating and unifying our code-bases in Sphinx 3.

Despite the modest goal, there are still several
interesting and new features that can be found in this
release. Their details could be found in the "New Feature"
section below. Here is a brief summary:

1. Further speed-up of CIGMMS in the 4 level GMM Computation
Schemes (4LGC)
2. Multiple regression classes on MAP adaptation in SphinxTrain
3. Better support for using LM in Sphinx 3.X.
4. FSG search is now supported. This is adapted from Sphinx 2.
5. Support of full triphone search in flat lexicon search.
6. Some support of different character sets ASCII. Models in
multiple languages are now tested in Sphinx 3.X, among them GB2312.

We hope you enjoy this release candidate. In the future, we will
continue to improve the quality of CMU Sphinx and CMU Sphinx's
related software.

New Features
------------
-Speaker Adaptation:
a. Multiple regression class (phoneme-based) is now supported.

-GMM Computation
a. Improvements of CIGMMs is now incorporated.
i. One can specify the upper limit of the number of CD
senones to be computed in each frame by specifying -maxcdsenpf.
ii. The best Gaussian index (BGI) is not stored and can
be used as a mechanism to speed up GMM computation
iii. tightening-factor (-tighten_factor) is introduced to
smooth between fixed naive down-sampling technique and CI-GMMS.
b, Support of SCHMM and FCHMM
i. decode fully supports computation of SCHMM in S3 format.

-Language Model
a. reading an LM in ARPA text format, in addition to the DMP
format, is now supported. Users now have an option to by-pass
the use of lm3g2dmp.
b. live decoding API now supports switching of language models.
c. full support of class-based LM. See also the Bug fixes section
d. lm_convert is introduced. lm_convert supersedes the
functionalities of lm3g2dmp. lm_convert can convert an LM
from TXT format to DMP format and vice versa.

-Search
Changes we made in different search algorithms are detailed below.
In 3.6, a collection of algorithms can be used under a
single executable, decode. decode_anytopo is still reserved
for backward compatibility purpose.
Decode now supports three modes of search.
Mode 2 (FSG): (Adapted from Sphinx 2) FSG search.
Mode 3 (FLAT): Flat-lexicon search. (The original search in decode_anytopo
in 3.X (X < 6)) Mode 4 (TREE): Tree-lexicion search. (The original search
in decode in 3.X (x<6) Some of these functionalities will only be applicable
in one particular search. We will mark them with FSG, FLAT and TREE.

a. One can turn off -bt_wsil to control whether silence should be used as
the ending word. (FLAT, TREE)

b. In FLAT, full triphones could be used instead of multiplexed triphones.

c. FSG is a new added routine in 3.6 which is adapted from Sphinx 2.5

-Frontend a. -dither is now supported in live_pretend and live_decode.

The initial seed can be set with the switch -seed.

-Miscellaneous

a. One can turn on built-in letter-to-sound rules in dict.c by using -lts_mismatch.

b. current Sphinx 3.6 is tested to work with acoustic and language models created from data in English, Chinese Mandarin.

c. allphone can now generate a match and a matchseg just like the decode* recognizers.

Bug fixes
---------

-Miscellaneous memory leaks fixed in the tree search (mode 4)

-Initialization of class-based LM routine switched the order of word insertion
penalty and language model weight.

-Assertion in vithist.c is now an error message. Instead of causing the whole
program to stop, decoding will fail for that sentence alone. We suspect
that this is the problem which caused memory wipe out in Sphinx 3.4 & 3.5

-Number of CI phones can now be at most 32767 (up from 127)

-[bug report #1236322]: libutil\str2words special character bugs.

Behavior Changes
----------------
-Endpointer (ep) now uses computation of log using the same base as s3.
-Multi-stream GMM computation will no longer truncate the pdf to 8 bits.
This will avoid programmer's confusion.
-Except in allphone and align, when .cont. is used in with the switch
-senmgau, the code will automatically use the fast GMM computation routine.
To make sure that the multiple-stream GMM computation will be in effect,
specify .s3cont.
-executable dag did not account for the language weight.
This issue has been fixed.

-(See Bug fixes also) decode will now return an error message when vithist
was fed in with history
-1. Instead of asserting, the recognizer will print a warning message.
Usually this means that the beam widths need to increase.

Functions still under test
-----------------------------------------

-Encoding conversion in lm_convert.
-LIUM contribution: LM could now be represented in AT&T fsm format.

Known bugs
------------------------------------------
-In confidence estimation, the computations of forward and
backward posterior probability have mismatch

-In allphone, sometimes the scores generated in the matchseg file will
have very low scores.

-Regression test on second-stage search still has bugs.

Corresponding changes in SphinxTrain
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Please note that SphinxTrain is distributed as a separate package. You can
get it by

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cmusphinx/ co -r
SPHINX3_6_CMU_INTERNAL_RELEASE SphinxTrain

i.e., checking out the code tagged as SPHINX3_6_CMU_INTERNAL_RELEASE

-Support for generation of MAP, multiple-class MLLR.

-Support for BBI tree generation

Friday, March 17, 2006

Add Google Search in my page.

It is a very convenient little thing. If you search for keywords such as Sphinx 4 and sphinx 3.x, you'll find a lot of results which web searching won't give you. Check it out.
Arthur
Testing. This is my first blog. -Arthur Chan