Thursday, March 30, 2006

Sphinx 3.6 Release Candiate I is now released! You could find it at

"Latest File Releases" in the CMU Sphinx's sourceforge web page.
http://sourceforge.net/projects/cmusphinx

Here are the release notes:

2006-03-22  Arthur Chan (archan@cs.cmu.edu) at Carnegie Mellon
University

Sphinx 3.6 Release Candidate I
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The corresponding SphinxTrain tag is
SPHINX3_6_CMU_INTERNAL_RELEASE, which can be checked out using
the command:
cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cmusphinx/ co -r
SPHINX3_6_CMU_INTERNAL_RELEASE SphinxTrain

A Summary of Sphinx 3.6 RCI
---------------------------

Sphinx 3.6 is a gently refactored version of Sphinx 3.5. Our
goal is further consolidating and unifying our code-bases in Sphinx 3.

Despite the modest goal, there are still several
interesting and new features that can be found in this
release. Their details could be found in the "New Feature"
section below. Here is a brief summary:

1. Further speed-up of CIGMMS in the 4 level GMM Computation
Schemes (4LGC)
2. Multiple regression classes on MAP adaptation in SphinxTrain
3. Better support for using LM in Sphinx 3.X.
4. FSG search is now supported. This is adapted from Sphinx 2.
5. Support of full triphone search in flat lexicon search.
6. Some support of different character sets ASCII. Models in
multiple languages are now tested in Sphinx 3.X, among them GB2312.

We hope you enjoy this release candidate. In the future, we will
continue to improve the quality of CMU Sphinx and CMU Sphinx's
related software.

New Features
------------
-Speaker Adaptation:
a. Multiple regression class (phoneme-based) is now supported.

-GMM Computation
a. Improvements of CIGMMs is now incorporated.
i. One can specify the upper limit of the number of CD
senones to be computed in each frame by specifying -maxcdsenpf.
ii. The best Gaussian index (BGI) is not stored and can
be used as a mechanism to speed up GMM computation
iii. tightening-factor (-tighten_factor) is introduced to
smooth between fixed naive down-sampling technique and CI-GMMS.
b, Support of SCHMM and FCHMM
i. decode fully supports computation of SCHMM in S3 format.

-Language Model
a. reading an LM in ARPA text format, in addition to the DMP
format, is now supported. Users now have an option to by-pass
the use of lm3g2dmp.
b. live decoding API now supports switching of language models.
c. full support of class-based LM. See also the Bug fixes section
d. lm_convert is introduced. lm_convert supersedes the
functionalities of lm3g2dmp. lm_convert can convert an LM
from TXT format to DMP format and vice versa.

-Search
Changes we made in different search algorithms are detailed below.
In 3.6, a collection of algorithms can be used under a
single executable, decode. decode_anytopo is still reserved
for backward compatibility purpose.
Decode now supports three modes of search.
Mode 2 (FSG): (Adapted from Sphinx 2) FSG search.
Mode 3 (FLAT): Flat-lexicon search. (The original search in decode_anytopo
in 3.X (X < 6)) Mode 4 (TREE): Tree-lexicion search. (The original search
in decode in 3.X (x<6) Some of these functionalities will only be applicable
in one particular search. We will mark them with FSG, FLAT and TREE.

a. One can turn off -bt_wsil to control whether silence should be used as
the ending word. (FLAT, TREE)

b. In FLAT, full triphones could be used instead of multiplexed triphones.

c. FSG is a new added routine in 3.6 which is adapted from Sphinx 2.5

-Frontend a. -dither is now supported in live_pretend and live_decode.

The initial seed can be set with the switch -seed.

-Miscellaneous

a. One can turn on built-in letter-to-sound rules in dict.c by using -lts_mismatch.

b. current Sphinx 3.6 is tested to work with acoustic and language models created from data in English, Chinese Mandarin.

c. allphone can now generate a match and a matchseg just like the decode* recognizers.

Bug fixes
---------

-Miscellaneous memory leaks fixed in the tree search (mode 4)

-Initialization of class-based LM routine switched the order of word insertion
penalty and language model weight.

-Assertion in vithist.c is now an error message. Instead of causing the whole
program to stop, decoding will fail for that sentence alone. We suspect
that this is the problem which caused memory wipe out in Sphinx 3.4 & 3.5

-Number of CI phones can now be at most 32767 (up from 127)

-[bug report #1236322]: libutil\str2words special character bugs.

Behavior Changes
----------------
-Endpointer (ep) now uses computation of log using the same base as s3.
-Multi-stream GMM computation will no longer truncate the pdf to 8 bits.
This will avoid programmer's confusion.
-Except in allphone and align, when .cont. is used in with the switch
-senmgau, the code will automatically use the fast GMM computation routine.
To make sure that the multiple-stream GMM computation will be in effect,
specify .s3cont.
-executable dag did not account for the language weight.
This issue has been fixed.

-(See Bug fixes also) decode will now return an error message when vithist
was fed in with history
-1. Instead of asserting, the recognizer will print a warning message.
Usually this means that the beam widths need to increase.

Functions still under test
-----------------------------------------

-Encoding conversion in lm_convert.
-LIUM contribution: LM could now be represented in AT&T fsm format.

Known bugs
------------------------------------------
-In confidence estimation, the computations of forward and
backward posterior probability have mismatch

-In allphone, sometimes the scores generated in the matchseg file will
have very low scores.

-Regression test on second-stage search still has bugs.

Corresponding changes in SphinxTrain
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Please note that SphinxTrain is distributed as a separate package. You can
get it by

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cmusphinx/ co -r
SPHINX3_6_CMU_INTERNAL_RELEASE SphinxTrain

i.e., checking out the code tagged as SPHINX3_6_CMU_INTERNAL_RELEASE

-Support for generation of MAP, multiple-class MLLR.

-Support for BBI tree generation

No comments: