tag:blogger.com,1999:blog-242860772024-02-02T23:24:22.878-08:00The Grand Janitor's BlogSpeech Recognition, Programming and Random Musings of Arthur ChanArthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.comBlogger113125tag:blogger.com,1999:blog-24286077.post-40729230344664501542013-11-16T20:32:00.002-08:002013-11-26T10:24:01.002-08:00"The Grand Janitor Blog" is MovingAfter 8 years of using Blogger, I finally make enough sense to get a dot com. Blogger just has too many idiosyncrasies which make it hard to use and expand. <br />
<br />
<div>
</div>
<div>
You can find my new blog, "The Grand Janitor Blog V2" at www.thegrandjanitor.com. I already write one message there. Hope you enjoy. </div>
<div>
<br /></div>
<div>
Arthur</div>
Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-90473599797975291822013-09-17T09:29:00.000-07:002013-09-17T09:29:28.945-07:00Future Plan for "The Grand Janitor Blog"I have been crazily busy so blogging was rather slow for me. Though I have a stronger and stronger feeling that my understanding is closer to the state of the art of speech recognition. And for now, the state of the art of speech recognition, we got to talk about the whole deep neural network trend.<br />
<br />
There is nothing conceptually new in the use of hybrid HMM-DBN-DNN. It has been proposed under the name HMM-ANN in the past. What is new is that there is new algorithm which allow fast training of multi-layered neural network. It is mainly due to Hinton's breakthrough in 2006: it suggests training a DBN-DNN can be first initialized by pretrained RBM. <br />
<br />
I am naturally very interested in this new trend. IBM, Microsoft and Googles' results show that DBN-DNN is not a toy model we saw last two decades. <br />
<br />
Well, that's all for my excitement on DBN, I still have tons of things to learn. Back to the "Grand Janitor Blog", as I had tried to improve the blog layout 4 months ago, I got to say I feel very frustrated by Blogger and finally decide to move to WordPress.<br />
<br />
I hope to move within the next month or so. I will write a more proper announcement later on.<br />
<br />
ArthurArthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com1tag:blogger.com,1999:blog-24286077.post-72263390853748541732013-06-25T15:55:00.001-07:002013-06-25T15:56:24.912-07:00Apology, Updates and Misc.There are some questions on LinkedIn about the whereabouts of this blog. As you may notice, I haven't done any updates for a while. I was crazy busy by work in Voci (Good!) and many life challenges, just like everyone. Having a lot of fun with programming, as I am working with two of my most favorite languages - C and Python. Life is not bad at all. <br />
<br />
My apology to all readers though, it could be tough to blog sometimes. Hopefully, this situation will change later this year.....<br />
<br />
Couple of worthwhile news in ASR, <a href="http://www.bloomberg.com/news/2013-06-12/goldman-sachs-s-trial-win-in-dragon-systems-deal-upheld.html">Goldman-Sach won the trial in the Dragon law suit</a>. There is also the VB's piece of <a href="http://spectrum.ieee.org/tech-talk/computing/software/microsoft-boosts-speech-recognition-for-its-smartphone">MS doubling up speed in their recognizer</a>. <br />
<br />
I don't know how to make out of the lawsuit but only feel a bit sad. Dragon has been the homes of many elite speech programmers/developers/researchers. Many old-timers of speech were there. Most of them sigh about the whole L&H fiasco. If I were them, I would feel the same too. In fact, once you know a bit of ASR history, you would notice that the fall of L&H gave rise to one you-know-its-name player nowadays. So in a way, the fate of two generations of ASR guys are altered.<br />
<br />
As for the MS piece, we are following another trend these days, which is the emergence of DBN. Is it surprising? Probably not, it's rather easy to speed up neural network calculation. (Training is harder, but that's what DBN is strong compared to previous NN approach.)<br />
<br />
On Sphinx, I will point out one recent bug contributed by Ricky Chan, which exposed a problem in bw's MMIE training. I am yet to try it but I believe Nick has already incorporated into the open-source code base.<br />
<br />
Another items which Nick has been stressing lately is to use python, instead of perl, as the scripting language of SphinxTrain. I think that's a good trend. I like perl and use one-liner, map/grep type of program a lot. Generally though, it's hard to find a concrete coding standard for perl. Whereas python seems to be cleaner and naturally lead to OOP. This is an important issue - perl programmers and perl programming style seems to be spawned from many different type of languages. The original (bad) C programmer would fondly use globals and write functions with 10 arguments. The original C++ programmer might expect language support on OOP but find that "it is just a hash". These style difference could make perl training script hard to maintain.<br />
<br />
That's why I like python more. Even very bad script seems to convert itself to more maintainable script. There is also a good pathway for python/C connect. (Cython is probably the best.)<br />
<br />
In any case, that's what I have this time. I owe all of you many articles. Let's see if I can write some in the near future.<br />
<br />
Arthur<br />
<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com2tag:blogger.com,1999:blog-24286077.post-76977496293046829572013-05-06T19:11:00.003-07:002013-05-06T19:12:24.379-07:00Translation of "Looking forward (only 263 weeks left)"<br />
As requested by Pranav, a good friend of Sphinx, I translated one of article "Looking forward (only 263 weeks left)" from my Chinese blog "333 weeks" (<a href="http://333weeks.blogspot.com/2013/05/263.html">original</a>). So here it is, enjoy!<br />
<br />
"April was a long long month.<br />
<br />
I spent most of my time on solving technical problems. With great help of colleagues, we finally got all issues resolved. I also start to put some time into new tasks. The Boston Marathon Explosion was tough for everyone, but we kind of having closure now. As for investment, mine is in pace with S&P. The weather is also getting better. Do we finally feel spring again?<br />
<br />
I think the interesting part in April is that I spent more time in writing, may it be blogging, articles. I wrote quite a bit even when I was busy. I mentioned the Selection of Cumulomaniac. At this<br />
stage, I am copyediting and proofreading the drafts. It's a good thing to write and blog as I love to connect with the like-minds."<br />
<br />
Arthur<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-39941060060997880212013-05-04T11:26:00.001-07:002013-05-04T11:26:29.517-07:00My Chinese Blogs : Cumulomaniac and 333 Weeks<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4D7bQa9IsLIY50WEa0xdTTSgdO8E-ym1Py2N47-33G_xpmNq_5viJ94vAegzCzSbfpoUUfB7XmngIx24ax3ooWzrJ0_jhew9a-VRC-9P_GBpDhtNMg3SFQ3j9JiTHTUq2gt9E/s1600/IMG_1247.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4D7bQa9IsLIY50WEa0xdTTSgdO8E-ym1Py2N47-33G_xpmNq_5viJ94vAegzCzSbfpoUUfB7XmngIx24ax3ooWzrJ0_jhew9a-VRC-9P_GBpDhtNMg3SFQ3j9JiTHTUq2gt9E/s320/IMG_1247.JPG" width="240" /></a></div>
<h3>
Foreword</h3>
<div>
I hadn't updated this blog for a while. April had been a long month and the whole Boston Marathon Explosion was difficult for me. I end up spending quite a bit of time to work on my other blogs such as <a href="http://cumulomaniac.blogspot.com/">Cumulomanic</a> and <a href="http://333weeks.blogspot.com/">333 weeks</a>. If you go click them, they are all in Chinese. In the past, it was okay, but recently there are more and more friends of mine asking me what the whole thing is about. So it might deserves some explanation. </div>
<div>
<br /></div>
<h3>
Cumulomaniac</h3>
<div>
Cumulomaniac is more of my personal photography and writing blog. From time to time, I go take pictures of clouds around Boston and shared it with my friends in Hong Kong. You know Hong Kong? It's probably hard for my American friends to even start to imagine if they only watch Jackie Chan's movie: I used to live in a 500 square feet room with a pitiful size of bathroom and kitchen. It's called 500 sq feets but it feels like 300 sq feets because over the years my family has piled tons of stuffs there. </div>
<div>
</div>
<div>
The place I lived in Hong Kong, called Sham Shui Po, locate close to a flea market and a computer shopping malls. That's perhaps why I am in love with gadgets in the first place. </div>
<div>
<br /></div>
<div>
For this context, the most important thing you should know is that there is no skyline in Hong Kong. So it was a big change when I first to the States. I guess there is a reason to share my friends with "my sky". </div>
<h3>
Startup Employee 333 Weeks </h3>
<div>
As you might know, I am working on yet another startup, Voci with some great minds graduated from Carnegie Mellon. When I took up the job, I decided to stay with this company for a while. I set the time to be 333 weeks. So the blog Startup Employee 333 weeks chronicled my story in the company. </div>
<div>
<br /></div>
<div>
I chose to write in Chinese because it is yet another blog topic which was discussed to the death by American bloggers. In Hong Kong/China though, there are still many people living in a bureaucratic system and live their lives as big companies' employees, they might not very familiar with how "startupers" work and live. There are also much misunderstanding from people who work in a normal traditional job on startup. </div>
<div>
<br /></div>
<div>
My focus in 333 Weeks is usually project management, communication and issues when you work in a startup. Those are what we programmers called <i>"soft stuffs</i>" so I seldom like to bring them up in The Grand Janitor's Blog. </div>
<div>
<br /></div>
<h3>
Why Didn't you Write Them In English?</h3>
<div>
I gave partial answers in the above paragraphs. In general, my rule of blog writings is to make sure my message are targeted to a well-defined niche group. The Grand Janitor is really for speech professionals while 333 weeks are written for aspiring start up guys. So that's pretty much sum up why you don't see my other messages in the past?<br /><br />Another (rather obvious) reason is my English. My English writing has never quite caught up with my Chinese writing. Don't get me wrong. I write English way faster than Chinese. I also write a lot. The issue is that I never feel I can embellish my articles with English phrases as I do with Chinese phrases. </div>
<div>
<br /></div>
<div>
It changes quite a bit recently as I feel my English writing has improved. (May be because I hanged out with a bunch of comedians lately. :) ) But I still feel some topics are better to be written in a certain language. </div>
<div>
<br /></div>
<div>
Hopefully this can be changed in near future. In fact, Pranav Jawale, a good friend of Sphinx, has recently interested in one article I wrote in 333 weeks. And I am going to translate it soon. </div>
<div>
<br /></div>
<div>
If you are interested in any articles I wrote in Chinese, feel free too tell me. I can always translate them and put it to GJB. </div>
<div>
<br /></div>
<div>
Arthur</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-13774234242212678492013-04-20T16:45:00.001-07:002013-04-20T16:45:06.243-07:00The Boston Marathon Explosion : AfterthoughtIt has been a crazy week. Lives were crazy for Bostonians ...... and perhaps all Americans. From the explosion to the capture of suspect were only 5 days. I still feel still disoriented from the whole event. <div>
<br /></div>
<div>
I feel the warmth from friends and families: there were more than 20 messages from facebook, linkedins, twitters from all over the world to ask about my situation in Boston. They are all friends who never been to Boston so they don't know that Copley square is a well-known shopping area and only few who are affluent enough would live there. Saying so, I was lucky enough to decide not to return books to BPL central that day. But I was shock by the whole thing. Some describe it as the most devastating terrorist attach since 9-11. I have to concur. Even though we can't establish a direct link between the two suspect and any terrorist organizations yet, the event is at least be inspired by on-line instruction on how to make improvised pressure cooker bomb. </div>
<div>
<br /></div>
<div>
To even now, no one could clearly explain the motives of the suspect. Family members are giving confusing answers on psychological profiles of the suspects. It's hard to judge at this point and may be we should hear more from the authorities. </div>
<div>
<br /></div>
<div>
My condolences to the families of all victims, to the transit police officer who died at the front-line, to all who was injured. I sincerely hope the Boston authority can soon help us understand why the tragedy happens. </div>
<div>
<br /></div>
<div>
Arthur</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com1tag:blogger.com,1999:blog-24286077.post-87403364218834734542013-04-04T08:16:00.002-07:002013-04-04T08:19:23.946-07:00Wednesday Speech-related LinksCMUSphinx:<br />
<br />
<a href="http://cmusphinx.sourceforge.net/2013/03/speech-recognition-on-kindle-touch-with-cmusphinx/">CMUSphinx on Kindle Touch</a> (cmusphinx.org Yay!)<br />
<br />
Business<br />
<br />
<a href="http://www.speechtechmag.com/Articles/News/Industry-News/Nuance-Unveils-Voice-Ads-88739.aspx">Nuance Unveils Voice Ad</a><br />
<br />
<a href="http://www.cnbc.com/id/100610757">Why Carl Icahn's Buying a Stake in Nuance</a><br />
<br />
This is indeed a big development for ASR industry because it makes a rather constant revenue stream as compared to sales of software or professional service.<br />
<br />
ArthurArthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-38597108439751222612013-04-04T08:09:00.001-07:002013-04-04T08:09:06.958-07:00Sphinx on Kindle <a href="http://cmusphinx.sourceforge.net/2013/03/speech-recognition-on-kindle-touch-with-cmusphinx/">http://cmusphinx.sourceforge.net/2013/03/speech-recognition-on-kindle-touch-with-cmusphinx/</a><br />
<br />
Love the current trend that Sphinx is everywhere.Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-69156466878458659422013-04-02T12:28:00.000-07:002013-04-02T12:44:03.676-07:00Grand Janitor's Blog February and March Summary<span style="font-family: Arial, Helvetica, sans-serif;">I wasn't very productive in blogging for the last two months. Here are couple of worthy blog posts and news you might feel interested.</span><br />
<br />
<ul>
<li><a href="http://grandjanitor.blogspot.com/2013/02/a-look-on-sphinx3s-initialization.html" style="background-color: white; line-height: 12.727272033691406px; text-indent: -15px;"><span style="color: blue; font-family: Arial, Helvetica, sans-serif;">A look on Sphinx3's initialization</span></a></li>
</ul>
<ul>
<li><a href="http://grandjanitor.blogspot.com/2013/03/sphinxbase-08-and-sphinxtrain-108.html" style="background-color: white; line-height: 12.727272033691406px; text-indent: -15px;"><span style="color: blue; font-family: Arial, Helvetica, sans-serif;">sphinxbase 0.8 and SphinxTrain 1.08</span></a></li>
</ul>
<ul>
<li><a href="http://grandjanitor.blogspot.com/2013/03/landscape-of-open-source-speech.html" style="background-color: white; line-height: 12.727272033691406px; text-indent: -15px;"><span style="color: blue; font-family: Arial, Helvetica, sans-serif;">Landscape of Open Source Speech Recognition Software (Part II)</span></a></li>
</ul>
<ul>
<li><a href="http://grandjanitor.blogspot.com/2013/03/good-asr-training-system.html" style="text-indent: -15px;"><span style="color: blue; font-family: Arial, Helvetica, sans-serif;">Good ASR Training System</span></a></li>
</ul>
<ul>
<li><a href="http://grandjanitor.blogspot.com/2013/03/c-vs-c.html" style="background-color: white; line-height: 12.727272033691406px; text-indent: -15px;"><span style="color: blue; font-family: Arial, Helvetica, sans-serif;">C++ vs C</span></a></li>
</ul>
<span style="font-family: Arial, Helvetica, sans-serif;">GJB also reached the milestone of <a href="http://grandjanitor.blogspot.com/2013/03/the-100th-post-why-grand-janitors-blog.html">100 posts</a>, thanks for your support !</span><span style="background-color: white; color: #222222; font-size: 13.63636302947998px; line-height: 16.363636016845703px;"><span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></span><span style="background-color: white; color: #222222; line-height: 16.363636016845703px;"><span style="font-family: Arial, Helvetica, sans-serif;">Newsworthy:</span></span><span style="background-color: white; color: #222222; line-height: 16.363636016845703px;"><span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></span><u><span style="color: blue;"><br /></span></u><br />
<u><span style="color: blue;"><a href="http://www.cmswire.com/cms/customer-experience/google-buys-neural-net-startup-boosting-its-speech-recognition-computer-vision-chops-020044.php" style="background-color: white;">Google Buys Neural Net Startup, Boosting Its Speech Recognition, Computer Vision Chops</a></span></u><br />
<u><span style="color: blue;"><br /><a href="http://www.theverge.com/2013/3/21/4132116/bing-streaming-mode-windows-phone-demo" style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 16.363636016845703px;">Future Windows Phone speech recognition revealed in leaked video</a></span></u><br />
<u><span style="color: blue;"><br /><a href="http://googleblog.blogspot.com/2013/03/google-keepsave-whats-on-your-mind.html?m=1" style="background-color: white; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 16.363636016845703px;">Google Keep</a></span><span style="background-color: white; color: #222222; font-size: 13.63636302947998px; line-height: 16.363636016845703px;"><span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></span></u><span style="background-color: white; color: #222222; font-size: 13.63636302947998px; line-height: 16.363636016845703px;"><span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></span><span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white; color: #222222; line-height: 16.363636016845703px;">Feel free to connect with me on </span><a href="https://plus.google.com/111459257194409198620/posts" style="background-color: white; color: #888888; line-height: 16.363636016845703px; text-decoration: none;">Plus</a><span style="background-color: white; color: #222222; line-height: 16.363636016845703px;">, LinkedIn and </span><a href="https://twitter.com/grandjanitor" style="background-color: white; color: #888888; line-height: 16.363636016845703px; text-decoration: none;">Twitter</a><span style="background-color: white; color: #222222; line-height: 16.363636016845703px;">.</span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">Arthur</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-56727645355968923392013-03-27T08:25:00.004-07:002013-03-27T08:35:26.146-07:00GJB Wednesday Speech-related Links/Commentaries (DragonTV, Siri vs Xiao i Robot, Coding with Voice)<br />
<div class="separator" style="clear: both; text-align: center;">
<object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://img.youtube.com/vi/Ku0H10_G1X4/0.jpg" height="266" width="320"><param name="movie" value="http://youtube.googleapis.com/v/Ku0H10_G1X4&source=uds" /><param name="bgcolor" value="#FFFFFF" /><param name="allowFullScreen" value="true" /><embed width="320" height="266" src="http://youtube.googleapis.com/v/Ku0H10_G1X4&source=uds" type="application/x-shockwave-flash" allowfullscreen="true"></embed></object></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div>
<a href="http://techcrunch.com/2013/03/27/apple-appears-in-court-in-china-to-defend-against-siri-patent-infringement-claim/">Apple Appears In Court In China To Defend Against Siri Patent Infringement Claim</a> (Techcrunch)</div>
<div>
<br /></div>
<div>
ZhiZhen company (智臻網絡科技) from Shanghai is suing Apple for infringing their patents. (<a href="http://www.shanghaidaily.com/nsp/District/2012/06/25/Xiao%2Bi%2BRobot%2Bstorms%2Bthe%2Bonline%2Bworld/">The original Shanghai Daily article)</a> From the news, back in 2006, ZhiZhen has already developed the engine for Xiao i Robot (小i機械人). A video 8 months ago (as below). </div>
<div>
<br /></div>
<div>
Technically, it is quite possible that a Siri-like system can be built at 2006. (Take a Look at Olympus/Ravenclaw.) Of course, the Siri-like interface you see here is certainly built in the advent of smartphone (, which by my definition, after iPhone is released). So overall speaking, it's a bit hard to say who is right. </div>
<div>
<br /></div>
<div>
Of course, when interpreting news from China, it's tempting to use slightly different logic. In the TC article, OP (Etherington) suggested that the whole lawsuit could be state-orchestrated. It could be related to recent Beijing's attack of Apple. </div>
<div>
<br /></div>
<div>
I don't really buy the OP's argument, Apples is constantly sued in China (or over the world). It is hard to link the two events together. </div>
<div>
<br /></div>
<div>
<br />
<br />
<div>
<a href="http://www.gizmag.com/panasonic-smart-tv-2013-dragon-speech-recognition/26801/">Dragon TV brings speech recognition to Panasonic’s 2013 Smart TVs</a> (DigitalVersus)</div>
<div>
<br /></div>
<div>
This is definitely not the Siri for TV.<br />
<br />
Oh well, Siri is not just speech recognition, there is also the smart interpretation in the sentence level: scheduling, making appointments, do the right search. Those by themselves are challenges. In fact, I believe Nuance only provides the ASR engine for Apple. (Can't find the link, I read it from Matthew Siegler.)</div>
<div>
<br /></div>
<div>
In the scenario of TV, what annoys users most are probably switching channels and searching programs. If I built a TV, I would also eliminate the any set-top boxes. (So cable companies will hate me a lot). </div>
<div>
<br /></div>
<div>
With the technology profile of all big companies, Apple seems to own all technologies need. It also takes quite a lot of design (with taste) to realize such a device. </div>
</div>
<br />
<h3>
Using Python to code by Voice</h3>
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/8SkdfdXWYaI?feature=player_embedded' frameborder='0'></iframe></div>
<br />
Here is an interesting look of how ASR can be used in coding. Some notes/highlights:<br />
<ul>
<li>The speaker, Travis Rudd, had RSI 2 years ago. After a climbing accident, He decided to code using voice instead. Now his RSI is recovered, he claims he is still using it for 40-60%. </li>
<li>2000 voice commands, which are not necessarily English words. The author used <a href="https://code.google.com/p/dragonfly/">Dragonfly</a> to control emacs in windows.</li>
<li>How does variables work? Turns out most variables are actually English phrases. There are specific commands to get these phrases delimited by different characters. </li>
<li>The speaker said "it's not very hard" for others to repeat. I believe there will be some amount of customizations. It takes him around 3 months. That's pretty much how much time a solution engineer needs to take to tune an ASR system. </li>
<li>The best language to program in voice : Lisp. </li>
</ul>
<div>
One more thing. Rudd also believe it will be very tough to do the same thing with CMUSphinx. </div>
<div>
<br /></div>
<div>
Ah...... models, models, models. </div>
<div>
<br />
<br />
<br />
<h3>
Earlier on Grand Janitor's Blog</h3>
<div>
Some quick notes on what a "Good training system" should look like: (<a href="http://grandjanitor.blogspot.com/2013/03/good-asr-training-system.html">link</a>).</div>
<div>
GJB reaches the 100th post! (<a href="http://grandjanitor.blogspot.com/2013/03/the-100th-post-why-grand-janitors-blog.html">link</a>)</div>
<div>
<br /></div>
</div>
<div>
Arthur</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com2tag:blogger.com,1999:blog-24286077.post-10203305150511898062013-03-26T09:51:00.003-07:002013-03-26T09:51:44.519-07:00Tuesday's Links (Meetings and more)Geeky:<br />
<br />
<a href="http://asserttrue.blogspot.com/2013/03/is-depression-really-biochemical.html">Is Depression Really Biochemical</a> (AssertTrue)<br />
<br />
<a href="http://blog.vivekhaldar.com/post/46019327375/meetings-are-mutexes">Meetings are Mutexes</a> (Vivek Haldar)<br />
<br />
So True. It doesn't count all the time you use to prepare a meeting.<br />
<br />
<a href="http://blog.regehr.org/archives/917?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+EmbeddedInAcademia+%28Embedded+in+Academia%29&utm_content=Netvibes">Exhaustive Testing is Not a Proof of Correctness</a><br />
<br />
True, but hey. Writing regression tests is never a bad thing. If you rely only on your brain on testing, it bounds to fail one way or the other.<br />
<br />
Apple :<br />
<br />
<a href="http://appleinsider.com/articles/13/03/26/apples-iphone-5-debuts-on-t-mobile-april-12-with-99-upfront-payment-plan">Apple's iPhone 5 debuts on T-Mobile April 12 with $99 upfront payment plan</a><br />
<a href="http://thedoghousediaries.com/4974?utm_source=loopinsight.com&utm_medium=referral&utm_campaign=Feed">iWatchHumor</a> (DogHouseDiaries)<br />
<br />
Yahoo:<br />
<br />
<a href="http://www.mondaynote.com/2013/03/24/yahoo-the-marissa-mayer-turnaround/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+monday-note+%28Monday+Note%29&utm_content=Netvibes">Yahoo The Marissa Mayer Turnaround</a><br />
<br />
Out of all commentaries on Marissa Mayer's realm. I think Jean-Louis Gassée goes straight to the point and I agree most. You cannot use a one size fit all policy. So WFH is not always appropriate as well.<br />
<br />
Management:<br />
<br />
<a href="http://dilbert.com/blog/entry/the_managementfree_organization/">The Management-free Organization</a>Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-82797809690885190382013-03-25T12:46:00.001-07:002013-09-17T09:30:19.826-07:00Good ASR Training SystemThe term "speech recognition" is a misnomer. <br />
<br />
Why do I say that? I have explained this point in an old article "<a href="http://arthur-chan.blogspot.com/2009/04/do-we-have-true-open-source-dictation.html">Do We Have True Open Source Dictation?</a>, which I wrote back in 2005: To recap, a speech recognition system consists of a Viterbi decoder, an acoustic model and a language model. You could have a great recognizer but bad accuracy performance if the models are bad. <br />
<br />
So how does that related to you, a developer/researcher of ASR? The answer is ASR training tools and process usually become a core asset of your inventories. In fact, I can tell you when I need to work on acoustic model training, I need to spend full time to work on it and it's one of the absorbing things I have done. <br />
<br />
Why is that? When you look at development cycles of all tasks in making an ASR systems. Training is the longest. With the wrong tool, it is also the most error prone. As an example, just take a look of Sphinx forum, you will find that majority of non-Sphinx4 questions are related to training. Like, "I can't find the path of a certain file", "the whole thing just stuck at the middle". <br />
<br />
Many first time users complain with frustration (and occasionally disgust) on why it is so difficult to train a model. The frustration probably stems from the perception that "Shouldn't it be well-defined?" The answer is again no. In fact how a model should be built (or even which model should be built) is always subjects to change. It's also one of the two subfields in ASR, at least IMO, which is still creative and exciting in research. (Another one: noisy speech recognition.) What an open source software suite like Sphinx provide is a standard recipe for everyone. <br />
<br />
Saying so, is there something we can do better for an ASR training system? There is a lot I would say, here are some suggestions:<br />
<ol>
<li>A training experiment should be created, moved and copied with ease,</li>
<li>A training experiment should be exactly repeatable given the input is exactly the same,</li>
<li>The experimenter should be able to verify the correctness of an experiment before an experiment starts. </li>
</ol>
<div>
<span style="font-size: large;">Ease of Creation of an Experiment</span></div>
<div>
<span style="font-size: large;"><br /></span>
</div>
You can think of a training experiment as a recipe ...... not exactly. When we read a recipe and implement it again, we human would make mistakes. <br />
<br />
But hey! We are working with computers. Why do we need to fix small things in the recipe at all? So in a computer experiment, what we are shooting for is an experiment which can be easily created and moved around. <br />
<br />
What does that mean? It basically means there should be no executables which are hardwired to one particular environment. There should also be no hardware/architecture assumption in the training implementations. If there is, they should be hidden. <br />
<br />
<br />
<span style="font-size: large;">Repeatability of an Experiment</span><br />
<br />
<br />
Similar to the previous point, should we allow difference when running a training experiment? The answer should be no. So one trick you heard from experienced experimenters is that you should keep the seed of random generators. This will avoid minute difference happens in different runs of experiments. <br />
<br />
Here someone would ask. Shouldn't us allow a small difference between experiments? We are essentially running a physical experiment. <br />
<br />
I think that's a valid approach. But to be conscientious, you might want to run a certain experiment many times to calculate an average. In a way, I think this is my problem with this thinking. It is <i>slower</i> to repeat an experiment. e.g. What if you see your experiment has 1% absolute drop? Do you let it go? Or do you just chalk it up as noise? Once you allow yourself to not repeat an experiment <i>exactly, </i>there will be tons of questions you should ask.<br />
<br />
<br />
<span style="font-size: large;">Verifiability of an Experiment</span><br />
<br />
Running an experiment sometimes takes day, how do you make sure running it is correct? I would say you should first make sure trivial issues such as missing paths, missing models, or incorrect settings was first screened out and corrected.<br />
<br />
One of my bosses used to make a strong point and asked me to verify input paths every single time. This is a good habit and it pays dividend. Can we do similar things in our training systems?<br />
<br />
<h2>
<span style="font-weight: normal;">Apply it on Open Source</span></h2>
<div>
What I mentioned above is highly influenced by my experience in the field. I personally found that sites, which have great infrastructure to transfer experiments between developers, are the strongest and faster growing. </div>
<div>
<br /></div>
<div>
To put all these ideas into open source would mean very different development paradigm. For example, do we want to have a centralized experiment database which everyone shares? Do we want to put common resource such as existing paramatized inputs (such as MFCC) somewhere in common for everyone? Should we integrate the retrieval of these inputs into part of our experiment recipe? </div>
<div>
<br /></div>
<div>
Those are important questions. In a way, I think it is the most type of questions we should ask in open source. Because regardless of much volunteer's effort. Performance of open source models is still lagging behind the commercial models. I believe it is an issue of methodology. </div>
<div>
<br /></div>
<div>
Arthur</div>
<div>
<br /></div>
<div>
<br /></div>
<br />
<br />
<br />
<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com2tag:blogger.com,1999:blog-24286077.post-71757145499595507622013-03-25T09:10:00.000-07:002013-03-25T09:10:25.572-07:00Monday's Links (Brain-Computer Interface, Apple and more)Geeky:<br />
<br />
<a href="http://calnewport.com/blog/2013/03/24/how-to-write-six-important-papers-a-year-without-breaking-a-sweat-the-deep-immersion-approach-to-deep-work/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+StudyHacks+%28Study+Hacks%29&utm_content=Netvibes">How to Write Six Important Papers a Year without Breaking a Sweat: The Deep Immersion Approach to Deep Work</a><br />
<a href="http://www.slate.com/articles/technology/future_tense/2013/03/brain_computer_interface_could_allow_next_gen_apps_to_market_your_brainwaves.2.html">It’s Like They’re Reading My Mind</a> (Slate)<br />
<br />
Apple:<br />
<br />
<a href="http://www.businessinsider.com/apple-buys-indoor-mapping-company-wifislam-2013-3">Apple Buys Indoor Mapping Company WifiSLAM</a> (LA times)<br />
<a href="http://www.latimes.com/business/la-fi-hiltzik-20130322,0,1228878.column?utm_source=loopinsight.com&utm_medium=referral&utm_campaign=Feed">How Apple Invites Facile Analysis</a> (Business Insiders)<br />
<a href="http://www.asymco.com/2013/03/22/so-long-break-even/?utm_source=loopinsight.com&utm_medium=referral&utm_campaign=Feed">So long, break-even</a> (Horace Dediu)<br />
<br />
After big channels picked up Richards' story:<br />
<br />
<a href="http://money.cnn.com/2013/03/25/technology/innovation/sexism-startup/index.html">Startups have a sexism problem</a><br />
<br />
Fun:<br />
<a href="http://www.slate.com/blogs/future_tense/2013/03/25/r2_d2_day_sign_a_white_house_petition_to_honor_the_selfless_not_selfish.html">R2-D2 Day ...... for real!</a>Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-55316218586549546292013-03-23T12:02:00.002-07:002013-03-24T10:13:02.919-07:00The 100th Post: Why The Grand Janitor's Blog?Since I decided to revamp The Grand Janitor's Blog last December, it has been 100 posts. (I cheat a bit, so "not since then".)<br />
<br />
It's funny to describe time with the number of articles you write. In blogging though, that makes complete sense. <br />
<br />
I have started several blogs in the past. Only 2 of them survive (, <a href="http://cumulomaniac.blogspot.com/">Cumulomanic</a> and "<a href="http://333weeks.blogspot.com/">Start-Up Employees 333 weeks</a>", both in Chinese) . When you cannot maintain your blog for more than 50 posts, you blog just dies, or simply to disappear into oblivion.<br />
<br />
Yet I make it. So here's an important question to ask: what makes me keep on?<br />
<br />
I believe the answer is very simple. There is no bloggers so far who work on the niche of speech recognition: None on automatic speech recognition (ASR) systems, even though there was much progress. None on engines, even much work has been done in open source. None on applications, even great projects such as Simon was there. <br />
<br />
Nor there were discussion on how open source speech recognition can be applied to the commercial world, even when there are dozens of companies are now based on Sphinx (e.g. my employer <a href="http://vocitec.com/">Voci</a>, <a href="http://www.englishcentral.com/">EnglishCentral</a> and <a href="http://nexiwave.com/">Nexiwave</a> ), and they are filling the startup space. <br />
<br />
How about how the latest technology such as deep neural network (DNN) and weighted finite state transducers (WFST) would affect us? I can see them in academic conferences, journals or sometimes tradeshows...... but not in a blog.<br />
<br />
But blogging, which we all know, is probably the most prominent form of how people are getting news these days. <br />
<br />
And news about speech recognition, once you understand them, is <i>fascinating. </i><br />
<br />
The only blog which comes close is Nicholay's blog : <a href="http://nshmyrev.blogspot.com/">nsh</a>. When I try to recover as a speech recognition programmer, nsh was a great help. So thank you, Nick, thank you.<br />
<br />
But there is only one nsh. There are still have a lot of fascinating to talk about...... Right?<br />
<br />
So probably the reason why I keep on working: <i>I want to invent something I want</i>: a kind of information hub on speech recognition technology, commercial/open source, applications/engines, theory/implementations, the ideals/the realities. <br />
<br />
I want to bring my unique perspective: I was in academia, in industrial research and now in the startup world so I know quite well people's mindsets in each group.<br />
<br />
I also want to connect with all of you. We are working on one of the most exciting technology in the world. Not everyone understands that. It will take time for all of us, to explain to our friends and families what speech recognition can really do and why it matters. <br />
<br />
In any case, I hope you enjoy this blog. Feel free to connect with me on <a href="https://plus.google.com/111459257194409198620/posts">Plus</a>, LinkedIn and <a href="https://twitter.com/grandjanitor">Twitter</a>.<br />
<br />
Arthur<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com2tag:blogger.com,1999:blog-24286077.post-22424070808532547262013-03-22T21:00:00.002-07:002013-03-25T14:37:20.598-07:00C++ vs CI have been mainly a C programmer. Because of work though, I have been working with many codebase which is written in C++. <br />
<br />
Many programmers will tell you C++ is a necessary evil. I agreed. Using C to emulate object oriented feature such as polymorphism, inheritance or even the idea of objects is not easy. It also easily confused novice programmer.<br />
<br />
So why C++ frustrates many programmers then? I guess my major complaint is that its standard has be evolving and many compilers cannot catch up with the latest. <br />
<br />
For example, it's very hard for gcc 4.7 to compile code which can be compiled by gcc 4.2 . Chances are some of the language feature is outdated and they will generate compiler error.<br />
<br />
On the other hand, C exhibit much greater stability across compiler. If you look at the C portion of the triplet (PocketSphinx, SphinxTrain, Sphinxbase), i.e. 99% of the code. Most of them just compile across different generation of gcc. This makes things easier to maintain.<br />
<br />
ArthurArthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-21990456555398138562013-03-22T10:21:00.003-07:002013-03-22T21:15:49.507-07:00Friday's ReadingsGeeky:<br />
<br />
<a href="http://gcc.gnu.org/gcc-4.8/changes.html">GCC 4.8.0 released</a><br />
<a href="http://royal.pingdom.com/2013/03/21/browser-wars-2013/">Browser War Revisited</a><br />
<a href="http://www.networkworld.com/community/blog/darpa-wants-unique-automated-tools-rapidly-make-computers-smarter">DARPA wants unique automated tools to rapidly make computers smarter</a><br />
<br />
Non-Geeky:<br />
<a href="http://techcrunch.com/2013/03/21/just-as-ceo-heins-predicted-blackberry-world-now-plays-home-to-over-100000-apps/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29&utm_content=Netvibes">Just As CEO Heins Predicted, BlackBerry World Now Plays Home To Over 100,000 Apps</a><br />
<a href="http://thenextweb.com/apple/2013/03/21/apple-updates-podcasts-app-with-custom-stations-on-the-go-playlists-and-less-skeuomorphic-design/">Apple updates Podcasts app with custom stations, on-the-go playlists and less ‘skeuomorphic’ design</a><br />
<br />
The whole PyCon2013's Fork the Dongle business:<br />
<br />
The story:<br />
<a href="http://www.bbc.co.uk/news/technology-21896442">'Sexist joke' web developer whistle-blower fired</a> (BBC) and then......<br />
<br />
<span style="color: #0000ee; text-decoration: underline;">Breaking: Adria Richards fired by SendGrid for calling out developers on Twitter</span><br />
<div>
<br /></div>
<div>
Different views:</div>
<br />
From those who works with Richards before: <a href="http://amandablumwords.wordpress.com/2013/03/21/3/?utm_source=loopinsight.com&utm_medium=referral&utm_campaign=Feed%3A+loopinsight%2FKqJb+%28The+Loop%29&utm_content=Netvibes">Adria Richards, PyCon, and How We All Lost</a><br />
The apology from PlayHaven's developer: <a href="https://news.ycombinator.com/item?id=5398681">Apology from the developer</a><br />
Rachel-Sklar from BI: <a href="http://www.businessinsider.com/rachel-sklar-on-adria-richards-and-sendgrid-2013-3">Rachel-Sklar takes</a><br />
Someone thinks this is a good time to sell their T-shirt: <a href="http://forkmydongle.com/">Fork My Dongle T-Shirt</a><br />
Is PyCon2013 so bad? (Short answer: no) <a href="http://peak5390.wordpress.com/2013/03/21/what-really-happened-at-pycon-2013/">What really happened at PyCon 2013</a><br />
<br />
Your view:<br />
<a href="http://www.independent.co.uk/voices/iv-drip/poll-pycon-playhaven-anonymous-adria-richards-and-online-sexism-where-did-it-all-go-wrong-8545741.html">POLL: PYCON, PLAYHAVEN, ANONYMOUS, ADRIA RICHARDS AND ONLINE SEXISM. WHERE DID IT ALL GO WRONG?</a><br />
<br />
Frankly, if you want to support woman in our industry, donate to this awesome 9 year old.<br />
<a href="http://www.kickstarter.com/projects/susanwilson/9-year-old-building-an-rpg-to-prove-her-brothers-w">9 Year Old Building an RPG to Prove Her Brothers Wrong!</a><br />
<br />
Arthur<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-15593668879779141942013-03-22T09:52:00.000-07:002013-03-22T23:12:16.814-07:00Friday Speech-related Links<a href="http://www.theverge.com/2013/3/21/4132116/bing-streaming-mode-windows-phone-demo">Future Windows Phone speech recognition revealed in leaked video</a><br />
<br />
Whether you like Softie, they are innovative in speech recognition in these few years. I am looking forward for their integration of DBN in many of their products.<br />
<br />
<a href="http://techcrunch.com/2013/03/21/german-language-learning-startup-babbel-buys-disrupt-finalist-playsay-to-target-the-us-market/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29&utm_content=Netvibes">German Language Learning Startup Babbel Buys Disrupt Finalist PlaySay To Target The U.S. Market</a><br />
<br />
Not exactly in ASR but language learning has been a main stay. Look at EnglishCentral, they have been around and kicking well.<br />
<br />
<a href="http://sujitpal.blogspot.com/2013/03/the-wikipedia-bob-alice-hmm-example.html">HMM with scipy-learn</a><br />
<br />
When I first learned HMM, I was always hoping to use a scripting language to train the simplest HMM. scipy-learn is one such software.<br />
<br />
<a href="http://googleblog.blogspot.com/2013/03/google-keepsave-whats-on-your-mind.html?m=1">Google Keep</a><br />
<br />
Voice memo is a huge market. But mobile continus speech recognition is a very challenging task. Yet, with Google technology, I think it should be better than its competitor, Evernote.<br />
<br />
Arthur<br />
<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-40630241781165172342013-03-21T09:22:00.002-07:002013-03-21T09:23:09.351-07:00Thursday Links (FuzzBuzz programming, Samsung, Amazon and more)Geeky:<br />
<br />
<a href="http://asserttrue.blogspot.com/2013/03/placebo-surgery.html">Placebo Surgery</a> : Still think acupuncture is a thing?<br />
<br />
<a href="http://prog21.dadgum.com/169.html">Expertise, the Death of Fun, and What to Do About It</a> by James Hague<br />
<br />
Indeed, it got hard to learn. My two cents: always keep notes on your work. See every mistakes as an opportunity to learn. And always learn new things, never stop. <br />
<br />
<a href="http://imranontech.com/2007/01/24/using-fizzbuzz-to-find-developers-who-grok-coding/">FizzBuzz programming</a> (2007)<br />
<br />
It's sad that it is true.<br />
<br />
Technology in general:<br />
<br />
<a href="http://appleinsider.com/articles/13/03/21/new-samsung-smart-watch-will-be-companys-third-stab-at-wrist-accessory">Samsung smartwatch product</a><br />
<br />
I still look for the Apple's product more. I guess I was there when iPhone came out, it's rather hard to not say Samsung plagiarize.......<br />
<br />
The Economics of Amazon Prime (<a href="http://business.time.com/2013/03/18/amazon-prime-bigger-more-powerful-more-profitable-than-anyone-imagined/">link</a>)<br />
<br />
When I go to Amazon, using Prime has indeed became an option, especially for the thousand ebook which cause less than $2.99. Buying ten of them is very close to the monthly subscription fee of Amazon Prime.<br />
<br />
Starbucks and Square don't seem to "mix" well (<a href="http://www.fastcompany.com/3005410/industries-watch/starbuckss-shoddy-square-rollout-baffles-baristas-confuses-customers">link</a>)<br />
<br />
Other newsworthy:<br />
<br />
<a href="http://www.nytimes.com/2013/03/19/business/as-crop-prices-surge-investment-firms-and-farmers-vie-for-land.html?_r=0">As Crop Prices Surge, Investment Firms and Farmers Vie for Land</a><br />
<br />
Crop has reversed its course, if you are interested in restaurants business (like me), this has a huge impact of the whole food chain.<br />
<br />
<a href="http://abnormalreturns.com/the-many-failures-of-the-personal-finance-industry/">The many failures of the personal finance industry</a><br />
<br />
Many geeky friends of mine are not making good sense in personal finance. This is a good link to understand the industry.<br />
<br />
Arthur<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-89132453676812079582013-03-21T09:04:00.003-07:002013-03-21T09:04:57.152-07:00Thursday Speech-related Readings<a href="http://www.informationweek.com/healthcare/admin-systems/speech-recognition-stumbles-at-leeds-hos/240151225">Speech Recognition Stumbles at Leeds Hospital</a><br />
<br />
I wonder who the vendor is.<br />
<br />
<a href="http://www.slate.com/blogs/future_tense/2013/03/19/google_peanut_gallery_demonstrates_voice_recognition.html">Google Peanut Gallery (Slate)</a><br />
<br />
Interesting showcase again. Google always has pretty impressive speech technology.<br />
<br />
<a href="http://business.time.com/2013/03/18/amazon-prime-bigger-more-powerful-more-profitable-than-anyone-imagined/">Where Siri Has Trouble Hearing, a Crowd of Humans Could Help</a><br />
<br />
Combining fragments of recognition a rather interesting idea though it's probably not new. I am glad it is taking off though.<br />
<br />
<a href="http://www.cmswire.com/cms/customer-experience/google-buys-neural-net-startup-boosting-its-speech-recognition-computer-vision-chops-020044.php">Google Buys Neural Net Startup, Boosting Its Speech Recognition, Computer Vision Chops</a><br />
<br />
This is huge. Once again, it says something about the power of DNN approach. It is probably the real focus in the next 5 years.<br />
<br />
<a href="http://techcrunch.com/2013/03/14/duolingo-adds-offline-mode-and-speech-recognition-to-its-mobile-app/">Duolingo Adds Offline Mode And Speech Recognition To Its Mobile App</a><br />
<br />
I always wonder how the algorithm works. Confidence-based algorithm of verification has always been tough to get it work. But then again, the whole deal of reCAPTCHA is really try to differentiate between human and machines. So it's probably not as complicated than I thought.<br />
<br />
Some notes on DNS 12: <a href="http://news.idg.no/cw/art.cfm?id=85E862BD-EEC6-F139-3CFE77A734E5F956">link</a><br />
<br />
The whole sentence mode is the more interesting part. Does it make users more frustrated though? I am curious.<br />
<br />
Arthur<br />
<br />
Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-61045378809056171532013-03-20T09:14:00.001-07:002013-03-20T09:14:18.443-07:00Wednesday Links (STEM Jobs)Martin Fowler on <a href="http://martinfowler.com/bliki/TypeInstanceHomonym.html">Homonyms in Design</a><br />
Peter Bell on <a href="http://blog.pbell.com/2013/03/19/innovation-debt/">Innovation Debt</a><br />
Mark Suster's "<a href="http://www.bothsidesofthetable.com/2009/11/04/is-it-time-for-you-to-earn-or-to-learn/">Is it Time to Earn or to Learn?</a>"<br />
<br />
STEM Jobs Series by Daniel Lemire (read from<a href="http://blog.vivekhaldar.com/post/45825204067/stem-jobs"> Vivek Halda's blog</a>)<br />
<br />
<a href="http://math-blog.com/2013/01/21/what-is-really-hot-in-stem-jobs/">What is really hot in STEM jobs?</a><br />
<a href="http://math-blog.com/2013/03/04/the-catch-22-stem-job-market/">The Catch-22 of STEM Job Market</a><br />
<a href="http://math-blog.com/2013/03/11/what-do-stem-employers-want/">What do STEM job employers want?</a><br />
<br />
Also the NYT's comment from Prof. Peter Cappelli:<br />
<a href="http://www.nytimes.com/roomfordebate/2012/07/09/does-a-skills-gap-contribute-to-unemployment/if-theres-a-skills-gap-blame-it-on-the-employer">If There’s a Gap, Blame It on the Employer</a><br />
<br />
ArthurArthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-20317103158449132013-03-19T21:57:00.001-07:002013-03-19T22:04:32.369-07:00Landscape of Open Source Speech Recognition Software (II : Simon)Around December last year, I wrote an <a href="http://grandjanitor.blogspot.com/2012/12/landscape-of-open-source-speech.html">article</a> on open source speech recognizers. I covered HTK, Kaldi and Julius. One thing you should know, just like CMUSphinx, all of these packages contain their own versions of Viterbi algorithms' implementation. So when you asked someone who is in the field of speech recognition, they will usually say open source speech recognizers are Sphinx, HTK, Kaldi and Julius. <br />
<br />
That's how I usually view speech recognition too. After years working in the industry though, I start to realize this definition of seeing speech recognizer = Viterbi algorithm could be constraining. In fact, from the user's point of view, a good speech application system should be a combination of<br />
<br />
a recognizer + good models + good GUI.<br />
<br />
I like to call the former type of "speech recognizer" as "<b>speech recognition engines" </b>but the latter type as "<b>speech recognition applications</b>". Both types of "speech recognizers" are worthwhile applications. From the users' point of view, it might just be a technicality to differentiate them. <br />
<br />
When I am recovering as a speech recognition programmer (another name throwing :) ), one thing I notice is that there is much effort on writing "<b>speech recognition applications</b>". It is a good trend because most people from academia really didn't spend too much time to write good speech applications. And in open source, we badly need good applications such as dictation machine, IVR and C&C. <br />
<br />
One effort which really impressed me is <a href="http://simon-listens.blogspot.com/">Simon</a>. It is weird because most of the time I only care about engine-level type of software. But in the case of Simon, you can see couple of its features are really solving problems in real life and integrated to the bigger them of open source speech recognition.<br />
<br />
<br />
<ul>
<li>In 0.4.0, Simon starts to integrate with Sphinx. So if someone wants to develop it commercially, they can.</li>
<li>The Simon's team also intentionally make context switching in the application, that's good work as well. In general, if you always use a huge dictionary, you are just over-recognizing words in a certain context. </li>
<li>Last and not least, I like the fact it integrates itself to Voxforge. Voxforge is the open source answer to a large speech database of commercial speech company. So integration with Voxforge will ensure an increasing amount of data for your application.</li>
</ul>
<div>
So kudo to the Simon team! I believe this is the right kind of thinking to start a good speech application. </div>
<div>
<br /></div>
<div>
Arthur</div>
<div>
<br /></div>
Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com4tag:blogger.com,1999:blog-24286077.post-54732562026457613192013-03-19T20:48:00.003-07:002013-03-19T22:10:08.156-07:00sphinxbase 0.8 and SphinxTrain 1.08I have done some analysis on sphinxbase0.8 and SphinxTrain 1.08 and try to understand if it is very different from sphinxbase0.7 and SphinxTrain1.0.7. I don't see big difference but it is still a good idea to upgrade.<br />
<br />
<br />
<ul>
<li>(sphinxbase) The bug in cmd_ln.c is a must fix. Basically the freeing was wrong for all ARG_STRING_LIST argument. So chances are you will get a crash when someone specify a wrong argument name and cmd_ln.c forces an exit. This will eventually lead to a cmd_ln_val_free. </li>
<li>(sphinxbase) There were also couple of changes in fsg tools. Mostly I feel those are rewrites. </li>
<li>(SphinxTrain) sphinxtrain, on the other hands, have new tools such as g2p framework. Those are mostly openfst-based tool. And it's worthwhile to put them into SphinxTrain. </li>
</ul>
<div>
One final note here: there is a tendency of CMUSphinx, in general, starts to turn to C++. C++ is something I love and hate. It could sometimes be nasty especially dealing with compilation. At the same time, using C to emulate OOP features is quite painful. So my hope is that we are using a subset of C++ which is robust across different compiler version. </div>
<div>
<br /></div>
<div>
Arthur </div>
<br />
<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0tag:blogger.com,1999:blog-24286077.post-51291112628409799262013-03-18T15:36:00.003-07:002013-03-18T15:37:47.302-07:00Python multiprocessingAs my readers may noticed, I haven't updated this blog as I have pretty heavy workload. It doesn't help that I was sick in the middle of March as well. Excuses aside though, I am happy to come back. If I couldn't write much about Sphinx and programming, I think it's still worth it to keep posting links.<br />
<br />
I also come up with requests on writing more details on individual parts of Sphinx. I love these requests so feel free to send me more. Of course, it usually takes me some time to fully grok a certain part of Sphinx and I could describe it in an approachable way. So before that, I could only ask for your patience.<br />
<br />
Recently I come up with parallel processing a lot and was intrigued on how it works in the practice. In python, a natural choice is to use the library multiprocessing. So here is a simple example on how you can run multiple processes in python. It would be very useful in the modern days CPUs which has multi-cores.<br />
<br />
Here is an example program on how that could be done:<br />
<br />
<pre style="background-image: URL(https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyo6bqOf_-RKRbNEy0wphIwhPoQ4yq4DGqGNHr9UX1TSYGuW5sRmNJweQy2zZw9LsdvItDTve5ZLDlCDpcqwE0u2N4vkrJNoE2xnDj5FUn0nKiR3xkfUZtTv5MiS3IT4URcumyJA/s320/codebg.gif); background: #f0f0f0; border: 1px dashed #CCCCCC; color: black; font-family: arial; font-size: 12px; height: auto; line-height: 20px; overflow: auto; padding: 0px; text-align: left; width: 99%;"><code style="color: black; word-wrap: normal;">1: import multiprocessing
2: import subprocess
3: jobs = []
4: for i in range (N):
5: p = multiprocessing.Process(target=process, \
6: name = 'TASK' + str(i), \
7: args=(i, ......
8: )
9: )
10: jobs.append(p)
11: p.start()
12: for j in jobs:
13: if j.is_alive():
14: print 'Waiting for job %s' %(j.name)
15: j.join() </code></pre>
<br />
<br />
The program is fairly trivial. Interesting enough, it is also quite similar to the multithreading version in python. Line 5 to 11 is where you run your task and I just wait for the tasks finished from Line 12 to 15. <br />
<br />
It feels little bit less elegant than using Pool because it provides a waiting mechanism for the entire pool of task. Right now, I am essentially waiting for job which is still running by the time job 1 is finished. <br />
<br />
Is it worthwhile to go another path which is thread-based programming. One thing I learned in this exercise is that older version of python, multi-threaded program can be paradoxically slower than the single-threaded one. (See this <a href="http://eli.thegreenplace.net/2012/01/16/python-parallelizing-cpu-bound-tasks-with-multiprocessing/">link </a>from Eli Bendersky.) It could be an easier being resolved in recent python though.<br />
<br />
Arthur<br />
<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com2tag:blogger.com,1999:blog-24286077.post-2888525969661356902013-02-28T08:59:00.001-08:002013-02-28T08:59:53.634-08:00Readings at Feb 28, 2013<a href="http://asserttrue.blogspot.com/2013/02/taeubers-paradox-and-life-expectancy.html">Taeuber's Paradox and the Life Expectancy Brick Wal</a>l by Kas Thomas<br />
<br />
<a href="http://prog21.dadgum.com/167.html">Simplicity is Wonderful, But Not a Requirement</a> by James Hague<br />
<br />
Yeah. I knew a professor who always want to rewrite speech recognition systems such that is easier for research. Ahh...... modern speech recognition systems are complex any way. Not making mistakes is already very hard. Not to say building a good research system which easy to use for <i>everyone</i>. (Remember, everyone has their different research goal.)<br />
<br />
ArthurArthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com2tag:blogger.com,1999:blog-24286077.post-67831882895544244182013-02-25T08:03:00.004-08:002013-02-25T08:03:36.751-08:00On sh*tty job. I read "<a href="http://www.smaggle.com/2013/02/25/hating-shitty-job-worse/">Why Hating Your Shitty Job Only Makes It Worse</a>", there is something positive about the article but I can't completely agree with the authors.<br />
<br />
Part of the dilemma at work in a traditional office space is that inevitably some kind of a*holes and bad system will appear in your life. The question is whether you want to ignore it or not. You should be keenly aware of your work condition and make rational decision of staying an leaving.<br />
<br />
Arthur<br />
<br />
<br />Arthur Chanhttp://www.blogger.com/profile/18162527494132410362noreply@blogger.com0