Wednesday, March 27, 2013

GJB Wednesday Speech-related Links/Commentaries (DragonTV, Siri vs Xiao i Robot, Coding with Voice)




ZhiZhen company (智臻網絡科技) from Shanghai is suing Apple for infringing their patents.  (The original Shanghai Daily article) From the news, back in 2006, ZhiZhen has already developed the engine for Xiao i Robot (小i機械人).  A video 8 months ago (as below). 

Technically, it is quite possible that a Siri-like system can be built at 2006.  (Take a Look at Olympus/Ravenclaw.)  Of course, the Siri-like interface you see here is certainly built in the advent of smartphone (, which by my definition, after iPhone is released).   So overall speaking, it's a bit hard to say who is right.  

Of course, when interpreting news from China, it's tempting to use slightly different logic. In the TC article, OP (Etherington) suggested that the whole lawsuit could be state-orchestrated. It could be related to recent Beijing's attack of Apple. 

I don't really buy the OP's argument, Apples is constantly sued in China (or over the world).  It is hard to link the two events together.  




This is definitely not the Siri for TV.

Oh well, Siri is not just speech recognition, there is also the smart interpretation in the sentence level: scheduling, making appointments, do the right search.   Those by themselves are challenges.    In fact, I believe Nuance only provides the ASR engine for Apple. (Can't find the link, I read it from Matthew Siegler.)

In the scenario of TV,  what annoys users most are probably switching channels and  searching programs.  If I built a TV, I would also eliminate the any set-top boxes. (So cable companies will hate me a lot). 

With the technology profile of all big companies, Apple seems to own all technologies need.  It also takes quite a lot of design (with taste) to realize such a device. 

Using Python to code by Voice



Here is an interesting look of how ASR can be used in coding.   Some notes/highlights:
  • The speaker, Travis Rudd, had RSI 2 years ago.  After a climbing accident, He decided to code using voice instead.  Now his RSI is recovered, he claims he is still using it for 40-60%. 
  • 2000 voice commands, which are not necessarily English words.   The author used Dragonfly to control emacs in windows.
  • How does variables work?  Turns out most variables are actually English phrases. There are specific commands to get these phrases delimited by different characters. 
  • The speaker said "it's not very hard" for others to repeat.  I believe there will be some amount of customizations.  It takes him around 3 months.  That's pretty much how much time a solution engineer needs to take to tune an ASR system. 
  • The best language to program in voice : Lisp. 
One more thing.   Rudd also believe it will be very tough to do the same thing with CMUSphinx.  

Ah...... models, models, models. 



Earlier on Grand Janitor's Blog

Some quick notes on what a "Good training system" should look like: (link).
GJB reaches the 100th post! (link)

Arthur







2 comments:

Pranav Jawale said...

There was an EU project on interactive TV. See http://dicit.fbk.eu and
http://www.lms.lnt.de/research/projects/interactive_tv_frontend/

I wonder what happened to the technology once the project ended.

Arthur Chan said...

Hey Pranav,

Thanks for your comment. Let me check it out your link and see if I dig up something.

Arthur