Computational Linguistics

Couple of interesting things happened recently in the Computational Linguistics related fields that I thought were worth linking to: ACM Queue had an interview with Mike Cohen of Google (previously of Nuance Communications) discussing recent advances and changes in speech recognition technology. Pluggd, with its hotly discussed demo of HearHere, uses speech recognition and some sort of topic clustering to show a time heatmap of your search keyword inside the podcast.

With paper books, you are pretty much stuck. On the other hand, e-books - with the right combination of software and open formats - may soon prove to be just the solution to keep you reading and learning in the new language. And, with the language learning market attracting billions of dollars, you can be sure somebody will find a way to make the best of the possibilities offered by e-books.

FreshNotes (currently in alpha) uses basic named entity extraction and maybe information extraction to produce a website that allows to search and navigate relationships between people and/or topics. The interface, but of course it is all pre-baked at the moment. From the CL point of view, I can see that there is very little smarts in the system yet. Only people’s full names are detected, I don’t see any implications of coreference resolution and the relationship is determined by the names proximity and possibly frequency of co-occurrence.

I have written about UIMA, IBM’s Natural Language Processing framework before. Since then, I had a couple of attempts to get a feel for it. Unfortunately, it kept feeling uncomfortable and confusing. Finally, I figured out why. UIMA’s extensive documentation expects that you are committed to the framework. So, the documentation makes sure you understand full architecture before it lets you near the tutorial. The tutorial itself starts somewhere around section 4.

Even the basic techniques from the computational linguistics field can make for interesting and intriguing applications. Gutenkarte takes public domain books, extract geographic names present in the text and plots them on the map. The result is an automatic clustering of place references, both visually and (within single click) textually. The site itself is self-explanatory, but there is a good write up on the larger context of the idea in the if:book blog entry.

Computational Linguistics – News update for Oct 9, 2006

How e-books could revolutionize language-learning

FreshNotes: Web 2.0 company using computational linguistics

UIMA’s expectations of the user

Creative use of the Named Entity Recognition techniques