Computational Linguistics

Just a link to an interesting article by Sunayana on Natural Language Processing as applied to problems in India. She has an interesting point that because NLP is so underdeveloped in India, even undergraduate-level projects may be contributing to the cutting edge of research. This is similar to what was mentioned in the podcast about Somali speech synthesis for clinicans that I pointed out a while ago. (Update for July 18: Sparked by my link, there is now a link-rich counter argument at ResNotebook__)

Dan Farber has written a good article on Powerset. It mostly talks about their grandiose marketing plans and how NLP (Natural Language Processing) will change the world, however it also has a reasonable explanation of what they are doing with fairly transparent references to (expanded) WordNet, named entity recognition, event extraction and semantic web technologies. It is also interesting that the article tries to give impression that Google is not using any of these techniques, while the quotes are hinting at more similarities than differences.

As a starting NLP/CL researcher, I find it really hard to wade through the fragmented community’s research efforts, software and evaluation methods. I am sure, eventually I will settle down into my specific area and will know most of the important works, however I want to have a better view of the general field now. I prefer not to spend forever thinking one way to just find out that, with better preliminary understanding, I could have achieved better results using other methods.

I am trying to use Stanford NLP parser for my research and I need to look at the trees it produces for large, complex sentences. I have found several packages for laying out the output as trees, but they are all seem to be targeted at visualizing smaller sentences, suitable for illustrating a point in the published paper. Sample output of Graphviz layout for Stanford Parser’s output My trees are large.

I have written about Spock - the supposedly computational linguistics heavy search engine - before. I have to admit that I could not see what people were getting so excited about. Recently, I have received an invitation to the Spock’s beta. And I am still not very excited. Similarly to constantly shifting FreshNotes/Knover, this company started from being all about using information extraction to build content and relations and then quickly moved to mostly human-powered content.

Link: NLP – The Indian perspective

More details emerge on Powerset’s engine

“State of the art” NLP Wiki

Laying out penn treebank output of Stanford parser

I have 3 Spock beta invitations to give away