Outer Thoughts (of Alexandre Rafalovitch)

I get much more spam then comments on this blog, so I figured that maybe I can do something about it. I have now installed reCAPTCHA verification plugin. It is intended to serve two purposes: Slow down the spammers by requiring them to enter some text Help digitizing the public domain books, as this is where the verification texts are coming from There should be no problems using the plugin, but if you encounter any do let me know.

Dan Farber has written a good article on Powerset. It mostly talks about their grandiose marketing plans and how NLP (Natural Language Processing) will change the world, however it also has a reasonable explanation of what they are doing with fairly transparent references to (expanded) WordNet, named entity recognition, event extraction and semantic web technologies. It is also interesting that the article tries to give impression that Google is not using any of these techniques, while the quotes are hinting at more similarities than differences.

I don’t swear! I find that if I use up the swear words in day-to-day situations, I will have nothing to use in the critical moments when I actually need to let the steam out. Interestingly, when I do get those moments, I still do not really swear. But I need to know that such release vent exists. So, I was relieved (if a bit surprised) to find that a competition was held on swearing words and expressions in Esperanto with prizes for top three places and that there were enough candidates offered to need the judges.

As a starting NLP/CL researcher, I find it really hard to wade through the fragmented community’s research efforts, software and evaluation methods. I am sure, eventually I will settle down into my specific area and will know most of the important works, however I want to have a better view of the general field now. I prefer not to spend forever thinking one way to just find out that, with better preliminary understanding, I could have achieved better results using other methods.

I am trying to use Stanford NLP parser for my research and I need to look at the trees it produces for large, complex sentences. I have found several packages for laying out the output as trees, but they are all seem to be targeted at visualizing smaller sentences, suitable for illustrating a point in the published paper. Sample output of Graphviz layout for Stanford Parser’s output My trees are large.

I have installed reCAPTCHA

More details emerge on Powerset’s engine

Swearing in esperanto

“State of the art” NLP Wiki

Laying out penn treebank output of Stanford parser