UIMA is a new-ish framework on the block competing/cooperating with GATE framework to do NLP processing, annotation and search. Jon Udell recorded a screencast with a couple of IBM-ers to show off and explain UIMA.
While the screencast moves a little slow for a person familiar with sentence tokenizing principles, it is still interesting to see how it hangs together.
The only problem I see with UIMA is the confusion in licensing.
If I ever do get to write a PhD, I will have to make sure to run it through this detector (as well covered in the New Scientist). Seriously though, this sounds like a great way to show off the computational linguistics (or more specifically data/text mining) experiments. Hopefully such projects will make the field more visible and more interesting to others.
On the project itself, I wonder how it would deal with papers produced by people for whom English is not their first language.
I have a lay interest in the field of computational linguistics. So I want to read the current thoughts of the people in the people. But where are they all? You would think they would blog, after all they are good with computers and they are good with languages.
But no! I can find teachers blog, I can find doctors blog, I can find CEOs blog. But I cannot find any computational linguists blog.