Oops: there goes the blog in 2012

I knew I was neglecting my blog in 2012, but I did not realize just how much until I received WordPress’ year in review for 2012 (Feel free to take a peek at it). The line that stopped me dead was “In 2012, there was 1 new post”. Sure enough - one post it was.

Well, this blog might be comatose, but I am not dead. In fact, quite the opposite, so busy that there is very little time for crafting articles.

My guest post about uncorpora project at TAUS blog

I was asked to guest blog for TAUS about my research/work project UNCORPORA. The article has now gone live. It might be interesting for people interested in UN languages, natural language processing or (by following links) XML geeks.

Making up with ANTLR

I like ANTLR! It is a specialized tool that can really be applied to many difficult tasks when regular expressions get all Dust Puppy like. And I have used it in the past with great success. But, every time I put this particular tool aside, I know that picking it back up will be like making up after a bad break up. Things feel familiar, but you are still so uncomfortable you cannot get anything working.

Conjunctions in named entities

A recent article on lingpipe discussed conjuncted named entities such as Johnson and Johnson and Wallace and Gromit. They suggest that maybe a way of treating this is as a frozen expression. I assume that means relying on statistical measures to see this Multi-Word-Expression repeating enough times to be treated as a unit. In the United Nations corpus, things can get even more interesting. Let’s look at a relatively easy example: draft resolution A/56/L.

Visualizing CiteULike collections

I am collecting my reading and reference material in CiteULike. I like the service because it can capture details from multiple sources. It also allows to discover what was collected by other interesting people through tags, people and bookmarks graph navigation. Nice as CiteULike is, it is fairly difficult to get an overall picture of one’s own collection. It is especially difficult to see quickly if there are people who serve as hubs by collaborating with multiple different groups.