Oops: there goes the blog in 2012

I knew I was neglecting my blog in 2012, but I did not realize just how much until I received WordPress’ year in review for 2012 (Feel free to take a peek at it). The line that stopped me dead was “In 2012, there was 1 new post”. Sure enough - one post it was.

Well, this blog might be comatose, but I am not dead. In fact, quite the opposite, so busy that there is very little time for crafting articles.

Conjunctions in named entities

A recent article on lingpipe discussed conjuncted named entities such as Johnson and Johnson and Wallace and Gromit. They suggest that maybe a way of treating this is as a frozen expression. I assume that means relying on statistical measures to see this Multi-Word-Expression repeating enough times to be treated as a unit. In the United Nations corpus, things can get even more interesting. Let’s look at a relatively easy example: draft resolution A/56/L.

Visualizing CiteULike collections

I am collecting my reading and reference material in CiteULike. I like the service because it can capture details from multiple sources. It also allows to discover what was collected by other interesting people through tags, people and bookmarks graph navigation. Nice as CiteULike is, it is fairly difficult to get an overall picture of one’s own collection. It is especially difficult to see quickly if there are people who serve as hubs by collaborating with multiple different groups.

New mailing list to discuss junction of NLP and Software Engineering

Dr. René Witte has just created a new mailing list (SENLP) to discuss applying NLP techniques to Software Engineering and also to discuss general Software Engineering issues in developing NLP systems. I am interested in both topics. I did 3 years as senior technical support at BEA and could see how applying NLP techniques on written notes in support cases could have improved quality of technical support. I did not get to do any of that, but some interest remains.

Where are all legal computational linguistics resources?

I am frustrated. I know my corpus (resolutions of the United Nations General Assembly) shares a lot in common with biomedical and legal domain. And I can find interesting articles in biomedical domain dealing with similar issues of complex tokenization, long named entity mentions (though mine are much longer), etc. But I see nothing in legal domain. I have just gone through all of Jurix’ proceedings as well as all of Artificial Intelligence and Law and all I got is between 2 and 4 articles worth following-up.