Where are all legal computational linguistics resources?

January 14, 2009

Computational Linguistics, Ideas, My PhD research

I am frustrated. I know my corpus (resolutions of the United Nations General Assembly) shares a lot in common with biomedical and legal domain. And I can find interesting articles in biomedical domain dealing with similar issues of complex tokenization, long named entity mentions (though mine are much longer), etc. But I see nothing in legal domain.

I have just gone through all of Jurix’ proceedings as well as all of [Artificial Intelligence and Law][3] and all I got is [between 2 and 4 articles worth following-up][4].

There must be somebody actually trying to parse real legal texts and figuring out to deal with complex organisation, people and group names. But all I can see is articles dealing with levels from ontology and up.

There might even be money in it!

One of the crazy business ideas I had was to parse all the web-based terms of use and privacy notices and annotate/crowd-vote them for how bad they are. So, before creating a web-based account, I could check it against database/parser and it would highlight and rate for me passages that I really should pay attention to (e.g. we sell your contact details to every spammer we know ). Since the language of those notices is often ritualistically formulaic, extracting interesting and useful summary would actually be simpler than it looks.

And the business model would center on providing automatic notification option if a notice from subscribed website sneakily changed and became much worse. That way one would pay money for peace of mind that there were no unexpected service rule changes.

[3]: http://www.springerlink.com/content/100239/ “Digital edition of “Artificial Intelligence and Law” journal” [4]: http://www.citeulike.org/user/arafalov/tag/legal “My article set from legal domain”