I am frustrated. I know my corpus (resolutions of the United Nations General Assembly) shares a lot in common with biomedical and legal domain. And I can find interesting articles in biomedical domain dealing with similar issues of complex tokenization, long named entity mentions (though mine are much longer), etc. But I see nothing in legal domain.
There must be somebody actually trying to parse real legal texts and figuring out to deal with complex organisation, people and group names. But all I can see is articles dealing with levels from ontology and up.
There might even be money in it!
And the business model would center on providing automatic notification option if a notice from subscribed website sneakily changed and became much worse. That way one would pay money for peace of mind that there were no unexpected service rule changes.