I am frustrated. I know my corpus (resolutions of the United Nations General Assembly) shares a lot in common with biomedical and legal domain. And I can find interesting articles in biomedical domain dealing with similar issues of complex tokenization, long named entity mentions (though mine are much longer), etc. But I see nothing in legal domain. I have just gone through all of Jurix’ proceedings as well as all of Artificial Intelligence and Law and all I got is between 2 and 4 articles worth following-up.
I have written about converting Microsoft Word files into text or html using OpenOffice before. However, the wizards I described in that article were crashing when the number of files crossed into several hundreds. I have written some macros to do the conversion, but they were scary looking and fragile. Fortunately, I now found a tool that does the same job better and with more flexibility. DocConverter by Danny Brewer and Dan Horwood allows to convert a whole directory of files at a time from any to any OpenOffice-understood format.
They say at BarCamp that if you don’t like the session you are in, feel free to go to a better one. No hard feelings. But what do you do, if you show up for the announced moderated discussion session yet the moderator does not. That’s what happened to us with the last (5:15pm) slot of the second day of BarCampNYC3. So, after waiting for 10 minutes past the start time, I decided to step in and moderate.
Arthur C. Clarke once famously wrote “Any sufficiently advanced technology is indistinguishable from magic”. In the same vein, many people feel that any sufficiently established bureaucracy is like a black magic, sorcery even. Certainly, it often takes skills out of this world to follow the logic of modern tax return instructions. Bureaucracy often has its place and reason. Laws protect exploitable minorities; procedures serve to avoid known problems; cross-referencing forms are filled in triplicate to allow for audit and protection against falsification.