I have (nearly) finished developing a mini-website in 6 languages (Arabic, Chinese, English, French, Russian, Spanish). The layout was the same, so ideally it would have been driven by a content management system. Not in this case unfortunately, as I was not given enough time to setup the infrastructure.
As I know nearly nothing of at least two of the languages above (Arabic and Chinese), I had to keep rechecking the content provided to ensure the right text ends up in the right place on a page.
I like ANTLR! It is a specialized tool that can really be applied to many difficult tasks when regular expressions get all Dust Puppy like. And I have used it in the past with great success.
But, every time I put this particular tool aside, I know that picking it back up will be like making up after a bad break up. Things feel familiar, but you are still so uncomfortable you cannot get anything working.
A recent article on lingpipe discussed conjuncted named entities such as Johnson and Johnsonand Wallace and Gromit_._ They suggest that maybe a way of treating this is as a frozen expression. I assume that means relying on statistical measures to see this Multi-Word-Expression repeating enough times to be treated as a unit.
In the United Nations corpus, things can get even more interesting. Let’s look at a relatively easy example: draft resolution A/56/L.
Homegrown visualization is not the only way to quickly navigate CiteULike references. There are other tools that display bibliographies in interesting ways.
One of such tools is Exhibit, one of graduates from SIMILE project. It allows to do a very interactive webpage driven by just HTML+Javascript, with no server-side component required. I really like SIMILE’s tools, even though it feels like development slowed somewhat recently.
There is an example of how to import and display bibtext within Exhibit.
I am collecting my reading and reference material in CiteULike. I like the service because it can capture details from multiple sources. It also allows to discover what was collected by other interesting people through tags, people and bookmarks graph navigation.
Nice as CiteULike is, it is fairly difficult to get an overall picture of one’s own collection. It is especially difficult to see quickly if there are people who serve as hubs by collaborating with multiple different groups.