Javadoc custom doclets – fun, frustration and forward motion

Javadoc is default - and often only - documentation for open source Java projects. It is generated automatically and can just be dumped on any public-facing server as a bunch of static files. Or even bundled with the distribution, if size is not an issue.

However, as project grows, several issues with using Javadoc documentation become apparent. The main issue is that Javadoc (yes, even JDK 8 one) uses frames and JavaScript for navigating the packages and classes. Which breaks any sort of direct linking to the content as well as discoverability by search engine. Yes, there are NO FRAMES links, but then the navigation becomes really cumbersome. The second issue is that generated Javadoc is using rather old HTML standards and is really not designed for Search Engine Optimization. Which means that search engines usually end up discovering random entry points following somebody’s old blog post.

This is especially problematic for projects with multiple versions, which is of course all actively developed projects. Usually, the Javadocs are placed at URLs with version included. Which, combined with issues above, usually means that search engines will find some randomly old version of a particular document and link to that. It’s up to users to figure out magic keywords to try finding latest Javadoc version and - most of the time - they just go back to browsing project’s main website to try finding the latest/relevant information.

Finally, for large projects, the Javadocs themselves become rather large and hard to navigate. If they are split into package groups (like Lucene and Solr’s are), some of the cross-group navigation is lost. If they are all together, the number of classes becomes overwhelming.

Now, there supposed to be an easy answer to that. Don’t like default Javadoc - make a custom doclet. Javadoc parses Java source files - you do the rest. The sources for default doclet are published, Javadoc supports doclet-specific options and so on. There even used to be a somewhat-thriving community of custom doclet generators.

Not so fast!

Turns out that through ignorance, negligence or other means, writing custom doclets became quite a hard problem. The website information is outdated, the sources are somewhat hard to find and there are some gotchas hidden in the code.

To be precise, it’s quite easy to write a custom doclet that does not generate HTML. Anything else is fine. You can write pre-processors and exporters/indexers. It’s just modifying standard doclet to - for example - generate frame-less navigation that is next to impossible.

I have however managed to get going in the right direction and below is a number of hints to start with JDK 8’s custom doclets:

  • The doclet overview is part of JDK 8 documentation.
  • The latest source can be downloaded straight from the repository as a zip/bz2/gz.  Make sure to download at the level linked and not just formats/html subdirectory. This is part of the gotcha that - I believe - make this approach not feasible for JDK’s earlier than the open-sourced version 8.
  • The most important class to start from is RootDoc (JavaDoc) as that’s what holds the whole parsed tree (and yes, Javadoc
  • Java 8’s own Javadoc is a good example of the output. Including demonstrating the HTML-polluting profiles

This should be enough if you want to analyze/export/search the source code using Javadoc as a fully-compliant fancy source parser. Read on if you actually want to modify the HTML output.

  • Once you downloaded the source tree above, you need to change all the packages to your own prefix. An example perl command for *ix/Mac would be (notice, no backups are made):

    perl -pi -e ’s/com.sun.tools.doclets/com.outerthoughts.html5doclet/g’ `find . -name “*.java”`

  • The codebase for the doclet and all supporting classes is quite complicated. Especially, since the code makes very liberal use of reflection and Method.invoke(). The best way to understand it is to run it in IDE and put debug breakpoints. To simplify a quick browse and if you are not actually emitting HTML, I have generated Javadoc and Source for the relevant packages and [it’s online][7].

  • If you are planning to make a main doclet, make sure to [comment out the trap method][8], which is the reason why the custom doclets died. Apparently, sometime around JDK 5, Sun was going to make doclets absolutely AWESOME. So, very temporarily, they disabled any custom doclets except their own while they finished that framework. Nothing is as permanent as a temporary solution.  Until, of course, the whole codebase goes open source.

I have created a project repo that starts from the copied files and makes them run under a custom package hierarchy. Notice that the license is GPL 2, as per original files. The repository will continue to evolve to (try to) generate frameless output (as well as some Solr search integration), so it might be something to keep track of.

One particular item of interest could be [Javadoc is default - and often only - documentation for open source Java projects. It is generated automatically and can just be dumped on any public-facing server as a bunch of static files. Or even bundled with the distribution, if size is not an issue.

However, as project grows, several issues with using Javadoc documentation become apparent. The main issue is that Javadoc (yes, even JDK 8 one) uses frames and JavaScript for navigating the packages and classes. Which breaks any sort of direct linking to the content as well as discoverability by search engine. Yes, there are NO FRAMES links, but then the navigation becomes really cumbersome. The second issue is that generated Javadoc is using rather old HTML standards and is really not designed for Search Engine Optimization. Which means that search engines usually end up discovering random entry points following somebody’s old blog post.

This is especially problematic for projects with multiple versions, which is of course all actively developed projects. Usually, the Javadocs are placed at URLs with version included. Which, combined with issues above, usually means that search engines will find some randomly old version of a particular document and link to that. It’s up to users to figure out magic keywords to try finding latest Javadoc version and - most of the time - they just go back to browsing project’s main website to try finding the latest/relevant information.

Finally, for large projects, the Javadocs themselves become rather large and hard to navigate. If they are split into package groups (like Lucene and Solr’s are), some of the cross-group navigation is lost. If they are all together, the number of classes becomes overwhelming.

Now, there supposed to be an easy answer to that. Don’t like default Javadoc - make a custom doclet. Javadoc parses Java source files - you do the rest. The sources for default doclet are published, Javadoc supports doclet-specific options and so on. There even used to be a somewhat-thriving community of custom doclet generators.

Not so fast!

Turns out that through ignorance, negligence or other means, writing custom doclets became quite a hard problem. The website information is outdated, the sources are somewhat hard to find and there are some gotchas hidden in the code.

To be precise, it’s quite easy to write a custom doclet that does not generate HTML. Anything else is fine. You can write pre-processors and exporters/indexers. It’s just modifying standard doclet to - for example - generate frame-less navigation that is next to impossible.

I have however managed to get going in the right direction and below is a number of hints to start with JDK 8’s custom doclets:

  • The doclet overview is part of JDK 8 documentation.
  • The latest source can be downloaded straight from the repository as a zip/bz2/gz.  Make sure to download at the level linked and not just formats/html subdirectory. This is part of the gotcha that - I believe - make this approach not feasible for JDK’s earlier than the open-sourced version 8.
  • The most important class to start from is RootDoc (JavaDoc) as that’s what holds the whole parsed tree (and yes, Javadoc
  • Java 8’s own Javadoc is a good example of the output. Including demonstrating the HTML-polluting profiles

This should be enough if you want to analyze/export/search the source code using Javadoc as a fully-compliant fancy source parser. Read on if you actually want to modify the HTML output.

  • Once you downloaded the source tree above, you need to change all the packages to your own prefix. An example perl command for *ix/Mac would be (notice, no backups are made):

    perl -pi -e ’s/com.sun.tools.doclets/com.outerthoughts.html5doclet/g’ `find . -name “*.java”`

  • The codebase for the doclet and all supporting classes is quite complicated. Especially, since the code makes very liberal use of reflection and Method.invoke(). The best way to understand it is to run it in IDE and put debug breakpoints.

  • If you are planning to make a main doclet, make sure to comment out the trap method (AbstractDoclet near line 59), which is the reason why the custom doclets died. Apparently, sometime around JDK 5, Sun was going to make doclets absolutely AWESOME. So, very temporarily, they disabled any custom doclets except their own while they finished that framework. Nothing is as permanent as a temporary solution.  Until, of course, the whole codebase goes open source.

I have created a project repo that starts from the copied files and makes them run under a custom package hierarchy. Notice that the license is GPL 2, as per original files. The repository will continue to evolve to (try to) generate frameless output (as well as some Solr search integration), so it might be something to keep track of.

One particular item of interest could be the self-start class that shows how to trigger Javadoc from code to be platform and path independent and to allow for intelligent pre-processing and parameter generation.

Also, in the repo, I had to move the resources files to a dedicated directly to ensure they work with Idea+Maven structure. See project commit history for details.

There is still a lot of work to be done. But even getting this far was a long road of dead or out-of-date URLs, unexplainable exceptions and hidden traps. So, I am hoping this post will save others the pain and restarts the ecosystem of custom Javadoc doclets.