Michael Baum reports on the survey of system administrators regarding their troubleshooting activities. It is an interesting summary, but something is missing.
There seem to be a lot of questions regarding how the problems are handled now with the predictable answers of base power tools like grep, perl and Ethereal. What I don’t see is any questions on how to fix the problem going forward.
By now we pretty much established that until the developers themselves try to support/troubleshoot their own products in production (or get loud enough feedback), they will not understand how to make their products easier to manage post-deployment.
The surveys of the how do you deal with it now kind should always include questions on why commercial solutions are not suitable (usually due to installation/license difficulties) and also what the companies creating the products could do to make things easier in a long run.
I know some companies slowly do it on their own (e.g. dTrace from Sun), but I think, if backed by organisations such as LOPSA or NaSPA, the progress might have been faster. After all, by now we have pretty much established that the problems will not go away by themselves, but - if anything - will get worse.
And if the System Administrators want to join forces with other technical people running into the same problems, they should pay more attention to technical support people as well as to forensic analysts. Both of these groups also have to deal with finding a needle of important information in a mountain of obscure, disjointed, overwhelming data. The progress in the area of forensic analysis tools is especially fast these days, as it is driven by the very high profile security concerns (most of the February issue of Communications of the ACM is about this very topic).
BlogicBlogger Over and Out