There are times when I absolutely love being the admin for our group’s high performance computer. But there are also times when I would rather clean toilets all day. This post will hopefully explain a few of the things I hate about being an Admin.
- Debug Support: I’m not your personal debugger. I know nothing of the code that you are writing, and therefore I shouldn’t be expected to help you debug your model. With that being said, I will usually help where I can. But don’t just send me an email saying “My model won’t compile. I need help”. If you really want help, send me detailed information of the problem you are having. Better yet, send me a copy of your source code. If you give me little to no information, then expect little to no support.
- Root Access: No I will not give you root access to our system, especially when you won’t give me a reason why you need it. I don’t care how much experience you have with computers or clusters. If you really want root access, you have to request it from my advisor, who paid for the system. Regardless, you should have no need for root access. If you need some software/library installed, I will install it system-wide.
- Backup?: If I send you an email that states that the high-capacity storage server has no backup and you need to take care of backing up your own data (and I describe your backup options in the email), don’t email me a month later asking if there is a backup for your files. Sure, the server is set up as a RAID-6 array and that provides “backup” if only 1 or 2 drives die. But a complete crash of the controller or 3+ drives means that everything is gone.
- I’m no Magician: Expanding on #3, if I tell you that the files are not backed up and the entire server then crashes, I can’t recover your files. When the drives die and the array has to be reconfigured, all data is lost. There is nothing that I can do to recover your files. You should have backed up when I suggested it months prior.
- Special Privileges: Unless you specifically request, and are approved extra resources on the server (from the owner), I will not give you extra resources. If, for some reason, you need to run more jobs (or on more nodes) than is allowed at one time, you need to request these extra resources through the proper channels. I will ultimately be the one to make the changes, but I am not the one who decides if you can have the extra resources. Also, all special privileges will be temporary!
- Extra Space?: We set up quotas for a reason. Nobody, under any circumstances on our server, will be granted a higher quota limit on their /home folder. If you have files that take up too much space, place them in our high-capacity storage server. If you temporarily need more space on the high-capacity storage server, see #5.
- Google!: When you run into a problem, don’t immediately send me an email requesting support. Do your best to solve the problem on your own. Use resources such as Google, StackOverflow, forums, etc. to attempt to solve the problem yourself. I shouldn’t get an email from a user saying “My job crashed for some unknown reason”. Then I look into it, and the problem is that you ran out of disk space. Check logs, use the internet, do your best to solve the problem yourself; you might learn something along the way. Then if you still can’t solve it, I’ll be happy to help.
- Information is Key: If you send me an email requesting support, you have to give me more information than “It crashed”. If the problem is a job in the queue crashed, then at the very least tell me the job number. Ideally, when requesting support, you should give me as much information as you possibly can. I would rather get 100 support requests a week that give me too much information, than 10 requests a week that give me no information.
With all of that said, I will do my best to help you in any way possible when you experience a problem that you cannot solve on your own.
Any other admins out there that have some frustrating stories to share? Leave them in the comments.