Developer *Fail* Moments

by - 2012/08/27 24 Comments Development, Random

Each OpenMovie project we manage to get ourselves into some fairly awkward technical difficulties – and even with the best intentions things backfire and break in strange and hard to foresee ways.

Being an open project at least we don’t have to pretend to be `professional` and can share some of the low-lights of the project as we did for Big Buck Bunny and Sintel :)

 

In no particular order…

  • Once NFS became mad on the server Sergey just rebooted the server. ll file systems were forced to be checked at boot time. Checking both 5TB and 3TB RAID took several hours. No internet and no production SVN meanwhile.

 

  • Sergey made some tweaks to file system layouts on all systems and made a typo in file system mounting rules. Next day nobody was able even to boot their computers.

  • Campbell decides to give someone else the fun of fixing broken (but awesome) 8 core XEON, we get it back from the shop `fixed` with only 2 memory slots limiting the ram making it fail at rendering the majority of our scenes.

 

  • For some reason Samba server (server to allow mac and window users access our shared folder) was misconfigured and files with wrong permissions were creating. Trying to fix this Sergey used a mask of “.*” to change permissions for all files including hidden one. Who knew that this mask would also match “..” folder which is parent folder. Ended up with all files granted to full access to anybody on the server.

 

  • In the middle of the project it was discovered a typo in automated Ubuntu installation script — it created 25GB partition for data and couple of hundreds of gigabytes for swap.

  • Campbell wanted to download all versions of a file and whipped up a clever script to grab the history of all versions at once,
    …turns out that making 120 connections to blenders svn server is enough to blacklist the blender institute from connecting to our source code repository, now artists are blocked from blender updates and devs can’t apply changes to blender. Resolved the next day by changing our routers MAC address to get a new IP.

 

  • Sergey powers up all remote DELL renderfarm nodes and manages to overload their PDU (Power Distribution Unit) 4 days before the premier, our main renderfarm is totally broken… what now?…

 

  • Campbell & Koen want to copy some footage on a USB disk, but the server is configured to cache gigabytes, while Koen is illegally parked and wants to rush the files out to DELL but the disk won’t unmount, killing all processes that use the disk to force and unmount manages to kill _every_ process on the server including our render farm and studio internet connection, also in the panic some shelves got knocked over in our server room by accident with glassware breaking… We ended up copying most of the remaining files online.

 

  • Kjartan asks for a render hack to simplify some specific scene, the workaround only gets applied to one of our farms causing 5+ hour render times.

 

  • Francesco adds computers to the renderfarm without copying over the servers SSH key causing the renderfarm to lock up trying to login to a the system.

 

  • Our fallback system for storing footage during the shoot, mysteriously fails to boot the morning of the shoot,
    making things worse real life sysadmins on set are playing as `extras` in the first scene and have to get makeup and costumes done. – We manage without the backup system.

 

Things are eerily relaxed here at the studio a day before the premier, with luck we manage without too many problems this time.

– Campbell