In my previous post, I talked about some of the issues I saw with the idea of doing away with operations teams and merging their responsibilities into the development team's tasks [as practised at companies like Amazon]. Justin Rudd, who is a developer at Amazon, posts his first-hand perspective of this practice in his blog post entitled Expanding on the Pain where he writes
Since I am a current employee of Amazon in the software development area, I probably shouldn’t be saying this, but…...First a few clarifications - there is no dedicated operations team for Amazon as a whole that is correct. But each team is allowed to staff as they see fit. There are teams within Amazon that have support teams that do handle quite a bit of the day to day load. And their systems tend to be more “smooth” because this is what that team does - keep the system up and running and automate keeping the system up and running so they can sleep at night. There are also teams dedicated to networking, box failures, etc. So don’t think that developers have to figure out networking issues all the time (although there are sometimes where networking doesn’t see a problem but it is affecting a service). Now for those teams that do not have a support team (and I am on one of them), at 3 in the morning you tend to do the quickest thing possible to get the problem rectified. Do you get creative? After being in bed for 3 hours (if you’re lucky) and having a VP yell at you on the phone that this issue is THE most important issue there is or having someone yell at you that they are going to send staff home, how creative do you think you can be? Let me tell you, not that creative. You’re going to solve the problem, make the VP happy (or get the factory staff back to work), and go back to bed with a little post it note to look for root cause of the problem in the AM. Now 1 of 2 things happens. If you have a support team, you let them know about what happened, you explain the symptoms that you saw, how you fixed it, etc. They take your seed of an idea, plant it, nurture it, and grow it. If you don’t have a support team and you are lucky, in the morning there won’t be another THE most important thing to work on and you can look at the problem with some sleep and some creativity. But the reality is - a lot of teams don’t have that luxury. So what happens? You end up cronning your solution which may be to bounce your applications every 6 hours or run a perl script that updates a field at just the right place in the database, etc. We all have every intention of fixing it, but remember that VP that was screaming about how this issue had to be fixed? Well now that it isn’t an issue anymore and it’s off his radar screen, he has new features that he wants pushed into your code. And those new features are much more important than you fixing the issue from the night before because the VP really doesn’t care if you get sleep or not at night.
First a few clarifications - there is no dedicated operations team for Amazon as a whole that is correct. But each team is allowed to staff as they see fit. There are teams within Amazon that have support teams that do handle quite a bit of the day to day load. And their systems tend to be more “smooth” because this is what that team does - keep the system up and running and automate keeping the system up and running so they can sleep at night.
There are also teams dedicated to networking, box failures, etc. So don’t think that developers have to figure out networking issues all the time (although there are sometimes where networking doesn’t see a problem but it is affecting a service).
Now for those teams that do not have a support team (and I am on one of them), at 3 in the morning you tend to do the quickest thing possible to get the problem rectified. Do you get creative? After being in bed for 3 hours (if you’re lucky) and having a VP yell at you on the phone that this issue is THE most important issue there is or having someone yell at you that they are going to send staff home, how creative do you think you can be? Let me tell you, not that creative. You’re going to solve the problem, make the VP happy (or get the factory staff back to work), and go back to bed with a little post it note to look for root cause of the problem in the AM.
Now 1 of 2 things happens. If you have a support team, you let them know about what happened, you explain the symptoms that you saw, how you fixed it, etc. They take your seed of an idea, plant it, nurture it, and grow it.
If you don’t have a support team and you are lucky, in the morning there won’t be another THE most important thing to work on and you can look at the problem with some sleep and some creativity. But the reality is - a lot of teams don’t have that luxury. So what happens? You end up cronning your solution which may be to bounce your applications every 6 hours or run a perl script that updates a field at just the right place in the database, etc.
We all have every intention of fixing it, but remember that VP that was screaming about how this issue had to be fixed? Well now that it isn’t an issue anymore and it’s off his radar screen, he has new features that he wants pushed into your code. And those new features are much more important than you fixing the issue from the night before because the VP really doesn’t care if you get sleep or not at night.