I had plenty of learning lessons in my first two months as a full-time software engineer, but one thing that stood out to me is the lesson on priorities.
Back in college, I also had priorities,such as getting good grades, have fun, and do my coding projects. However, I almost never had to make a choice, because there are so few considerations and potential consequences. If I feel like coding today, then I will open my text editor and start coding. When I get bored, I will open Chrome and start watching YouTube or Twitch. If I didn’t study well for one module, no worries, just rush through the lecture notes 2 days before exam.
However, when I entered the real software engineering world, things are different.
Feature vs Reliability
In real world, there are a lot more things at stake and a lot more considerations to make. Different roles and teams in the company have different priorities, and all priorities must align before engineers can start working on a feature properly. Product managers (PMs) are more concerned about the product, and the needs from the users of the product. Their priority is to ship features that are useful to the users, and this is usually the default priority for the engineers if nothing goes wrong.
However, there will be other priorities as well. From the Site Reliability Engineering (SRE) team’s perspective, the top priority is the stability and reliability of the entire system. When the entire system goes down, features are meaningless. Usually this is not an issue for PMs because the engineers are good enough to ship features while taking care of the system. However, there might be instances where the alignment has to be adjusted because of other factors.
Time-Sensitive Business Requirements vs Reliability
One such typical factor is urgent business requirements. In most cases, business requirements drive the engineering department, and business requirements can be time sensitive. For example, when launching into a new market, the product has to be shipped fast to capture the market share. Also, when government regulations change, the engineering team has to respond fast and make changes to ensure compliance. All these time-sensitive requirements means that the PMs and operations team would have conflicts in priorities with the SRE team. With limited number of engineers, you can only choose to either ship it fast, or ship it with careful engineering considerations.
When such conflict arises, it is important for all parties to understand the different perspectives, as well as pros and cons of different priorities. If we ship it fast, what are the implications on the entire system. If we ship it slow, what are the implications on the business side. However, no matter what the decision is, the task is still the same for the engineer, ship the feature. The only difference is in delivery timeline and JIRA ticket duration. As an engineer, then it is your responsibility to work within the constraints to delivery the best that you can do. If the decision was to ship fast, the engineer also has the responsibility to highlight the technical debts to other parties and take care of them afterwards.
Bug Fix – “What is the Priority?”
The other kind of priority is more within the engineering scope: bug fixes. In college, it was simple. When you see a bug, you stop whatever you are doing, fix the bug, deploy, and continue where you left off. However, the real world is different.
During my first week, I was eager to fix any bugs that I see in the product. So when I saw a bug reported for my (internal) product, I immediately stopped what I was doing and worked on a fix. But when I requested to deploy a hot-fix, the question that I got was “Why hot-fix? Is it affect a lot of users? How serious is it? What is the priority?” Only then I realized that it was not urgent and the my fellow colleagues could afford to wait for the next day.
Why Not Fix It Immediately?
You might ask, “why not fix it immediately since you already have the fix?” To that, the answer is that we need to balance engineering resources. Each hot-fix requires additional engineering resource to carry out reviews, perform ad-hoc tasks according to hot-fix deployment pipeline. Moreover, it carries the inherent risk of performing hot-fix on the production system. When you weigh these against the impact of the bug, it becomes clear we cannot deploy a hot-fix for every bug that we see.
That is why we also have priorities for open production issues (OPIs) and bugs. The priority number decides whether to deploy a hot-fix, fix in the next schedule deployment, or add into backlog. All these workflows and rules are in place to make sure that everyone focuses on what is actually the priority, instead of jumping into fixing things are less important.