The Engineering Manager

Balancing logistics, operations, strategy, and culture.

Aug 26, 2024

This essay is part of the series Essential Roles within Development Teams.

An engineering manager (EM) is accountable for the success of their team—where success means delivering on a team’s objectives—objectives which the manager must first ascertain, prioritize, and communicate. The responsibilities of an EM are aligned with ensuring reliable, efficient, and sustainable team progress and continuous improvement. The best managers curate an environment wherein engineers can effectively collaborate to do their best work. In their role as coach, they serve as models of good behavior, encourage active self-reflection and constructive feedback, align incentives between individuals and the team, and reinforce accountability through personal relationships and the careful review of operational metrics.

Responsibilities

The engineering manager is a challenging role that combines responsibilities in logistics, operations, strategy, and culture. This is in contrast to the tech lead which is almost exclusively operational in nature.

In this essay I distinguish between logistical and operational responsibilities like one might distinguish between software development and operations: where logistics (development) are oriented towards creating the best possible circumstances for the efficient delivery a some value (or value generating process) whereas operations involve the actual work and tactics required to deliver the value on an ongoing basis.

Software “logistics” (development): design, code, test, profile.
Software operations: deploy, run, monitor, remediate.
Team logistics: plan (long-term), budget, prioritize, communicate (expectation setting).
Team operations: plan (short-term), execute, cross-train, debrief/retrospect

If teams are staffed with tech leads, in the long-term steady-state, the engineering manager can remain primarily focused on their logistical, strategic, and cultural responsibilities and can often take on the manager role for multiple teams.

Logistics

Managers have to consider the immediate and long term needs of their team and ensure all of those needs are sufficiently met. This typically includes budgeting for staff as well as hiring and onboarding procedures. A manager may also have to manage expenses associated with cloud infrastructure or plan and orchestrate on-site gatherings for remote employees.

Managers have to absorb and filter information about the company’s business, organization, key objectives, products, and markets—discerning what is actionable and what isn’t. With this context, engineering managers then develop and prioritize team-level objectives that balance short and long-term technical and business goals, usually in partnership with a product lead i.e. a Director of Product or product manager (PM) staffed on the same team. This roadmap once developed then has to be presented to the team. This process is typically repeated at least once a quarter.

An EM is the principal connection between the team and the rest of the company. Beyond taking in information, they must also periodically represent and justify the value of the team’s work to the outside organization. They are also the first point of contact for the team for instigating cross-team communication and coordination.

Finally, managers must proactively explain the manner in which individuals on the team will be evaluated and develop these evaluations continuously, providing recurring feedback to their reports.

Most of these responsibilities require excellent verbal and written communication skills to be performed well.

Operations

When a manager is heavily involved in the operations of a team, it’s a signal of dysfunction. Within this domain, an EM’s principal duty should be holding the tech lead accountable; although, there may be initial or interim periods during which the EM will be more involved in establishing procedures.

If an EM is to be deeply involved in any day-to-day operational activity, it should be orchestrating the team’s scrum process. This is largely an administrative responsibility and therefore something the EM or PM can readily own1. I intend to publish a dedicated essay on this topic, but in the meantime a robust scrum process looks something like:

Daily standups
- Status reports
Pre-planning (“backlog grooming”)
- Reviewing recently planned work and assigning initial cost estimates
Sprint planning
- Short-term work prioritization (e.g. 2 week - 1 month horizon)
Retrospectives
- Periodic team debriefing opportunity
Quarterly planning
- Med-long term work prioritization (e.g. 3-12+ month horizon)

With the exception of pre-planning, the EM is well-suited to running all of the above scrum rituals2. Being more involved in this capacity also enables the manager to guide work assignments (improving bus factor) and adjust the level of process to the specifics of the work being performed, biasing the team towards action and speed of delivery.

Managers should also facilitate ad-hoc meetings as often as is useful to the team. For example, members of the technical staff should be encouraged to make short, informal presentations within “technical context share” meetings—the EM can facilitate a backlog of topics for this discussion. Additionally, it is the EM’s responsibility to ensure that any serious unplanned event is followed up by a postmortem and blameless incident retrospective.

Finally, in the sphere of operations, a manager should be constantly developing their situational awareness, ultimately synthesizing and periodically reflecting back to the team the “big picture” of where they are and where they are going. This requires the manager to review status reports within and outside the team and periodically review the output of “fitness functions”, whether that be in the form of dashboards, reports, alerts, or raw data. The dimensions of these ought to at least span across product insights (e.g. user retention), application performance (e.g. median response latency), and DORA metrics (e.g. MTTR and deployment frequency)3.

Strategy

Many managers overly focus on operations at the expense of thinking more strategically. Delegation is a challenging skill to develop and deploy and people tend to gravitate towards work in which they have confidence in their abilities. The immediacy of operational work has a kind of reassurance tied to it (“this must be helpful”). Nevertheless, teams quickly get stuck in local maxima when managers fail to take responsibility for strategic, long-term planning.

Strategy takes many forms. It could be developing a hiring roadmap, a technical roadmap, or a career plan for the manager’s reports. The relative value of this work is only knowable from reflective System 2 thinking, which requires a manager to extract themselves out of the day-to-day, reactive, tactical operations of the team4. Strategic, slow thinking develops heightened awareness of unknowns and implicit assumptions. It enables the manager to ask better and often uncomfortable questions in 1-on-1s and teams retrospectives. Embodying the 10th Man during planning and a forensic mindset during retrospectives—even when it risks conflict—is the responsibility of the EM. By sensitively and skillfully bringing attention to difficult topics and conflicting points of view5, even and especially when emotions are present, a manager helps the team build confidence in their own ability to navigate tension and therein collaborate and communicate at their full potential6.

The landscape of strategic frameworks which managers can navigate and borrow from is a rich topic. My favorite author on this subject is the inimitable Will Larson. In his essay Magnitudes of Exploration, he crystalizes a strategy for addressing the relative tradeoffs between standardization, exploring new technologies, and migrations to enable 10X improvements, the core of any engineering strategy:

Mostly standardize, exploration should drive order of magnitude improvement, and limit concurrent explorations.

Later on in the essay he writes (emphasis mine):

I am unaware of any successful technology company that doesn’t rely heavily on a relatively small set of standardized technologies. The combination of high leverage and low risk provided by standardization is fairly unique.

You might expect standardization to be widely adopted in the industry. Even without a deep understanding of the topic, it would seem obvious that this should be the default approach in engineering organizations. My experience over the past 12 years has been that it is fairly uncommon within companies let alone within the industry more broadly7. In practice, when standardization is happening within a company it looks like project templates, shared cored libraries, developer handbooks, service catalogs, department-level ADRs, training programs, and architectural working groups or review boards. Standardization is challenging as it requires top down direction and coordination between teams.

The last thing I’ll say about strategy (also inspired Larson’s blog) relates to the concept of evolutionary architectures as developed by Ford, Parsons, and Kua. Most of the strategies and principles explored in their book have less to do with software design (i.e. shape of code as written) than they have to do with the software design process i.e. how teams make decisions. The desired end result—an evolutionary architecture—is a system which facilitates its own modification.

Some examples from Larson’s notes on the book:

“Build sacrificial architectures.” Assume that you’ll make tradeoffs that won’t last forever, and be okay with occasionally throwing away your implementations.

“Prefer evolvable over predictable.” If you optimize for the known challenges for an architecture, you’ll get stuck because there are at least as many unknown challenges as known challenges. It’s better to be able to respond to problems quickly than to cleanly address what you’re currently aware of.

“Make decisions reversible” by making it easy to undo deploys and such. Prefer immediately shifting traffic off a broken new version to slowly deploying a previous revision. Prefer flipping flags to disable new features over deployment, etc.

EMs don’t need to be deeply tapped into the technical developments of their team in order to promote evolutionary thinking and have a dramatic impact on the architectural decisions being made by their tech leads. Simply asking questions like “do you think we’re prematurely over-investing in this capability?” and “would this design adapt well if we decided in 10 weeks we needed to optimize more for X than Y?” can trigger productive reflection and design improvements in the planning stages of a project.

Culture

Success depends on teams learning and subsequently making improvements based on those learnings. The fastest way individuals and teams learn is by making mistakes, so it’s the responsibility of a manager to cultivate an environment where mistakes are viewed as valuable learning opportunities and where its OK to make calculated bets in the absence of perfect information. Like a scientist captivated by outlier data, it is precisely and only when reality doesn’t match our expectations that we are capable of learning anything. If most experiments end in “failure,” a rational response would be to reduce the cost and turn-around-time of each experiment. Practically speaking, this is difficult to achieve because individuals within our society are often not familiar with a culture of experimentation and hypothesis testing8. To that end, there are many things managers can do to help their teams embrace an experimentalist mindset:

Celebrate incremental progress.
Encourage and model a bias towards action9.
Distinguish between One-way vs Two-way door decisions.
Make experiments “safe to fail” by limiting their blast radius.
Model blameless communication.
Back up the teams’ product lead by leaning into product design processes.

In practice, a culture of experimentation is perfectly aligned with a culture of excellence. As teams improve, they discover new, better approaches to doing their work which sets their aspirations ever higher. Great managers can help their teams find their way into this virtuous cycle.

This is in the absence of a dedicated "scrum lead” which is a good opportunity for growth for junior team members.

While engineering work is complicated and rarely falls into nice, organized boxes, this outline is a useful set of guidelines to structure work that performs best within a culture of flexibility and adaptability.

It’s a good idea to pair metrics of effectiveness with metrics of efficiency (e.g. ticket close rate vs ticket creation rate).

A manager must focus on training their tech lead(s) whenever they find they are unable to hold their leads accountable for the bulk of operations.

This is often done by adopting a mindset of curiosity and blamelessness.

Beyond basic things like the simple passage of time and cultivating an environment of honesty, transparency, fairness, kindness, and consistency, it’s the confrontation and productive diffusion of tension (i.e. “radical candor”) that builds the foundation of deeper trust on any team.

This could be a reflection of my own personal history or the ecosystem broadly. Over the past decade of low interest rates, excessive company growth, and the associated overheated hiring market, tech companies were prone to spoiling their engineers and foregoing efforts at improving operational efficiency (like top down standardization). The pendulum swung very far in the direction of “team autonomy” and “move fast and break things,” which we are still reckoning with in 2024 even as the pendulum begins to right itself post COVID inflation and interest rate hikes. Regardless, there is a parallel and closely related issue on the relative absence and fear of leadership in our cancel-culture prone society that makes any kind of top-down direction difficult to implement.

Our society’s educational system is oriented around a style of teaching and testing which reinforces a culture of expertise/elitism.

Human behavior is mimetic and leaders are mimed more often than they realize.

jc foust

Discussion about this post