Large Teams and Object Oriented Programming

Principles are an important thing. However, like any other situation where principles are important, the principles of object oriented programming occasionally require a little extra thought to implement when things get difficult. In particular, as your team grows, a lot of the things that you learn early on in OOP have to be adjusted to the reality of working in a larger team. While this isn’t exactly a compromise of the principles of OOP, it is definitely a learning experience that will deepen your understanding of proper practices, usually after it frustrates the crap out of you.

Like an object model with many different types, the interactions of a large team become more complex in a thoroughly non-linear fashion. Eventually, this complexity will require you to adjust the way you interact with the system, lest the whole thing come grinding to a halt. If you don’t, the resulting stability issues will make it hard to get anything done. For interpersonal interactions, most organizations will divide responsibility among a set of smaller teams to limit the number of challenging interactions. In an OOP codebase, a similar approach is applied, but you’ll find that it doesn’t tend to always work the way you might expect. Managing large teams of people who are working on the same codebase can often result into some interesting problems within the codebase.

OOP can be a useful paradigm when building an application. However, the principles of OOP as taught in a textbook are not necessarily the way things get implemented in messy, real world scenarios. As your team grows, team dynamics and interactions tend to force some changes in how code is written, designed, tested, and managed. While textbooks will teach you general principles in a fairly clear cut manner, the application of those principles in the real world, especially on a larger team, might not be so clear cut.

Episode Breakdown

The basic reasons for team and codebase growth.

It goes without saying that the reason you tend to see large teams working on large codebases is because there is an economic value in doing so. It’s unlikely that a large team will remain employed very long if that’s not the case. Further, given this economic value, larger teams and codebases have to be able to prove their value regularly, or they are at risk of being cut. This means that deliverables must be met and systems must remain stable. This tends to place schedule and feature set pressure on the team. Complex systems tend to evolve from simple systems over time. This additional complexity is often required to deal with hidden complexity in the real world systems that the code is supposed to address. This added complexity tends to require more and more people to deal with it over time.

YAGNI is often provably false.

YAGNI means “You ain’t gonna need it”. In general, it’s a good rule and suggests that you shouldn’t gold plate your code for future scenarios that you can envision, but which have not yet come into scope. This meshes well with the agile notion that you only build the things you need and that you don’t “borrow trouble from the future”. In larger teams, this can often break down, however, as if something is needed in the near term, it can be harder to tell how soon it will be needed. Worse, if it is needed, large swaths of the team could be blocked from getting their work done if the feature isn’t implemented.

To fix this, as teams get larger, you are probably going to have to do more planning up front, with an eye towards identifying potential blocking features that are on your project’s critical path, well before they act as blocking issues for getting work done. For instance, let’s say that you are adding an enterprise message bus to your application. In a small team, you might simply add it and migrate existing code to take advantage of it, merging it into the main branch at a reasonably safe time. In a larger team, there might well never be a time where merging this work in doesn’t create disruption to at least some of your teammates. Management will have to plan the rollout of such critical features (and projects that need them) in such a way that they clear up the calendar for releasing them.

File conflict (real or imagined) will drive lots of development practices.

If you only have a couple of people working on a team, merge conflicts when checking into source control should be fairly rare, especially if you are keeping your code clean and following other OOP principles. However, when you have 30 people working on the same codebase, often with very different deadlines, it’s inevitable that merge conflicts will occur. If these conflicts occur deep within the system, this can result in a lot of work in testing for regressions. You may find that you occasionally are forced to avoid this by first creating separate implementations and combining them in another story later.

Deprecation of old code is much trickier.

Getting rid of obsolete code that is deep within your system is a lot easier with a small team. You can often remove code without consulting too many people and you can usually time it’s release so as to avoid causing major problems for other team members. In a large team, however, there are likely to be a number of long-lived code branches out in source control that use the code you want to get rid of. Further compounding the problem, in large systems, you’ll typically find that you don’t know all the ways in which any particular piece of code is being used. Larger systems are more likely to use reusable component libraries that are shared across projects and are also more likely to use metaprogramming techniques (such as reflection) that make it more difficult to determine where components are used.

You might also find that it is very difficult to regression test (in a timely fashion, at least) to determine whether a deep change has broken large pieces of the system. You will likely need to adopt a staged approach to removing large chunks of code, allowing other parts of the team to remove or adjust references until something can be safely removed. If your system is complex enough, your deprecation strategy should also include the use of feature flags, so that you can quickly switch between the older and newer implementations, in case something gets rolled to production with serious regressions that aren’t caught by testing. Remember that QA is part of your team too, and subject to exactly the same headaches as everyone else.

Code duplication may save your hind end.

While the usual rule that many people follow is to allow code to be duplicated into up to three places before refactoring out similar functionality, you may want to take that advice with a grain of salt if you have a larger team. For the same reasons that you can’t easily remove deprecated code quickly, you probably also can’t drastically refactor duplicate code. Furthermore, you may not want to, at least not without doing so in a staged manner. Remember that if you do refactor to remove duplication, that you are increasing the surface area that needs to be addressed by testing. You may not want to do it all at once, simply to make it easier to troubleshoot if any production issues come up.

Inheritance is replaced with genetic disease.

Object inheritance is one way that duplication is often removed in systems that are built using OOP. Unless carefully managed, however, this approach to removing duplication can quickly become brittle. If your team has been around long enough, is large enough, and manages a large enough codebase, it’s almost certain that the ugliest area in your codebase is hard to change due to an inheritance hierarchy having been built around it, often with clever “hacks” to support more use cases. With few exceptions, most concerns that can be addressed by inheritance are better suited to being addressed using composition, code generation, aspect oriented programming, or nearly any other approach. Bear in mind that advice given earlier about refactoring obsolete code and about duplication. Both may be necessary to get control of deeply nested inheritance hierarchies without disrupting the rest of the team or causing excessive testing.

Knowledge gaps and silos will exist in the code no matter what you do.

If a codebase and team are large enough, there will be knowledge silos on your teams, despite your best efforts to avoid them. These silos will likely manifest in wildly different approaches being used to similar code in disparate parts of the codebase. Further, these knowledge silos can result in some “interesting” uses of existing code, to say the least. You may well find that something you built for a singe purpose a year ago gets used in a situation you never expected, in a way you never intended, and in a way that only fails in a subtle manner.

When you find these problems, you often can’t fix them immediately. Rather, you’ll need to follow the same deprecation approaches we outline previously, as well as spending some time trying to determine if the new use case is one that you want to support. Further, while removing knowledge silos is something you might try to do on a smaller team, on a larger team, you simply have to accept that they will be there, and act accordingly. This can require that you spend a lot more time making sure that documentation is in order, as well as limiting the types of situations that your components are designed to handle, in order to make it less likely that they will be misused. You’ll note how this doesn’t perfectly match the way that most OOP-based systems envision reusable components, but the fact is that you are safer in large systems having more components that each cover fewer use cases.

Either the lava flow pattern will be present or you’ll miss deadlines.

Refactoring on large teams is tricky (as mentioned in every point in this outline). As your teams’ practices evolve, you’ll often find that previous approaches are insufficient. This happens on any team that has even a tiny amount of introspection, and shouldn’t surprise you when it happens in yours. As you learn more about the best way to approach building systems, you’ll often find that newer code implements newer paradigms, while slightly older code implements slightly older paradigms, and really old code implements really old paradigms.

This antipattern is known as the “lava flow” pattern and tends to develop in larger codebases over time. Further, while you’d probably like to get rid of it, you can’t do it very quickly. You will have to find peace with keeping things around, even though it isn’t optimal, simply because deadlines have to be hit. As a result, you may find that you have to slow refactor and “choke out” the older code, piece by piece. It’s probable that the oldest pattern in the lava flow will be the pattern that will hang around the longest.

App scaling problems will start to impact testability and require duplication.

If your app is large, with a large team, it’s likely to experience scaling issues. These scaling issues will likely be different in different parts of your application, and you may find that adjusting shared code for one use case ruins its scalability for the other use case. While duplication is probably not ideal, you may find that in the short term, the best choice is actually duplicate code and then adjust the pieces separately. Code reuse is great, but you may find that reuse stops being a good idea as the consuming code evolves. This tends to manifest as large numbers of regression due to relatively minor changes to the reused component. Only reuse code as long as the reuse makes sense. Once it stops, refactor to duplication and move on. As your team gets larger (along with the codebase and use cases), the odds increase that consumers of a particular component will evolve such that the component no longer is really appropriate for the current use case. Accept that and deal with it accordingly.

Standards compliance will need to be tooling-enforced, instead of via code review.

While code reviews are great for reducing knowledge silos and enforcing code standards, as teams get larger, their effectiveness for enforcement purposes breaks down. Instead, code reviews will evolve to become a way to educate the team and remove silos, with standards compliance being increasingly enforced by tooling. Because standards compliance will begin to be enforced by tooling, this will start to change the way design happens, mainly to avoid the overhead imposed by the tooling. Depending on how standards are enforced, and where they are enforced, certain approaches to refactoring will be more or less appealing to your team, based on the overhead enforced by the tooling. You need to be aware of how this will impact your team’s choices.

Tricks of the Trade

Sometimes you just have to get something working and worry about refining it later. The tricky part is to actually make the time to come back and refine. It can be easy to forget or get busy on another project or area in the code and never return to clean up or refine once you get a thing working. Find a way that works for you to keep track of the things that you need to refine. {I have my notebook and TODOs} It will be different for everyone but make sure you track that, you may not be able to set aside specific time for refinement but you can build it into your other estimates as you keep working in a codebase. Few things are as frustrating as going back to code you wrote a few months ago and seeing a mess of things you need to clean up.

Editor’s Notes:

Windows update messed up the mic settings on Beej’s computer and we didn’t know it until after recording.

Tagged with: , , , , , , ,