Thursday, December 9, 2010

'Taking it too far' v 'Taking too much time'

Coderetreat is about spending the day writing perfect code. During our day-to-day programming tasks, whether they are for pay or our side-projects, we always have the end goal of getting something done. This inevitably causes us to cut corners, writing less-than-perfect code. Why do we do this? For one, it is because writing the code that we 'know we should write' will take too long. We have to weigh the value of merciless refactoring and ultimate clean code with the reality of the benefit we get from delivering something. At coderetreat, though, the pressure to finish goes away, deleting the code makes it so that you can't finish the problem. This allows us the freedom to act as though we have an infinite amount of time.

As I've traveled around and had the opportunity to work with people of widely varying experience and skill level, I've noticed that the big difference is which corners get cut. For people with less experience, say 1-5 years, the 'get it done' code is significantly different, less clean, than the code written by those with a lot more experience, say 15-20 years, yet written in the same amount of time. This makes sense and isn't a knock against beginners, it is just a good indication that there is always more to learn. The gap between what we actually write and what we think we would write given enough time is always there, just at a different level. As we gain more experience, that gap simply moves upward, so the code is cleaner given the same constraints.

I've facilitated a lot of coderetreats. I've watched people of widely varying skill levels work on the same problem. Some will write very readable code, others will write less readable code. My role as a facilitator is to ask the questions about the code, questioning the refactoring decisions, and, more often, helping people understand why they stopped refactoring.

Suppose you have this method, a very common one used to figure out whether a living cell will die in Conway's Game of Life:

Commonly a team arrives at this and moves to the next test. I'll ask if they aren't skimping a bit on the refactoring and get a response similar to "well, we don't want to go too far" or "this is very readable." I'll generally add a little code:

That is clearly more readable; most teams that see this small change agree. My question is "why did you stop?" The readability also exposes a few other problems, mostly around the method names, but that is a topic for a different blog post. So, the question remains "why is the first form 'good enough'?" After all, the extracted version is more readable, captures the domain better (underpopulated vs a cryptic number comparison) and is not much more work to extract, especially with basic refactoring support.

[Just a note here that this is not the absolute end state of the system, but an intermediate step along the way of evolving the design. If teams get that far, this usually ends up being extracted into another object that resolves some of the design issues inherent in it.]

One hypothesis is that we often mix up the idea of "going to far" with "taking too much time." After all, I would be surprised if someone would say that the extracted version truly is taking it too far. However, I do believe that some people would argue that it might take them too much time to do this all the time. I've worked with people who didn't have a good understanding of their toolsets, so this sort of extraction would take more time: manually extracting it doesn't take a lot of time, but it is definitely non-zero. So, creating code like this is time-consuming on a larger scale, so we don't do it. This, though, is really about "taking too much time."

Another hypothesis is that we aren't comfortable with more abstractions. For various reasons, most of us don't have a great familiarity and comfort with abstractions. One difference I've noticed between people with varying experience levels is a view on abstractions and how they handle them. I like to say that programming is based a lot on just being adept at abstraction wrangling, and my experiences with people has shown me that this has more than a hint of truth to it. As we encounter the creation and manipulation of abstractions, we get a better sense of where they are hiding in the system and how to be effective in drawing them out into the open. A common argument against heavy abstraction is that you then have to jump around the system to see what is 'really' going on. For people who are truly comfortable with it, though, you either don't need to see what is 'really' going on, relying on the names to truly reveal their intent. And, when you do need to see what a specific part of the system is doing, navigation can be a trivial act, due to a clear signpost guiding you to your destination.

These are just musings of mine. I don't have the answers; these thoughts are based on my experiences pairing with different people and watching people of various levels writing the same problem over and over at coderetreats. I'd love to hear any ideas that others have either in comments or as blog posts. I think this is an important topic that can help us understand ourselves better both as a means of accurate self-assessment and developing more effective training mechanisms.


  1. In this given example I doubt that the extra abstraction would hold many programmers back. But doing the extraction might make you have to come up with (or rather, define) a name for whatever is being done. Finding a label for the operation might be far more time consuming than just sticking with ones mental image of a ”not between these two”-syntax. (Whatever that looks like.)

    If previous discussions have not used the name or label in conversation it might take you a while to get it right. I can quickly think of two reasons why a wording is not just ready to be entered; previous conversations have not been at this level of detail; or, as I've seen many, many times working in Sweden, the words that have been casually used in the preceding conversations needs to be translated into English before used as a method name.

    So I essentially agree with the hypothesis that we mix up “going to far” and “taking too much time”. And I also think it is a toolset problem, but I don't think it is the editing tool.

  2. Dovetailing with the concept of practice, in such a situation as a Code Retreat, one should take every opportunity to "go too far", seeing as one tends to stick with "good enough" during one's day job. When you work on a side project, attend a Code Retreat, or otherwise write code for something other than your current project, go hog wild! Go too far! Perhaps you'll see that, in your day job, you really haven't been going far enough.

  3. Feels to me that professionals "see" things right away and apply refactoring, while less experienced coders need to think more what they are doing, and applying that kind of refactoring would require thinking and attention. You also learn to do things in a certain way when you code similar things over and over again which lowers the burden of coding to your cognition. It's then for example easier to concentrate on abstraction.

  4. "Why have I stopped?" is a wonderful question to ask yourself when you're refactoring.

    Great post. I'm lucky that in my day job I have a lot of luxury in terms of how far I want to take refactoring. (I say lucky, I'm self-employed and choose a life-style that doesn't require huge earnings).

    These small things are just so satisfying (for me) to draw out.

    Oddly I don't think I lose as much in productivity as I had imagined in the past. On a large project I eventually gain because the code is so much easier to work with when re-visited.

    I think @Robin is spot on- the tricky part is that you have to name what it is that you're doing. It's very similar to the pain that TDD puts you in - forcing clarity where you'd like to just muddle through.

  5. I agree with JB that during practice, coderetreats, etc. that is when you should definitely be pushing your limits, because that's how you'll grow. If you take it (refactoring, or any other practice) to the extreme during practice, then at work you'll have the knowledge you need to determine if or when a tradeoff should be made to write it "good enough" or to produce the cleanest code you're able to.

  6. Good insight, Corey. I especially liked your comments about abstraction wrangling, so much so that my comment grew into a blog post about it.

  7. I think that you are taking it a bit too far here. You've added 2 methods that you will have to test separately and that are only used by the will_die_with? method. Also, the LoC has increased from 3 to 11, increasing the time it will take to read the code and find the method you are looking for.

    In my opinion, if you really want to improve this example, you're better of using properly named variables, like this:

  8. Thanks for that post, Corey.

    Your interest in improvement, specifically improved readability, brings to my mind the tension between two notions of expression. For the moment I'll call these "textual" and "notational". Natural language provides the model for the textual approach. Mathematical formulas are the model for the notational approach.

    COBOL and APL strike me as convenient mnemonics for these two visions. Is that harsh? Emulating COBOL is not what coders yearn for when they fire up Ruby, right? Forget I brought up COBOL, then. What I see you reaching for as you strive for code that is more readable is code that is more like English, less like math.

    I've noticed that the tension between these two qualities exists within each and every language. COBOL, Ruby, etc. have math-like parts, and J (at the notational extreme) has textual parts. In fact, here are three lines of J code that are pretty darn textual:

          will_die =: underpopulated or overpopulated
          underpopulated=: number_of_neighbors is_of_lesser_magnitude_than the_minimum_neighbors_to_live
          overpopulated =: number_of_neighbors is_of_greater_magnitude_than the_maximum_neighbors_to_live

    This code goes further than your refactored Ruby code in pushing aside math notation. In this it's highly unusual for J code, but implementing it would be straightforward.

    That code still has assignment symbols (=:) and all those pesky underscores, so (if we take English as our measure) it isn't quite so readable as it would be without them. We could change this J code so that these lingering notational things disappear. Doing so would not, I'm thinking, leave us with a program that is more readable as a whole. At that point we'd have a whole lot of code devoted to supporting English structural properties. It would turn the task of writing Conway's Life into writing a language that allows programming to occur in a subset of English.

    Is it safe to say nobody ever has time for that? I think it is.

    The main thing that makes J so attractive to me as a language is how it encourages notational expression. Here's an idea: whether something is "more readable" depends largely on what you're yearning for, natural language or formal notation. Because computer programs are such formal objects, formal notation appeals to me a lot.

    Of course I have noticed few share my preference. While not a complete mystery to me, I still wonder why.

  9. I have to disagree. This refactor triples the line count with no practical improvement in comprehension to my eyes. Perhaps worse, it helps lock in a particular survival metric, where there are discrete upper and lower bounds. Given my druthers, I'd have done something like:

    def will_die_with?(number_of_neighbors)

    Paving the way to extract the (2..3) range to allow it to vary, if necessary.

  10. "When you work on a side project, attend a Code Retreat, or otherwise write code for something other than your current project, go hog wild! Go too far! Perhaps you'll see that, in your day job, you really haven't been going far enough."

    Good advice! I also think that exercises such as Object Calisthenics [1], coding without ifs, Extract Till You Drop [2] etc. are useful in trying to go too far - which is probably farther that you think it is.


  11. I think that the level of abstraction is informed by the understanding of the goals and foresight about the scope and immediacy of future change.

    The purposes of refactoring have always been about readability, maintainability and flexibility to admit change. Which are requirements because code needs to be debugged and needs will change. If code is simple enough to be declared bug-free and not likely to change, those requirements are weaker.

    In this case, one has a good understanding of the problem domain (it's a known problem, many implementations exist) and already has a great deal of foresight about expected change (Life is part of a spectrum of cellular automata), which is primarily why I would argue about stopping at the first example.

    Because knowing that the game of Life is also about the effect of alteration of rules away from B3/S23, I would say this does not go far enough in some ways if you are building a cellular automata which will change in the future, since you may want to explore rule changes - and this refactoring puts you on a path of a hardcoded structure of will_die_with being made of two branches (over and under). These would likely be thrown away with most foreseen structural rules changes, and thus are pointless. Because you know this already and they also do not stand alone from a testability perspective, I would assume they would not be public and question what they really enable in terms of readability.

    So if I were to foresee a need for flexibilty in the will_die_with, I would argue that it doesn't go any distance in a useful direction to accomodate cellular automata with different rules, and thus is not profitable in terms of return on effort.

    On the other hand, the first example is a good stopping point, since it does not build anything more which would be thrown away if one were to return to refactor to evolve the ruleset, and is already very understandable.

    It is foresight which tells us that the first is "good enough", not a problem with abstractions.

  12. @Tom-Eric

    Regarding testing the methods separately, there is already test coverage for the original method that covers the cases included. This lead to one of two common cases (sometimes both over time):
    these methods often are extracted as private methods, so they are not tested separately; or,
    they eventually are part of a larger extraction that warrants moving the existing tests over.

    In any case, the functionality contained in them is sufficiently different than the functionality in the coordinating method (will_die?) that they warrant consideration as first-class parts of the system.

    Regarding the increased time to find what you are looking for, one of the points in my post is that the readability, coupled with an expertise in your editor's navigation functionality, should cause that amount of time to be trivial. The discussion around the reasons people say 'it is easier to find if it is all in one place' is part of the point of my blog post.

    Your gist definitely is another way to improve the readability, although it still couples the rule for underpopulation with the rule for overpopulation with the rule of cell evolution. Personally, I find that including 3 different things in the same method is a bit too much coupling. Often times, I'll fall back on SRP and create methods based on your example having 3 axes of change.

  13. @Arttu

    For me, this isn't a 'professionals' vs 'less experience,' as I think that sets up a false dichotomy. 'Professionals' can be at varying levels of experience.

  14. @Donald Ball

    The issue I would have with your refactoring is that it is focused on code golf, rather than expressing the rules of the domain.

    Also, it doesn't work, as the rule is that it dies with less than 2 and greater than 3. Adding a negation operator won't help, as it creates a friction between the positive statement in the name of the method and a negative-oriented algorithm.

  15. @Cade Roux

    Good points and thoughts on the different approaches.

    I agree that it is not enough of a refactoring. I put an updated note to explain that this was just one step in the evolution of the design.

    I have seen this approach be a very good step towards a very flexible design. It tends to be a good guidepost for people to start to notice places where further extractions can be done.

    As I said in my updated note, though, this is just a snapshot from an ongoing design evolution during coderetreat. It isn't intended to be an example of a final state.

  16. I like this post because it considers time as an important factor of "good enough" quality.

    Since "good" = quality and "enough" = stop, we all have to make a decision whether we should continue to add more value at whatever cost is incurred (time, in this case).

    Is my code good enough to teach our intern about methods? Probably. Is it good enough to impress Cory? Maybe not.

    Still, the point of a retreat (it seems to me) is to practice exploring the notions of how we define "good" and "enough" in real time while solving a real problem in front of people we respect.

  17. @jonbox


    I think you highlight well the importance of when, where and who we are working with. An intern would need a different level of rigorousness, due to a lesser experience with abstractions. Giving them vague guidance without concrete rules potentially could just confuse. As people gain experience, the rules become guidelines and end up turning into feelings guided by technique.

    Also, the point of the exercise is important. There is a difference between working with someone to accomplish a specific feature-oriented task and working with them to gain better understanding.

    Thanks for the comment.

  18. "why did you stop?"

    How about "because it was descriptive, succinct and accurate".

    I think your refactoring has obfuscated the solution. 12 months down the road these methods may get moved around in the file causing the developer to jump around to figure out what the business rules are.

    Do the simplest thing. Don't overcomplicate.

  19. Craig,

    Thanks for the opinion.

    "Do the simplest thing. Don't overcomplicate." I would argue that the refactored version is simpler than the first one, as the step opens up for a better understanding of the system as a whole. It also keeps the code at a consistent level of abstraction, so easier to parse. It also clarifies what those cryptic 2 and 3 represent.

    As I put in the note, this is a step in the whole process, but one that leads to a more understandable solution based on what the rules really are.

    "descriptive, succinct and accurate" is a subjective term that is one of the things I'm wondering about. I've often had people say that a 100-line method that contains all the business logic is "descriptive, succinct and accurate," since all the code is there for 'easy' reading. Until, that is, I have another team member walk through it and explain what it is doing.

    I would hold that the refactoring step does not obfuscate, since it makes it clear what the rules are: underpopulated or overpopulated.

  20. Congratulations, you just made the code 3x the size with no benefit at all!

  21. Richard,

    Thanks for the congrats! I appreciate it.

    As for 'no benefit at all' I think that is arguable. I'd love to hear your thoughts regarding the actual code, rather than just a blanket statement, though. The majority of the other commenters added real opinions. :)

    But, I'll take the congratulations. It always makes me feel a bit better.

  22. Corey,

    As I read your post and first encountered your will_die_with? method, I thought it was perfect. I've implemented the Game of Life in the past and so I vaguely recalled what the 2 and 3 represented and thought the method was doing exactly all that it needed to do.

    Once I saw the result of your refactoring, the rules of Life immediately came back to me. The clarity gained with your refactoring is well worth the increase in code size.

    I find one of the constant struggles that I deal with is trying to balance succinctness with explicitness. It's good to write as little code as possible. But it's better to explicitly model your domain concepts in code so that future readers (including myself) don't have to waste time trying to figure out *why* the code was written the way it was.

    My guess is that people like Richard only care about "communicating" with computers and not human beings (including themselves). Any idiot can write code a computer can understand...

    The way that you handle comments from trolls like him shows what a classy person you are. Keep up the great work!

  23. Thanks, Jason.

    I would shy away from things like "any idiot can write...." It masks that different people have different views and understandings and relationship with their code.

    I'm glad you took a couple minutes to think about the refactoring. I find that it sometimes does take a bit of thought to appreciate these refactorings.

    I also find it important to remember that all refactorings are just way stations on the way forward in our design. Looking at them as final destinations leads to a different view on what is 'best'.

  24. Corey,

    Excellent points.

    Isn't the "any idiot can write..." line a quote from somebody much more famous than I? That doesn't make it a nice thing to say, but when I saw the time and effort you put into explaining your thoughts being belittled by somebody who obviously didn't give it the slightest bit of consideration, the last thing I was worried about was being nice.

    I would be really interested in seeing the object that you mentioned that these methods can get extracted into to see how the design can evolve even further. If only you'd host a code retreat in San Diego... (hint, hint). =)

  25. :) Thanks. I appreciate it.

    I prefer the line
    "It is easy to get it working if you write crap" :)

    Also, a coderetreat in san diego isn't completely out of the question! I am looking at the schedule for next year, and I was thinking of trying to get some places that I've not been before. :)

  26. Great post.

    Two things I took away here that maybe were not your primary focus:

    Naming is really important;
    There is a distinction between code the writer understands vs code written so that other people can understand.

    The more concise the code, the greater number of people that can read it with less effort. Obviously, naming is a big part of having that clean code.

  27. Thanks, Dustin,

    Yeah, naming is incredibly important. At coderetreat, we focus a lot on naming as one of the core guidelines for simple design.


Please don't post anonymously; I won't use your name/url for anything insidious.
Comments by Anonymous are not guaranteed to make it through moderation.
Constructive comments are always appreciated.