Legacy Coderetreat: Part 3 – Golden Master

Golden Master on legacy code

Blog post series

This blog post is part of a series about legacy coderetreat and legacy code techniques you can apply during your work. Please click to see more sessions about legacy code.


Whenever you start dealing with an existing software system you need to have a basic safety net. This basic safety net will make sure that whenever you change big things in the code, the changes will not affect the existing functionality. You can read more about this concept in the generic session Part 2 – From Nothing to System Tests.

As mentioned in the above entry, we want to start with some basic safety feature that will let us test the code in a generic way, focusing only on inputs and on outputs and without changing the production code. So we will treat the system as a black box (we do not care about the internal behaviour of the system, we care just about the whole system inputs and outputs) and we will test only the outputs for given inputs.

The Golden Master technique is very useful when a clear input and output is easy to obtain on the system level. There are some cases where the Golden Master technique can be applied with difficulty or where it cannot be applied at all. We will discuss these situations further as well.


In audio mastering, a golden master is a model disk used as a reference to create disks in the old vinyl industry. This disk was cut in metal and it would contain the sound transferred from a microphone (see here more details). In the software world we took this name and we started using it for a fixed reference of a system output, paired with a system input.


For all the given situations we need to think if the system tests generated with the Golden Master are enough, or we need to start adding other types of tests like unit tests, component tests, integration tests, security tests, etc.

The Golden Master technique is based on the following steps:

  1. Find the way the system delivers its outputs
    Check for clear outputs of the system: console, data layer, logger, file system, etc.
  2. Find a way to capture the output of the system without changing the production code
    Think if that output can be “stolen” without changing the production code. For example you could “steal” the console output by redirecting the stream of the console to an in-memory stream. Another example would be injecting an implementation of a data layer interface that would write to another data source than the production code would.
  3. Find a pattern of the outputs
    The output of the system could be text or a data structures tree or another type of stream. Starting from this output type you can decide if you could go on to the next step.
  4. Generate enough random inputs and persist the tuple input/output
    In the case you can persist the outputs to a simple data repository, you can think about what are the inputs for the corresponding outputs. Here the magic consists in finding a good balance of random or pseudo-random inputs. We need inputs that would cover most of the system with tests, but in the same time the test would run in seconds. For example in the case of a text generating algorithm we need to decide if we want to feed the system with 1000 or one million entry data. Maybe 10 000 input data are enough. We will anyway check the tests coverage during the last stage.
    We need to persist the pair input-output. The output would be the Golden Master, the reference data we will always check our SUT against.
  5. Write a system test to check the SUT against the previously persisted data
    Now that we have a way to “steal” the outputs and we have a guess about how to generate enough input-output pairs, but not to many, we can call the system and check it against the outputs for a given input. We need to check if the test touches the SUT and if it passes. We also need to check that the test runs fast enough, in seconds.
  6. Commit the test
    Always when a green test passes we need to commit the code to a local source control system. Why? So that we can easily revert to a stable testable system.
    Important: Do not forget to commit also the golden masters (files, database, etc)!
  7. Check test behaviour and coverage
    In this stage I tend to do two things:

    • Use a test coverage tool to see where the system tests touch the SUT
    • Start changing the SUT in order to see the golden master test go red.
      If the test does not go red, I can understand that the code base is not covered by tests in that area and I should not touch it during the next stages, until I have a basic safety net. Always after this step I will revert to the previous commit, not matter how small the change to the SUT was.
  8. If not enough behaviours are covered, go to 3
    In the case we found during the last stage some behaviours that were not covered by the golden master test, we need to write some more tests with other inputs and outputs. We go on until all the visible behaviours needing to be covered by tests are covered.


After this session we will have a basic safety net composed of system tests. These system tests will check the SUT against the golden masters. A very important output of the system would be the golden masters persisted in any form. All the system tests would need always to be called during the next refactoring phases.


This technique is very useful, but it can be very tricky as well. One needs to use a lot of imagination and craft in order to succeed in the task of redirecting the outputs so they can be persisted. We often need to think outside the box about the SUT and how we can redirect output streams towards in-memory streams. Each situation is different so the craft and imagination are essential when using this technique.

The system tests generated can be misleading. You need to treat the system tests generated by it with care, as it may not test as much as you might think. Do not do any major refactoring when only these tests are available. We will discuss during the next sections which other tests can be added so that we can have a more confident testing battery.

It is very important to figure out if this technique can be used. If the system will have a very tight coupling, so that we cannot redirect the outputs, we need to use another technique which is more feasible for the given case. An example of a tight coupled resource would DateTime being used inside the system. We would need to always change the time on the machine of the system if we would use golden masters, and this might not be doable.
Another case when this technique could not be used would be if in the middle of the SUT there are random generated data that would keep us from having one or more golden masters.
There might be a lot of other cases where this technique may not be useful as well and they need to be treated with care.
Nevertheless we can have an option of fiddling with the outputs by checking the non-random parts only. For example if we have a text file as a golden master we could use a regular expression to check all the data and we would ignore the random information from the golden master.

Start redirecting the output only when needed, maybe some part will be repeated and we need just a sub-set of the output. For example when redirecting a console, we might want to capture only one step of the console output. For doing that, we could call the console redirection only when needed, not necessarily at the beginning of the call towards the SUT.


I first found out about this technique from JB Rainberger when I attended the second ever Legacy Coderetreat, the Belgium edition. I thought about it as being a very fast, and useful technique when dealing with existing code. After the first session I fiddled around with it and I learned more about when it can be used and when it is better to use other techniques.

I do not know who really invented this technique, so I would be extremely grateful if you would tell me who is the initial author.

Code Cast

Find here a code cast in Java about Golden Master.
To check a more generic approach to system tests check out the other session Part 2 – From Nothing to System Tests and its corresponding code cast.

Image credit: http://upload.wikimedia.org/wikipedia/commons/8/84/GoldeneLP.jpg


If you want to receive an email when I write a new article, subscribe here:

Subscribe for new articles

4 Thoughts on “Legacy Coderetreat: Part 3 – Golden Master

  1. I’d refer to this technique as ‘Approval Testing’, since you ‘Approve’ the Golden Master when you create a test case. The name was coined by Llewellyn Falco – I’ve written more about it in my blog:


    • Thanks Emily for your comment. I understand that Llewellyn’s framework can be extremely useful, but even he uses the term Golden Master in his framework. As I know this name was used long before “Approval Tests”, so that is one of the main reasons why I will use it for the future.

      Llewellyn’s framework seems extremely useful and I will check it in the near future.

  2. Nice write-up with great detail on the legacy code retreat problem Adi.

    I don’t understand why you say “if it can be used”, to me it can always be used. Of course any legacy system has loads of untestable dependencies like dates, access to 3rd party systems, etc. But just like you inject the randomizer and swap the system output to test Trivia, well in another system it is just other seams you’ll be using. Or am I too optimistic here?

    Not having to write the assertions is the real block buster here in my opinion. Intelligent diffing is another one, as Llewellyn Falco has captured it in ApprovalTests. I hope suspect tooling is going to explode in this area as theoretically you could just instruct “capture all outputs” to generate the golden master (I mean in a one-liner).

    As for random data, I think that is a very special case found in the trivia exercice. In fact to me it is not random at all, it just happens to be the most convenient way of enumerating enough input arguments. My experience and speculations about that http://martinsson-johan.blogspot.fr/2014/05/golden-master-and-test-data.html

    • Thank you for the comment, Johan.
      Sometimes it is either too complicated to make it work if you want to test your system as a black box. I gave some examples with outputs that depend on strange criteria that are hard to understand. So in this case I prefer writing several smaller black-box system tests, focused on one smaller part of the system. If you want to have increased confidence that you will not introduce defects, you would not want to start touching the code in any way and treat it as a black box. Yes, there are cases when you can inject behaviour, but those cases are for better designed systems and not for the ball of mud systems I am used to. In any case I would not create seams into a legacy system without having sa safety net beforehand. In this article I explained how you can create a safety net, so creating seams is out of the question for me because I want increase confidence that I will not introduce defects.

      I agree that you can write Golden Masters with tools. I have not used Approval Tests yet, but I saw that it seems like a good tool. Anyway, the concept in more important than the tools and I wanted to show in the blog post and in the code cast how one can use Golden Master as a technique to create the first coarse safety net to a black-box system.

      Yes, the data is not quite random in trivia. But you want to put in balance the time to take in order to understand the pattern or the time to take in order to create a safety net first and then focus on the problem you want to fix in the existing code. I prefer not reading too much the code and trying to understand it through tests.

Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation