As a software developer or tester it’s important to know about the testing pyramid because it will help make decisions about the types of testing that each project requires. Not all types of tests are the same, some are fast while others are slow, some impersonate an end user while others simply test units of logic. After this post you’ll understand the different types of tests and why pyramids are bigger at the bottom :P

A first look at the testing pyramid

Let’s take a look at the test pyramid.

the-testing-pyramid

First let me explain the axes: the vertical axis here is the level of the test. A high level test is a test which tests a big group of code or even the system in it’s entirety and a low level test is one that tests the nitty gritty internals of very small bits of code. At the bottom of the pyramid we have the lowest level tests. These are usually the unit tests which test single individual classes. When we say that these tests are low level, we mean that it is very deep into the specifics of the code and how it is written. The lowest level in the testing pyramid is usually unit tests because they test single classes.

The pyramid is wider at the bottom to show that we should have many more low level tests than high level tests, we’ll see why soon.

Next let’s jump straight to the top of the pyramid, we’ll come back to the middle later. The top of the pyramid represents the high level tests. The highest level tests are end user tests, these test the entire system from the point of view of an end user. At this highest level we usually test that the entire system behaves correctly when we pretend that we are are a user of the system.

If we are building a website then this end user test might take the form of manual testing where someone manually clicks through the important flows, or it might be automated using Selenium or Behat. If we are building an API then these tests would simulate real calls to the API to make sure that the API returns the right result. They are at the top of the pyramid because these types of tests are very high level, we ignore the lower level implementation details of the code, in fact we ignore the entire codebase itself. If you think about it, we don’t even care what programming language the application is written in! At this highest level the only important thing is that the application behaves as it should for the end user.

The middle of the pyramid

So, what’s in the middle? The answer to that question is: “all the rest of the tests”! There are so many different ways to test an application that I can’t possibly put them all on the pyramid. Let me give you a few examples though so you know how the different types of tests would fit in.

As a first example let’s imagine a test that tests a little more code together, maybe we test groups of classes together to make sure they work with each other. This type of test would be at a higher level than unit tests because we are testing multiple classes. However we are still testing code so we are still at quite a low level. The actual name of these kinds of tests is different according to different people, I’m going to call them “code level integration tests” because they test that multiple classes work as a whole when we bring them together.

Let’s look at another example. Let’s imagine that we used the MVC pattern and we decided to write tests for the controllers which didn’t mock anything except the database, where would these tests belong on the pyramid? Well, they test a lot of code together they would be a lot higher than unit tests. However they are still not fully simulating a real user because it is still a code-based test. Because they are closer to simulating a full end user experience than even integration tests which test groups of classes together, we’re going to put these kind of controller tests around here in the pyramid.

controller-tests-on-the-pyramid

Again, the important thing is just that you understand what makes a certain type of test “high level” and what makes it “low level”. If for example we decided to unit test our controllers and mock all the other classes, then these would be much lower level than if we didn’t mock everything else. Since there are thousands of ways to test software, the important thing is that you can recognize the difference between a high level test and a low level test so that you can order the tests that you have in your project in your own testing pyramid.

Which type of test is the best?

So which type of test is the best? Before we answer, let’s take the extreme examples. Let’s image that you have full unit test coverage for your product but you have no end user tests.

If all these tests pass in pre-production would you trust that your system will work correctly when you put it into production? Personally I wouldn’t, because it is completely possible that all your unit tests are green, but the code cannot be used together, maybe because one class has a typo in it’s name but the test for the class uses the typo-ed name so it passes. Or maybe your production server doesn’t have the correct version of your programming language but your code and unit tests do.

So if we rely exclusively on unit tests we are missing one small but vital piece: we can’t guarantee that the application plays well with the infrastructure around it and therefore that it works for the end user.

Now let’s look at the other extreme, let’s imagine that we have completely covered the end user test cases for our system, let’s imagine that our product is a website and we have written end user test suites to simulate all the possible use cases that a user could come up against. If these tests pass in pre-production, would you trust that your product will work correctly when you put it into production? In this case we can trust the test more because it is actually simulating a user, or in other words, it is at a higher level and so it tests more of the stack together.

Higher level tests seem better, but are they?

So, it looks like higher level tests are better than lower level tests because they more closely resemble a complete end user experience and therefore they give a much better guarantee that the application is working as expected. But there are some very big problems with this. First up tests at the top of the pyramid take far far longer to execute than tests at the bottom because tests at the top need to go through the entire stack to completely simulate an end user while tests at the bottom are extremely quick because they are code based.

The second problem is that when a failure occurs in an end user test it is much harder to track down the problem, whereas a unit test will show you exactly where the problem is.

So let’s look at it again, which type of test should we write? In the end the lower always wins, except when we want to test that small “missing vital piece” that we mentioned earlier: the integration with the infrastructure around the code.

Lower level tests win

Let’s discuss this for a moment. Why should we have more lower level tests that higher level tests, when we have just said that higher level tests are better? The answer is of course that lower level tests are much faster to execute, so we get very quick feedback if our product is working properly.

The other benefit is that unit tests are much better at pointing the location of the problem. The idea behind the “test pyramid” is that we have only a small amount of high level tests, we might have a very few tests to simulate the most important and common user flows but the main objective here is to test that the application code plays well with the infrastructure around it. Then as we move down to the lower levels we will have more and more tests that go into more and more use cases as we move down the pyramid.

So at the top, we only test what we cannot test lower down: the infrastructure, and as we move down the pyramid we begin testing the application logic itself.

Because unit tests are so quick, it is much easier to run them over and over while developing. Indeed they can be a huge help to make sure that you haven’t broken anything while you are developing and refactoring. While you could also run the end user tests while you are developing, they would take much longer, normally it takes long enough to run these tests that running them frequently while developing is simply not practical.

Conclusion

This is the reason that the base of the pyramid is wider than the levels above it, it means that we should aim to have more lower level tests than higher level tests.

There is a place for both high and low level tests since they each have their own advantages and disadvantages, but when writing tests always prefer lower level tests over higher level tests, unless of course you want to test the integration of all the moving parts by doing the “ultimate” test and pretending to be a user. Remember though that any high level tests should test only the integration, if you find yourself wanting to test lots of different edge cases in a high level test then it might be better to test those edge cases in a lower level test instead.

How this applies to Docker

When we split our architecture into microservices then testing gets easier. All of a sudden it is very easy to test each microservice on it’s own. A test which tests a microservice on it’s own is somewhere in the middle of the pyramid since it is at a high level but still doesn’t consider all of the architecture together as a user would.

With tools such as docker-compose it is now very easy to spin up one or more microservices and run high level tests against it, which is great news.

Now each microservice can have:

  • It’s own suite of unit tests
  • It’s own suite of tests which test the boundary of the microservice (acceptance tests?)
  • A test together with all other microservices to ensure they all play nicely together

Because of the way that docker encourages separation into microservices and because of the tools like docker-compose which make it really easy to spin up services for testing, now testing these is easier than ever.