Although Docker is growing fast and is quickly gaining a strong reputation among development teams for it’s ability to deploy development environments in local consistently and reliably, are companies really using it in production and at scale? Does Swarm have all the features needed to orchestrate containers in production?
Let’s find out…
Kubernetes vs Swarm
Kubernetes is more mature than Swarm and they know it (their tagline is “Production-Grade Container Orchestration”).
Swarm is much easier to use than Kubernetes, which makes it more accessible for people who are getting started with container orchestration. Swarm aims to provide enough features to be able to fully orchestrate a production environment without needing Kubernetes or any other orchestration tool.
Let’s look at what the industry is up to.
According to the ClusterHQ survey 29% of people they surveyed use Swarm, while 43% use Kubernetes. According to the folk over at Datadog nearly 30% of large companies have adopted Docker, however it’s not clear if these companies are using docker in production, they might have “adopted” Docker in a non-production environment instead, or they might even be using Kubernetes to orchestrate their docker containers.
It looks like the industry is still undecided, although Kubernetes is being used in a lot of bigger companies already.
Is Swarm Ready For Production Now?
Yes. Swarm has already been around for over a year and the release of Docker 1.12 came with all the tools to orchestrate docker containers at scale, and many companies are already using Swarm in production. The main reason against using Swarm in production today in a larger company is it’s immaturity compared to Kubernetes. However Kubernetes is a complex system and setting up a production-grade Kubernetes cluster is not nearly as easy as setting up a production-grade swarm, for this reason many smaller companies are opting for Swarm instead of Kubernetes.
The upcoming release of Docker 1.13 will address some “usability issues” that 1.12 had, mostly making the “legwork” of automatically deploying services much easier with the release of docker stacks, as well as ironing out some other wrinkles and giving us tools like docker secrets. However right now Swarm is usuable and stable, any improvements only make it better :)
Up to now databases have been the hardest thing to containerize because of the movable nature of the services, by nature containers in services aren’t guaranteed to be on the same host when they are restarted. There are solutions to this, for example flocker will allow you to create volumes which will be backed by many common storage backends. This allows you to create a volume which is actually backed by an Amazon EBS volume for example. Any data that you write to this volume is actually sent over the network and stored on the Amazon EBS volume, not on the docker host itself.
Flocker in theory allows you to containerize even your persistent storage database and deploy it to a distributed cluster similar to your application. However there are other considerations to take into account, for example: Does your database support two different instances of the database reading and writing from the same data on disk? In the case of MySQL it’s not possible, although mongodb does allow this kind of concurrency. So if you’re using a database where multiple instances cannot work on the same volume at the same time then it cannot be scaled horizontally beyond a single instance. In most cases this leads companies to keep managing it in the “traditional” way outside of docker.
Are Databases Ready To Be Containerized In Production?
The quick answer: If you can avoid it, don’t attempt it.
Even though it is possible to get multiple instances of mongodb accessing a shared multi-host volume, this setup is probably not a good choice for production, especially when other more battle tested options like sharding and database replication exist for “standard” deployments of mongodb.
If you want to use MySQL then the answer is a definite “no”, both MyISAM and InnoDB are not ready to concurrently use the same data and might never be, for now stick to “standard” deployments of MySQL in production and use traditional replication and master-slave techniques when you need database redundancy and improved performance.
Alternative databases that are designed to work in replicated, environments like Kubernetes and Swarm do exist though. Vitess was developed by YouTube to scale up their MySQL database when they outgrew even traditional replication and sharding.
If you’re wondering if Swarm is ready for production then you’re in luck, it is, and you won’t be the first to use it in production either. Being a young technology means that it is improving and evolving rapidly however, so if you opt for swarm in production expect things to change and evolve as time goes on, which is both good and bad. Small usability nuisances are corrected quickly and new features are being added constantly.
If you’re wondering if dockerizing databases in a production environment makes sense, forget about it for now because unless you’ve got enough traffic that even a well oiled “traditional” replicated database can’t handle it then trying to containerize the database will likely cause more headaches than happiness no matter what orchestration tool you use.
Do you have any stories about Swarm or Kubernetes in production? Share them below! If you have any questions, just ask.