Version Control and Hosting
Coders like to code.
It can be easy to get into the habit of simply opening up an editor and banging out as much code as possible.
This is particularly true if you're working on a personal project or you're the sole developer.
It can be even more tempting if you're a fast coder or have a boss who wants fixes and solutions right away.
But if you are slinging new code into production without a proper version control system, you're not really doing software development, you're doing "Cowboy Coding."
How Version Control Works
Version control, also called revision control, versioning, or source control, is a method for tracking the revisions made to documents, code, or other files.
Version control systems (VCS) or version control software can be standalone applications that are built into document editing applications (like Word or Google Docs).
What Does Version Control Do?
Version control software allows developers, editors, and other team members to view previous versions of files, as well as restore earlier versions.
Version control maintains a master copy of the code base. Many version control systems allow for several parallel copies of the entire code base to exist simultaneously.
Each software developer has their own copy of the code base: they can make revisions without influencing the master source code.
These revisions are brought in at an appropriate time and merged into the master source code.
How this merging happens depends on the version control system (VCS) in use.
Reasons to Use Version Control
Not convinced yet that you need a version control system?
Here are the reasons why version control is worth using:
- Freedom to make mistakes
- Freedom to try something new
- A full history of revisions made to your code base
- Less unanswered questions
- Paper trail of what was done and why
- Facilitate easier collaboration between team members.
Freedom to Make Mistakes
Do you ever use the UNDO button (CTRL-Z) while working? Of course, you do. It's one of the most important features of modern computers.
What the UNDO button gives you is the freedom to make mistakes. This is one of the advantages you get from version control — in fact, it might be the most important advantage.
Freedom to Try Something New
With version control, you can try something out — a new solution, a new feature, a bug fix.
If it doesn't work you can simply revert your code to an earlier point or discard the proposed revisions.
Those revisions will not have been merged into the master source code. (It is kind of like saving points in a video game.)
This is helpful for two reasons:
- You will make mistakes inevitably, so you might as well have a way to correct them easily.
- Once you know you have a way to reverse mistakes, it becomes much easier to venture into unknown territory and take risks with novel solutions or untested ideas.
Full History of the Revisions Made to Your Code Base
Have you ever worked on a project over a long period of time and then someone who uses it says, "Didn't the exit button use to trigger a save warning before closing the application?"
If a system exists for a long enough period of time, it is inevitable that some features will be changed and removed.
Once you know you have a way to reverse mistakes, it becomes much easier to venture into unknown territory and take risks with novel solutions or untested ideas.
Usually, there was some reason for having the feature in the first place (even with features that are eventually removed).
However, there was also a reason why a given feature was removed (even if the reason was that someone did so accidentally).
Fewer Unanswered Questions
Later on, when someone shows up and asks about some feature that used to be there, you can try really hard to remember what happened.
Or, if you have version control, you can go look up past revisions and come back with definitive answers about:
- What that feature used to do
- When it was removed
- Why it was removed.
This is particularly helpful if you have to:
- Re-implement the feature (sometimes you can just re-implement the code that was removed!)
- Defend its continued exclusion from your production-ready applications.
Paper Trail of What Was Done and Why
This is closely related to version history, but it is more about developers and less about features.
Your paper trail is not (usually) a literal paper trail, but version control allows you to see things like:
- What revisions were made
- When revisions were made
- Who made the revisions.
This is helpful when trying to piece together why things are the way they are. You can assign credit or blame or just figure out who to ask about some specific feature or implementation.
Usually, version controlled repositories are stored in multiple locations.
This saves your projects from having a single machine as a catastrophic single point of failure.
Facilitate Easier Collaboration Between Team Members
If only one person is working on a project, you might be able to get away without using any version control system (though this is still really a bad idea).
However, if multiple people are working on a project together, the risk of people writing over each other's revisions or creating incompatible code (also known as merge conflicts) is very high.
As such, one indispensable feature of version control systems (VCS) is the ability to check for mutually incompatible revisions to the master code base to ensure that everything works together.
Deployment and Version Control
How do you move files from your local development machine to your test environment and then, finally, the production environments?
Some people just keep an FTP window open and drop files in as they change them.
This is unwise. It is too easy to leave a needed file out, and if there is an unexpected problem on the server, it becomes difficult to reverse your revisions.
Pushing Revisions All At Once
If you are using certain types of version control (especially Git), you can simply push your revisions all at once to a remote server. It does not matter what environment — development, test, or production — the server handles.
If any of your revisions cause a problem at any point in the future, you can easily roll back the revisions so that things begin to function again.
Types of Version Control Systems (VCS)
There are basically two types of version control systems:
- Centralized version control systems
- Decentralized version control systems.
Let's take an in-depth look below.
Centralized Version Control Systems
Centralized version control systems follow a client-server model.
In these systems, a single, master ("central") set of source code sits on a server. Individual files that are being worked on are checked out by developers.
The working copy is then "locked." Others are either warned that they should not make revisions to the file or even prevented from editing the files (or both).
Developers then push the revisions they made to these files back to the central source code, which is the version used for code/software deployments to production environments.
A Sample Centralized VCS Workflow
In a centralized version control system, there is a central server (or repository) that acts as the source of truth.
This is also the set of code that is typically kept in a production-ready state.
This means that, at any given time, the code could be shipped to a production environment without negative ramifications.
When you need to work on something, you find the files that you need to work on. You then "check out" these files, which means that:
- You pull a copy to your local machine, where you can work on it
- The files themselves are locked against editing by others on your team
When you are finished making changes, you can commit them, including a note on what you did.
Unlike decentralized systems where you merge in your changes (we will talk more on merges in a bit), you simply push your changes to the central server. This releases the locks you have on those files.
Decentralized (or Distributed) Version Control Systems
Decentralized/distributed version control systems are those where the software developers involved have:
- A complete copy of the entire code base (as opposed to a working copy of select files)
- A history of revisions made.
Source of Truth, Users, and Nodes
There is no one user or node, that is more important than any other node, though there is usually one single repository that is designated as the origin. (Think of a repository as a file but with historical information.)
The origin is similar to the "central" source code in a centralized VCS.
Individual changes are, when ready, merged into the source of truth (typically labeled as the master branch).
Because of the asynchronous and independent method by which decentralized VCS work, merge conflicts must be resolved by the developers before merging occurs.
This is how irreconcilable differences between the work of two or more developers are prevented from breaking the master branch.
A sample decentralized VCS workflow
In this section, we will cover the process of using a decentralized version control system.
The branching and merging required make the use of such systems slightly more complicated than its centralized counterparts.
You can get started in one of two ways:
- You can initialize a new repository on your dev machine
- You can clone an existing repository.
Regardless of which option you chose, you will end up with a full copy of the source code on your computer.
Different versions of the code are called branches, with the source of truth and the version that is shipped to a production called the master branch. When using distributed VCS, it is good practice to keep the master branch in a state ready for production deployment at all times.
Every time you want to make a change to one or more files, you create a new branch. As its name implies, a branch is an offshoot of the main code.
The number of changes you include on a branch can vary.
You might make just a small change, or you might keep months of changes on a single branch.
Typically, you would (at the very least) ensure that all of the changes are related to a single feature.
The process of saving a change is called committing.
Each commit that you make requires you to add notes on what you did — your VCS should automatically note that you were the person who committed the change and when.
Over time, you will be able to see a log of all commits made, when they were made, and by whom.
Commits have the bonus feature of allowing you to roll back your changes just a little bit at a time.
This is assuming you have created multiple commits and not just one big commit at the end of your project).
You can think of commits as divisions of branches.
While branches hold changes related to a given feature, commits are the smaller changes that, added together, become the full feature update.
Branches are also helpful for sharing your work.
For example, let's say that you are working with several others, and you are all contributing to a single repository.
Well, if you wanted to share your work (perhaps you want to get the code you have written reviewed), you can just push the branch you have been working on instead of the entire repository.
Shipping Your Work
When you are reading to ship your work, you can begin the process of merging, where someone (typically not yourself) merges your features branch into the master branch.
The general process is as follows:
- You push your branch up to the central repository and request for it to be pulled into the master branch
- Someone else reviews your branch and if everything looks okay, they finalize the merge.
Note that version control systems will only allow the reviewer to merge if your proposed changes do not conflict with anything that has already been merged into the master branch.
If this is not the case, you will have to resolve the merge conflicts and update your request.
Comparing and Contrasting Distributed (Decentralized) vs. Centralized Version Control Systems
What are the primary differences between a decentralized/distributed version control system versus a centralized version control system?
The most obvious difference between centralized and decentralized VCS is in terms of access and convenience.
Downsides of a Centralized System
You can think of a centralized system as being akin to accessing a shared Dropbox folder through a web browser.
Conversely, accessing a distributed system is the equivalent of syncing a shared, community Dropbox folder to your own computer.
With a centralized system, before your users can begin editing, they need to:
- Access the central source files
- Download the working copy they need
- Check out the working copy so that they are locked and unable to be edited by others.
Files in a Distributed System
With a distributed system, the files are already right where you need them.
This is because one of the first steps of getting a distributed system set up is to clone all the files, as well as the version history, to your local development workstation.
Cloning a repository is analogous to copying a file — remember, however, repositories possess additional historical information.
When you are ready to begin working, all you have to do is open up the files you've "pulled" to your computer.
Having all the files you need locally is a huge advantage in terms of speed and efficiency.
The only time you need to communicate with the server is to pull a file from it or push a file back to it.
Decisive Advantages and Disadvantages of a Distributed System
This asynchronous method also allows users to make several revisions locally before deciding on the next step:
- Pushing their revisions out to everyone else working on the project (by pushing to the origin branch and having the revisions pulled in)
- Sending their revisions to select team members for review before making them visible to the entire team.
However, one big downside of a distributed VCS is the amount of space a local repository might require.
Depending on the size of your project, individual repositories that you have cloned to your computer can end up taking a lot of space.
This problem is amplified if you have to clone multiple repositories for a single (or even multiple) projects.
Why Are These Disadvantages?
When you consider the sheer number of text files, image files, videos, and changelog sizes, this can be problematic, especially for those on budget workstations.
For users with such limitations, a centralized VCS might be a better option, since users only have to pull down the files they need, not the entire set of source code and accompanying revision history.
Distributed Version Control System Options
When choosing a version control system (VCS), what are the options available to you?
Which one should you choose?
In the following sections, we will cover several popular distributed version control systems, as well as several popular centralized version control systems.
Hopefully, this helps you choose an option that fits your needs. If not, this list should help you jump start your search for the option that works!
Let's start with some of the most popular distributed options available.
Bazaar is the version control system sponsored by Canonical, written in Python.
For users familiar with Concurrent Version System (CVS) or Subversion (SVN), Bazaar commands will appear similar.
Bazaar, unlike some of the other distributed VCS, allows you to use it with or without a central repository or server where the master source code set lives.
It also integrates well with other VCS — you can commit changes to SVN, and you can read files that are tracked by Git or Mercurial.
You can also export Bazaar history to many other systems.
Fossil is a cross-platform, distributed version control system that also includes features for:
- Bug tracking
Fossil ships with a built-in web interface that displays detailed change history and project status information.
The goal of this interface is to reduce the complexity inherently involved with project tracking and to improve a user's situational awareness in the code base.
Similarities to Bazaar
Like Bazaar, Fossil does not require you to use a central server, though if you do, the collaboration between your team members will be easier.
Fossil utilizes SQLite databases to store its content.
Git is a version control system created by the "father of Linux," Linus Torvalds.
Though Git features prominently in the software development world, it can be used to track changes in any type of file set.
Over all else, Git prioritizes performance.
This is important when distributed version control systems require:
- The initial pulling of all project files (not just the ones being worked on)
- Data integrity
- Support for non-linear workflows.
Git on Different Platforms
Though Git is developed using Linux, it is a cross-platform solution.
Typically, each project is managed in an individual repository. (Remember a repository is essentially a folder but with a log of changes).
Files for large projects are sometimes split into multiple repositories.
Git is typically used in conjunction with some type of web-based hosting service.
This is the method by which multiple collaborators can share their work, as well as pull down the original source code and the changes made by their peers.
In addition to supporting all of the version control and source code management features of Git, GitHub offers:
- Access control tools
- Bug tracking tools
- Feature requests management
- Task management/productivity tools
You can even generate and host simple web pages using GitHub.
Though GitHub offers both public and private repositories, utilizing a private repository incurs fees (whereas a public repository is free of charge).
This is in line with GitHub's dedication to open source code.
Bitbucket is Atlassian's contribution to the world of web-based hosting for Git (and Mercurial) users.
In addition to its free accounts, Bitbucket offers more feature-rich commercial plans.
For some users, Bitbucket is a better option than GitHub, since Bitbucket does not change anything if you use a private repository.
Free accounts get an unlimited number of private repositories, though the number of contributors is capped.
Bitbucket is typically seen as the option for professional developers working with proprietary source code.
Its primary use is for code and code review, though Bitbucket does offer some extras like:
- Static website features.
Gitlab Self-Hosted and Fully-Hosted Plans
GitLab provides four different self-hosted solutions plans:
- Core: for small teams or personal projects (Core is completely free to use)
- Starter: for personal projects or small teams who want professional support.
- Premium: for teams needing high availability, high performance, or 24/7 support.
- Ultimate: for large enterprises needed additional security and compliance functionality.
If you are not interested in self-hosting, you can opt for the fully-hosted version of Git. For each self-hosted plan, there is a corresponding hosted plan:
- Core → Free
- Starter → Bronze
- Premium → Silver
- Ultimate → Gold
Feature Parity Between Gitlab Plans
GitLab ensures feature parity between its self-hosted and fully-hosted plans (that is, the features offered to those on the Starter plan are the same as those on the Bronze plan).
Need a Private Repository?
For those of you who need a private repository (or multiple private repositories), you might strongly consider GitLab.
For these situations, GitLab is cheaper than GitHub and faster than Bitbucket (though obviously, your mileage may vary depending on variables specific to your situation).
Mercurial is a cross-platform distributed version control system that is:
- Highly performant
- Easily scalable
- Capable of handling both plain text and binary files
- Advanced in its branching and merging capabilities.
Despite the complexity that such features might introduce, the engineers still strive to ship a conceptually simple product with an easy-to-use, integrated web interface.
Though the command line is the primary method by which a user interacts with Mercurial, there are many graphical user interface (GUI) extensions available, and many integrated development environments (IDE) offer built-in Mercurial integration support.
Centralized Version Control System Options
The following version control systems are some of the most popular centralized options available.
Concurrent Versions System (CVS)
Concurrent Versions System (CVS) is a free version control software.
CVS' origins are with a series of shell scripts shipped in mid-1986.
CVS is no longer maintained (the last time the developers shipped a new release was 2008), but you will still find some people using CVS.
When using CVS, note that the terminology it uses is slightly different from those used by other version control systems.
For example, a set of related files is called a module, while the series of modules a CVS server manages is called the repository.
CVS calls the files that get checked out by developers are the working copy, sandbox, or workspace.
Revisions to the working copy are sent to the repository via commits, while updating is the process of acquiring the changes now present in the repository.
Apache's Subversion (SVN) is an open source versioning/revision control system.
We mentioned that Concurrent Versions System (CVS) still has some users, but CVS has not been updated since 2008.
As such, Subversion has been designed to act as and is frequently used as, a (mostly) compatible alternative/successor to CVS.
What Makes Subversion Worthwhile?
While distributed systems like Git seem to get most of the attention in the world of version control systems, Subversion is commonly used, especially in the open-source community.
Subversion was originally developed in 2000 as an alternative to CVS, but with bug fixes and additional features not found in CVS.
One of the biggest perks of Subversion is its built-in, fine-grained permissions system.
You can limit access to files and directories on a per-user basis.
Furthermore, Subversion is a good option for those who want binary files and other assets stored in the same repositories as the source code (even more so if you have a large number of said binary files).
Easy-of-Use and Target Market
Finally, do not discount the fact that there is a learning curve when it comes to version control systems.
Subversion can be easier for people (especially non-technical users) to learn and understand than other version control systems.
Finally, Subversion is a good option for businesses operating in heavily-regulated industries.
While you can certainly hack any version control system to maintain the audit trails you need to ensure that your company is compliant with the appropriate regulations.
SVN, as an enterprise-grade system, comes with the feature set necessary to make this process easier for you.
Team Foundation Server (TFS)
Team Foundation Server (TFS) is Microsoft's contribution to the world of version control systems.
TFS also includes features for:
- Requirements management
- Project management
- Testing and release management capabilities.
Essentially, TFS contains everything you need to manage all aspects of the software development lifecycle.
What is TFS Used With?
TFS can be used with many different integrated development environments (IDEs).
It is built especially for use with Visual Studio or Eclipse.
You can self-host TFS, or you can subscribe to the hosted version called Visual Studio Team Services.
Furthermore, TFS is one of the few products that boast built-in extensibility.
You can certainly hack other systems to perform the way you want if it goes against the way the product is designed, but TFS makes this process much easier.
There are many different version control systems out there, and while they all implement version control slightly differently, the important thing is for you to adopt one.
The difference between Git, CVS, and SVN, is not as large as the difference between not having versus having a version control system.
Don't risk catastrophic loss of your source code — adopt a version control system today!