November 23, 2014
Hot Topics:

A Crash Course in Subversion

  • April 22, 2005
  • By Garrett Rooney
  • Send Email »
  • More Articles »

If you're already familiar with version control, Subversion is reasonably simple to use. The workflow is quite similar to that of several other version control systems (notably CVS), so you shouldn't have too much trouble transitioning to Subversion. This chapter, [From the Apress book Practical Subversion] begins with a simple overview of Subversion and then dives into the specifics you need to know to use the software. Along the way, I compare Subversion commands to the equivalent commands in other version control systems, such as CVS and Perforce.

Conceptually, Subversion's design is similar to that of CVS. There is a single central repository that holds all versions of each file that is under Subversion's control. You (and others) can interact with the repository in two different ways, either by checking out a particular revision of the versioned data into a local working copy or by acting directly on the repository itself, without the need for an intermediate working copy. Generally, you'll check out a local working copy, make changes, and then commit those changes back into the central repository.

Locking vs.Nonlocking

An important difference between Subversion and many other version control systems is that like CVS, Subversion's mode of operation is nonlocking. That means that if two users have checked out working copies that contain the same file, nothing prohibits both of them from making changes to that file. For users of systems such as Visual SourceSafe, this may seem odd, as there is no way to ensure that the two users' changes to the file don't conflict with each other. In truth, this is by design.

In the vast majority of cases, the two users' changes don't conflict. Even if the two users change the same file, it's likely that they'll change separate parts of the file, and those disparate changes can easily be merged together later. In this kind of situation, allowing one user to lock the file would result in unneeded contention, with one user forced to wait until the other has completed his changes. Even worse is the situation in which the second user changes the file despite the fact that the file is locked. When the first user completes his change and unlocks the file, the second user is stuck merging the changes together manually, introducing an element of human error into something that the computer can handle far better.

Worse yet are the problems of stale locks. In a version control system that uses locks, there's always the danger of a user taking out a lock on a file and not returning it by unlocking the file when she's done. Every developer has run into something like this at some point. You begin work on a new bug or feature, and in your first stab at the solution you end up editing a file. Because you're making changes to the file, you take out the lock on it to ensure that nobody else changes it out from under you. At this point you can get into trouble in several ways. Perhaps once you get further into the solution, you realize that you were wrong to change that file, so you return the file to its previous state and move on to another solution, without unlocking the file. Perhaps your focus moves to some other issue and your work on the first problem sits there for a long period of time—and all the while you're holding the lock. Eventually, someone else is going to need to edit that same file, and to do so he'll need to find you and ask you to remove the lock before he can proceed. Worse, perhaps he'll try to work around the version control system and edit the file anyway, which leads to more complicated merging issues in the future. Even worse, what if you're on vacation or have left the company when this happens? An administrator will have to intercede and break the lock, creating an even greater chance of someone's work getting lost in the shuffle.

So in the typical case in which there are no conflicts, the nonlocking strategy used by Subversion is a clear win. But what about the rare case in which changes really do conflict? Then the first user to complete his change commits that change to the repository. When the second user tries to commit, she'll be told that her working copy is out of date and that she must update before she can commit. The act of updating will give Subversion a chance to show that the changes conflicted, and the user will be required to resolve the conflict.

This may seem similar to what would happen in the locking case, except for a couple of critical differences. First, the conflict forces the first user to stop and deal with the differences, avoiding the chance that the second user might just copy her version over the first version and destroy the first change in the process. Second, Subversion can help with the merging process by placing conflict markers in the file and providing access to the old, new, and local versions so the user can easily compare them with some other tool.

If you've never used a version control system that makes use of conflict markers, the best way to understand them is through an example. Suppose you have a file in your working copy, hello.c, that looks like this:

#include <stdio.h>
int
main (int argc,char *argv [])
{
   printf ("hello world \n");
   return 0;
}

Then say you change the hello world string to Hello World, and before checking in your changes you update your working copy and find that someone else has already changed that line of the file. The copy of hello.c in your working copy will end up looking something like this:

#include <stdio.h>
int
main (int argc, char *argv [])
{
<<<<<<<.mine
   printf ("Hello World \n");
=======
   printf ("hello world!\n");
>>>>>>>.r5
   return 0;
}

The <<<<<<<, =======, and >>>>>>> lines are used to indicate which of your changes conflicted. In this case, it means that your version of the section of hello.c that you changed looks like printf ("Hello World \n");, but in a newer version of the file that has already been checked into the repository, that line was changed to printf ("hello world!\n");.

Of course, all of this only works if the file in question is in a format that Subversion understands well enough that it can merge the changes automatically. At the moment, that means the file must be textual in nature. Changes to binary files such as image files, sound files, Word documents, and so forth can't be merged automatically. Any conflicts with such files will have to be handled manually by the user. To assist in that merging, Subversion provides you with copies of the original version of the file you checked out, your modified version, and the new version from the repository, so you can compare them using some other tool.

Note: Historically, most version control systems were designed to handle plain-text content, for example, a computer program's source code. As a result, they developed formats for storing historical data that were designed with plain text in mind. For example, RCS files work in terms of a textual file, adding or removing lines from the file in each new revision. For a binary file, which doesn't have "lines" at all, this breaks down, so systems based on these formats usually end up dealing with binary data by storing each revision separately, meaning that each time you make a change you use up space in the repository equal to the size of the file you modified. In addition, these systems often include other features, such as keyword replacement or end-of-line conversion, which not only don't make sense in terms of binary files, but also can actually damage them, because a binary file format probably won't survive intact if you replace all instances of $Id$ with a new string, or all the newline bytes with carriage return/linefeed combinations.

In addition to helping you handle the situation in which a conflict does occur, the use of a nonlocking model helps in another way: It removes the false sense of security that a locking model gives you. In the majority of cases, when you make a change to one part of a program, the effect of that change isn't isolated to just that file. For example, if you're changing a header file in a C program, you're really affecting all the files that include that header. Locking access to that one file doesn't buy much safety, because your changes can still quite easily conflict with any number of potential changes in other parts of the program. Locking gives you the illusion that it's safe to make changes, but in reality you need the same amount of communication among developers that you'd need in the non-locking mode. Locking just makes it easier to forget that.

Now, none of this is meant to imply that the only possible solution to the version control problem is a nonlocking system. There are certainly situations in which locking is a valuable tool, perhaps with files that truly shouldn't be modified except by certain key individuals, or perhaps when you're working with binary files that can't be easily merged. The Subversion developers have recognized the need for a solution to this problem, so in the future the problem will be addressed. For some clues as to what the locking system might be like, you can take a look at the locking-plan.txt file in the notes directory of the Subversion distribution. Unfortunately, the user interface and technical issues of such a feature are complex enough that the feature has been deferred until after Subversion 1.0 is released.





Page 1 of 5



Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Sitemap | Contact Us

Rocket Fuel