Programmers' Canvas Toolkit Home

Programmers' Canvas

Stay in Sync

Your ad here

07 May 2003 12:46

Build early and often

Work in private sandboxes

Stay in sync

Prefer merging over locking

CVS vs SourceSafe


Programmers' Canvas Toolkit

Java Development with Ant
Open Source Development with CVSDesign PatternsThe Mythical Man-MonthDynamics of Software Development

Programmers' Canvas

A pattern language for software development

Introduction

Source code should be respected - after all, code is the fruits of a development team's labor. Some teams manage their code well, and some need improvement. What's the secret?

A Pattern Language for Software Development

Software development can be viewed as a manufacturing process. The materials include 'gray matter' from various knowledge workers, who contribute ideas, insight, and know-how. The machinery includes development environments, email, compilers, source-code control systems, and other tools. The end-products are the files that are stored on disk.

Programmers' Canvas focuses on source code management, which is just one aspect of developing software. It is loosely written in pattern form. The pattern's name refers to a team of artists working on a mural. Programmers' Canvas is not specific to any company, product, programming language, or tool set. The pattern is primarily based on the activities of programming teams at two software companies. The companies operate in different software markets and their products range from $100 consumer products to $100,000 enterprise applications.

The Internet has vastly accelerated the speed at which software is written and deployed.  The Internet was not much of a factor when this pattern was first written down.  But the fundamentals remain the same.  Web-development is still file-based, just like Cobol applications.  Therefore even web developers can learn something from this pattern.  Indeed, the break-neck speed of web development makes Programmers' Canvas even more relevant.  Mistakes are more likely to occur when developers are under intense time constraints.

This pattern was written by a programmer, for programmers who are often too busy to think or read about this seemingly mundane topic. But if source code management was truly uninteresting, I wouldn't have bothered to write this document. This pattern is not a masterpiece, so the author apologizes for imperfect grammar, the lack of detailed examples, and brash assumptions made about the reader and software developers in general.

Benefits

Programmers' Canvas describes how multiple programmers can work on the same source code without interfering with one another. The pattern also recommends ways to optimize the use of programmers' time.

All successful software projects use at least part of the pattern. Implementing the pattern in full realizes the following benefits, which are covered in more detail in the following section:

  • Improved productivity - Repetition can be automated via scripts. Almost every aspect of this pattern can be scripted. Programmers spend more time programming by spending less time (a) merging by hand and (b) waiting for each other to release file locks.
  • Improved quality - Since developers stay up to date with the code line, even when they're not ready to check in, they find conflicts with other programmers earlier. Also, the pattern reduces the frequency of 'code stomp' incidents.

Motivations

Programmers' Canvas has two motivations:

  1. Programmers need to share a common code base. The code needs to be available for browsing, testing, and modification.
  2. Programmers need to retrieve a snapshot, at any point in time, of source code that compiles and runs. Since check-ins occur randomly, the most current code base is often unstable.

Forces

All software projects try to balance conflicting forces. Successful projects manage them well. The following forces are relevant to Programmers' Canvas:

  1. Time to market. Missing market windows often means the end of a project. Focus more on results and less on the process.
  2. Programmers are expensive. Make effective use of their time and put up no unnecessary barriers to productivity.
  3. Quality. The product must function with only tolerable bugs (hopefully, bugs that customers rarely see).

Related Patterns

This pattern language is far from the last word on configuration management. It is based only on my own experiences. There are many works on configuration management available on the Internet.

Products

There are many development-related products on the market.  Unfortunately, most of them are unpleasant to use and unjustifiably expensive.  Here is a list of products that the author uses.

DevGuy's Programmers' Canvas Toolkit DevGuy's free Open Source Programmers' Canvas Toolkit contains everything you need to create your own development environment, from setting up CVS source control to running nightly builds
CVS

CVS, the Concurrent Versions System, is a free, Open Source, full-featured, rock-solid source code management system.  CVS supports the concurrent model of development that is recommended by Programmers' Canvas better than any other source-code management system.

TortoiseCVS TortoiseCVS is a free Open Source Windows Explorer shell extension that provides most of CVS's function set that's easy to use
WinCVS

WinCVS (aka CVSGUI) is a free Open Source graphical user interface for CVS on several platforms

CVSWeb
cvsWeb
cvsWeb is a free Open Source tool which allows developers to browse CVS repositories via a web interface
ViewCVS ViewCVS provides similar functionality to cvsWeb, with more features, and it's also free and Open Source
WinMerge WinMerge is a free Open Source visual text file differencing and merging tool for Win32 platforms
CVS Conflict Editor

CVS Conflict Editor is a free Open Source Win32 application that helps developers resolve merge conflicts from CVS and diff3

Apache Ant Apache Ant is a free Open Source Java-based build tool. In theory, it is kind of like Make, but without Make's wrinkles.
NAnt is a free Open Source tool like Apache Ant but implemented on the .NET platform (which runs on Linux via Mono)

Terminology

This paper uses the words 'programmer' and 'developer' interchangeably. They are considered synonyms.

Patterns

Programmers' Canvas consists of four patterns:

Build early and often
Work in private sandboxes
Stay in sync
Prefer merging over locking

Build Early and Often

'Building' means compiling source code into a runnable form. 'Building early' encompasses many steps, and this paper only accounts for a few of them. Install tools - compilers, editors, etc.. Install a source-code control system (CVS, VCS, PVCS, SourceSafe, RCS, SCCS, etc.) and make sure everyone knows how to use it. Create a sample application, check it in, and build it. Make sure everyone on the project knows how to build the sample. Place these programs on the network (along with a checklist - HTML is great for this) so that new team members can set up their workstations and build the product with minimum assistance.

Build the product often. The more frequently the better. The build frequency depends on how often programmers check in their files. It is worth the effort to write scripts to automate the builds, if there are no such tools available. Building all of a company's projects does not scale well. As the project matures, build times will increase beyond tolerable limits. Think ahead about dividing the projects across functional boundaries. Independent sub-projects can be compiled on separate CPUs or machines.

Building the code line ensures that everything still compiles and links. The code doesn't atrophy -- broken source code is not tolerated. The frequent and predictable builds represent the "heartbeat" of the project.1 When the build breaks, the heartbeat stops. Build problems must be addressed and resolved immediately, before programmers can check in new code. Create a 'buildmaster' position whose responsibilities include checking on the build and contacting developers when something breaks. Some groups rotate the buildmaster role, allowing every programmer to get a feel for the process.

Create a set of minimum acceptance tests that measure the new build's stability. Some projects even require that acceptance tests pass before code can be checked in. Acceptance tests evolve as the project deadline approaches. The most successful acceptance tests are automated, because they are easy to run (thus encouraging use) and catch problems that are easily overlooked. Automated tests can be implemented as console applications (white-box testing) or scripts that manipulate objects in the user interface directly (black-box testing). It is easier to write console-based test programs than GUI scripts, because the user interface typically changes frequently while APIs stay relatively static. GUI scripts are, in general, difficult to write even when the user interface is static because failure cases are not always easy to detect. Here, architecture plays a key role. If the product is written using scriptable objects (e.g., COM or CORBA-based development), then tests can be written in easy-to-use scripting languages. This allows expensive programmers to write reusable core components in complex languages such as C++ or Java and test them simpler languages such as Perl or VB Script. This can be an effective combination since tests scripts are often thrown away or rewritten, while core components have a much longer life span.

In addition to stability, acceptance tests can measure other aspects of the source code. Acceptance often includes performance benchmarks and regression tests (tests that make sure known bugs stay fixed). Conflicting forces act upon acceptance tests. The tests need to be thorough, while developers want them to be fast so they can get their fingers on the latest build. Therefore, implement several test suites - ones that look at the product 'at a glance', as well as more thorough (and time consuming) tests.

Returning to the subject of building the product, it is recommended that developers adhere to a check-in deadline. The team must agree to a deadline when changes are submitted for the next build. Programmers will try to check in after the deadline -- it will happen, so plan for it. Some source-code control systems provide a programmatic interface to restrict check-ins -- scripts can be written to take advantage of these features. Starting the build 30 minutes after the deadline is a good idea, but don't broadcast this information to a large audience.

Before gathering the source code files to perform a build, a 'label' is placed on all of the files. Most source code control systems support labels. For example, the labels could be named using the date of the build; 'Nov0296.' Labels accumulate and are never replaced. The labels capture the source code's history. In order to track down bugs, and for other purposes, older versions of the product can downloaded and then built.

Work in Private Sandboxes

Programmers modify and test code in private 'sandboxes.' When a new sandbox is created, it is filled with files from a snapshot. Sandboxes should, in general, never get the latest code, because it might not compile. Or, it might compile but it might have nasty bugs. The sandbox ensures that programmers are insulated from each other's changes. Code in a sandbox should be checked in only when it compiles -- otherwise, the build will fail, alarming the buildmaster.

Sandboxes are susceptible to several tricky problems. For example, they can get out of date -- this problem is covered by the next section.

Also, how can developers collaborate on new code, without affecting the integrity of the builds? Developers need a 'shared sandbox.' Shared sandboxes can be implemented by what is called a branch. Most source code control systems support branching. For example, say you want to branch version 3 of file 'string.cpp.' When string.cpp is changed and checked in, the file will be changed on the branch, but the changes will not be visible to other programmers. Source code control systems can automatically merge the branch back into the main code line. Insist that long-lived branches abide by the build early and often rule.

Stay In Sync

Programmers stay relatively up to date with the current code base. This ensures that the team stays in sync, taking advantage of new features and bug fixes as soon as they are checked in. This also helps programmers find merge conflicts earlier in the process.

Source code control systems are often a bottleneck when many programmers want to update their sandboxes at the same time. Copy recent snapshots somewhere on the network. Place binary files from successful builds on the network so that programmers don't have to spend their time compiling and linking. Create scripts to perform these steps.

Copying binary files accounts for most of the time taken to update a sandbox. Eventually, the delay will exceed tolerable limits. Programmers don't need all of the binaries -- in many cases, executables and dynamic link libraries will suffice. Programmers should be able to 'subscribe' to binaries in different ways. Write tools that support this capability.

Prefer Merging over Locking

This is the most controversial aspect of Programmers' Canvas. I considered omitting this topic, but merging is very important. Merging is a required skill for all programmers. However, automated merging is on par with sorcery - most programmers, logic-minded as they are, think it doesn't work. If you're thinking, 'Sorcery!' at this moment, I humbly request your time and your patience. I have not yet met a programmer who, after trying it a few times, did not make automated merging part of their routine.

Programming teams face similar issues that multi-threaded programs encounter. Multithreaded applications use locks, such as semaphores and mutexes, to avoid compromising the integrity of shared resources. However, these locks diminish the effectiveness of parallelism, since they cause one thread to wait for another. Similarly, a file lock prevents two programmers from modifying the same file at the same time. A file lock is a programmer-efficiency bottleneck waiting to happen.

If two developers need to modify unrelated parts of a file, it is unnecessary for one programmer to wait for the other.  Real contention occurs when two programmers change the same part of a file at (roughly) the same time. But this is generally a rare occurrence.  Contention rates depend on the programming language, the size of the project, the layout of the source code, and the overall software architecture. Almost all files are at risk of contention, but the probability is usually low for any given file at any particular time.  Files which interface between components (in C++, these are called header files) tend to have high contention rates.

Contention can even occur in scenarios that involve only one developer.  Developers often have multiple sandboxes when they work on independent changes.  A developer that has multiple sandboxes can stomp on his own changes when the same file is modified in each sandbox (the author is guilty of this offense). If a project consists of independent software components, then individual files tend to be modified by only one or two programmers. But once two programmers are responsible for the same file(s), then contention is likely. Software projects could try to impose a 'one file, one owner' policy, but programmers will make exceptions in the name of expediency.

When a developer locks a file to change it, sometimes the change takes longer than expected. An hour becomes a day.  A day becomes two.  Meanwhile, another developer needs to change the same file.  But she cannot, because the file is locked.  The lock can be broken, but what happens to the developer who holds the lock?  He has to merge his changes back in by hand, and that's very tedious.  The result is that both developers lose significant amounts of time.  Also, notice that even in locking scenarios, merging is not always avoidable.  Merging is a fact of life and developers need to get used to it.

The traditional method of modifying a file is 'lock-change-check in.' But some developers use a different method. Instead of locking a file before changing it, lock the file after modifying it. This method is summarized as 'change-lock-merge-check in.' You may have noticed that there is a new step called 'merge.' You may be wondering how this could be better, if you have to do more work. The answer is that merging can be performed by the computer.

Automated merging programs use a technique called three-file merge (this link provides some good examples). The freely-available diff3 program and related tools can be found on the web by searching for diff3. Some source code control systems can perform a three-file merge. Of course, automatic merging is not foolproof. There are times when the merge cannot be automated. For example, what is the best course of action when one programmer deletes a function, but another programmer modifies the same function? In this case, the automated merge tool would notify the programmer of a conflicting change. Merge conflicts must be resolved by hand. In many cases, however, conflicts do not arise. Isn't that better than the traditional method in which the programmer always merges by hand?

Binary files can not be merged. These files must be modified the usual way, by checking the file out, modifying it, and then checking it back in. Some products such as Delphi and PowerBuilder are not text-based but instead place source code into a few binary files. These languages are not merge-friendly, but some products (possibly including recent versions of Delphi and PowerBuilder) have built-in check-out, check-in, and merge facilities to support team development.

Merge situations need to be detected. This can be done by keeping track of the previous version of a file. One common solution is to place a version number in each file -- most source control systems support this feature. Another successful route is copying the file to a safe location - perhaps somewhere in the sandbox. Both approaches have benefits and drawbacks. The revision string approach works only with text files, and even some text files cannot contain embedded comments. Copying the previous file consumes more disk space and requires bookkeeping, but tools can perform the bookkeeping automatically. The author has tried both and prefers the file copy method, because it allows all file types to be handled in a uniform fashion. In addition, having a local copy of the file allows changes to be identified and 'undone' without logging into the source-code control system. Finally, having a previous copy of the file is essential for combating the 'sandbox contamination' problem that will be described later.

Without automated merging, it is very easy for programmers to 'code stomp' - in other words, to inadvertently lose another programmer's changes. I have worked on several teams that did not merge, because they thought they were saving time and playing it safe. But they were doing neither -- instead, precious time was spent resolving code stomps. Even worse, code stomps went unnoticed for days and weeks at end -- it's possible that some changes even slipped into oblivion. When automated merging was introduced, the number of 'code stomp' incidents went down considerably.

Programmers' Canvas does not forbid locking files. Files must be locked before checking them in. Binary and other file types that can not be merged need to be locked before they are changed.

Sandbox Contamination

The difference between merging and traditional locking is that on the average, files stay locked for a much shorter period of time when merging is employed. Merging also helps avoid a common problem called 'sandbox contamination.'

Sandbox contamination occurs when unblessed code is copied into a sandbox.

The latest code has not been blessed by the buildmaster. It may need files that don't yet exist in the sandbox (maybe they haven't been checked in yet). The code may not compile, or (worse) might build fine but may not run. Sandbox contamination occurs when merging code, but the problem also happens when employing the more traditional 'check-out, change, check-in' technique.

Merging reduces the extent of the contamination problem. In other words, when developers make merging part of their routine, their sandboxes will be infected less frequently. The secret is to merge with the latest snapshot before checking in. Synchronizing may involve merging with files in the snap, but there aren't any contamination problems since the files were blessed by the buildmaster. This is another reason to stay in sync, as described earlier. However, it is impossible to completely avoid merging with the latest code, as this happens when two or more programmers change the same file before the next snap is available.

The scope of the contamination problem can be reduced by synchronizing with the last snapshot before checking in.

The solution to sandbox contamination is simple. Copy the file in its pre-merged form back to the sandbox. The latest code is not introduced into the sandbox. However, there is a problem with this solution:

If a previously merged file is changed before the next snapshot is available, three-file merge will fail.

The three-file merge program will create a file with the wrong contents. The file in the sandbox and the checked-in file do not share a common ancestor. The checked-in file contains the 'contamination' (in other words, the latest code), and the sandbox does not. The three-file merge program will dutifully delete the 'contamination', causing a classic code stomp. There are two solutions to this problem. If the file has not been changed by another programmer since the previous check-in, then the developer can 'rollback' her previous changes and run the three-file merge (the steps that follow do not include this solution, because it is difficult to automate). However, if another developer has changed the file in the interim, then the changes must be merged in by hand.

If you want to avoid sandbox contamination, get accustomed to merging.

The steps required to safely change the (hypothetical) file 'A' follow. Although the flowchart seems daunting, it is fairly complete. Tools can hide most of the complexity from developers. The other 'solution' to sandbox contamination (the most common solution) is to ignore it, and hope for the best.

  1. Get the latest snapshot
  2. Assuming you want to change 'A'…

  3. Save a copy of 'A' to a safe location
  4. Change 'A' and test - this could take several days or weeks
  5. If a new snapshot is available,
  6. a) Get the latest snapshot
    b) If the file saved in step 2 differs from 'A' as it appears in the latest snapshot, then merge your changes into 'A' as it appears in the latest snapshot - store the resulting file in your sandbox
    c) Copy the version of 'A' as it appears in the latest snapshot to a safe location (overwriting the file in step 2)
    d) Go to step 3

    Assuming you are ready to check in 'A'…

  7. Check out 'A'
  8. If the latest version of 'A' differs from 'A' saved in step 2 (or 4c), then you need to merge; otherwise, proceed to the next step
  9. a) Save your version of 'A' to a safe location
    b) Merge changes into latest version of 'A'

  10. Check in 'A'
  11. If you did not merge in step 6 then…
  12. a) Copy checked-in version of 'A' back to your sandbox
    b) Go to step 2

  13. If you merged in step 6 then ...
  14. a) Copy saved version of 'A' (step 6a) back to your sandbox
    b) Copy saved version of 'A' (step 6a) to a safe location, overwriting file saved in step 2 (or 4c)

    Assuming you want to change 'A'…

  15. Modify 'A' and test
  16. If a new snapshot is available, go to step 20.
  17. Assuming you are ready to check in 'A'…

  18. Save a copy of 'A' to a safe location
  19. Check out 'A'
  20. Use a tool to view the differences between your 'A' and the 'A' saved in step 9b (or 18); e.g., 'diff' in UNIX or 'windiff' in Microsoft Windows
  21. Merge the changes by hand into the latest version of 'A'
  22. Check in 'A'
  23. Copy saved version of 'A' (step 12) back to your sandbox
  24. Copy saved version of 'A' (step 12) to a safe location, overwriting file saved in step 9b (or 18)
  25. Go to step 10
  26. 'A' is being modified, and a new snapshot is available:

  27. Use a tool to view the differences between your 'A' and the 'A' saved in step 9b (or 18); e.g., 'diff' in UNIX or 'windiff' in Microsoft Windows
  28. Merge the changes by hand into the version of 'A' that appears in the snapshot - the resulting file should go into your sandbox
  29. Copy the version of 'A' as it appears in the latest snapshot to a safe location (overwriting the file in step 9b or 18)
  30. Go to step 3

Code branches, described earlier, can cause automated merging to falter.  Often, programmer want to add an identical change to two or more branches of code.  From the human user's point of view, the operation is straight-forward.  However, the automated merge will be successful only if edits are made to a file that is checked into both branches.  Otherwise, merging must be performed by hand.

versions.gif (9321 bytes)

Note that V must be an ancestor of V1 and V2 in order for the automated merge to be successful.  Otherwise, merging must be performed by hand.  If a 3-file merge was attempted, where

Vo = Original file
V1 = Change 1
V2 = Change 2

the changes between Vo and V would be added to V2's branch.  Presumably this is undesirable since the changes weren't checked into V2's branch.

This is an example of the limitation of 3-file merge programs.  More sophisticated methods are available, although they are not foolproof either.  Robert Sartin (sartin@tivoli.com) writes:

"I wrote a review of Aide-de-Camp years ago.  The tool is now sold by TRUE Software, as ADC/Pro.  It recorded all changes as "change sets" and you could create a file that contained exactly the change sets you requested.  The theoretical view was you could merge arbitrary changes made at distant points on divergent branches of development.   During the week I had to test the software, we never managed to break it.  We did testing using the then current version of X11 and Motif with our change sets including some heavily customized local versions of each.  More recently Continuus (Continuus/CM), PureAtria (ClearGuide), and Platinum (CCC/Harvest) have added similar features to their products.  The late-comers tend to do it through what Ovum now calls "change packages" which are really just collections of particular revisions of particular files.

SCCS can also include (-i, or exclude -x) individual deltas in a get.  This gives much of the ability to merge in changes independent of the revision graph.  I've found it useful, especially when combination with the "-m" option which includes the SID that introduced a line.  That makes it easier to handle screw ups.

I have a concern about the computer blindly getting it wrong.

The latest Workshop on Software Configuration Management (SCM-7 last May in Boston) had some relevant presentations:

  • "Change Sets Versus Change Packages: Comparing Implementations of Change-Based SCM", D.W. Weber, Continuus.  Not surprisingly (giving the vendor affiliation) it favors change packages and points out the risks of change sets.
  • "Towards a Uniform Version Model for Software Configuration Management", Reidar Conradi and Bernhard Westfechtel, both with European universities.  This paper presents uniform terminology that covers the different styles of version control systems and a taxonomy of some current systems.

"

CVS vs. Visual SourceSafe, etc.

Most source code control systems are biased towards file locking.  CVS, the Concurrent Versions System, does not require files to be locked -- in fact it discourages file locking.  CVS provides automated 3-file merging.  There are actually two ways to lock files in CVS (admin -l and edit -c) but you shouldn't use them unless you have to.

It is possible to use other products without locking files.  I have used SourceSafe in this fashion, changing file attributes from read-only to writable, turning off DevStudio's Visual SourceSafe integration, manually detecting the need to merge via scripts, and utilizing diff3.  I built tools that made this a comfortable process.  However, this type of usage goes against the grain and you're likely to spend more time fighting your fellow developers instead of doing actual development work.

The latest version of Visual SourceSafe now allows "shared" checkouts.  SourceSafe can even merge files automatically.  I hope that this means more developers will abandon the use of exclusive checkouts.

I won't switch back to SourceSafe, because I despise having read-only files in my sandbox.  Read-only attributes just get in the way and add zero value (but if you really want files to be read-only, CVS and WinCVS can do it).  CVS is able to detect whether you have changed a particular file without using file attributes, and you can ask it which files have been changed at any time.  CVS has some other nice features, such as check-in email notifications, watch lists, and Internet support.  And VSS can't beat CVS's price (free).  I have experienced first-hand SourceSafe's propensity to corrupt files and have spoken to other SourceSafe users about the same, yet I have never seen CVS corrupt a file after two years of use.  Plus I think SourceSafe's branch implementation is unintuitive and just plain wrong.  

Conclusion

There are few silver bullets in the software industry. Programmers' Canvas is not one of them. This pattern language will not make your code write itself or run faster. However, by implementing at least some of these patterns, real benefits can be realized in terms of higher programmer efficiency. Most developers have an uneasy time transitioning from traditional file-locking to automated merging, so one-on-one assistance and encouragement from team leaders is often required.  Some aspects of Programmers' Canvas can be found in commercial products, but it's likely that some custom tools will need to be built to fit the philosophies of a particular development group.

Related Links

Branching Patterns


1 The Mythical Man Month Essays on Software Engineering Anniversary Edition, Brooks, Frederick P., Jr., 1995, Addison-Wesley Publishing Company, Inc., Menlo Park, California, 270.

(c) 1996-2003 devguy.com