[caiman-discuss] Adding the ability to restart DC from a checkpoint

Jean McCormack Jean.McCormack at Sun.COM
Fri Jan 25 09:12:04 PST 2008


In the DC meeting yesterday we discussed the future user experience for 
the Distro Constructor. The first thing I'm
looking at is the ability to restart the DC build at different 
checkpoints or steps in the process.

There were 3 ways of specifying the restart that were considered
1) The user would edit the manifest file to specify they wanted to start 
the build at a certain point
2) a command line option
3) Making the command have an interactive option

After consulting with Frank Ludolph #2 (command line option) was decided 
upon.
His suggestion was this:
dist_const -resume [step]

dist_const -resume would resume the build from the failed step in the 
previous build
dist_const -resume step would resume the build from the step specified.

Some technical thoughts behind this new option:

- In order to keep the build from having issues because the user changes 
the manifest between the two
  runs, we would not have them specify a new manifest file.
- The build does need to have the manifest information somehow, so my 
thought was that during a build
   we would copy the current manifest file to .step<step number>. As the 
step completes successfully this
   file would be deleted. It would then serve as a marker for the 
-resume case as to where to restart and
   would contain all the information for the restarted build.
- dist_const -resume step would check that the step specified is <= the 
failed step. Restarting at step+n is not
   allowed
- We could do some checking to make sure that the user hasn't modified 
.step<number> which has the potential
  to cause havoc in the build. Depending upon where you were in the 
build process, some modifications would be OK, others not.
  I'm not sure the extra complication is worth it. How do others feel 
about this?
- The messaging coming from the DC would be worded such that the user 
would know what step failed in the process.
   That's the next step in this work.
- the .step<number> files would be cleaned up at the start of every 
build and the end of every successful build.
- dist_const -resume doesn't make sense after a complete successful 
build but dist_const -resume step does. If the user
  has a build that completes successfully but doesn't work, they could 
rerun the build from any step they think is appropriate.

Any comments?

Jean






More information about the caiman-discuss mailing list