Skip to content
This repository has been archived by the owner on Feb 23, 2018. It is now read-only.

Github Move

paulp edited this page Jun 12, 2011 · 7 revisions

Intro

We intend to move the canonical scala repository from svn / epfl.ch to git / github.com.

Background info for understanding preconditions:

The scala build process requires a bootstrap compiler, called "starr" for "stable reference Iforrrrrget", to get the ball rolling. When certain fundamental compiler features change, a starr which is aware of the change must be present or the build cannot be bootstrapped. So over the years many starrs have been checked into the repository. The "starr files", at present, are:

  • lib/scala-library.jar
  • lib/scala-compiler.jar
  • lib/scala-library-src.jar

With each new starr, the total size of the repository jumps by approximately the size of those jars, currently 20MB. In a git mirror which preserves all the svn history:

% git log --format=oneline  -- lib/scala-library.jar |wc -l
       140
% du -hs scala-full
    1.0G	scala-full

Without historical starrs the repository will be a fraction of that. Since rewriting the git history renders all clones incompatible, we will only do it once. (Actually it has already been done once -- github.com/paulp/scala-full is the complete one, github.com/scala/scala purges many starrs but has reaccumulated a dozen or so.) But we will only do it once more, because we don't do it until we can make it unnecessary to do it again, as described herein.

Automatic Starr Downloading

Every revision of scala which acts as starr in the history (i.e. the 140 revisions indicated above) needs to be published as versioned maven/ivy artifacts so they can be expressed as standard dependencies. They can be named or versioned to remain distinct from the regular scala-library and scala-compiler, and probably should be: they will only be used to bootstrap compiler builds.

In order for people to be able to check out older versions of scala and build them, we need to travel back in time to those revisions and inject on-demand starr downloading into the build process. The exact mechanism yet to be decided, but it should be as simple as possible and require the minimum possible changes to existing files.

Proposal: store a metadata file which is changed whenver a new starr was introduced.

# starr.properties
repo=http://www.typesafe.com/scala-starr/
starr=r24749

Create an xml file at the beginning of time with the following. (Wherever you see "library" think "library, compiler, and optionally src")

  // starr-build.xml
  // pretend it's xml
  if (lib/scala-library.jar exists and has the right version) proceed
  else { 
    if (lib/starr/scala-library-${starr}.jar does not exist) { download it }
    rm lib/scala-library.jar ;
    ln -s lib/starr/scala-library-${starr}.jar lib/scala-library.jar
    proceed
  }

(lib/starr is an svn-and-git-ignored local cache of starrs. This could use the ivy or maven cache if it matters.)

Add a line to scala's build.xm which runs starr-build.xml before anything else.

** To Be Determined **

The starr jars one finds in svn are binary blobs. They might have been built by anyone and contain anything. Is it more important to be faithful to the history (i.e. we could put those exact blobs up as maven artifacts) or to be consistent even if it risks changing the outcome of a build? In the latter case we would determine what the version of starr ought to be at that point in time and rebuild all starrs in exactly the same way.

I vote for the second, but I can understand preferring the first.

The Rewrite

Once we have the above in hand, we can undertake the rewriting.

  1. start with a clean git-svn mirror containing all history and all files
  2. Using git filter-branch or equivalent, inject starr-build.xml and starr.properties build.xml at the beginning of time.
  3. Then rewrite every commit which touches "starr files" not to check in binaries.
  4. Instead, those commits update starr.properties with that version.
  5. Jump through the git hoops to really and truly purge the binaries
  6. Push it to scala/scala and flip the switch.

As soon as we have the compiler being built with sbt 0.10, we can express starr as a simple dependency of the locker project. But history is important and a git repo which is 20x bigger than necessary is unacceptable, so we need to integrate it into ant even if we moved to sbt today.

What did I miss?

Clone this wiki locally