Pijul

Where can I learn more about "partial checkouts"?


#1

In “The road ahead” there is this brief feature description:

Partial repository checkouts. This is one the coolest features of Pijul, which will hopefully allow it to scale to much bigger repositories than others.

This would be a pretty huge deal for monorepos. There’s lots of custom tooling to ease the management of Git monorepos, e.g. https://lernajs.io/, but certain parts of its architecture is hard to get around.

Has there been any writings anywhere about how Pijul’s partial checkouts would work?


#2

I guess it is something like

svn checkout http://example.com/svn/repo/trunk/subdirectory

You can checkout any subdirectory of the project, keep the history, and still can pull from the project (svn update).


#3

No, but I can explain more here. There are two levels of implementation:

  1. Because Pijul patches commute™, and include a globally unique identifier of the files they apply to, it is fairly easy to just pull the patches that apply to a subset of the repository. This allows one to work on that subset and make patches against it. Now, remember that branches/repositories behave as sets (actual mathematical sets) of patches, ordered only by the dependencies, explicitly mentioned in the patches themselves. This means that when the author pushes to a central monorepo, that monorepo will have the dependencies, which is the only condition required to merge patches. Hence, the monorepo will just compute the union of its sets of patches with the new patches.

    This level is not super hard to implement, but has some tricky bits: what if, for instance, we want to get just one directory, but the patches required to build that directory also build other parts? One solution to that problem would be to have a list of paths to output, apply these “wider” patches anyway, and output just the part we’re interested in.

    Another solution could be to split the patches before applying them, but then we’d have to maintain a list of partially applied patches, which could be quite messy, and require extra datastructures which could end up costing more in disk space.

  2. Another implementation level is to help users write patches that apply to just one subset of the repository. I don’t know if others agree, but this is what I’d like to do with nested repositories: when you create a nested repository, it creates an empty .pijul, which just means that pijul record from inside the nested repository will record a patch only in the nested repository path. Obviously, this default behaviour could be overridden with some command-line option.