While trying to clean the Nest with my new Nest cleaner (not yet ready, still pretty inefficient), I had the opportunity to measure disk space usage “in production”, i.e. for a real repository.
So, the pijul repository has 25 branches (there is one branch per user branch, and an extra one when the branch has pending patches), and uses 39Mb, which is a lot.
The current implementation of branches is (roughly) a table of type
Db<String, (Db<PatchId, ()>, Graph)>. Moreover, forks happen in O(log n) time (where n is the number of Sanakirja pages referenced at least twice), but edits to one of the branches start duplicating it.
My idea to solve this problem is to have just one graph containing all the edges, and branches would then be just sets of patches.
Only the patchsets would get forked (in the Sanakirja sense), and they are much smaller (and further compressible by
This would additionally allow to apply or unapply a patch from another branch in time independent from the patch size, and would make time travel as fast as a standard “revert”.
The tradeoff is that obsolete edges (which includes old versions of deleted edges, but not only) would have to stay in the database.
Also, a conversion tool would be really easy to write.
What do you think?