Patching patches


#1

Here is a feature that someone suggested to me today: patches that operate on other patches rather than on the direct text.

One potential implementation (but I have other ideas as well) could be to represent some patches as plain text files, added to the repository as normal files (for example in a patches directory at the top level), and patchable themselves. When a patch is patched, a cascade of unrecords + application of the new version would follow (the cascade is needed because this would obviously open the door to patches of patches of … of patches).

What do yous think?


#2

Intriguing. But are there real life use cases strong enough to justify the added complexity?

I was thinking: in github you can review commits and ask for changes; with pijul and this feature those changes would be made to the patch itself instead than as a new patch. Cleaner history when finally merged, but is it necessary in a patch-based system? One could just undo the patch, modify and record again.


#3

This comes from an actual real life use case (not for code though).


#4

I am sorry, but I don’t get the feature. The idea is that, when you patch a patch, this change the history? Like, sharable git commit --amend?


#5

Yes, I think so. I don’t know how big the difference would be between this, and the current implementation of this feature (which uses dependencies).

The use case is to keep two levels (patches to patches to the text) in the history, instead of one (patches to the text).


#6

To be honest, I don’t really see the point; and I fear it would make pijul more complex or confusing than necessary. But it may be because I can’t really get my head around this proposal.


#7

If you gonna implement it, please consider not patches of patches, but patches of all diffs graph and maybe, if storing code as AST is implemented, the graph of symbols in asts connected to each other with temporalmlinks. And why do we need metapatches? What is the use case?


#8

It’s an industrial use case, where people want to see their patches and patch them. I can’t say more for now, sorry, but I’ll hopefully announce it formally soon.


#9

Now that I’ve actually used pijul, receiving feedback from you and @lthms about the tags command, I’m starting to see the usefulness of patching patches.

How I worked this time, when I received your feedback I’d just undo my patch, work on what needed changing, then recorded from scratch. Neat history, but the process to go there is lost. Even if we end up including discussions in the repo itself, the actual code reviewed is gone.

The other route would have been piling up patches. But this dirties history, unless we manually manage semantic dependencies (which is already possible in pijul, but honestly seems quite a bother; also, semantic deps are currently undistinguishable from automatic ones).

By patching patches, one could store both the reviews AND the reviewed patches in the history of the repo, maybe making them foldable when viewing the log. Which is so cool!

So, about the implementation, I don’t think we need to do anything too strange. Aren’t they just a subcase of semantic dependencies? the one in which the patches conflict so they must be applied in sequential order, solving conflicts automatically in favour of the younger one. Maybe this can be the occasion for deeply restructuring show-dependencies. Semantic dependencies can either be: non-conflicting, basically a spontaneous branch; conflicting, basically patched patches (fixing a typo 2 years later, or similar things, belongs to this category too). Non-semantic dependencies are conflicting patches created “by chance” while working on different goals; their current implemention should work for patched patches too, the only difference is that in the latter, we tell pijul they are semantic. Though actually there is a subtle but important difference: semantic conflicts are applied to the initial patch, which remains the master; non-semantic conflicts overshadow the initial patch, becoming the new master.

In short, it seems it’s just a matter of changing the metadata field of dependencies to one which links to the related patches and explains their relationships (semantic non-conflicting, semantic conflicting, non-semantic conflicting)


#10

Even though I am currently using primarily git for my development, I also have independently discovered an interesting use case for patches for patches, so I thought it might be worth sharing.

I am developing SIT (Serverless Information Tracker, http://sit.sh). SIT organizes all updates in an event sourcing way (each record is just a record of something that happened, to be processed by a reducer).

SIT’s first application is in issue tracking. One of its features is called “merge requests” – similar to pull requests on GitHub but self-contained, as in patches being included into those records. For trivial cases, this is perfectly fine, however, since there’s no way to “push-force” a new patch into the issue, only add a new version of it, in case of longer patches being discussed and worked on, I’d need to include that long patch every time I update it. As a potential solution for this problem, I’ve considered allowing recording a patch to a patch instead, potentially significantly reducing the need for duplicating the bulk of the patch being worked on.

Just my 2c!


#11

Once Upon A Time ™, there was a revision control system called CM-Lite (“configuration management lite”, developed and used internally by Transarc during their development of the Andrew File System and other projects). Its usage model was patch-centric (despite its centralized, RCS-based nature!) but layered in exactly the “patching patches” kind of way. That is, the state of a checkout was determined by a set of (human-chosen) topics and a version for each topic. Together, a topic and a corresponding version could be mapped to an actual series of patch hunks; naturally, a checkout was just the effect of unioning these hunks, essentially. IIRC, commits to a branch could add a new topic (and initial patch and version), add a new version to an existing topic, and/or change the branch’s current version associated with a topic. A generic pull operation would bring in new topics at the remote’s chosen version, advance shared topics when they could be “fast-forwarded”, indicate conflicts of diverged versions within a topic, and identify conflicting topics (which could not be applied given the rest of the state of the system). I think there was some temporal ordering to topics, so it was less “big bag of patches” than darcs or pijul, but I don’t see that as essential to the design.

In any case, the result, in practice, was very cool: topics were loosely akin to “feature branches”: they tended to be used for “aspects” of the program under development. Unlike branches, tho’, they were long-lived objects, subject to revision over timelines measured in years: errors in the original work could be fixed “in situ”. The upshot was that one could readily see the long-term temporal evolution of a particular, logical piece of the software as development continued. It was easy to see what topics were present in some branch and not in others, and, for shared ones, how their histories related. Notably, this facilitated long-lived divergences riding atop a common core of code. Customers’ customizations would be given their own topics, could be developed over a long term, and optionally merged back with their complete, longitudinal development history without cluttering the initial (topic) view presented to the developers.

Tragically, CM-Lite did not escape IBM’s acquisition of Transarc like OpenAFS did, so there’s essentially no trace of it on the Internet.


#12

I was not aware of CM-Lite. Thanks for sharing this piece of history!

@pmeunier Are “recurive patches” related to this issue?


#13

@nwf: I too would like to thank you for sharing this. Great story!

@lthms: it is related. Our patch hash representation was unsafe anyway (because #[repr(packed)]), so I had to upgrade it, and I used the opportunity to go a bit further and output patches as json. This is still work in progress, but the libpijul side is done, there are now only a few CLI tweaks to be made before we can test it.


#14

I am sorry, but I don’t really get what your patch actually implements, besides removing repr(packed) (which is, by itself, awesome!). Why “recursive”?


#15

It also implements a new hash format called “Recursive”. That hash format is actually a patch described in a file introduced by another patch. The format is free, at the moment only JSON patches are implemented.


#16

I cannot find anything on the Internet related to this “recursive” hash format. Is it something you have invented for pijul?


#17

I’ve not really invented anything here, they’re just recursive path identifiers : instead of a hash, they’re a reference to another hash (which could itself be a recursive one), and a line number.


#18

I am sorry if I am being annoying, I just want to understand. Can you motivate this change? I have some trouble getting the point (which exists, I don’t doubt that at all!).


#19

Is what you’re doing related to Merkle Trees maybe?

@lthms if that’s what he implemented, then it helps keeping track of integrity of the repo with fewer hashing steps. Given a good hash, this helps a ton with performance in a folder and files structure (like a repo). Dropbox uses something like that, IIRC.

@pmeunier how far am I from the truth? :sweat_smile: