The problem
I am using pijul for some private projects for a while now and several usage scenarios appear repeatedly, of which I want to initiate a discussion about one in particular: Channels composed of other channels.
What pijul does very well from the get-go is keeping the history of separate data separated. When I always only record changes for files in src/
separately from changes in doc/
, both histories will never depend on each other and I can operate on either. So far so good. What does not work well, though, is communicating separate threads of history to me, the user.
This becomes especially visible when I incorporate threads of changes from other channels, that I want to try out but may want to get rid of later. I have a hard time pin-pointing the exact thread of changes that I want to unrecord again. Especially, because the threads of changes sooner or later share a common prefix of changes with my channel (because they revolve around the same set of files with a common history). When I want to unrecord those threads, I obviously want to keep the common prefix, but nothing in the log tells me, where the deviation starts.
Now, to my suggestion!
The solution – Channels of channels
Pijul channels should work slightly differently. Instead of only holding changes in their log, they should also hold other local channels and blend in all changes from their logs.
What is to be gained from this?
Although, pijul works perfectly without channels as a means to capture branches. It lacks, in my experience, considerably in communicating which set of changes belongs to which feature. I think, channels are necessary for communicating this and I see proof in the fact that the Nest implements discussions via channels for exactly this purpose.
Right now, when I want to try out a new feature, I can pull the most recent change of it and pijul pulls all the necessary dependent changes for me. The problem starts, when there is no single most recent change but several that are independent or only have common ancestors. Then, I need to know all of them to get all parts of the new feature. The only means to communicate the complete set of changes to another developer is by putting the feature into a separate channel and point them to it.
Now, git (and all other VCSs, I know of) force you to dissolve a feature into its elementary set of changes when pulling it into your current channel. So, in git, as well, I have trouble removing all the bits of a feature that I no longer want in my code base. In channels of channels, the incorporated feature channel would remain a single identifiable element that I can address to remove or inspect it.
What’s more, a channel of channels could be a live structure. Additions to the feature channel could be immediately reflected in my main channel that incorporates it. If I didn’t want that, I could put a tag of the feature channel into my main channel, such that updates would need to occur manually.
This would give the user powerful tools to handle named sets of state and incorporate them into a single history graph. Right now, everything immediately falls apart into single changes when pulled into a channel.
This directly addresses discussions on the Nest around (un-)recording sets of changes. Its immediate benefits would be, that it allows an arbitrary semantic structure of features and development stages of a project.
What semantics would such a channel have?
-
A channel that has its own changes and now also hosts a feature channel just applies the set of all changes to the working tree. No special semantics necessary. Changes that appear in both channels are applied once. It works like pulling the feature on the fly. All the caching infrastructure to make re-pulls fast should work in principle. The log would maintain the complete set of changes from all incorporated channels with an additional field to mark what channels a change is part of. In this regard, would the log of a channel still be flat.
-
The references to other channels could be kept in the log, identified by their current state hash.
-
pijul record
already offers recording to another channel. So, even recording a new change to the feature channel is possible from the merged working tree. This means, that channels of channels are not a read-only concept but naturally extend to writing, as well. -
The natural protection would apply. After recording changes to main that depend on changes from feature, it is no longer possible to record depending changes to feature without pushing the changes from main, first. Just like pushing a change to a channel without its necessary dependencies is not possible. This would immediately highlight that the user starts mixing the histories of two channels, which is likely unintended.
-
When a change is added to main, that depends on a change from feature, this change now belongs to both channels, main and feature. In the event that feature is unrecorded, the dependency now remains part of the main channel. A warning could be shown when recording a change on main that depends on a change that is purely in feature. On the other hand, this is a likely case because an accepted feature is usually meant to become part of the history of main.
-
A feature channel could be added to a main channel with
record --ref-channel <feature>
orpull --as-ref <feature>
. -
It could be removed by identifying it in the log and
unrecord --reset
'ing it. -
The dependency on a feature channel can be resolved by using
unrecord
without--reset
, which would keep the changes of the feature channel in the log but would remove the reference to it. -
A feature channel could be identified in the log by its current state hash. This hash would change on every update of the feature channel but would allow to clearly communicate a particular feature channel state to other developers, if needed.
-
When all changes are kept flat in the log,
pijul log
could mark which changes appear in which channel (like how git shows, which branch head is on which commit). -
No special precautions should be necessary to avoid that channels form a circular dependency on each other. It should just work as a result of the commutativity of changes.
-
Everything above should directly apply to tags from other channels. They should likely never automatically update. Thus, recording the last state of a channel, as opposed to the channel itself, could be used to imply that we are not interested in automatic updates to incorporated feature channels.
What could go wrong?
The stronger semantics that come with channels of channels can, of course, be used to make a code repository go haywire:
-
Creating huge amounts of almost trivial micro-channels and working with “aggregator channels” will degenerate into a worse version of normal channels that only know changes.
-
Mindlessly recording all kinds of channels in circles will lead to the situation that all channels depend on each other. This is a degenerated version of a single channel, and the naming of all the channels becomes arbitrary, because no single channel has a meaning without the other channels it depends on. Only all of them together bear the meaning of the repository, which makes them superfluous.
Technical challenges
-
In the log, changes need an additional field that marks which channel they belong to. This is used in
unrecord --reset <channel-ref>
to identify the changes that are not part of the common prefix and need to be unrecorded. -
The log must be able to hold two types of channel references. I see no great challenges to make them fit into the current format for changes.
-
“Live references” refer to actual local channels inside a repository. Whenever they get touched, the current channel must update the working tree like after a pijul pull.
pijul log
retrieves the current state hash while showing the log. -
“Fixed references” refer to a particular state hash that does not change. It can only be unrecorded.
-
-
The two types of channel references are not needed to recreate the set of changes they represent. The set of changes is immediately added to the main log at the time of addition (and updated at the time of change). They purely serve as the marker for updates and to identify which changes are free to be unrecorded. Push and pull should work and perform almost exactly as before.