Phenomenological Pijul (or Pijul from the outside)

There has been a lot of confusion lately about what all the elements of pijul do, why they have such uncommon names, and how to wield pijul anyhow. So, this is a topic that tries to explain pijul, looking at it from its appearance and from how it handles.

So, you start out with an empty working tree in a fresh repository like this:

$ pijul init repo; cd repo

We need to clarify the language, first. So, here follow some definitions. (Words in bold are defined somewhere in this text.)

Working tree

This is ultimately the thing, that you are interested in, because it let’s you access the data that you want to see right now. It is a manifestation of the current state of your repository, because all repository state manifests into data that gets represented by files in a hierarchy of directories, known as your working tree. (This is not a conceptual limitation. One could imagine a mode of work in which pijul does not manifest its current state into a working tree, but for our purposes, this is always the case.)

Pijul treats your working tree differently than Git does. Git’s inventor Linus Torvalds once described it as: “The content tracker from hell”. And he went on to make it very clear that Git is first and foremost a content tracker. Manifesting content (data in our language) into a working tree is an addon to its storage format. This is why Git makes a clear distinction between blobs, which hold data but have no file name inside your working tree, and tree objects, which must ultimately refer to blobs to represent actual data and which assign a file name (and access bits) to a blob.
Pijul, however, works directly on your working tree. The distinction becomes clear in two observations:

  1. Git has trouble tracking file renames, additions and deletions (during merges) and must rely on many heuristics to circumnavigate problems that appear in common use cases. The reason for this is, that ultimately, Git’s storage model has no notion of successive changes. It does not know how your working tree transformed from one commit to the next. It just knows that last time it looked like that and now it looks like this.
    In Pijul, these modifications of the working tree are recorded explicitly, like changes of the file’s contents are.
  2. In Git, adding a previously untracked file and announcing to git to include a file in the next commit share the same interface, namely git add.
    Pijul uses two interfaces for this purpose. pijul add marks a new file to be tracked from now on (and has to be done only once per file). pijul record stores the modifications of the currently tracked files as a new change.

Repository

This is the conglomerate of all the wisdom that pijul has acquired about your data. It contains all possible manifestations of your data (i.e. states) as a big graph of changes but also holds knowledge about remote repositories and their state, and other useful pieces that are important for your interaction with your data through pijul (the program).

State

Because the only means to interact with your data is through your working tree and because a working tree is always a manifestation of a state, then, what is a state?

A state is a sub-graph of the graph of changes that is stored in your repository. Thus, of all your wisdom about your data, it represents a (usually very carefully crafted) selection of it.

The central service, pijul provides for you, is helping you to efficiently manage all the states, that you could be possibly interested in. E.g. In common version control systems, which use strict linearly ordered tracking of data modifications, one would like to go back in the history (of ones modifications) or merge another timeline into the current or split off into an alternative timeline or reorder/modify the history altogether.

State is central to turning your data into something you can manipulate (by manifestation into a working tree). That is why your main handle to work with state is called a channel. This means that in your working tree, you are always in some channel. Note, that a channel is not equal to a state, but it always represents some state.

$ pijul channel
* main

And you are!

Channel

There are multiple angles to look at channels. The most obvious is, that it always represents a state. It also keeps a linearised representation of its state, which is accessed through pijul log. Locally, this allows you to have a notion of time and explore a “history” of changes that lead to the current state that it represents. But most importantly, it allows you to identify individual changes that you can remove from it (undoing the change), that you can hand to your collaborators or that you can amalgamate into a new channel (both via. pijul push).
Lastly, it also holds a list of named states that are particularly interesting to you, called tags.

It is important to note right here that although the channel is your main handle to work with state, this doesn’t mean you need loads of channels to keep up with all the different states that you are interested in. The channel is merely the interface between you, pijul and the working tree that you are currently interested in.
One of the ways to interact with a channel is by adding new (read: unknown) changes to a channel, thus, modifying the state it represents.

The simplest way to add a new change is by modifying your working tree, and recording the change:

$ echo a >a
$ pijul add a
$ pijul record
Error: No identity configured, yet. Please use `pijul key` to create one.

Bummer! Pijul tries to be fit for 21st century problems. This is, identity of an author is treated very seriously. (As you treat your identity seriously IRL, too.) The way, pijul tackles the realm of identity is by requiring an author to create a public/private key pair that will henceforth identify them against pijul. What seems superfluous for local usage, becomes important, once people start collaborating. While this solves some problems regarding identity (search for “malleable identities” on the Nest), it introduces new problems (losing access to the private key, key compromise, multiple hosts etc.). Yet, that’s how pijul currently works.

$ pijul key generate <user>
$ pijul record
<interactive>...
Hash: ...

So, after creating an identity, we can record our modifications. This does two things:

  1. It creates a change that contains a faithful representation of our modifications and is identified as the “Hash: …” in the output.
    In this case, that means it records the fact that we created a new file in the working tree and want pijul to track it for us. It also means that it records that a new line with an “a” is added to the file.
  2. It records (hence the name) the change to the current channel, modifying the state this channel now represents.

Change

Changes are unique pieces of wisdom about your data, and they can be uniquely identified via a hash.
While changes represent a particular piece of knowledge about how you modified your working tree, they do not float completely freely in space. To make them actually useful, a change also knows about an author, a message describing the semantics of the change and a set of dependencies to other changes this one builds on. To make it clear once more: A change does not describe the state of your working tree (as a Git commit does). A change describes how you modified your working tree, starting from earlier modifications (which are called dependencies).
This distinction can be observed as follows:

  1. Git manifests data into a working tree by expanding its data structures of the last commit mapping blobs to files, where mapping between content and file name is stored in tree objects. This is (almost) zero cost.
    Pijul manifests data into a working tree by traversing all changes recorded in a channel to an (empty) root change, building the current state from it. This is real hard work.
  2. Git computes a diff by comparing two recorded states of the working tree. This is real hard work.
    Pijul displays the diff by returning the content of all recorded changes, that are in one state but not the other. This is (almost) zero cost.

to be continued… (Please keep discussions focussed on the improvement of this text. Create new topics for questions regarding its understanding or for in-depth discussions.)

8 Likes

The part about state is missing the part about its naming, because at least one of the command help texts (log) says there is an option --state and also --channel, but not --tag.
This section also alludes to going back in history in common VCS, but doesn’t actually say that can be done in pijul. Why mention it, unless it’s to say it can or can’t be done? It would be a place to say that pijul is “change management” if it’s not “version control”.

I know a lot of people come from Git, but I don’t, so comparing to Git is confusing. Under the change section, you could compare to Darcs or SVN or both (or none at all since you are just explaining this tool).

Thanks. As a long-time git user, that was very informative.

The docs say (more than once, I believe) that pijul channels aren’t needed as often as git branches. But I haven’t found any examples of how I would, for example, work on 2 features and 2 defects simultaneously without mixing any of that, without using channels, and how that would integrate with a CI system. So, if that exists, a pointer to that would be helpful. To me anyway. Or an explanation here.

4 Likes

Hi! You can use channels if you want: we wouldn’t have implemented them if the goal were to avoid them entirely.

However, as indeed written many times in many places:

In Pijul, independent patches can be applied in any order

I know that the consequences of that property take time to understand, but this means that if your features are independent, you can push them independently, without using branches.

If your work is highly planned and predictable, or if you actually need to maintain multiple versions, then branches are useful. But otherwise, patches are lighter, easier and more flexible. You’ll have patches too when using branches, you just need one extra thing to worry about (branches).

This needs to be a required reading, perhaps even put somewhere on the mail website (everything before the “Bummer! Pijul tries …” part is already written in a descriptive tone … It will help newbies like me for sure.

Something like this would be perfect in the docs… I’m having a bit of trouble to wrap my head around pijul and this helped quite a bit :slight_smile:

I had a request or two about this, speaking as someone who’s long used git (which I switched to from subversion, which I switched to from cvs, which…). I’ve tinkered with pijul but haven’t used it on a serious project yet.

Something I find really infuriating about git, especially when trying to help someone new to git learn it, is that it has an internal data model that’s bizarre and strangely out-of-sync with what a user wants to do with it. Users don’t edit blobs they edit files. They think in terms of changes to (groups of) files, not blobs. As a user, for the most part, most of the time, you’re thinking about and working with text files. To really use git effectively, though, especially if you find yourself in one of its many pitfalls or edge cases, you have to break your brain a bit and stop thinking in terms of files. That’s weird and hard. It took me years to really feel like I wasn’t constantly in danger of destroying my git repos and I still lean heavily on documentation and a cheat sheet I frequently add to.

pijul seems so much better in this regard, and that’s one of the many reasons I’d like to start adopting it at least for personal projects. I have to say, though, that while the OP took on the admirable (and much needed!) task of presenting pijul from a user’s perspective, to my naive way of thinking it quickly drifted into pijul’s internal way of thinking pretty quickly. There’s nothing wrong with this and it’s great, please don’t get me wrong. I guess what I’m suggesting is that there’s more space for documents like these that staunchly stick with the “I’m a user and I spend all my time editing files” perspective as much as possible.

Like, imagine an FAQ:

  1. I have a bunch of files in a directory and I want to track changes I make to them with pijul. Where do I start?
  2. I’m planning a new project and I want to start with pijul right off the bat before I start creating files. What should I do?
  3. I have a project I set up with pijul, and I’ve edited a bunch of project files. I want store those changes as a unit that I can give a name. How do I do that?
  4. I changed a bunch of my files but don’t remember all the changes I made. What can I do to sort this out?
  5. I created a subfolder in my project’s directory. How do I tell pijul to keep track of that?
  6. I was tracking a subfolder in my project but I accidentally deleted it! Can pijul restore it, and if so how?

Etc. What I’d love to see is a document explaining pijul CLI commands with an eye towards teaching me what I’d need to know to answer those and similar questions myself as they come up, and getting me to the point where I feel confident that I fully understand what’s going on with my files (and the state relating to them) after every pijul command. I feel like that’d be difficult to write for git because the CLI commands are so unintuitive that your only recourse is to memorize a bunch of things and constantly read documentation. But pijul is very “pure” conceptually, so to speak, and I feel like this is doable.

One thing I think is important: though technically speaking the files are a manifestation of the repository state, as you say, this is a counterintuitive way of thinking for a user who thinks in terms of files on their computer’s hard drive. Especially for beginners, that might even be a frightening prospect. So, as a matter of perspective, it’s worth considering writing such a document with the assumption that users (especially beginners) think of their directories of files as the “single source of truth”, and as whatever is being stored in pijul as metadata, backup data, history, or what have you, which often lags behind the SSOT. The perspective matters, because while actively working on a project, what’s currently in the files on the hard drive is almost always what a user cares most about preserving. If those current files have changes that have not been recorded in pijul, then from the user perspective pijul’s internal state is wrong, and the files are not a manifestation of pijul’s internal state. The user needs to get used to being aware of this and recording their changes regularly, and one of the purposes such a document can serve is reinforcing this lesson.

Anyway sorry for the long post. I hope something in there is useful. Thanks again for this thread!