Semantics of "missing contexts" conflicts

Here’s a question related to my current effort to iron out the remaining bugs/unimplemented things in libpijul: how should we represent conflicts arising from missing contexts?

A missing context is the situation that arises when Bob adds a line to a block of text, while Alice deletes that block, and then pulls Bob’s patch.

The current solution

The current solution is to resurrect all the context until finding an alive one, i.e. if Bob changes “AC” to “ABC”, while Alice deletes A and/or C, we mark A and C as “zombies”, i.e. lines on which there is no consensus to tell whether they are fully dead or fully alive.

The main problem of this solution is that we might resurrect lines that are not involved in the conflict. Imagine the situation where the repository initially contains lines A_0, A_1, … A_n, then Alice produces a sequence of patches P_0, P_1, P_{n-1}, where for all i, P_i deletes A_i. Bob pulls all those patches, then adds a line B after A_n, while Alice makes patch P_n deleting A_n.

When Alice pulls Bob’s patch, the current implementation will resurrect all lines of the file, even though both agreed on their deletion.

Solution B: unrecording the other side of the conflict

Another solution is to unrecord the patch that deleted the edge (in this situation, that would be P_n), output the file. We would do this in an aborted child transaction, so that the unrecorded patch would still be applied. It is unclear how to articulate that with the pending patch. Maybe:

  1. apply the pending patch,
  2. start a child txn
  3. unrecord the hunk that deleted the line
  4. abort
  5. unrecord the pending patch (in the parent txn)

This sounds quite intuitive from a user’s perspective. In some cases though, unrecording P_n might reach a point where another context has to be restored, and we might have to unrecord many patches recursively.

Solution C: change the zombies

Yet another simpler solution is to change the zombies to being the lines introduced by Bob’s patch, and to look for an alive context without resurrecting anything.

This is computationally equivalent to the first solution (linear in the size of the conflict, where “the conflict” has the counter-intuitive definition of “all the deleted lines between the lowest alive line and the current hunk”).

This solution, however, doesn’t have the same drawbacks from the user’s point of view.

I’m a little bit scared of solution B, presumably because I haven’t looked too closely at it. Is it possible to define solution B only in terms of the pristine graph, so as to ensure that the state of pristine does indeed only depend on which patches are present at a given time in the current branch? Defining the behavior in terms of txns makes this property non-obvious.

Have you thought about this again? I’m inclined to try the solution where we don’t mark a conflict there, and see how confusing it is in practice. As long as we keep associativity, we’re not being actually naughty. Otherwise, can you state again the argument why we should mark a conflict? Having that mathematicized might help see the correct solution.

The argument is actually simple: I often use darcs on papers, and when I change a line in a paragraph (including the last and first line of that paragraph), and my coauthor deletes that paragraph (or comments it out, which implies deleting it, as far as Pijul is concerned) five minutes to the deadline, I’d like to be informed that there won’t be a new line hanging out somewhere in the middle of the paper, without having to read the whole thing.

Obviously, nothing prevents me from writing new random lines cut in the middle of a sentence anywhere, but I’ve already been through the situation I described.

I tend to use M-x visual-line-mode, but I agree that latex files with their non-semantic lines are a kind of worst case. Thanks to your example, I see that there might be two kinds of conflicts:

  • hard conflicts, where no textual contents can be given for a file that would allow for an associative semantic
  • soft (or transient) conflicts, where a canonical contents exists, but the merge is merely likely to need careful reviewing.

The difference is that in the case of soft conflicts, the behavior of pijul can depend on preferences, order or merge or semantic clues without compromising associativity and compatibility between repositories, as long as whatever marking / signalling of these pijul does happens only in the working copy.

Edit: this may have been what you meant from the start, in which case we can choose any of the three solutions freely, as long as it remains local (ie, whatever we add to make output yield warnings/conflict markers is not exprted in patches).

You’re right, in any case, we need to preserve the connectivity (or pseudo-connectivity) of the graph, so this choice is completely independent from the application algorithm (except if we choose solution B, which is a little scary).