Pijul and data versionning

Hello there!

I’m working on OSRD, a GPL tool to visualize, edit, and simulate railway infrastructure, financed by Europe and SNCF.

We would very much like to have an usable versioning feature, which would enable our users to visualize, combine, and evaluate changes to the railway infrastructure.

This is, of course, very, very difficult, for a number of reasons:

  • there aren’t many options for versioning data (LakeFS?)
  • there’s basically no option for versioning structured data
  • the UX of combining multiple change sets (which may be local projects) into a bigger picture is troublesome with git-like concepts
  • interactively handling git-like conflicts through a rich web UI sounds like a nightmare
  • ensuring merge correctness is troublesome

As far as we know, Pijul does not yet fit the bill:

  • it has no support for structured data
  • it probably hasn’t seen a lot of use with this amount of data (>1gb, <10gb)
  • it can only work with the local filesystem, not remote databases / KVs

Still, we think Pijul’s model, clean reference implementation, availability as a rust library are key assets that might make it realistic to bring it there. But there, our opinion does not matter as much as yours.

So what do you think? Could Pijul, given time and resources, be used to version data?

We’d love to chat about the matter with anyone interested!

Victor Collod
SNCF Réseau

At the basic level, Pijul looks at bytes, so any level of structure is possible above that. The only thing that needs to be changed is the diff algorithm (and possibly the rendering of conflicts).

It has, actually, and the internal datastructures don’t care much about the size of data: the complexity is in the number of edits, not in their size.

Also, very large patches are detachable from their contents, to make it easy to version large binary files, in the cases where merge algorithms don’t really make sense.

The storage backend is very generic: I have prototypes of things where the algorithm runs in a web browser and the backend is on the server.

That said, Pijul is a CRDT: using it over the wire doesn’t make too much sense.