Is Pijul the right tool for me? I want to version-control my personal backups

Hi :slight_smile:

Other users and I will benefit from knowing whether to learn and use Pijul. Here’s my use-case:

  • I have files in my computer (music, videos, documents, etc.)
  • I have a hard drive that contains a copy of my computer’s files
  • I change files in my computer by downloading, creating, modifying, or deleting files
  • I connect my hard drive and update the backup (which at the moment means wiping my entire hard drive, copying and pasting my files back into it, and then going to sleep so that when I wake up I have a synced backup)

I’d love to feel confident that both my computer and my hard drive are syncing just what has changed. A cool side-effect of version-controlling my whole computer would be to be able to revert silly changes or know what has changed.

I know Git-Annex exists, but I am assuming it works through snapshots and not through patches. I love the idea of Pijul’s simplicity. The whole “intuitive version control for anyone, including non-coders” is very attractive to me.

My hesitation comes in when I realize that Pijul keeps an internal representation of my files through a tree. If that tree is copying and pasting files into a version-control-friendly representation, that could mean that my data gets duplicated and that I end up without space in my computer. Is that the case?

What about deleted files? I read the manual but still don’t really know if files are version-controlled in such a way that you can ‘undelete’ a file. If so, that’s great news for version control but terrible news for my storage (the trash bin is never effectively emptied!).

I know what I am asking is ironic: I want to version control and at the same time forget entirely about deleted files (I mean, ideally I wouldn’t have to make the tradeoff of “storage v.s. total version control”, but here I am, in a situation with limited resources).

I just wonder what Pijul’s answer to my use case is.

1 Like

As it stands now, Pijul would be a bad fit for your use case.
The database in the .pijul folder takes up a lot of space. For Pijul itself, there is about 34Meg, which is 31Meg for .pijul and about 3Meg for all the source code. That is a factor of ten for the relatively short history of Pijul.

Perhaps you should look into rsync or other backup tools.
Also, there is Snowtrack for artists (lots of big binary files).

That is actually a misunderstanding. The pristine does take up some space, but that only grows linearly with history. In the case of Pijul itself, the .pijul folder contains patches (15M, almost entirely compressed contents, most of it now deleted, as in most projects I believe), and a short “summary” of the application (16M, stored uncompressed for speed).

For binary files, we do have some specific features to help, such as patches made with rsync diff, which are roughly the same size as what is exchanged on the network when syncing with rsync. We also have partial patches, allowing you to avoid downloading dead parts of your files.

@joyously this discourse is about discussing Pijul. Of course, comparisons with other tools are welcome, if they contribute to the technical debate, for example by comparing algorithms.

Our view here is that version control is mostly about merging changes, not just storing history. Pijul’s algorithm tries to model that properly and intuitively.

Indeed, you have to choose whether you want to keep history or not. Keeping history will obviously be more expensive, there is no way around that. Pijul does duplicate files, like every single version control system (including “for artists” ones, or other reincarnations of SVN or Git). One thing you could do is to use Pijul with the working copy on your main hard drive, and the pristine on the backup drive. The pristine stores data compressed, but if you change your files a lot, history will be saved. Pijul deduplicates as much as possible, but can’t go beyond what compression allows.

One thing we could do is to delete the “data part” of dead binary files, preventing you from ever recovering them. This isn’t technically hard, but would need more time resources than we have. If we had an industrial use case, or a paying user for that feature, we could justify the time.

If you only want a storage solution and forget history, then rsync might work better indeed. Or ZFS, but you will also store some history.

IDK what OS you are using, but ZFS is probably the most appropriate tool for your usecase. It would need some amount of familiarity with unix systems administration though. I have found it very helpful for keeping snapshots and backups.