A safe API for sanakirja

Hi all,

I was a bit bothered by the amount of unsafe code needed to interface to sanakirja, so I started thinking a bit about how to provide a safe API. The goal would be to do this without changing the on-disk format, just the interface.

The first step involves various changes to the main sanakirja types and traits in order to capture all the invalidation semantics correctly using lifetimes. I have some thoughts on that here, and I’d appreciate any comments, particularly if I’ve made some wrong assumptions about sanakirja’s internals.

The second step would involve using custom derive to make it easy for users to implement Representable for their own types.

2 Likes

I’m bothered by the same thing. There are two separate issues:

  • Unsafe code inside Sanakirja. Unfortunately, there isn’t much we can do about this. Unfortunately, your suggested API for load_page cannot work, because we often need to have several pages loaded at the same time, and mutate one of them. In particular, Sanakirja does copy-on-write most of the time, so it has to have at least one page to read and one page to write in the same transaction. Rust doesn’t let them have the same lifetime.

  • The need for unsafe functions when calling Sanakirja, in libpijul/src/backend.rs for instance. This is due to the fact that Rust lacks associated type constructors to account for the fact that the return value of a txn.get has the lifetime of the borrowed txn. A workaround would be to store only one type in the databases, for instance &[u8], but this is IMHO uglier than unsafe code here and there, limited to one module.

Yet another issue is that Sanakirja doesn’t really fit into the Rust memory model. In Sanakirja, it is perfectly safe to have mutable borrows to different databases in the same lifetime. However, the fact that these borrows are on different databases cannot be enforced by the typesystem, and cannot be enforced at runtime either.

All that said, I agree it would be cool to get rid of all the unsafes.
If at least the associated type constructor issue were fixed in the Rust compiler, that would eliminate all unsafe code in libpijul.

Thanks for the information! Let’s assume for now that we’re not worried about the unsafe code inside Sanakirja, and only the external interface.

I don’t understand your point about needing associated type constructors. I mean, I can see how they would make things neater, but I don’t see why they’re necessary. In particular, in the version the I posted, I “lifted” the lifetimes to the Representable trait, so I think that calling db.get returning a value that borrows the txn gives it the right lifetime. You can see examples of it at the bottom of the main function.

Anyway, I didn’t notice that my proposal allows multiple mutable references to the same database, so that’s definitely a huge soundness hole. Here are a couple of half-baked ideas to fix that:

  1. As far as I can tell, the only issue is with largish values that are stored “on-page”, right? Small values pose no problem because we can just copy them instead of borrowing the memory, and large values stored on their own pages have stable locations for the whole duration of the transaction. So we could introduce a StablyRepresentable trait that’s implemented for everything that has a stable location. A Db would implement StablyRepresentable if both K and V do. If a Db implements StablyRepresentable, then we can have multiple simultaneous mutable references to it, because nothing that it returns will ever be invalidated.
    I think this scheme might already be enough for libpijul, because the only dbs-inside-a-db seem to be the ones in branches, and those all store small values, so they could be stably representable.

  2. I wonder if there’s a solution using the copy-on-write semantics together with “child” transactions. The idea is that you could have a Txn and a MutTxn going at the same time. You could read from the Txn but not the MutTxn, and you could write to the MutTxn (without invalidating anything you’d read from the Txn). Now, if at some point you want to read from the “fresh” data then you create a “child” MutTxn, which has the effect of freezing the parent one. Then you can read from the parent while writing to the child.
    For libpijul, this would be useful when applying patches, because first you update the branches and then you need to read the updated branches in order to update all the inodes.

This is a really good point, although it might be slightly inconvenient from the user’s point of view (but certainly less so than using unsafes).

I don’t believe you can do that, because a MutTxn borrows its parent (that’s intentional). But I had never thought of using a Txn as a parent, so the interface might be relaxed there. This sounds unsafe, but let me get back to that after some sleep.

Ok, I finally have something to actually show here: I wrote a wrapper around sanakirja here, that attempts to expose a reasonable safe API. The main principle is that in order to read anything, you need to take a snapshot. There’s also one place – specifically, write-access to the root_dbs – that using runtime exclusivity checking (similar to what RefCell does).