Generalized birthday bounds

There exist efficient collisions in the version identifier described in the manual, from either Wagner’s A Generalized Birthday Problem (2002) or On the (in)security of ROS (2020).

In essence, there is no benefit gained from the scalar multiplication, so an adversary needs only a collision among many additions multiplications in a finite field, which winds up not unlike XOR.

Your application needs a cryptographic accumulator, for which a group of unknown order typically works better than others. At least some implementations exist but I donno their quality.

We’ve recent improvements against class groups, so you’ll need an RSA group with a trusted unknown modulus. RSA trusted setups wind up fucking evil, so I’d suggest using the RSA-2048 challenge until Ethereum Research runs a new trusted setup.

As your accumulator does not require membership proofs, there might exist some interesting variations upon the pairing based accumulators that avoid trusted setups, but not sure off the top of my head.

As an aside, these generalized birthday bounds were hard to fix in threshold/multi Schnorr signatures [1, 2] and blind Schnorr signatures [3].

Where is that description you found? These papers seem unrelated to the current state algorithm.

An adversary starts with a status C and wants two sequences of patches with hashes a_i and b_i so that (a_1 * … * a_n) C passes review and (b_1 * … * b_n) C is malicious. (Edit: Incorrectly wrote + instead of * at first)

If n >= 256 then the second papers gives a fast polynomial time algorithm for this, and Wagner’s older paper gives a practical but subexponential time algorithm for smaller n. (Edit: it being * makes this more complex)

It’s simply that patches’ hashes are de facto being combined by addition in the scalar field of the elliptic curve, even if you’re actually applying them via sequential scalar multiplications. Your security argument assumes the adversary constructs patches one at a time, but they could easily construct all 2 * 256 patches together.

I’m not sure I follow you. The current state is exponentiated with exponent next_patch_hash, not merely multiplied.

If not, this is a bug.

As a rule, all mathematicians write elliptic curves additively, or else use some other notation. In other words, the discrete logarithm problem should be called the discrete division problem in elliptic curves. Almost all younger cryptography implementors who took real classes on elliptic curves prefer additive notation too.

A Schnorr group is an older construction of a group with a hard discrete logarithm problem. As a political/pedagogical move, the early elliptic curve cryptography papers wrote the elliptic curves as multiplicative groups, and discussed hardness of the discrete logarithm problem, so many protocol oriented cryptographers who avoid implementation details still write elliptic curves multiplicatively.

You need to know the conventions of whatever library you’re using, but anything modern like curve25519-dalek or arkworks-rs uses additive notation.

There is however an issue with my writing addition above, so I’ll go edit those comment to keep some sanity to the thread…

Alright fixed, apologies for doing quick comments without checking the details before, anyways…

We’ve an adversary who wants multiplicative collisions in the scalar field. I do not know right now if the ROS paper gives a polynomial time algorithm, but I’m fairly sure Wagner’s paper still gives a subexponential time algorithm, although not entirely sure how dangerous.

In principle, if you’ve elliptic curve with a scalar field of prime order q then if q has only some large prime divisors then attacking this might become hard, except your q only has around 250ish bits, and we’d typically choose q around 1500 bits for DLOG security here.

Ain’t quite as trivial a break as I initially thought, but this should still wind up broken.

I’ve messaged one of the ROS paper authors so we can have an adult in the room. :wink:

Alright, accumulators have stronger properties than required: An accumulator would permit succinct proofs that a patch was included under a version id.

We’re only fishing for a collision resistance property here though, so yes some 1500 bit elliptic curves likely make this secure. Crazy!

I do still think attacks should exist upon constructions with typical elliptic curves.

One thing I don’t understand is that the same attack seems to be possible on elliptic curve signatures, right?

It only impacts MPCs for signatures that face “parallel sessions”, not regular signatures where a signer signs in one step.

Example: Alice and Eve secret share a Schnorr key (say a scalar for use in ed25519). Eve opens 256 parallel signing sessions with Alice. Alice sends her nince contribution for phase 1 of each of these. Eve runs a computation considering Alice nonce contributions, which tells Even how to construct her own phase 1 nonce contributions so that Alice’s contributions create a forgery. We’ve proven secure a protocol that prevents this. but the proofs are difficult.

Anyways, there exists an easier way to do what you want here:

Instead of using scalar multiplications, each patch could be hashed-to-curve and then add them up.

A hash-to-curve is a hash function that outputs curve points so that their discrete log/division with respect to other points remains unknown.

An example is curve25519_dalek::RistrettoPoint::from_hash. There exists an IRTF effort to standardize hash-to-curve for most curves, but the old slow algorithm is to seed a PRNG and then decode each 32 bytes of output as a curve point until one decodes correctly.

If HtC is a hash-to-curve then you’ve collision resistance in the map from patch_1,..,patch_n to HtC(patch_1) + HtC(patch_2) + ... + HtC(patch_n) across all n, which is what you wanted here.

It’s not an accumulator however, so someone could claim a shallow clone repository state corresponds to X = Y - HtC(bad), then show themselves adding the bad patch, and finally claim this corresponds to some other state Y.

Also, all these designs have an issue that shallow clones cannot verify their version id, but an accumulator makes this less bad maybe. I’ve no idea how pijul handled shallow clones though.

Thanks for taking the time to explain! I’d be interested in your “parallel sessions” proof, if you have a link to a paper.

Pijul doesn’t really have shallow clones at the moment, but this is something I’d like to add. There are two related mechanisms:

  • Patches are separable from their contents: you still need to get the operations from the patch, but the actual bytes are only downloaded if they’re still alive. An example of this is when working on large binary files (video games are the main thing I was thinking about), you may find yourself recording many full versions of the file. Pijul allows you to only download a patch saying “Alice replaced 2Gb coming from patch X with 2 new Gb”, and not the actual 2Gb. For many versions in a row, that can be helpful. This is verified in the following way: the content is hashed, the hash is included in a “summary” describing only the operations, and the patch’s hash is that summary’s hash.

  • The other one is a major project I have for the future: I want to use tags to compress parts of the storage. This is already how tags work, but tags are entirely standalone, and this project would “chain” them together to save disk space. One issue with this is that regardless of cryptography, it is hard to prove that applying N patches gives the claimed result without running the sequence of applications. This would still be very helpful to navigate history and to save disk space, and you only need to either trust the server, or verify the claim once. Unfortunately, I have so many other projects around Pijul that I don’t want to implement this one before people actually face the disk space problem.

Also, I agree that accumulators sound fantastic.

Above the [1, 2] and [3] link papers that resolve parallel sessions problems. Also one later proof exists. We’ve long known of the generalized birthday problem for blind Schnorr but Cryptology ePrint Archive: Report 2018/417 - On the Security of Two-Round Multi-Signatures broke almost all pre-2018 threshold/multi- Schnorr MPCs, which sparked [1,2]. As the discoverer of one of these proofs, I doubt they’re too relevant here…

Afaik version control users still typically care first & foremost about the whole repository hash, so pijul surely still represents those somehow, no? Also, these whole repository hashes would capture the patch history too like in git?

At least the elliptic curve summation of patches hashed-to-the-curve gives an inexpensive to transform representative name for a repository state though. Are you saying shallow clones might know all the patch headers, but not all the patch bodies?

Any thoughts about situations where the history might be a security issue, like if I invite a user to a chat then maybe I do not want them to learn the history. Is it sensible to invite a user to a collaborative document, but want to hide the edit history before they join? Is this just better done with some snapshot or even a new repository?

Accumulators alone do not unify the best of both worlds here because an adversary can still give a false accumulator value, but…

An accumulator could key a PRF thus giving a random oracle, so it could be used to sample the document history or text, but doing so does not give much since we’re looking for a malicious needle in a haystack, not just some proof that some old commits exist. A rateless erasure code could magnify this somewhat, but no idea if that’s useful, very dubious…