Skip to main content

Introducing Sindri

· 6 min read

Introducing Sindri — A Modernized ZK DevOps and Proving Stack

We're proud to unveil Sindri, a modernized zero-knowledge proof (ZKP) DevOps and proving stack, designed with the developer in mind. Sindri represents our contribution to the developer ecosystem and a step towards transforming ZKPs from a niche cryptographic concept into an accessible, universal utility.

Sindri has already generated countless proofs across a myriad of popular proving systems for ZK-enabled apps. We've opened our beta for developers. Feel free to sign up here!

Persistent Problems with ZK Development Today

ZK development is mired in trapdoors and pitfalls from development through production. Many developers fall into the trap of thinking the hard work is done once they've decided on a proving framework, fine-tuned their circuits, perhaps even self-provisioned "beefy" hosted machines from a cloud provider, and so on. The truth is: that's when many of the most complex, incessant tasks begin and existing solutions fall short.

Today's developers grapple with an array of pervasive ZK-related DevOps and infrastructure challenges, ranging from managing large proving keys and network latency to handling cold starts and versioning. Other important issues include cost control, integrating hardware acceleration, cloud infrastructure, and instance type selection, and establishing intelligent queue logic for workloads. Additionally, constant oversight of infrastructure and ensuring availability are critical. Falling short in any of these areas could lead to failure, yet proficiency in each is now a basic necessity. This underscores the urgent need for modernized tools for today's ZK developers.

(If this sounds familiar to you, email us at hello@sindri.app, and let's work together on how we can help you!)

Sindri: Edge Cryptography

Sindri is our contribution towards transformative change at the DevOps and infrastructure level required to unlock greater developer freedom. Our initial product release allows any developer to easily access and scale production-caliber ZK proving from anywhere on the planet, across any stage in their development cycle, across the DSLs and proving systems they love, and across any ecosystem (blockchain, web2, etc.) with near-zero lines of code through the Sindri SDK and API.

Sindri is a paradigm shift toward what we call "Hybrid Edge Cryptography" — handling cryptographic processes that involve offloading intense cryptographic computations from the source. This approach is designed to provide localized, verifiable, and high-performance cryptographic computation for any device that requires it. Embracing this paradigm means continuously driving innovation toward the edge while simultaneously maintaining the flexibility to shift workloads dynamically to serverless, high-performance operators as required. We've engineered Sindri to be modular, portable, and powerful to fit the current and future needs of this paradigm and will share more soon.

For those eager to learn more about Sindri's architecture today, we encourage you to take a spin through our documentation.

Blockchain is the Substrate for ZK — the Future is Much Bigger

While ZK has clearly found its early product-market fit in blockchain, it is just the beginning of the Sindri story. Our vision is to become the bedrock orchestration layer for ZKP development, integrating seamlessly with various proving systems, hardware vendors, and critical proving participants, and catering to an ever-expanding array of applications and use cases across digital ecosystems:

The Emerging Service Layer: We anticipate the emergence of a service layer, or "Halo Applications," where applications provide multifaceted, auxiliary services to blockchains and rollups leveraging ZK. The earliest of these can be seen in the form of breakthrough technology such as coprocessors and zkML. Expansion at this layer is inevitable and will not only enrich blockchain functionality, but it will bridge gaps between various digital ecosystems.

AI, Trust, and the Expansion: We're entrusting AI with greater decision-making power as it becomes more advanced. Enhanced cryptographic trust mechanisms are key components for reliably auditing and verifying AI actions, which ensure transparency and accountability They also enable broader, permissionless access to these AI systems. Sindri is at the forefront of this shift, building streamlined infrastructure for secure, efficient AI interactions.

Sindri Circuit Hub: An organized repository for open source circuits, offering a convenient starting point for circuit developers, aiming to enable greater composability within the circuit developer ecosystem. We will be seeding the Hub with the building blocks for verifiable PageRank in Noir, compression with Gnark, verifiable machine learning inference using Circom, and a float radius tutorial using the Axiom team's Halo2 tooling. We hope these inspire builders looking to take advantage of the unique use cases of ZK technology and we'll dig into each in more depth soon.

Permeation into Web2: The attributes that make ZKP appealing in blockchain environments can be cross-purposed into the Web2 landscape. We're actively engaged in conversations with organizations that seek to bring ZKP into the fray of environments that require high-integrity source data and computation.

Our Team and Advisors

Our team comprises PhD-level mathematicians, engineers, and builders with rich backgrounds in cryptography, network intrusion detection, software development and design, algorithmic optimization, deep learning, and digital signal processing. As developers, operators, and former academics, we bring deep expertise in secure and trustless networks and decades of experience building critical infrastructure. Our team's current and prior work includes creating groundbreaking machine learning techniques that enhanced information retrieval and set new industry standards, pioneering the first use of unsupervised machine learning for particle identification in high-energy nuclear physics, and algorithmic optimization for zero-knowledge processing.

$5M Seed Funding: Accelerating Our Vision

We are also thrilled to announce our $5M Seed funding, led by CoinFund with participation from strategics and angels across chip design and fabrication, signal processing, high-performance computation, and renowned figures in the fields of cryptography and blockchain. This investment will fuel our team expansion, developer network growth, and network seeding.

We're excited to join the CoinFund portfolio of world-class companies advancing zero-knowledge technology and the integration of machine learning, on and off-chain Together, we're dedicated to innovating with enhanced developer tools and pushing the boundaries of ZK technology, aiming to make blockchains more intelligent and the world's data verifiable.

Join us in shaping the ZK development landscape. For career opportunities, visit our Careers Page. To integrate with Sindri and for partnership opportunities, contact us at hello@sindri.app.

Discover more about Sindri here and stay updated by following us on Sindri's X account.

Friendly Introduction to Sindri's API

· 8 min read

Zero-Knowledge Programs

Inspired by this thread.

Suppose you have a function that takes in two variables and returns some result:

result=doSomething(x,y)\Large{ \text{result} = doSomething(x,y) }

There are a few consequences of executing doSomethingdoSomething the traditional way:

  1. Anyone wishing to verify result will need both inputs x and y
  2. They will also have to expend compute resources to run doSomethingdoSomething

In a trustless setting, you may want to keep some of the variables private or you may want to convince anyone that the result is legitimate without having to rerun the program. Zero-knowledge proofs modify the execution of your program to look more like this:

result,proof=doSomethingZK(public x,private y)\Large{ \text{result}, \text{proof} = doSomething^{ZK}(\text{public}\ x, \text{private}\ y) }

so that:

  1. broadcasting result and proof reveals nothing about y
  2. verifying result only requires proof and x

A major caveat is that doSomethingZKdoSomething^{ZK} takes much longer to execute. Research surrounding the subject of Zero-Knowledge Proofs has proliferated in the last few years. Every work is attempting to optimize at least one of the following:

  1. The runtime of doSomethingZKdoSomething^{ZK}
  2. The verification time given result and proof
  3. The size of proof

Proofs

In this section, we present a high level overview of the KZG polynomial commitment scheme in order to explain why doSomethingZKdoSomething^{ZK} generally takes so long. KZG is a popular protocol choice for blockchain developers because the proof size is small and the verification work is minimized. Furthermore, while proving still needs to happen off-chain, verification is enabled on-chain.

One way to convince someone that you’ve run code is to show your work by publishing the execution trace, i.e. all of the intermediate steps and variable values. You could convince any verifier that you ran the program at the price of giving the verifier full knowledge of private inputs. Additionally, the size of such a proof is unappealing. However, this transformation is still a useful first step towards a zero-knowledge proof.

The diagram above depicts the process of transforming pseudocode into an execution trace, which is sometimes called arithmetization. We will call the right hand side a circuit: it is a matrix with each entry given by a formula representing some intermediate value of the program. For a concrete input pair x and y, filling in the actual values of the circuit is known as witness generation.

The next step is going to turn each column of the circuit into a polynomial. This improvement is not initially obvious. In the diagram below, we have a vector of length n, which will turn into a polynomial of degree n, and it takes the same amount of data to describe both the vector and the polynomial. However, we do not intend to publish the entire polynomial for a column. Rather, we will convince a verifier that we know the global structure of the polynomial and that we can predict how it will behave in any local neighborhood.

The first phase of the polynomial argument, the global structure, functions similarly to a hash. Random hash collisions are unlikely, meaning the hash output is a sufficiently identifying property of a polynomial. Additionally, the hash is irreversible. So, an adversary cannot learn much about the pre-image given only the hash output.

The second phase of the polynomial argument, the local structure, opens up the behavior of the polynomial around any random point (note that the prover does not get to choose this). Consider first the claim: “I have a polynomial and when I plug 1212 into it, I get 4242 back”. In order to convince a hostile listener that I have full knowledge of this polynomial, I’ll need to supply more information. Like the diagram below indicates, if I know P(12)=42P(12)=42, then P(x)42P(x)-42 will cross the x-axis at 1212. There is a programmatic way (polynomial division) to retrieve a polynomial of degree n1n-1 that crosses the x-axis in all the same places as P(x)42P(x)-42, except at the root 1212. In short, in my role as the prover I need to supply QQ so that Q(x)(x12)=P(x)42Q(x)(x-12) = P(x)-42. This is a very convincing argument to the verifier that I know the polynomial P(x)P(x). But, to stick to the principles of ZK, I need to withhold both P(x)P(x) and Q(x)Q(x). We already have a hashed version of PP, so we’ll supply the hashed version of QQ as well.

The final act is for the verifier to evaluate the proof they have received. The prover sent along commitments for the two polynomials PP and QQ in addition to the claim that PP evaluates to 4242 when you plug in 1212. The verifier is not able to to directly check that the polynomial equation Q(x)(x12)=P(x)42Q(x)(x-12) = P(x)-42 holds but they can check a “hashed” version of this equality, very similar to the way the commitments for PP and QQ were obtained.

That concludes the steps in a ZK proof. The “not quite zero-knowledge” comment in the diagram above deserves elaboration. If you were to rerun doSomethingZKdoSomething^{ZK} many times and show a different evaluation point of the polynomial each time (the (12,42)(12,42) pair), then an adversary gains a little information about your polynomial PP each time. It would only take them nn communications from you to completely determine your polynomial PP and work backwards to get the private xx value. KZG does not initially qualify as a zero-knowledge protocol, but there are straightforward revisions which grant this property.

The discussion above oversimplified a lot of the “moon math” that makes the final verify equation un-cheat-able. Because of that, the complexity might be unclear. Here are some more specific comments:

  1. The commitment function, or hash, is not fast. Another name for this operation is ‘multi-scalar multiplication’ or MSM. Each of the coefficients of your polynomial is represented with around 250 bits and the τ\tau values are elliptic curve points which require around 380 bits to represent. While MSM is O(n/log(n)), n is already large enough for this to be a heavy computational burden. Additionally, KZG requires more calls to MSM than the operation described next, so this is the bottleneck.
  2. Calculating Q(x) requires polynomial division. Fast fourier transform (FFT) techniques place this computation in O(n*log(n)). In addition to the complexity introduced by large n (the degree of your polynomial), recall that the coefficients are each around 250 bits.

Sindri's API

Hopefully the discussion above has convinced you of the difficulties of producing ZK proofs. If you have your own doSomethingdoSomething and want to convert to doSomethingZKdoSomething^{ZK} then there are three major options available to you:

  1. Perform all the work yourself with your own machines. There are many open source libraries available to you that already provide accelerated proving implementations on certain hardware options. Selecting the implementation that best suits your hardware setup will require a technical deep dive. Additionally, you should not underestimate the work it will take to scale this acceleration beyond your initial prototype and reconfigure the entire stack as needed to stay up to date with a rapidly evolving backdrop.
  2. Plug doSomethingdoSomething into a ZK-VM that will outsource the entire process. As long as you write your program in a way the virtual machine can understand, it will automatically convert it into a doSomethingZKdoSomething^{ZK} circuit and host proofs for you. There may be limitations placed on the development of doSomethingdoSomething. For instance, a VM might have trouble with a multi-thread program which is very hard to prove. The biggest perk of competitive VM solutions is that proof acceleration is baked into their product, so that you reap the benefits of an incredibly technical field with zero headaches. However, a crucial consideration with this option is that you have very limited control. The field of ZKP is still evolving and the form doSomethingZKdoSomething^{ZK} takes currently may become obsolete given research advances. You must place a lot of trust in your VM solution to keep pace with those advances while providing stability to your own application.
  3. Compute the doSomethingZKdoSomething^{ZK} circuit definition and upload to Sindri. Sindri's API offers the middle ground between the previous two options. You’ll need to learn a little bit of the ZK space in order to pick the proving scheme and arithmetization that is right for your use case. Additionally, you’ll have to roll up your sleeves and provide the circuit definition for us. After that, the responsibility of running proofs quickly and reliably falls to Sindri's infrastructure.

Summary

Sagittal MSM: A Streamlined MSM Implementation

· 4 min read
Lead Developer

TL;DR — Our streamlined approach, which has a deterministic ordering of bucket updates, 3 orders of magnitude fewer buckets, and removes the logic for the double/double-add, queuing, scheduling, and conflict detection/resolution, results in at least a 90% reduction in the number of EC point additions when compared to the pseudo-naive approach.

Sindri Labs developed a streamlined implementation of the Multi-Scalar-Multiply (MSM) algorithm that computes the inner product of a vector of large (e.g. 256 bit) scalars with a vector of points on an elliptic curve (EC) whose coordinates are in the finite field Fq with q roughly 384 bits in size. Our improvements are targeted at GPU and FPGA implementations.

The well-known algorithm of Pippenger (Bucket Algorithm) allows the computation to use a remarkably small number of EC point additions, but at a cost. Proposed implementations require maintaining on the order of 2202^{20} partial sums (buckets) of EC points. Moreover, the updates to the partial sums are performed in a random order, and can cause stalls in feeding the EC point addition pipeline if point additions involving the same bucket are not separated in time by at least the length of the pipeline. Conflict resolution strategies require relatively complex logic.

Our proposed solution falls short of the theoretical minimum number of EC point additions required for an MSM with vectors of length in excess of 2202^{20}, but it has several main advantages that will make up for it.

Our solution:

  • Has a simple, deterministic ordering of bucket updates. This guarantees no conflicts and obviates the need for logic to detect and resolve them (queues and schedulers) that the Pippenger approach requires.
  • Removes the logic of the double or double-add method for performing an efficient scalar multiply using additions.
  • Utilizes projective coordinates in Montgomery form.
  • Reduces the memory footprint for the buckets by three orders of magnitude; it uses 282^{8} buckets instead of 2202^{20} buckets. With so few buckets, aggregating the buckets in the final phase is of negligible cost. Therefore, we can move our finalized buckets back to the host (CPU) and aggregate them there. On an FPGA implementation, this frees block RAM because we do not have to implement the aggregation algorithm on the device.
  • Allows for the precomputation of EC point additions. This is a one-time computation for all proofs that use the same common reference string (CRS). It increases the size of the CRS, so it comes at the expense of either a higher data transfer rate to the device or a larger amount of device RAM. The amount of precomputed EC point additions is an adjustable parameter.

The combined effect of a reduced memory footprint and simpler control logic offers an opportunity to realize multiple EC point addition pipelines on the FPGA and a consequent improvement from parallelization. We anticipate that our EC point addition pipeline will be shorter than the schedule of partial sum updates. This means we can break up particular bottlenecks in the pipeline if it allows for a higher clock rate.

Furthermore, this streamlined approach is more adaptable to GPUs and may open the door to more performant GPU zk-prover systems because, with the omission of the performance hits of queues, scheduling functions, conflict resolution functions, caching, and double/double-add logic, our approach will allow for a much greater GPU utilization.‍

Comparison Against the Pseudo-Naive Approach

Pseudo-naive approach problem definition

Each three-dimensional point, GkG_k, of an array of points GG, needs to be scaled by adding to itself a corresponding number of times, sks_k, from an array, ss, of the same index size, NN, as GG. The sum of all the scaled points, RfR_f, needs to be computed and returned.

Let

Rf=k=0N1skGk\large{ R_f = \sum_{k=0}^{N-1} s_k * G_k }

where NN is on the order of 100 million, GkG_k is a three-element array (x,y,z) of 377 bit integers, sks_k is a single 253 bit integer, and skGks_k * G_k is an elliptic curve point scalar multiply that utilizes the double/double-add method, which uses at most 2sk2*s_k EC point additions (1.5sk1.5*s_k on average).

Comparison

Our approach results in at least a 90% reduction in the number of EC point additions when compared to the pseudo-naive approach described above.‍

Want to chat? Give us a visit at Sindri Labs.

References

Aasaraai, Kaveh, et al. FPGA Acceleration of Multi-Scalar Multiplication: CycloneMSM. 2022, https://eprint.iacr.org/2022/1396.

Lu, Tao, et al. CuZK: Accelerating Zero-Knowledge Proof with a Faster Parallel Multi-Scalar Multiplication Algorithm on Gpus. 2022, https://eprint.iacr.org/2022/1321.

Xavier, Charles. F. pipeMSM: Hardware Acceleration for Multi-Scalar Multiplication. 2022, https://eprint.iacr.org/2022/999.