LightPIR. Privacy-Preserving Route Discovery for Lightning (paper summary and analysis)
Lightning is currently source-routed. This means that each sender does a local route search on the full network graph. This may become unsustainable as Lightning grows grows. Naively outsourcing route discovery to dedicated servers harms privacy: the servers know who is paying whom. The LightPIR paper proposes a solution. The authors combine private information retrieval with all-pairs-shortest-path pre-computation with hub labeling, optimized for real LN topology. In this post, I summarize the LightPIR protocol and outline the potential first steps to turn it from a research prototype to a real-world implementation.
Multi-hop routing is a key feature of Lightning. Payments between users with no shared channels allows Lightning to truly perform as a network: every (public) channel provides liquidity not only its counterparts but also to everyone else. Currently, Alice (the sender) defines the entire route to Bob (the receiver). To do so, Alice stores a snapshot of the public network based on P2P gossip.
As LN grows, it may become infeasible for each node to store a full network snapshot. Although the current LN size (18410 nodes and 82128 channels at the time of this writing, as per ACINQ) doesn’t sound like a lot by modern hardware standards, if we expect Lightning to grow by orders of magnitude, source routing may become challenging. Indeed, we want Lightning to work even on an IoT device with a tiny battery, little storage, and sporadic Internet connection. This justifies the need for alternative efficient route discovery algorithms.
A naive trust-based solution
Let’s start with a naive trust-based solution and then improve it in both efficiency and privacy. Imagine we have specialized nodes (servers) to do route discovery. Generally, we can evaluate a client-server protocol using many metrics:
- server-side computation;
- server-side storage;
- client-side computation;
- client-side storage;
There are two approaches to server-based route discovery:
- a server calculates a route from Alice to Bob when Alice asks for it (just-in-time);
- a server pre-calculates shortest routes for each pair of nodes and looks up the appropriate route when requested. This approach is called “all pairs shortest paths”, or APSP.
The just-in-time approach is less demanding but may introduce higher latency for the client. APSP requires more computation and storage from the server.
The protocols we’ve considered so far are not private. The server knows what route a client requests (that is, who is paying whom). In contrast, the current source-based route discovery is relatively private. Our goal is to come up with a protocol that would combine the efficiency of APSP approaches with the privacy of client-side route discovery.
Private information retrieval (addressing privacy)
Let’s put LN specifics aside for a moment.
Consider a server that stores a database
DB, which is an array of boolean values.
The client wants to obtain the
A straightforward protocol would be: the client sends
i to the server, and the server replies with
Clearly, the server knows which element the client has requested.
Can a client obtain
DB[i] without the server knowing
Theoretically, the only solution is to send the whole database to the client.
This is hardly practical.
Instead of the problem we know is practically unsolvable, let’s bend the rules a bit and consider a related problem under additional security assumptions.
Here is where private information retrieval (PIR) comes in. There are two branches of PIR: computational (CPIR) and information-theoretical (IT-PIR). CPIR allows for PIR with a single server but is slower and involves advanced cryptography (keyword: fully homomorphic encryption). IT-PIR, on the other hand, is fast, uses cheap cryptography, but only works with multiple servers. LightPIR is based on IT-PIR, so we’ll use the term PIR to refer to IT-PIR unless stated otherwise.
Consider two servers holding identical database copies. A simple PIR protocol works under an additional security assumption: the servers don’t collude.
- the client generates a random string
rand another string
r'that only differs from
- the client sends
rto the first server;
- the first server selects elements at indices
r[j]=1and responds with a single bit
dthat is a XOR of those elements;
- the client sends
r'to the second server;
- the second server responds analogously with a single bit
- the client locally computes
d XOR d' = D[i].
In other words, the client obtains two bits that are XOR’s of nearly identical subsets of the database elements.
The only difference between those subsets is that one includes
D[i] and the other doesn’t.
XOR’ing them together, the client discovers
This simple PIR algorithm is not very efficient.
Server-side computation cost and the communication cost are linear in the database size.
Even though each response is a single bit long, the request
r is as long as the database itself.
Advanced PIR protocols are more efficient. One idea is to consider the database as a matrix instead of a linear array. This allows to improve communication complexity by increasing the number of servers. For instance, with four non-colluding servers, the client only sends request of length of square root of the number of elements. For more details, see this introductory lecture.
Hub labeling (addressing efficiency)
Any digital map service finds a route across a continent in seconds. Isn’t it amazing, considering that a road graph may contain millions of points?
Turns out, there is a lot of space for optimizations in route discovery algorithms that exploit the properties of particular graphs. In particular, roads are not equally likely to occur on a random route. Most routes (except very short ones) follow this pattern:
- drive to the nearest highway entrance;
- drive along the highway;
- exit the highway and drive to your destination.
Route discovery algorithms have been heavily optimized based on the fact that highways are involved in a disproportionately large share of long routes. A key optimization is called hub labeling. Informally, a hub is a node that is part of many sufficiently long shortest paths. (Each highlighted word in the previous sentence is of course formally defined.) Remember an earlier APSP method, where the server pre-computes routes for each pair of nodes? Let’s now modify APSP with hubs in mind.
Let each server store, for each node, a list of its hubs and the shortest route to each of them. Finding a route from Alice to Bob involves these steps:
- find a node that both Alice and Bob consider their hub;
- look up the shortest route from Alice to the hub;
- look up the shortest route from the hub to Bob;
- concatenate the two partial routes to get the full route.
An attentive reader might spot an issue here. What happens if Alice and Bob have no hubs in common? To avoid this scenario, hub sets are chosen to intersect all shortest paths of non-trivial length. This task is equivalent to the set cover problem, which is NP-complete. Heuristic algorithms do the job well enough for practical applications.
A (heuristic) hub set labeling algorithm computes shortest paths between any pair of nodes and assign hub labels. The hub sets generated by a heuristic are valid (i.e., they allow for lookup-and-concatenate route discovery) but may be suboptimal (for example, include more hubs than necessary). Still, the result is good enough for practical applications.
LightPIR: putting this all together
At long last, let’s discuss the contributions of the LightPIR paper! The authors combine PIR and hub labeling (HL) for route discovery in payment channel networks. Assuming three types of nodes (a content provider, servers, and clients), the protocol goes as follows:
- the content provider (CP) creates a network snapshot;
- CP populates the graph with hub sets and shortest routes to hubs;
- CP and sends its copies to the servers;
- clients query multiple servers using a PIR protocol.
The novelty of the paper is an improved HL algorithm tailored to the LN topology. The LN-optimized HL algorithm goes as follows:
Knodes with the highest degree and put them in the hub set for all nodes;
- exclude those nodes from the graph;
- find the remaining hubs based on the pruned graph.
Intuitively, the LightPIR exploits the fact that the LN is even more centralized than highway networks.
Instead of repeatedly “discovering”, for instance, that the ACINQ node (the most connected public node at the time of this writing) belongs to the hub set for Alice, and for Bob, and for Carol, etc, the proposed algorithm simply adds ACINQ to all hub sets.
The same applies to
K most connected nodes, which are then temporarily excluded from the graph.
Additional “node-specific” hubs are then discovered on the pruned graph.
How many most-connected nodes shall we pre-select into all hub sets?
The authors run experiments on historic LN snapshots and conclude that
K should be around 100.
A note on dataset validity
Initially, I had doubts about the dataset the paper is based on. The data source is Christian Decker’s Lightning Network Gossip repository, which is trustworthy. However, the following phrase raised suspicion:
although the network size almost doubles from March 2019 to January 2021, the number of nodes in the largest SCC remains pretty consistent across all snapshots (≈ 2500 nodes)
This seemingly contradicted other sources (like Bitcoin Visuals), which report a consistent growth in the number of Lightning nodes during that period. Moreover, in the snapshots I took for my own research in 2021, 99% of nodes belonged to the main connected component.
In fact, there is no contradiction. The LightPIR paper uses a directed graph model and only considers the main strongly connected component (SCC). The main SCC is the largest set of nodes where each pair of nodes has a directed route. The (non-stront) connected component, in contrast, doesn’t take directionality into account, and hence may be much larger than the main SCC.
Even if there is no contradiction, it’s unclear why the authors chose to only consider the main SCC, which contains only between 39% and 72% of public nodes. More thought is required to more accurately reflect the real-world network structure. Would the results change if we consider the main non-strongly connected component?
Assuming LightPIR is a good idea, how should it be converted into a practically implementable proposal for Lightning? I see a few issues that should be addressed.
PIR protocols are based on the assumption that servers don’t collude. The paper that introduced PIR (Chor et al., 1995) justifies this assumption from the reputational standpoint:
We assume that the servers do not collude in trying to violate the user’s privacy. <…> A detected violation of the privacy guarantees will result in severe damage to the server. It is as if a bank were caught in fraud. <…> In the rare case where the user values its potential loss as more substantial than the server’s risk, the user should not use a PIR scheme in which privacy depends on a noncollusion assumption.
This seemingly contradicts the permissionless nature of Lightning. However, Lightning nodes already have a somewhat persistent identity and reputation. This is especially relevant for Lightning Service Providers, or LSPs (a vague term for large nodes providing LN services professionally). If LSPs put their reputation on the line and are punished for colluding, the scheme might work in practice. The key question then is: how to provably detect collusion?
IT-PIR vs CPIR
As mentioned earlier, there are two kinds of PIR: information-theoretical (IT-PIR) and computational (CPIR). The IT-PIR approach, which LightPIR is based on, achieves exactly zero information leakage even if the attacker has infinite computing power. CPIR, on the other hand, allows for a single-server PIR, assuming the attacker’s resources are bounded. The paper doesn’t justify the choice of IT-PIR as opposed to CPIR.
The aforementioned 1995 paper writes on the topic of non-collusion assumption:
The single-server computational PIR scheme of Kushilevitz and Ostrovsky (1997) addresses this concern.
Wouldn’t CPIR approach be more suitable for Lightning? A bound on attacker’s resources is the assumption under which most real-world systems, including Bitcoin and Lightning, operate anyway. What are the downsides of CPIR in this context?
A single source of network graph data
The LightPIR model assumes a single source of truth about network topology, copiled by a dedicated entity (the content provider). Lightning, in contrast, has no canonical network view: nodes compose their own graphs based on gossip. (Yet, everyone is very likely to be aware of the most active and connected nodes.)
There might be two ways to reconcile theory and practice. First, we could amend the theory to allow database copies to be slightly different. The servers then could independently compile their graphs based on gossip. Second, LN nodes could opt-in for a centralized data source. For instance, a well-known LSP (think 1ML, LNBIG, or ACINQ) would periodically publish fresh network snapshots. Servers (maintained by, for instance, by wallet providers) would download the snapshot and announce to their clients that they provide PIR-based route discovery based on the graph from a certain date. Clients would then query a random subset of wallet providers to privately retrieve routes.
A common route quality metric
LightPIR assumes that all clients use the same metric to evaluate route quality. In the simplest case (ignoring fees), the model provides shortest routes. If we use fees as edge costs, we’d be talking about cheapest routes. On the one hand, the cheapest routes model may capture the desires of many clients. On the other hand, this assumption may open clients to attacks where the adversary attracts payments by advertising low fees. On top of that, total fees may be just one of the components of route quality function. Some clients may want to also optimize for:
- route length (longer routes increase the chance of payment failure);
- success probability (related to but not fully defined by route length);
- avoiding specific nodes (known adversarial nodes, nodes located in a certain region, etc).
We may bridge this gap between theory and practice in one of two ways (or both):
- amend the theory to allow for user-defined route quality metrics;
- use the protocol as is for users who aim for a simple metric (such as minimal fees).
The model doesn’t account for amounts (and fees?)
The graph model as presented in the paper is quite simplified. First, it doesn’t account for payment amounts and channel capacities. Second, it doesn’t account for fees (though I’m not exactly sure).
Also, the key challenge in scaling LN routing is accommodating dynamic parts of the graph. The nodes and channels are somewhat static, whereas fee and policy updates may be frequent. How should LightPIR be modified to not only account for fees but also to reflect ongoing fee updates?
LightPIR is a route discovery protocol for payment channel networks that combines private information retrieval and optimized hub labeling. It builds on prior work in route discovery for road networks and achieves higher efficiency by exploiting LN’s centralized topology. Simulations based on historical LN snapshots indicate how to best parameterize the algorithm.
LightPIR is an example of research that applies prior scientific findings in new contexts. In this case, results on private database lookup and routing for road networks have been applied to payment channel networks. This approach is valuable and underappreciated. Most likely, there are lots of valuable ideas in scientific literature from long before Bitcoin came along, waiting to be applied in modern development. However, at least in the case of LightPIR, more effort is required to turn this protocol into an implementation-ready proposal.
Other papers on routing in payment channel networks:
Some more practice-oriented proposals:
Unifying these and other related ideas into a practical protocol for Lightning route discovery remains a direction for future work.
Notice a subtlety here: we consider Bob’s hub set but we need a (partial) route to Bob, not from Bob. This doesn’t matter if all routes are bidirectional. Does this explain why the authors only consider the main strongly connected component (see below)? ↩
It’s not exactly clear from the paper whether the CP or the servers do the hub labeling, but I would imagine it makes little sense to repeat the work on each server, as the result will be the same. ↩
It’s not at all surprising that the optimal number of most-connected nodes
Kis nearly the same across all snapshots (Figure 3): running the same algorithm on nearly identical inputs can be expected to produce nearly identical results. ↩
The smallest and largest snapshots contained 3480 and 6376 nodes, respectively. ↩
Section 2 introduces a directed graph with weighted edges, but Section 7 states that the authors “do not yet take into account fees”, and that their work is optimized for “sending a fixed amount of Bitcoin”. Also, Section 4.2.2 says that the labeling algorithm “requires that the path weights are unique integers, but this is easily achieved with insignificant perturbations of edge weights (in our case we initially set the weights to be the channel base fees and then perturbed them)”. It’s unclear to me whether edges have weights after all, and if so, whether weights equal fees. Even if base fees are considered as edge weights, an accurate model of LN fees must also account for their proportional component, which means accounting for amounts, capacities, and balances. ↩