Store channels per-peer (ChannelManager)

Jun 21, 2022

https://github.com/lightningdevkit/rust-lightning/pull/1507

Host: ariard - PR author: ViktorTigerstrom

The PR branch HEAD was ae665ce at the time of this review club meeting.

Notes

The core component of LDK is the ChannelManager. This structure gathers the channels states and the pending messages (forward HTLCs, outbound payments, inbound payments) to be applied triggering a state transition.
Beyond, ChannelManager holds on-chain data (e.g best block seen) and user setting (node config) as those fields alter the processing of the state transitions (e.g reject OpenChannel because funding_satoshis is under user’s min_funding_satoshis).
Zooming, 2 notable members of ChannelManager are channel_state and per_peer_state. ChannelHodler stores currently few different elements such as the channels, the HTLC to be forwarded and the LDK internal processing events. PeerState only stores the per-peer features flag for now.
As of today, ChannelHolder is protected by a single Mutex requiring in-order processing of the events, and thus only one channel can execute state transition at the same time. #1507 and the split-off #1542 are WIP enabling better parallelization of LDK channel processing.

Questions

1) What’s the current processing flow from receiving an inbound message in handle_message() to sending to the appropriate Channel’s method ? How many locks are taken and when ? 2) This PR is moving the HashMap<[u8; 32], Channel<Signer>> from ChannelHolder to PeerState. What’s the purpose of this move ? Does it change anything for a LDK user ? 3) Why introducing a new id_to_peer in ChannelManager ? 4) What is backward compatibility and what is LDK’s project policy on structure serialization backward compatibility ? 5) The PR is using FairRwLock to protect PeerState. What are the advantages of FairRwLock compare to Rust’s traditional RwLock. 6) What could be the future internal refactoring after this PR to achieve #422 ? 7) Apart of HTLC/message processing, what are the other perfomance bottlenecks of a LN node ?