~

Roadmap

Improve Security

Summary

Much work remains to bulletproof Urbit to the point where it can not only defend its memory safety and cryptographic safety, but can also fend off denial of service attacks by dedicated assailants. We will start raising the bar for security incrementally to be resilient against increasingly determined and well-resourced attackers.

Securing Urbit breaks down coarsely into:

  • securing the software supply chain ("the components, activities, and practices involved in the creation and deployment of software"),
  • mitigating denial of service attacks ("malicious attempts to overwhelm an online service and render it unusable"), and
  • preventing penetration (e.g. corrupting memory to hijack the virtual machine).

We will first secure the system against an individual or small group intending to casually DoS the network by spamming packets. This work will largely consist of many small modifications to process incoming packets more efficiently, improve monitoring and incident response, and ban malicious IP addresses and Urbit ships.

Tlon is developing an incident response plan, and basic logging is being developed, which is the first step in DoS protection. The next step in DoS protection is to validate incoming Ames packets in Vere -- also a high priority.

Concurrently, we will tighten down intra-Arvo security. Once two known issues are addressed -- by adding a basic userspace permissioning system and closing a vulnerability known as the "zapgal type hole" -- Arvo should be secure against penetration, assuming a secure runtime. Mitigating DoS in Arvo and all its userspace agents will require more work, including interaction with Vere.

Along with tightening Arvo security and basic DoS protection, basic software supply chain protection is being established. This entails using best practices for managing permissions to interact with infrastructure nodes (live galaxies and stars) and places that distribute software, including Urbit's GitHub repository and the HTTP endpoints from which executables -- Vere binaries and "pills" (Nock bootloaders) -- are downloaded. We will also need to build protections against DNS spoofing that would allow an attacker to feed a ship bunk PKI data from a fake Ethereum node.

Defending the system against an experienced team determined to find memory corruption in the runtime to gain remote code execution will require the most lead time. We can start on this soon by improving the quality of the architecture and code in the most critical parts of the runtime; we also plan to experiment with "design for verification", i.e. structuring the code to be amenable to pen-and-paper correctness proofs, so that we can begin preparing the code for external audit and penetration testing.

The Nock interpreter (bytecode interpreter, jet dashboard, and allocator) is the most difficult part of the system to secure. The first step toward guaranteeing its security may be to simplify its architecture so it can be more effectively analyzed and verified. Splitting out the I/O drivers into their own processes and dropping privileges will add some defense in depth to Vere and increase flexibility of implementation; that work is planned for relatively soon.

Here is a list of security tasks ordered within each category in roughly increasing order by level of vulnerability:

  • PKI Security:
    • Azimuth PKI contracts: audited, mature
    • JavaScript client security: somewhat mature, but unverified
  • Arvo security:
    • Ames cryptosuite: audited, somewhat mature, missing forward secrecy
    • basic kernel security: small attack surface
    • kernel protection against agents: requires userspace permissioning and zapgal type hole closing; otherwise, small attack surface
  • Vere security:
    • deserialization ("+cue"): unverified but small
    • jet memory safety: unverified
    • IPC safety: unverified
    • tools (|meld, |pack, |mass, |trim) safety: unverified
    • allocator soundness: unverified, somewhat difficult to verify
    • jet dashboard soundness: unverified, difficult to verify
    • bytecode interpreter soundness: unverified, difficult to verify
    • I/O driver security: likely vulnerable, should be split out into subprocesses, maybe rewritten in a memory-safe language
    • dropping Unix process privileges: not implemented
    • encryption at rest for event log and snapshot: not implemented
  • Software Supply Chain Security:
    • in-Urbit software distribution: protected by Ames authenticated encryption, but no code signing, so it's vulnerable when redistributed by another ship
    • GitHub repo permissions: established recently
    • infrastructure nodes (live galaxies) permissions: established recently
    • protection against Ethereum node DNS spoofing: none
    • protection against malicious pill or Vere binary distribution: none
    • kernel source code attestation: none
  • Denial of Service Protection:
    • Vere packet-level DoS protection: minimal (just basic authenticated encryption)
    • Arvo DoS protection: minimal (just |ruin)
    • incident response plan: none
    • telemetry for DoS-related packet statistics: none

Projects

Completed

Drop Pokes to Dead Agents

Adds an interface between Ames and Gall that lets Gall tell Ames to drop an incoming %plea packet in case the plea targets an agent that isn't running. The sender will retry every two minutes indefinitely, so if the agent starts running, the %plea will eventually go through, with minimal state on the receiving ship.

Future

Arvo/Vere Error Handling

duration

TBD

owner

TBD

There are a number of cases where Arvo can crash in a way where it gets into an inconsistent state (e.g. Clay does not always handle Ames crashes properly) or gets stuck on something and fails to continue to make progress.

Future

Dropping Privileges

duration

TBD

owner

TBD

According to the principle of least privilege, each Vere process should only have privileges from the host operating system that it needs for normal operation. This is easier to understand, and likely easier to implement well, with multiprocess I/O.

Future

Forward Secrecy Ratchet in Ames

duration

3-6 Months

owner

TBD

Ames has no forward secrecy other than manual on-chain key rotation. This means if an attacker finds the symmetric key between two ships or one of the private keys, they could decrypt the whole history of communication between those ships.

Future

Logging

duration

TBD

owner

TBD

Adding a basic logging system to Urbit (probably logging to files as a first step, or maybe syslog) would reduce hosting costs, improve release cycle times, and reduce the amount of time it takes devs to debug live ships.

Future

Multiprocess Event Log

duration

TBD

owner

TBD

Similarly to breaking up the Urth process, we should split event log reading and writing into its own process, which will communicate with the Mars Nock runner process.

Future

Multiprocess I/O

duration

2-3 Months

owner

TBD

We intend to split the Urth I/O process into multiple processes -- one dispatcher process and one process for each I/O driver.

Future

Network DoS Protection

duration

TBD

owner

TBD

A standard protection against DDoS attacks is to track and rate-limit IP addresses sending malicious packets. A basic version of this functionality could be added to Vere without too much trouble and give Urbit a modicum of DoS resilience.

Future

Outer HMACs on Packets for DDoS Protection

duration

1-3 Months

owner

TBD

Ames packets need to be decrypted to be authenticated. Wrapping the packet in an HMAC would let the receiver discard unauthenticated packets faster, improving DoS resilience.

Future

Zapgal Security

duration

1 Month

owner

TBD

Turning off the `!<` ("zapgal") rune in userspace will reduce the security attack surface of the kernel, reducing the kernel's vulnerability to attack by malicious applications.