Running an Urbit hosting company needs to have low enough unit costs per ship to have the possibility of profit. Some of the costs of hosting have to do with Urbit resource usage, especially RAM. Other costs stem from maintenance burden and difficulties with supervising Urbit processes from hosting environments.
Demand paging in the runtime promises to reduce RAM usage significantly. Establishing "quiescence" should also reduce costs, by letting a host move a ship's data out of RAM into disk-based storage when not in use.
Consolidate Packet Re-Send Timers
Ames sets a lot of timers to remind it to send packets again if they don't get acknowledged fast enough. Reducing the number of timers lowers disk write usage, improves quiescence (which should eventually let hosting providers use less RAM and thereby lower costs), and should improve overall performance on publisher ships.
Demand paging refers to the ability to load only needed pages of memory into RAM, leaving other pages on disk, to reduce memory use. Operating systems almost always include this feature. Urbit does not include it yet, but it will need to, since Urbit is a "single-level store".
Event Log Truncation
The event log grows indefinitely, using more disk space over time. Once event logs can be truncated, disk space can be reclaimed, using a roughly constant amount of disk space over time (the size of the Arvo snapshot, which grows much more slowly than the event log). Reducing disk space in this way is important for reducing cost and maintenance burden for running ships.
Exit Code Scheme for Mars
Adding a scheme of Unix process exit codes to the Mars process, and handling them in the Urth process (sometimes by printing them with more detail), will make the system easier to debug and maintain.
Generalized Deferral Mechanism
Arvo's Behn vane (kernel module) currently serves two purposes: setting timers, and deferring tasks to later events. Deferral could be split out into a separate feature, which could aid both in refactoring Behn to be easier to verify and optimize.
Adding a basic logging system to Urbit (probably logging to files as a first step, or maybe syslog) would reduce hosting costs, improve release cycle times, and reduce the amount of time it takes devs to debug live ships.
A more memory-efficient implementation of `|meld` should reduce dangerous memory pressure. This would reduce maintenance burden.
Multiprocess Event Log
Similarly to breaking up the Urth process, we should split event log reading and writing into its own process, which will communicate with the Mars Nock runner process.
The "quick boot" project aims to bring initial Urbit boot time down from ~10 minutes to well under a minute.
Separating IPC Streams
Urth/Mars communication all happens within a single IPC connection. Separating communication that's unrelated to the event loop (such as logging, scry requests and responses, spinner status notifications, and snapshot notifications) into separate IPC streams should improve performance and increase flexibility for alternate Mars implementations.
Urbit's timer system could be better in several ways.