Have been enjoying reading more about databases these last few years since joining Databricks. Many CS PhDs working on AI would benefit a lot by spending more time on this topic.
I find the Lakebase design for serverless Postgres very elegant, so I spent some time explaining how it works in this blog.
The blog starts by explaining how databases really persist data (with a write-ahead-log and data files that are updated async), and how Lakebase separates storage and compute by externalizing those two components. It ends with how the Lakebase architecture naturally leads to LTAP, enabling OLTP and analytical workloads against a single governed copy of data.
My goal was to make it readable by anyone curious about how these systems work, not just database and storage experts. That turned out to be a lot more challenging than I first thought. Database storage is one of the most complex areas in computer science (the ARIES paper cited in blog was the hardest paper I personally ever had to read). The first draft had too little detail and I couldn't land the ideas. The second had too much and I'd lost anyone who isn't already a storage expert. This is the third draft, and I'd love feedback on whether the depth feels right.
https://www.databricks.com/blog/lakebase-ltap-rethinking-database-storage