Yesterday
- I released a new alpha version of sqlite-utils last night—the 128th release of that package since I started building it back in 2018. sqlite-utils is two things in one package: …
This week
- Why Strong Consistency? Eventual consistency makes your life harder. When I started at AWS in 2008, we ran the EC2 control plane on a tree of MySQL databases: a primary to handle writes, a secondary to take over from the primary, a handful of read replicas to scale reads, and...
Two weeks ago
- I’ve been curious about how far you can push object storage as a foundation for database-like systems. In previous posts, I explored moving JSON data from PostgreSQL to Parquet on S3 and building MVCC-style tables with constant-time deletes using S3’s conditional writes. These...
About a month ago
- DSQL: Simplifying Architectures Complexity is a choice. While we were designing and building Aurora DSQL, we spent a lot of time thinking about our experience building and running database-backed systems. We saw that building great, fast, cost-effective, highly-available,...
about 1 month ago
- Fixing UUIDv7 (for database use-cases) How do I even balance a V7? RFC9562 defines UUID Version 7. This has made a lot of people very angry and been widely regarded as a bad move1. More seriously, UUIDv7 has received a lot of criticism, despite seemingly achieving what it set...
- In the previous post, I explored a Parquet on S3 design with tombstones for constant time deletes and a CAS updated manifest for snapshot isolation. This post extends that design. The focus is in file delete operations where we replace a Parquet row group and publish a new...
about 2 months ago
- Locality, and Temporal-Spatial Hypothesis Good fences make good neighbors? Last week at PGConf NYC, I had the pleasure of hearing Andres Freund talking about the great work he’s been doing to bring async IO to Postgres 18. One particular result caught my eye: a large difference...
- Parquet is excellent for analytical workloads. Columnar layout, aggressive compression, predicate pushdown, but deletes require rewriting entire files. Systems like Apache Iceberg and Delta Lake solve this by adding metadata layers that track delete files separately from data...
- PostgreSQL handles large JSON payloads reasonably well until you start updating or deleting them frequently. Once payloads cross the 8 KB TOAST threshold and churn becomes high, autovacuum can dominate your I/O budget and cause other issues. I have been exploring the idea of...
- Holly molly, September was hectic, mostly good and definitely memorable. Family came over for a visit from Poland, we got married, we travelled to northern Italy, and the most recent meetup I organised was a huge success. It was intense and I’m ready for a chill and quiet...
- Understanding query planner quirks yielded a ~35% speedup....
2 months ago
- I normally skip presentations because I prefer reading, but Building the Hundred-Year Web Service (YouTube) was worth the time.1 Note that despite “htmx” featuring in the title, very little of the presentation is actually about htmx. It is about choosing and using technology in...
3 months ago
- One of the first schema decisions you face when designing a database table is: Should I use an INT or a UUID as the primary key? Most developers default to an auto-incrementing integer. It’s simple, compact, and familiar. But UUIDs (a.k.a. GUIDs) are increasingly popular — and...
- As programmers we spend a lot of time shuttling data back and forth between different systems and transforming it from one format to another. Sometimes it gets pretty miserable!
- Dynamo, DynamoDB, and Aurora DSQL Names are hard, ok? People often ask me about the architectural relationship between Amazon Dynamo (as described in the classic 2007 SOSP paper), Amazon DynamoDB (the serverless distributed NoSQL database from AWS), and Aurora DSQL (the...
- This one will be quick. Imagine this, you get a report from your bug tracker: Sophie got an error when viewing the diff after her most recent push to her contribution to the @unison/cloud project on Unison Share (BTW, contributions are like pull requests, but for Unison code)...
4 months ago
- This article is about a code-transformation technique I used to get 100x-300x performance improvements on a particularly slow bit of code which was loading Unison code from Postgres in Unison Share. I haven't seen it documented anywhere else, so wanted to share the trick! It's a...
- 1. Mental Model D1 = SQLite running inside your Worker process Not a separate database server - zero network latency One logical database, replicated globally by Cloudflare env.DB injected at runtime via binding system 2. Basic Setup Wrangler Co......
- Quick Mental Model KV Namespace = The actual database/storage (has ID like abc123def456) Binding = Variable name in your code (like TODO, USERS) Key-Value Store = Simple hash map, not a relational database Eventually Consistent = Changes take tim......
- PostgreSQL’s hash partitioning distributes rows across partitions using deterministic hash functions. When you query through the parent table, PostgreSQL must perform catalog lookups to route each query to the correct partition. This results in measurable overhead for...
- Introduction to CoreNN, an open source vector database that scales to 1 billion vectors on a single machine with high recall and throughput....
- An explainer to how modern fast and accurate vector searching works, with interactive demos....
- An analysis of DiskANN, a newer graph-based ANN index built for cheaper disk while still retaining high recall and throughput....
Rows per page