This week
- This is the fourth chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon. Thank you! Continuing with the topic of relationships, this chapter is dedicated to the many-to-many...
Last week
- This is the third chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon. Thank you! In the previous chapter you learned how to execute a variety of queries on the products...
Two weeks ago
- This is the second chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon. Thank you! This chapter provides an overview of the most basic usage of the SQLAlchemy library to...
About a month ago
- How do teams choose vector databases search engines People wrack their brains between Elasticsearch OpenSearch Solr Vespa Pinecone Turbopuffer Weaviate......
- Welcome! This is the start of a journey which I hope will provide you with many new tricks to improve how you work with relational databases in your Python applications. Given that this is a hands-on book, this first chapter is dedicated to help you set up your system with a...
about 1 month ago
- This is a follow-up to Interesting Bits of Postgres Grammar. Since then, I’ve been continuing my work on the Squawk language server and column naming became one of the many rabbit holes. Overview If you label your columns with an alias, select 1 as id, then the name is obvious....
about 2 months ago
- 17 Feb, 2026 Update (Feb 18, 2026): After a productive discussion on Reddit and additional benchmarking, I found that the solutions I originally proposed (batched writes or using a synchronous connection) don't actually help. The real issue is simpler and more fundamental than I...
3 months ago
- Happy New Year and thanks for your support in 2025!...
4 months ago
- Skipping expensive per-row subqueries to speed up my average query ~17%....
- What Does a Database for SSDs Look Like? Maybe not what you think. Over on X, Ben Dicken asked: What does a relational database designed specifically for local SSDs look like? Postgres, MySQL, SQLite and many others were invented in the 90s and 00s, the era of spinning disks. A...
- Through market forces, embeddings became the singular framework we understood RAG. It's the wrong lens to think about the problem...
5 months ago
- Why Strong Consistency? Eventual consistency makes your life harder. When I started at AWS in 2008, we ran the EC2 control plane on a tree of MySQL databases: a primary to handle writes, a secondary to take over from the primary, a handful of read replicas to scale reads, and...
- I’m taking a few weeks of pause on my HNSWs developments (now working on some other data structure, news soon). At this point, the new type I added to Redis is stable and complete enough, it’s the perfect moment to reason about what I learned about HNSWs, and turn it into a blog...
- I’ve been curious about how far you can push object storage as a foundation for database-like systems. In previous posts, I explored moving JSON data from PostgreSQL to Parquet on S3 and building MVCC-style tables with constant-time deletes using S3’s conditional writes. These...
- DSQL: Simplifying Architectures Complexity is a choice. While we were designing and building Aurora DSQL, we spent a lot of time thinking about our experience building and running database-backed systems. We saw that building great, fast, cost-effective, highly-available,...
6 months ago
- Fixing UUIDv7 (for database use-cases) How do I even balance a V7? RFC9562 defines UUID Version 7. This has made a lot of people very angry and been widely regarded as a bad move1. More seriously, UUIDv7 has received a lot of criticism, despite seemingly achieving what it set...
- In the previous post, I explored a Parquet on S3 design with tombstones for constant time deletes and a CAS updated manifest for snapshot isolation. This post extends that design. The focus is in file delete operations where we replace a Parquet row group and publish a new...
- Locality, and Temporal-Spatial Hypothesis Good fences make good neighbors? Last week at PGConf NYC, I had the pleasure of hearing Andres Freund talking about the great work he’s been doing to bring async IO to Postgres 18. One particular result caught my eye: a large difference...
- Parquet is excellent for analytical workloads. Columnar layout, aggressive compression, predicate pushdown, but deletes require rewriting entire files. Systems like Apache Iceberg and Delta Lake solve this by adding metadata layers that track delete files separately from data...
- PostgreSQL handles large JSON payloads reasonably well until you start updating or deleting them frequently. Once payloads cross the 8 KB TOAST threshold and churn becomes high, autovacuum can dominate your I/O budget and cause other issues. I have been exploring the idea of...
- Holly molly, September was hectic, mostly good and definitely memorable. Family came over for a visit from Poland, we got married, we travelled to northern Italy, and the most recent meetup I organised was a huge success. It was intense and I’m ready for a chill and quiet...
7 months ago
- Understanding query planner quirks yielded a ~35% speedup....
Rows per page