This week
- Locality, and Temporal-Spatial Hypothesis Good fences make good neighbors? Last week at PGConf NYC, I had the pleasure of hearing Andres Freund talking about the great work he’s been doing to bring async IO to Postgres 18. One particular result caught my eye: a large difference...
- Parquet is excellent for analytical workloads. Columnar layout, aggressive compression, predicate pushdown, but deletes require rewriting entire files. Systems like Apache Iceberg and Delta Lake solve this by adding metadata layers that track delete files separately from data...
- PostgreSQL handles large JSON payloads reasonably well until you start updating or deleting them frequently. Once payloads cross the 8 KB TOAST threshold and churn becomes high, autovacuum can dominate your I/O budget and cause other issues. I have been exploring the idea of...
Last week
- Holly molly, September was hectic, mostly good and definitely memorable. Family came over for a visit from Poland, we got married, we travelled to northern Italy, and the most recent meetup I organised was a huge success. It was intense and I’m ready for a chill and quiet...
- Understanding query planner quirks yielded a ~35% speedup....
About a month ago
- I normally skip presentations because I prefer reading, but Building the Hundred-Year Web Service (YouTube) was worth the time.1 Note that despite “htmx” featuring in the title, very little of the presentation is actually about htmx. It is about choosing and using technology in...
- One of the first schema decisions you face when designing a database table is: Should I use an INT or a UUID as the primary key? Most developers default to an auto-incrementing integer. It’s simple, compact, and familiar. But UUIDs (a.k.a. GUIDs) are increasingly popular — and...
about 2 months ago
- As programmers we spend a lot of time shuttling data back and forth between different systems and transforming it from one format to another. Sometimes it gets pretty miserable!
2 months ago
- 1. Mental Model D1 = SQLite running inside your Worker process Not a separate database server - zero network latency One logical database, replicated globally by Cloudflare env.DB injected at runtime via binding system 2. Basic Setup Wrangler Co......
- Quick Mental Model KV Namespace = The actual database/storage (has ID like abc123def456) Binding = Variable name in your code (like TODO, USERS) Key-Value Store = Simple hash map, not a relational database Eventually Consistent = Changes take tim......
- PostgreSQL’s hash partitioning distributes rows across partitions using deterministic hash functions. When you query through the parent table, PostgreSQL must perform catalog lookups to route each query to the correct partition. This results in measurable overhead for...
- Introduction to CoreNN, an open source vector database that scales to 1 billion vectors on a single machine with high recall and throughput....
- An explainer to how modern fast and accurate vector searching works, with interactive demos....
- An analysis of DiskANN, a newer graph-based ANN index built for cheaper disk while still retaining high recall and throughput....
- Making a server do less work is often a good thing. Here we’ll reduce the number of events a Meteor instance has to process to handle real-time publications.
3 months ago
- I'm a big fan of Redis. It's such an amazing idea to go beyond the get-set paradigm and provide a convenient API for more complex data structures: maps, sets, lists, streams, bloom filters, etc. I'm also a big fan of relational databases and their universal language, SQL....
- Чем Datomic отличается от других баз данных и почему иногда остутствие оптимизатора лучше, чем его присутствие...
- Today I’m celebrating my two-year work anniversary at Weaviate, a vector database company. To celebrate, I want to reflect on what I’ve learned about vector databases and search during this time. Here are some of the things I’ve learned and some common misconceptions I see: BM25...
4 months ago
- I’ve been working on Squawk for a while, it’s a linter for PostgreSQL, and it now uses a handmade parser. So let’s explore some interesting bits from the Postgres grammar. Custom Operators Very few operators are defined in the grammar itself and lots of Postgres features rely on...
- One of my pet peeves with testing in web development is mocking. So many devs tend to be purists who insist that the unit under test should be the only thing being tested, and that's why they mock everything. In projects that use Prisma or any other ORM, developers mock the ORM...
5 months ago
- Well, it happened. Decision Drill is a stateful server process. It was deliberately designed to hold data for at most a month, limiting the impact of failed data integrity. Way too many services start out by implicitly promising to keep data intact essentially forever. Sometimes...
- A few days ago, I wrote about a surprising planner behavior with CTEs, DELETE, and LIMIT in PostgreSQL, a piece I hastily put together on a bus ride. That post clearly only scratched the surface of a deeper issue that I’ve since spent way too many hours exploring. So here are...
- What happens when you embed geospatial capabilities in generalist data tools? More people engaging with geo data. I just returned from the inaugural Cloud-Native Geospatial conference. It was fantastic, I highly recommend you jump in if Jed and team organized another. One of the...
- On Wednesday May 7th, I’ll stream myself live-coding a vector database with John Berryman. As part of that, I want to establish some baseline ideas / concepts before digging into the most popular vector search data structure - Hierarchical Navigable Small Worlds (HNSW). HNSW...
Rows per page