Last week
- How I wrangled millions of raw vessel position reports into a structured H3 grid — and what that grid makes possible....
about 2 months ago
- After retrieving BM25 or any ranked search results you might not realize it but you have new information about the......
- I mostly link to written material here, but I’ve recently listened to two excellent podcasts that I can recommend. Anyone who regularly reads these fragments knows that I’m a big fan of Simon Willison, his (also very fragmentary) posts have earned a regular spot in my RSS...
2 months ago
- To evaluate search we typically build a judgment list We transform clickstream data into evaluation data This labels a result......
3 months ago
- Look at this math and grasp at its majesty: P(R) = P(R | BM25) * P(R | Emb) # Prob(Relevance) = lexical * embedding OK what’s so special about that? That’s an AND. A probabilistic way of combining scores so that when BOTH “things happen”, the final result becomes true. Here when...
- Good vector search means more than embeddings Embeddings don’t know when a result matches doesn’t match Similarity floors don’t work......
- I’ve been using the Irish energy provider Energia for 5 years or so (as of writing, 2026) and they used to have a useful insights dashboard that let me analyse my power usage. Well, they seem to have removed it so I built a handy dashboard that anyone can use. It’s at...
- Its convenient to have a lexical score normalized from 0 1 Sadly BM25 scores tend to be all over the......
- You may know BM25 lets you tune two parameters k1 how quickly to saturate document term frequency’s contribution b how......
- Rare terms have high inverse document frequency IDF BM25 scoring treats high IDF terms as more relevant Why We assume......
- In the previous tip we discussed how pointwise 1 5 labels fall apart The expert rater gives only nit picky......
6 months ago
- I have a weird relationship with statistics: on one hand, I try not to look at it too often. Maybe once or twice a year. It’s because analytics is not actionable: what difference does it make if a thousand people saw my article or ten thousand? I mean, sure, you might try to...
- A free introductory search course for anyone who wants better search without all the hard work...
7 months ago
- After the LLM judge hype curve crashes, what will come after?...
8 months ago
- Kicking the tires on an initial, naive agentic search with some thoughts on how it could be improved further...
9 months ago
- Jeff Kaufman shared some data around contra dance attendance as a function of requirements on wearing surgical masks. He compares this data to survey data, which is a useful way to validate in both directions. I found the plot compelling for a different reason – depending on how...
- I recently read You do not need “analytics” for your blog because you are neither a military surveillance unit nor a commodity trading company by Leon Paternoster. It’s a well-argued piece, and I agree with the general thrust… but I also won’t be removing analytics from my site...
10 months ago
- An analysis of DiskANN, a newer graph-based ANN index built for cheaper disk while still retaining high recall and throughput....
Rows per page