- Redis sorted sets are probably the most widely deployed example. Redis uses a skiplist for range queries and ordered iteration paired with a hash table for O(1) lookups. Together they cover the full API at the right complexity for each operation
Skiplists also win over balanced BSTs when it comes to concurrent access. Lock-free implementations are much simplier to reason about and get right. ConcurrentSkipListMap has been in the standard library since Java 6 for exactly this reason and it holds up well under high contention
- Yep, and it was simple in Redis to augment them with the "span" in order to support ranking, that is, given al element, tell me its position in the ordered collection.
- Rebalancing is what really kills you. A CAS loop on a flat list is pretty straightforward, you get it working and move on. But rotations? You've got threads mid-insert on nodes you're about to move around. It gets ugly fast. Skiplists just sidestep the whole thing since level assignment is basically a coin flip, nothing you need to keep consistent. Cache locality is worse, sure, but honestly on write-heavy paths I've never seen that be the actual bottleneck.
- > Skiplists also win over balanced BSTs when it comes to concurrent access.
Balanced Skiplists search better than plain Skiplists which may skew (but balancing itself is expensive). Also, I've have found that finger search (especially with doubly-linked skiplist with million+ entries) instead of always looking for elements from the root/head is an even bigger win.
> ConcurrentSkipListMap has been in the standard library since Java 6 for exactly this reason and it holds up well under high contention
An amusing observation I lifted from OCaml's implementation (which inturn quotes Donald Knuth): MSBs of PRNG values have more "randomness" than LSBs (randomness affects "balance"): https://github.com/ocaml/ocaml/blob/389121d3/runtime/lf_skip...
---
Some neat refs from our codebase:
- Skip lists: Done right (2016), https://ticki.github.io/blog/skip-lists-done-right / https://archive.is/kwhnG
- An analysis of skip lists, https://eugene-eeo.github.io/blog/skip-lists.html / https://archive.is/ffCDr
- Skip lists, http://web.archive.org/web/20070212103148/http://eternallyco... / https://archive.is/nl3G8
- (I used to work at SingleStore, and now work at Antithesis)
SingleStore (f.k.a. MemSQL) used lock-free skiplists extensively as the backing storage of their rowstore tables and indexes. Adam Prout (ex CTO) wrote about it here: https://www.singlestore.com/blog/what-is-skiplist-why-skipli...
When SingleStore added a Columnar storage option (LSM tree), L0 was simply a rowstore table. Since rowstore was already a highly optimized, durable, and large-scale storage engine, it allowed L0 to absorb a highly concurrent transactional write workload. This capability was a key part of SingleStore's ability to handle HTAP workloads. If you want to learn more, take a look at this paper which documents the entire system end-to-end: https://dl.acm.org/doi/10.1145/3514221.3526055
- At the intersection of these two topics, does Antithesis have any capabilities around simulating memory ordering to validate lock free algorithms?
- We support thread-pausing via instrumentation. This can cause threads to observe different interleavings, which can help uncover bugs in concurrent algorithms. At this time, we don't perform specific memory model fault injection or fuzzing.
- Some more links that are inside the article:
- More info about skiplists: https://arxiv.org/pdf/2403.04582
- Performance comparison with B-tree ?: https://db.cs.cmu.edu/papers/2018/mod342-wangA.pdf
- Other blog from Anthithesis about writing their own db: https://antithesis.com/blog/2025/testing_pangolin/
Also I find it a bit hard to understand the performance outcome of this setup.
I know formats like parquet and databases like ClickHouse work better when duplicating data instead of doing joins. I guess BigQuery is similar.
The article is great but would be also interesting to learn how performance actually worked out with this.
- Back in 2014, I did an analysis of (single threaded) CPU-efficiency and RAM-efficiency of various data-structures (skiplists, slablists, avl-trees, rb-trees, b-trees):
https://nickziv.wordpress.com/wp-content/uploads/2014/02/vis...
I used whatever I could find on the internet at the time, so the comparison compares both algorithm and implementation (they were all written in C, but even slight changes to the C code can change performance -- uuavl performs much better than all other avl variants, for example). I suspect that a differently-programmed skip-list would not have performed quite so poorly.
The general conclusion from all this, is that any data-structure that can organize itself _around_ page-sizes and cache-sizes, will perform very well compared to structures that cannot.
- For this problem, I’d consider a different approach. You have a fuzzer, and based on some seed it’s generating lots of records. You then need to query a specific record (or set of records) based on the leaf.
I’d just store a table of records with the leaf, associated with the seed. A good fuzzer is entirely deterministic. So you should be able to regenerate the entire run from simply knowing the seed. Just store a table of {leaf, seed}. Then gather all the seeds which generated the leaf you’re interested in and rerun the fuzzer for those seeds at query time to figure out what choices were made.
- (I work at Antithesis)
Yes, this is (more or less) how we regenerate the system state, when necessary. But keep in mind that the fuzzing target is a network of containers, plus a whole Linux userland, plus the kernel. And these workloads often run for many minutes in each timeline. Regenerating the entire state from t=0 would be far too computationally intensive on the "read path", when all you want are the logs leading up to some event. We only do it on the "write path", when there's a need to interact with the system by creating new branching timelines. And even then, we have some smart snapshotting so that you're not always paying the full time cost from t=0; we trade off more memory usage for lower latency.
Oh one other thing: the "fuzzer" component itself is not fully deterministic. It can't be, because it also has to forward arbitrary user input into the simulation component (which is deterministic). If you decide to rewind to some moment and run a shell command, that's an input which can't be recovered from a fixed random seed. So in practice we explicitly store all the inputs that were fed in.
- On practical machines they aren't good for much. To access a value in a skip list you have to dereference way more pointers than in a b+ tree. On paper they're about the same, but in practice the binary tree will tend to outperform. You get way more work done per IO operation.
- Skiplists are designed for fast intersection, not for single value lookup (assuming a sane design that's not based on linked lists, that's just an educational device that's never used in practice).
They are extremely good at intersections, as you can use the skip pointers in clever ways to skip ahead and eliminate whole swathes of values. You can kinda do that with b-trees[1] as well, but skip lists can beat them out in many cases.
It's highly dependent on the shape of the data though. For random numbers, it's probably an even match, but for postings lists and the like (where skiplists are often used), they perform extremely well as these often have runs of semiconsecutive values being intersected.
[1] I'll argue that if you squint a bit, a skiplist arguably is a sort of degenerate b-tree.
- Treaps can handle parallel set operations very efficiently:
- I don't understand how they differ in this regard from range trees, which they essentially are, just their method of construction is different.
Things like BSP trees are very good at intersections indeed, and have been used for things since time immemorial, but I think the skiplist/tree tradeoff is not that different in this domain.
- Skiplists have some nice properties - the code is fairly short and easy to understand, for one. Qt's QMap used to be skip list based, here's the rationale given for it: https://doc.qt.io/archives/qq/qq19-containers.html#associati...
- It seems like Qt went from red-black tree to skip list in Qt4 and back to red-black tree in Qt5.
- yeah it turns out that complex code, when its properly encapsulated and implemented in a bug-free manner, is not such a cost after all.
A correct skiplist is easier to NIH than a correct red-black tree (which for me was the final boss of the DS class in college), but has performance edge cases a red-black tree doesnt, if you treat it like a search tree.
- I think it was more about binary size. There are a few sentences in the Qt containers documentation about them being "optimized to minimize code expansion".
- I mean that could mean a lot of things. By default, in C++, an std::vector<float> and std::vector<int> are two entirey separate classes with distinct methods getting compiled, even though in this particular the methods would be identical. Moreover, since templates are in headers, every object file gets their own compiled copy.
I'm sure there's some cleverness in there to mitigate the problem somewhat, but the problem still fundamentally exists
You can mitigate this manually to some degree, by making the generic classes call out to non-generic functions, of which there's guaranteed to be just 1 copy. I'm guessing that's what Qt does (among other things) when mentioning this
- Sorry if this is a dumb question but wouldn't vector<float> and vector<int> generate different code on get/set? Since floats go in different registers/need different instructions to load/store them to memory? Or is it something like, actually all of the std::vector functions technically operate on T&, and manipulating the pointers can use the same code, and really it's the optimizer that has to think about float vs int registers or something
- Well, you can do two or three main things:
- Use an algorithm and implementation that needs a small amount of code for each instance (that is where the skip list is useful)
- Have a shared "out of line" (not inline) implementation for bulky parts of the implementation, where possible
- Support the compiler + linker feature to merge identical functions by making code identical between template instantiations of similar enough types
- Plenty, but these days mostly if you either (a) want a simple implementation or (b) don't have to worry much about cache locality. The problem is that (b) doesn't really exist outside of retrocomputing and embedded systems these days. It's still one of my favourite data structures, just because it's a clever way to get most of the benefits of more complicated datastructures on small systems with minimal code.
- Occasionally in pure python "asymptotically reasonable, simple implementation, terrible memory locality" is the right pick for a data structure. You want to avoid an extra linear term, you don't care too much about the constant factor, and the cache doesn't really matter because the language is so slow that it's not really noticeable.
Not too common, but it happens.
- Only somewhat related but there is supposedly a SIMD/GPU-friendly skiplist algo written about here: https://csaws.cs.technion.ac.il/~erez/Papers/GPUSkiplist.pdf
- Backpointers to earlier epochs in append-only cryptographic data structures like key transparency logs. If the client last fetched epoch 1000, and the server reports the current epoch is 3000, the server can return log(2000) intermediate epochs, following skip pointers, to provide a hash chain from epoch 3000 to 1000.
https://github.com/foks-proj/go-foks/blob/main/lib/merkle/bi... https://github.com/foks-proj/go-foks/blob/main/lib/merkle/cl...
- >What are skiplists good for
In practice, I have found out, nothing much. Their appeal comes from being simpler to implement than self-balancing trees, while claiming to offer the same performance.
But they completely lack a mechanism for rebalancing, and are incredibly pointer heavy (in this implementation at least), and inserts/deletes can involve an ungodly amount of pointer patching.
While I think there are some append-heavy access patterns where it can come up on top, I have found that the gap between using a BST, a hashtable, or just putting stuff in an array and just sorting it when needed is very small.
- Lock-free BSTs or b-trees exist only in research papers, but lock-free skiplists are straightforward to implement.
- You're right, but it's not popular to go against the premise of a post.
- skiplists form the basis of in-memory tables used by LSM trees, which are themselves the basis of most modern DBs (written post 2005).
- Not sure if it's true that LSM trees are so dominant, it was a bit of a fad a few years ago during the NoSQL hype. They can be a good default for concurrent write-heavy workloads like analytics or logging, but they can be tricky to tune.
The good-ol' copy-on-write memmapped B-Tree is still widely used, even on newer key-value stores like redb (I am more familiar with the Rust ecosystem), and it claims to outperform rocksdb (the go-to LSM-tree kvs) on most metrics other than batch writes [1] (although probably biased).
LMDB is still widely used (one of the main classic B-tree kvs), and Postgres, SQLite and MongoDB (WiredTiger), among others, are still backed by B-Tree key-value stores. The key-value storage backends tend to be relatively easy to swap, and I don't know of major efforts to migrate popular databases to a LSM tree backend.
[1] https://www.redb.org/post/2023/06/16/1-0-stable-release/
- In the age of agentic programming and the ever increasing pressure to ship faster, I'm afraid this kind of knowledge will become more and more fringe, even moreso than it is today. Who has the time to think through the intricacies of parallel data structures? Clearly we'll just throw more hardware at problems, write yet another service/api/http endpoint and move on to the next hype. The LLM figures out the algorithms and we soon lose the skills to develop new ones. And tell each other the scifi BS myth that "AI" will invent new data structures in the future so we don't even beed humans in the loop.
- AI is like a genie: be careful what you wish for or you'll get what you asked for.
Lately at work I've done C++ optimization tricks like inplace_map, inplace_string, placement new to inline map-like iterators inside a view adapter's iterators and putting that byte buffer as the first member of the class to not incur std::max_align_t padding with the other members. At a higher architecture level, I wrote a data model binding library that can serialize JSON, YAML and CBOR documents to an output iterator one byte at a time without incurring heap allocation in most cases.
This is because I work on an embedded system with 640 KiB of SRAM and given the sheer amount of run-time data it will have to handle and produce, I'm wary not only about heap usage, but also heap fragmentation.
AI will readily identify such tricks, it can even help implement them, but unless constrained otherwise AI will pick the most expedient solution that answers the question (note that I didn't say answers the problem).
- This is also the reason why we have two polar opposite views on AI. “Slop generator” vs “Next best thing since sliced bread”.
With SOTA models it all depends on how you drive them.
- All my old software before AI was self documenting and didn't need comments -- it just was obvious. Today my prompts never make slop. I'm a really good driver.
- I think the opposite is the case. We increasingly need to care more about performance and know how to leverage hardware better.
The market is telling us that through increased hardware prices.
LLMs being very powerful means that we need to start being smarter about allocating resources. Should chat apps really eat up gigabytes of RAM and be entitled to cores, when we could use that for inference?
- People underestimate the effect of knowledge accumulation that happens when learning from high quality sources imo.
LLMs aren't even close to the level of knowledge distillation capacity a human has yet.
- Or..? A golden era for people who want to think of new things and test out their ideas quickly by having AI code it up.
- The skiptree is a great example of constraint-driven invention - you didn't set out to build a new data structure, you had a specific constraint (BigQuery's poor point lookups) and the solution naturally emerged from combining two existing ideas. The part about writing a JavaScript compiler to generate kilobyte-scale SQL queries is underappreciated, too. Most engineers would have switched databases, but building a compiler when output is too complex to write by hand is almost always the right call.
- In my personal projects, I've used it to insert/delete transactions in a ledger. I wanted to be able to update/query the account balance fast. Like the article says, "fold operations".
- Skiplist operations are local for the most part, which makes it easier to write thread-safe code for than b-trees in practice. Anecdotally, they were a nice implementation problem for my Java class in uni. But I liked working with b-lists more.
Skip trees/graphs sound interesting, but I can't think of any use case for them off the top of my head.
- Random access with similar performance to a balanced binary tree, and ordered iteration as simple as a linked list. It's a nice combination. (Of course, so is a binary search of a sorted array, which I lean more towards these days unless doing a lot of random insertions and deletions throughout the life of the mapping).
- Could someone provide intuitive understanding for why the "express lanes" in a skip list are created probabilistically?
My first instinctive idea would be that there is an optimal distance, maybe based on absolute distance or by function of list size or frequency of access or whatever. Leaving the promotion to randomness is counter intuitive to me.
- The same reason naive BST tree (non self balancing one) doesn't work. You need to be able to add and remove elements without any knowledge of future operations and without malicious input being able to force a degenerate structure which is equivalent to a flat linked list. It's a bit similar how treap achieves somewhat balanced tree using randomness instead of deterministic rebalancing algorithms.
If you knew all the items ahead of time and didn't require adding/removing them incrementally you could build optimal skiplist without randomness. But in such situations you likely don't need a skip list or BST at all. Binary search on sorted list will do.
- There likely is an optimal stride, as a function of the platform and the data you are storing. There is also a worst-case stride. You probably don't know what the good and bad strides are, and because they are data-dependent, they can change over time. Randomness gives you a very high probability of avoiding very bad case performance. And (to rephrase what another comment says) avoiding a constant stride avoids unlucky 'resonances' where the interval interacts with some other fixed property of your system or data.
- If it has a constant stride then that stride can align with something upstream and result in horrible performance
- I guess I'm missing the point of this, so I'm probably looking at it wrong.
If you're saving multiple ancestors in ancestors_between why not save all of them?
And if the goal is to access the info for all of the ancestors, what makes it more likely than average that your ancestors aren't on the same layer as yourself (i.e. that e.g. half of them aren in ancestors_between)?
Because, if a fixed ratio (50 or even 10%) of an arbitrary node's ancestors is at the same layer, isn't complexity still the same (only reduced by a constant factor)?
- A major global bank operated all trading, especially the complex stuff, off of a globally replicated skip list.
- Would you mind sharing more?
- FTA:
> Skiplists to the rescue! Or rather, a weird thing we invented called a “skiptree”…
I can't help but wonder. The article makes no mention of b-trees if any kind. To me, this sounded like the obvious first step.
If their main requirement was to do sequential access to load data, and their problem was how to speed up tree traversal on an ad-hoc tree data structure that was too deep, then I wonder if their problem was simply having tree nodes with too few children. A B+ tree specifically sounds to be right on the nose for the task.
- IIUC, the tree structure is the very thing they're trying to store and query -- it's not merely an artefact of a data structure that they're using to store something else (the way a B+ tree or binary tree used to store, say, a set of product names would be). If their tree has long paths, it has long paths.
- Author here. That’s right — the challenge was representing a tree in an existing SQL database that we’d chosen for its pricing model. I think encoding a B-tree in SQL would be a lot less fun even than what we did.
- The nice thing with skip lists, is all 'rungs' of the list are stored in an array for a given node, so there's a lot less pointer chasing.
The not so nice thing about it is that you have to pre-guess how deep your tree will be and allocate as many slots, or otherwise itll degrade into a linked list when you run out of rungs, or you end up wasting space.
- You don't really have to pre-guess the number of rungs -- you can just have a linked list of them and add a new rung "on top" when necessary, because you always start at the top rung, and only ever move sequentially along a rung, or step down to the rung immediately below. The overhead of pointer chasing is pretty small because there will only be O(log n) rungs; you move between rungs roughly often as you move along them.
Also you can freely choose the "compaction rate" (base of that logarithm) to be something other than 2 -- e.g., choosing 4 means half as many rungs (but ~twice as much wandering along each rung).
- Cyrus uses skip lists for its internal db structs.
- What they really wanted was an HTAP. https://en.wikipedia.org/wiki/Hybrid_transactional/analytica...
- > Every insert would need to write to both systems, and since we want to analyze the data online (while new writes are streaming in) keeping the two databases consistent would require something like two-phase commit (2PC).
Not convincing. One can write the bulk data which is at first unused - no need to sync anything. Then one writes to the tree DB using, where each node only stores a key to the relevant data.
- It was really cool to mention its name during tech interviews but not anymore I guess.
- [dead]
- [dead]
- [flagged]
- Why come here to just post AI comments? I don't get the point.
- If you need a graph db, use a graph db.