• This article is a fun read.

    If you enjoyed this, or if you need more control over some memory allocations in Go, please have a look at this package I wrote. I would love to have some feedback or have someone else use it.

    https://github.com/fmstephe/memorymanager

    It bypasses the GC altogether by allocating its own memory separately from the runtime. It also disallows pointer types in allocations, but replaces them with a Reference[T] type, which offers the same functionality. Freeing memory is manual though - so you can't rely on anything being garbage collected.

    These custom allocators in Go tend to be arena's intended to support groups of allocations which live and die together. But the offheap package was intended to build large long-lived datastructures with zero garbage collection cost. Things like large in-memory caches or databases.

    • Do you think the problem that is addressed by offheap could also have been addressed with a generational garbage collector?
      • For the problems that arena allocators solve, relatively short lived allocations which die soon, yes. A generational collector would allow for faster allocation rates (a thread local bump allocator would become easy to use).

        But very long lived data structures, like caches and in memory databases still need to be marked during full heap garbage collection cycles. These are less frequent with a generational collector though.

  • I've been doing some performance tuning in Go lately to really squeak performance, and ended up with a very similar arena design except using byte slices for buf and chunks instead of unsafe pointers. I think I tried that too and it wasn't any faster and a whole lot uglier, but I'll have to double check before saying that with 100% confidence.

    A couple other easy wins -

    if you start with a small slice and find some payloads append large amounts, write your own append that preemptively is more aggressive in cap bumping before calling the builtin append.

    unsafe.String is rather new and great for passing strings out of byte slices without allocating. Just read the warnings carefully and understand what you're doing.

    • The append(slice,slice2...) code is all well and good but its going to hit into the expansion quite often. When you know the second append is going to be large its often faster to allocate a new slice with the right size and no elements and then append both slices to it, then there is no expansion costs the values just get copied in and it also produces less garbage to be collected.

      I have done a few other things in the past where I had sliceLike's which took two slices and point to one and then the other and a function mapped to the indices as if they were appended, costs a bit on access but saves on the initial allocation if you don't intend to iterate through the entire thing or only do so once.

      The base library in go does not do much for optimising this sort of thing, its not a dominate operation in most applications so I can see why we don't have more advanced data structures and algorithms. You have to be quite heavily into needing different performance characteristics to outperform the built ins with custom code or a library. All parts of Go's simplicity push that seems to assume people don't need anything else other than Array Lists and hash maps.

      • > All parts of Go's simplicity push that seems to assume people don't need anything else other than Array Lists and hash maps.

        you can see some of this in the work of the progenitors of Go.

        quoth pike style, from rob pike:

        Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.

              The following data structures are a complete list for almost all practical programs:
        
        array linked list hash table binary tree

        Of course, you must also be prepared to collect these into compound data structures. For instance, a symbol table might be implemented as a hash table containing linked lists of arrays of characters.

        https://doc.cat-v.org/bell_labs/pikestyle

      • jerf
        I think the way to look at it is that there's a lot of programs that don't need more than slices, maps, and combinations of said.

        We know this is true, because dynamic scripting languages lean even more heavily into that, plus they do it with generally worse performance, and there are still many programs that those languages are perfectly suited for. Not every language has to have all the highest performance features pushed to the n'th degree.

        If you sit down with a set of needs that needs that high performance stuff, that intensely needs custom data structures beyond those, and then you choose Go, the mistake isn't that Go doesn't have every last high-performance option, the mistake is, you shouldn't have chosen Go, anymore than you should sit down with those requirements and choose Perl. Though you may be able to get farther with Go, it was still the wrong choice on your part. I certainly look a bit askance at everyone who says "I'm going to write a high performance database to compete in the commercial database market!" and chooses Go... which is a surprisingly large group. That's not where this article comes from but it's where a lot of the other articles about super-optimizing Go comes from... but the real answer is, they probably shouldn't have chosen Go.

        On the other hand, there's definitely always been a contingent of people out there who sit down to write a website that will get several dozen hits an hour and think they need to grab Rust and start banging out high performance async code without any framework assistance (can't afford the slowdown of abstraction, you know) with custom data structures and custom database code and "should I use mmap to access the file I'm storing all the data or should I use io_uring?" when in fact a pure-Python Django website backed to a conventional database that they forgot to even index properly would have humanly-indistinguishable performance. Engineers screw up their performance requirements analysis all over the place. It's probably one of the bigger and more consequential mistakes made by engineers that we rarely talk about here.

        • I'd suggest a lot of the reason behind choosing Go, may well be the GC by default, which can simplify a lot. Then only needing to optimise a subset of memory use cases.

          Maybe an alternative would be choose D, as it also has a GC, and allows more controlled memory layout than Go, with parallel allocations both in the GC and outwith the GC.

          Assuming the only other offering is Rust, I'd interpret such choices as said folks not being convinced that the pain from using Rust is worth the gain for their particular use case.

        • I learned programming from a Java 7 book in high school. When I went through the Tour of Go, I found myself shocked that anybody would want to write software in a language like this. Where is inheritance? Where is the data structures package? I couldn't even find a standard linked list. What if I have a list where I need to make lots of insertions in the middle?

          My 15-year-old self would be shocked at my day-to-day as an engineer now.

          • Coming from Java or any heavy OO language is probably a bit of a shock. I've never really missed inheritance with the way it does composition. I think one of the 'mantras' of Go is having the writer know the cost of what they're doing, at the expense of a lot of helpers. The issue is it's a bit inconsistent IMO. Like, they make you iterate to copy a map, but not to copy a slice. It feels like it should lean more heavily one way or the other.

            I -would- definitely like more in the standard data structures and algos realm. I used to be more active on golang-nuts, and a lot of the replies to such ideas was "it's so easy to write yourself, look at this 5 liner, why make a package for it." Initially I kinda understood, but years in, after rewriting the same things over and over, it would be nice to one line a lot of this.

          • > Where is inheritance?

            It's been widely considered as a mistake for about two decades now...

            > I couldn't even find a standard linked list.

            Even? If you need a linked list, then you have a 0.01% use case and shouldn't expect such a niche data structure to be easily available. That said, https://pkg.go.dev/container/list

            > What if I have a list where I need to make lots of insertions in the middle?

            Then you should use an array. If you're not making full use of the pointer-y nature of a linked list, you shouldn't be using it.

            > Where is the data structures package?

            Go only got generics in 2022 so the standard library is lacking in ergonomic data structures.

            • Exactly right. My books (and later, college courses) emphasized linked lists and inheritance as fundamental concepts of programming, but the reality as a working engineer is totally different.

              When I first encountered Go, I was still a learner and the lack of these things in the language and standard libraries shocked me. But it turns out that they were writing a language more for practical software engineering than for outdated curricula. At the end of the day, structs, slices, and maps cover 99% of what you need!

      • > You have to be quite heavily into needing different performance characteristics to outperform the built ins with custom code or a library. All parts of Go's simplicity push that seems to assume people don't need anything else other than Array Lists and hash maps.

        Every 2000s-era Enterprise Java project I worked on was written by people that used ArrayList<T> for everything. My classmates' programs were much the same way in college. I wonder if the Golang authors observed similarly and came to this conclusion.

    • > ... small slice and find some payloads append large amounts, write your own append that preemptively is more aggressive in cap bumping before calling the builtin append ...

      gVisor netstack (userspace TCP/IP) uses a copy-on-write, reference-counted, tiered-pool for their arena-like alloc needs: https://gvisor.dev/blog/2022/10/24/buffer-pooling/ / https://archive.vn/YzB1C

      The downside of their approach is, the client code is no longer dealing with just []byte (although, both the View and the Buffer type can vend out []bytes).

  • Off topic, but I love the minimap on the side -- for pages where I might be jumping around the content (long, technical articles, to refer back to something I read earlier but forgot) -- how can I get that on my site? Way cool.
  • tl;dr for anyone who may be put off by the article length:

    OP built an arena allocator in Go using unsafe to speed allocator operations up, especially for cases when you're allocating a bunch of stuff that you know lives and dies together. The main issue they ran into is that Go's GC needs to know the layout of your data (specifically, where pointers are) to work correctly, and if you just allocate raw bytes with unsafe.Pointer, the GC might mistakenly free things pointed to from your arena because it can't see those pointers properly. But to make it work even with pointers (as long as they point to other stuff in the same arena), you keep the whole arena alive if any part of it is still referenced. That means (1) keeping a slice (chunks) pointing to all the big memory blocks the arena got from the system, and (2) using reflect.StructOf to create new types for these blocks that include an extra pointer field at the end (pointing back to the Arena). So if the GC finds any pointer into a chunk, it’ll also find the back-pointer, therefore mark the arena as alive, and therefore keep the chunks slice alive. Then they get into a bunch of really interesting optimizations to remove various internal checks and and write barriers using funky techniques you might not've seen before

  • Related: discussion around adding "memory regions" to the standard library: https://go.dev/issue/70257

    (Previous arena proposal: https://go.dev/issue/51317)

  • Interesting stuff! For folks building off-heap or arena-style allocators in Go—how do you usually test or benchmark memory safety and GC interactions in practice?
  • > Go prioritizes not breaking the ecosystem; this allows to assume that Hyrum’s Law will protect certain observable behaviors of the runtime, from which we may infer what can or cannot break easily.

    If this assertion is correct, then effectively Go as a language is an evolutionary dead end. Not sure if I would Go fascinating in this case.

    • It's quite a leap from "certain observable behaviors of the runtime" cannot change to Go is a dead-end.

      Go regularly makes runtime changes and language changes, see https://go.dev/blog/. Some highlights:

      - Iterators, i.e., range-over-function

      - Generics

      - For loops: fixed variable capture

      - Optimized execution tracing

      - Changing the ABI from stack-based to register-based.

    • They introduced generics into the language whilst maintaining compatibility and breaking changes between language versions is painful in large code bases.
    • They also changed maps' iteration order to be random, rather than insertion order.
    • They broke the foreach loop behavior in 1.22, mainly to make it match what people expected.

      https://go.dev/blog/loopvar-preview

      • Small, but significant point: you can easily avoid the new behavior. IIRC, if you had a pre-1.22 project, and didn't change anything, it still compiles as before. So if you relied on that behavior (which would be very weird, but who knows), backwards compatibility is still there for you.
        • It defaults to the new behavior. If you want the old behavior, you had to set a flag on the compiler. But this applies to all code, so any libraries you include would also get whatever behavior you set on the flag.
  • Just a quick meta note. This article is really lengthy, I don't have time to read this level of detail for the background. For example the "Mark and Sweep" section takes up more than 4 pages on my laptop screen. That section starts more than 5 pages into the article. Is this the result of having AI help to write sections, and as a result making it too comprehensive? It's easy to generate content, but the editing decisions to keep the important parts haven't been made. I just want to know the part about the Arena allocator, I don't need a tutorial on garbage collection as well.
    • This is an interesting comment. The author, has been consistently making lengthy posts since 2021 - there's no reason to believe he is using AI as it doesn't look like his writing style has changed.

      However, the reader has changed, and readers are notoriously lazy. Now instead of a "tl;dr", the reader might incredulous assume the writer is using AI. This is an interesting side effect.

      FWIW: The Mark and Sweep section is specifically about Go's internal implementation of Mark and Sweep and is relevant context for the design decisions made in his arena. It is not generic AI slop about Mark and Sweep GCs.

    • I skimmed 60% and it doesn’t look like AI (slop) to me.

      I expect that from SEO spam, not something niche like this.