My application programmer instincts failed when debugging assembler

40 points by lifefeed 2 days ago | 26 comments

xg15
> Abstractions. They don’t exist in assembler. Memory is read from registers and the stack and written to registers and the stack.
[...] But my application-coded debugging brain kept looking at abstractions like they would provide all the answers. I rationally knew that the abstractions wouldn’t help, but my instincts hadn’t gotten the message.
That feels like the wrong takeaway for me. Assembly still runs on abstractions: You're ignoring the CPU microcode, the physical interaction with memory modules, etc. If the CPU communicates with other devices, this has more similarities with network calls and calling the "high level APIs" of those devices. For user space assembly, the entire kernel is abstracted away and system calls are essentially "stdlib functions".
So I think it has a different execution model, something like "everything is addressable byte strings and operates on addressable byte strings". But you can get that execution model occasionally in high-level languages as well, e.g. in file handling or networking code. (Or in entire languages built around it like brainfuck)
So I think assembly is just located a few levels lower in the abstraction pile, but it's still abstractions all the way down...
- leptons
  No, assembly doesn't always inherently deal with abstractions. It depends on the system involved. I don't really count "microcode" as an abstraction, it's essentially part of the hardware and doesn't even exist on many embedded CPUs. The assembly instructions for all intents and purposes operate directly on the hardware. If you wanted to get really absurd with it, you could say that all of it is an abstraction of electrons.
  Embedded CPU assembly is what I do most often, for the last 40 years, and there aren't really any abstractions at all - not even microcode. You have a few KB or ROM and maybe a few KB of RAM, ALU, registers, peripherals, and that's it - no APIs, no kernel, no system calls, no stdlib. Just the instructions you burn into the ROM.
- TacticalCoder
  > Assembly still runs on abstractions: You're ignoring the CPU microcode ...
  Yes and no. There's no way to "get" to these. Arguably assembly is an abstraction on top of codes (hexcodes or binary if you want to see it that way), but the assembly instructions are the lowest level we get to access. For as a programmer you don't get to access the microcodes emulating an amd64 architecture and you cannot decide to use these microcodes directly.
  Otherwise it's just electricity. Then it's just electrons.
  So it's not false that it's all abstractions but it doesn't help much to view it that way.
userbinator
Asm is simple enough that "mental execution" is far easier, if more tedious, than in HLLs, especially those with lots of hidden side-effects. The concept of a function doesn't really exist (and this is even more true when working with RISCs that don't have implicit stack management instructions), and although there are instructions that make it more convenient to do HLL-style call and return, it's just as easy to write a "function" that returns to its caller's caller (or further), switches to a different task or thread, etc. If you're going to learn Asm, then IMHO you should try to exploit this freedom in control flow and leverage the rest of the machine's ability, since merely being a human compiler is not particularly enlightening nor useful.
- zahlman
  > Asm is simple enough
  The general conceptual model of "asm" is simple.
  Some instruction sets and architectures are hideous, though.
  > merely being a human compiler is not particularly enlightening nor useful.
  I don't think I can agree with that. At least it teaches you what the compiler is doing. And abiding by conventions (HLL-esque control flow, but also things like "put the return value in r0" and "put constant pools after the function") can definitely make it easier to make sense of the code. (Although you might share a constant pool across a module or something, if the instructions reach far enough.)
  Not to say that you can't do interesting things, and can't ever beat the compiler. One of the things I most enjoyed discovering, in mid-00s era THUMB (i.e. 16-bit ARM) code, is that the compiler was implementing switch statements with tables of 32-bit constants that it would load into an indirect jump. I didn't get around to it, but I figured I could mechanically replace these with a computed jump into a "table" of 16-bit unconditional branches (except for very long functions, but this helped bring the branch distances under thresholds).
- streetfighter64
  I agree entirely, great insight! I'd like to add that assembly is best enjoyed in a suitable environment for it, where "APIs" are just memory writes and interrupts. Game programming for the C64 is way more fun than dealing with linux syscalls, for example. A lower level interface enables all the fun assembler tricks, and limited resources require you to be clever.
- jiehong
  Then you goto hell…
- mathisfun123
  > Asm is simple enough that "mental execution" is far easier, if more tedious, than in HLLs
  Ya totally I can also keep 32 registers, a memory file, and stack pointer all in my head at once ...fellow human... (In 2026 I might actually be an LLM in which I really can keep all that context in my "head"!)
  RobotToaster
  there's an interesting new API skill for the human cortex v1.0, that allows for a much larger context window, it's called pen and paper.
  ExtremisAndy
  For real! I occasionally write assembly because, for some reason, I kind of enjoy it, and also to keep my brain sharp. But yes, there is no way I could do it without pencil and paper (unless I’m on a site like CPUlator that visually shows everything that’s happening).
  mathisfun123
  What do the words "mental execution" mean?
  userbinator
  Using your brain and not the machine.
  userbinator
  8 registers are sufficient; if you forget what one holds, looking up at the previous write to it is enough.
  Contrast this with trying to figure out all the nested implicit actions that a single line of some HLL like C++ will do.
Surac
I was luck to learn asm on a very simple 8 bit CPU (6502). It had a very limited register set (3) and instruction count. I think if you realy like to dive into the ASM topic try to find a small easy CPU model and use a emulator to run your code
jagged-chisel
I think lots of commenters are being unintentionally pedantic. It’s clear that there are different types of abstractions one is concerned with when programming at the application level. Yes, it’s all abstractions on top of subatomic probability fields, but no one is thinking at even the atomic level when they step through the machine code execution with a debugger.
- throwaway94275
  The one abstraction you would have to keep in mind with assembler (writing more than reading tho) is the cache hierarchy. The days of equal cost to read/write any memory location are ancient. Even in the old 8 bit days some memory was faster to access than others (e.g. 6502 zero page).
  The flags are another abstraction that might not mean what it says. The 6502 N flag and BPL/BMI instructions really just test bit 7 and aren't concerned with whether the value is really negative/positive.
  leptons
  Ooof I remember the bank switching on PIC microcontrollers was particularly awful. I still got it to work, but it wasn't very fun.
Chaosvex
Not sure what to take away from this. __abstract works because GCC allows it as an alias to __abstract__, not because parsing the syntax is forgiving.
Abstractions do exist (disagreeing with the single other post in here) and they also exist in most flavours of assembly, because assembly itself is still an abstraction for machine code. A very thin one, sure, but assemblers will generally provide a fair amount of syntactic sugar on top, if you want to make use of it.
Protip: your functions should be padded with instructions that'll trap if you miss a return.
- rep_lodsb
  >Protip: your functions should be padded with instructions that'll trap if you miss a return.
  Galaxy brained protip: instead of a trap, use return instructions as padding, that way it will just work correctly!
  Some compilers insert trap instructions when aligning the start of functions, mainly because the empty space has to be filled with something, and it's better to use a trapping instruction if for some reason this unreachable code is ever jumped to. But if you have to do it manually, it doesn't really help, since it's easier to forget than the return.
nurettin
Coming from pascal to C as a highschooler, my biggest wtf moment happened when I forgot a ; after a struct in a header. The compiler kept complaining about the code below the include and for the life of me I couldn't figure it out. Took me another hour to reason that the includes must be concatenating invalid code.
- zahlman
  Ah, that's nostalgic.
  I haven't done serious work in C in quite some time. I wonder if modern compilers are better at reporting that sort of thing.
Kiboneu
Neat. The author is about to stumble onto a secret.
> In Sum# > Abstractions. They don’t exist in assembler. Memory is read from registers and the stack and written to registers and the stack.
Abstractions do not exist periodi. They are patterns, but these patterns aren’t isolated from each other. This is how a hacker is born, through this deconstruction.
It’s just like the fact that electrons and protons don’t really exist. but the patterns in energy gradients are consistent enough to give them names and model their relationship. There are still points where these models fail (QM and GR at plank scale, or just the classical-quantum boundaries). It’s gradients all the way down, and even that is an abstraction layer.
Equipped with this understanding you can make an exploit like Rowhammer.
https://en.wikipedia.org/wiki/Row_hammer
- wiz21c
  Abstractions pretty much exist and in assembler they matter even more because the code is so terse.
  Now, there are abstractions (which exist in your brain, whatever the language) and tools to represent abstractions (in ASM you've got macros and JSR/RET; both pretty leaky).
  Kiboneu
  That wasn’t my point. You almost got there when you wrote “there are abstractions (which exist in you brain, whatever the language)”. And your point on leaky abstractions is exactly the indication that they exist in your mind, not out there.
  My point is that we settle with what we see for convenience/utility and base our models on that. We build real things on top of these models. Then the result meets reality. If only that transition were so simple.
  When an effect jumps unexpectedly between layers of abstraction we call it an abstraction leak. As you mentioned. The correct response is to re-examine these leaks and make other frameworks to cover the edge cases, not to blame the world.
  Hackers actively seek these “leaks” by suspending assumptions that arise out of the abstractions that humans tend to rely on.
  I’m not surprised that my OP got downvoted. It can be very upsetting when one’s conceptual frameworks are challenged without prescription. No one even mentioned the specific example that I referenced. Well, if they can’t parse it, they don’t deserve it. Keeps me in the market.
david-gpu
My unsolicited friendly advice to software folks who are curious about assembly languages is: ask yourself what is it that you expect to get out of it.
If you want a better understanding of the architecture, reading the documentation from the hardware vendor will serve you better.
If you want your code to be faster, almost certainly there will be better ways to go about it. C++ is plenty fast in 99% of the situations. So much so that it is what hardware vendors use to write the vast majority of their high-performance libraries.
If you are just curious and are doing it for fun, sure, go ahead and gnaw your way in. Before you do so, why not have a look at how hand-written assembly is used in the rare niches where it can still be found? Chances are that you will find C/C++ with a few assembly intrinsics thrown in more often than long whole chunks of code in plain assembly. Contain that assembly into little functions that you can call from your main code.
For bonus brownie points, here is a piece of trivia: the language is called assembly and the tool that translates it into executable machine code is called the assembler.
- AlotOfReading
  For bonus brownie points, here is a piece of trivia: the language is called assembly and the tool that translates it into executable machine code is called the assembler.
  IBM has a long history of using "assembler" as a shorthand away to refer to languages. IBM was dominant enough historically that you'd find it used in all sorts of other places. It's bad terminology, but it's not wrong.