IPv6 zones in URLs are a mistake

132 points by xena 19 hours ago | 116 comments

Sohcahtoa82
It gets worse than that.
The Python `ipaddress` library has an `ip_address` address that returns either an IPv4Address or IPv6Address if the passed string is a valid IPv4 or IPv6 address, or throws a ValueError if the address is invalid.
I've seen code that uses that function to determine if a user-supplied string is a valid IP before passing it to a command line. At first glance, that seems fine, but some shell metacharacters are valid in the IPv6 zone ID.
`fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned` is a valid IPv6 IP, and if you did `ping fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned`, you'd have the output of `whoami` written to /tmp/pwned.
Obviously, people shouldn't writing code that puts user input into a shell call without the proper method of execution (ie, shell=False when using subprocess.Popen), but people often think "I validated it, it's fine" and then get popped because their validation wasn't as good as they thought it was.
EDIT: In case it isn't clear, `${PATH:0:1}` is necessary in the attack payload because a `/` is invalid in a zone ID. `${PATH:0:1}` is a tricky way to get a `/` character by just grabbing the first character of your PATH environment variable.
- rtpg
  > `fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned` is a valid IPv6 IP, and if you did `ping fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned`, you'd have the output of `whoami` written to /tmp/pwned.
  Is this really a Python problem? `subprocess.run` for example defaults to `shell=False` so you have to set `shell=True`, and on top of that be building up argv?
  The "default" API for `subprocess.run` has you doing `subprocess.run(["ping", ip])` which... I think just entirely avoids this problem?
  There's def a general sort of "oh people will just copy/paste stuff into a shell" or the whole shell script arg escaping mess. Just feels like Python is not really doing anything bad here.
  btown
  Never underestimate the power of an LLM that's spent its entire context passing its own self-generated strings to `bash`, to think "maybe the quickest way to get this done is to pass a self-generated string to `bash`."
  AshamedCaptain
  do note that even if you don't do shell expansion you're still subject to "smart" programs interpreting a single argv that starts with a dash as a parameter and its argument. I'm sure there's going to be a CVE about this at some point if there hasn't already.
- deepsun
  I would argue that command line is for human input, so the failure already happened when they composed a `ping` shell command programmatically.
  Granted, a lot of software works like that, but the command line was invented as a human interface, we just bungee-strapped a computer instead.
- edoceo
  Maybe the crazy part is also what is a valid IPv6 string. Amd for safety mostly-never pass anything to the shell.
evgpbfhnr
And it gets even more fun when browsers such as firefox implemented this, then decided no we won't do it and removed the feature -- now there's no way to access your router web interface over link-local address...
(rationale being that whatwg said no: https://github.com/whatwg/url/issues/392 ; firefox bug https://bugzilla.mozilla.org/show_bug.cgi?id=700999 )
The "solution" is to use a proxy such as https://github.com/twisteroidambassador/prettysocks/tree/ipv... which incidentally encode the % as a `s` and handle special URLs like this http://fe80--1ff-fe23-4567-890as3.ipv6-literal.net for you through the socks dns resolution feature... I've never found anything else that works recently -_-
- Dagger2
  I very much didn't test it, but this patch might do the job on Firefox (provided there's no code in the UI doing extra validation on top):
  --- a/netwerk/base/nsURLHelper.cpp +++ b/netwerk/base/nsURLHelper.cpp @@ -928,3 +928,3 @@ bool net_IsValidIPv4Addr(const nsACString& aAddr) { bool net_IsValidIPv6Addr(const nsACString& aAddr) { - return mozilla::net::rust_net_is_valid_ipv6_addr(&aAddr); + return true; }
  Or if you actually wanted to do some validation, pass the address to getaddrinfo():
  bool net_IsValidIPv6Addr(const nsACString& aAddr) { struct addrinfo *res, hints = {.ai_flags = AI_NUMERICHOST}; int err = getaddrinfo(aAddr.get(), nullptr, &hints, &res); if (err) return false; bool isValid = res[0].ai_family != AF_INET; freeaddrinfo(res); return isValid; }
  This way it's valid if your OS considers it a valid address.
- zamadatix
  Could you do the same trick just putting a temporary entry in the hosts file?
AshamedCaptain
You complain about URL encoding ? Enter UNC encoding ...
https://devblogs.microsoft.com/oldnewthing/20100915-00/?p=12...
> \\fe80--1ff-fe23-4567-890as3.ipv6-literal.net\share
- Dagger2
  The most amazing part about this is that Microsoft used a public domain for it and then lost the domain registration.
  skissane
  And don't even care to make a serious effort to get it back. I suspect if they tried using the UDRP with a claim "we lost it by accident, cybersecurity risk, current owner is just squatting on it without actively using it" – they'd have quite decent odds of success, given the attitudes of the average UDRP arbitrator. The current holder would of course argue "you lost it more than a decade ago, you should be estopped by the passage of time" – but again, the average UDRP arbitrator would likely weigh the "cybersecurity risk" argument higher.
- jujube3
  Only a chopped unc would use that notation.
dcow
Hasn’t everyone moved on to ULAs now?
To explain, IPv6 link local addresses are like using a MAC address to send packets. You wouldn’t ever host services on a LL address and things that do are doing it wrong. Every v6 router should advertise a ULA prefix to all downstream clients. If you want to connect to your router’s web UI you’d use its universal local address—not its link local—and avoid all of these problems. This is exactly why zones were deemed mistake and replaced by ULAs and this was 10 years ago… at least!
- neilalexander
  Having services be accessible on a link-local address and then advertising that service via mDNS is a completely legitimate use-case that works extremely well and is extremely common with Apple devices amongst others. The advantage being that it still works just the same even without a router handing out addresses or if you just connect two devices directly to each other.
  Also what gives you the impression that zones were “deemed a mistake”? They may be awkward in URIs but they are very much not a mistake, they are a deliberate part of ensuring that each link has its own link-local subnet without any ambiguity. It solves the problem of what the operating system should do if you need to access a link-local address that shows up via more than one network interface, which is a very real problem with unscoped IPv4 link-local addresses.
  Finally, ULAs don’t and were never intended to replace link-local addresses, they serve a different purpose entirely.
- topranks
  ULAs are standards compliant but tbh it's a layer of complexity I rather not have.
  Just give me GUAs and be done with it.
  cassianoleal
  GUAs are dependent on the PD you get from your ISP. Change ISPs, all your IPs change. ISP decides to change the PD, all your IPs change...
Tharre
"IPv6 is weird. One of the more strange parts of the standard is that every interface's link local addresses are in fe80::whatever`."
How is IPv6 weird here, it's the exact same thing in IPv4, no? If you have two different network interfaces, you have to identify which is which somehow, either by assigning a specific IP range to it or by adding some kind of identifier.
Making zones part of addresses in the first place was probably a mistake, I agree, but the problem of address conflicts when users can choose arbitrary addresses certainly isn't a design flaw of IPv6.
- WhyNotHugo
  It's not the same as IPv4. IPv4 doesn't solve this problem. If eth0 and eth1 are both 169.254.0.20 on two different networks, you can't specify that you want to ping 169.254.0.1 on a specific interface. There's no way to disambiguate both destinations.
  fragmede
  https://linux.die.net/man/8/ping
  ping takes a -I argument you specify which interface to use.
  sznio
  except in ipv4 getting a link-local address means "I fucked up DHCP" and isn't really meant to be a feature it didn't really work in ipv4 land, and as per the OP, doesn't work in ipv6 land too. Just give everything a proper address and leave link-local to mdns or whatever it was meant to support
  throw0101a
  > except in ipv4 getting a link-local address means "I fucked up DHCP" […]
  No, it means "there is no infrastructure on this link segment". No router (to send out IPv6 RAs), and (as you say) no (working) DHCP server.
  Still being able to have network connectivity automatically in this scenario can still be handy. If mDNS is running on things, then the user doesn't even have to bother manually setting an address: the link light comes on and you have connectivity to the local segment.
- trumpdong
  A link-local address necessarily needs a way to specify a link, and the link is local to the sending host and not something the receiving host knows. I suppose they could have used the upper address bits, but the sending host would need to know to convert them to 0 when sending the packet out on the wire, and with the interface ID when receiving.
- masfuerte
  There aren't address conflicts. And users aren't choosing this, it's part of the IPv6 spec. Each interface has a unique address, but you can't tell from looking at an address which network it lives on.
  ivlad
  Not really.
  Nothing prevents host from configuring a static link-local address, like fe80::1234. Not only that, some networks choose to have some standard link local address as a default gateway. For example, a router or a L3 switch can have fe80::1 on its downstream interfaces. This way, all hosts on all networks have fe80::1 as the default gateway and the router will have fe80::1 address on multiple interfaces.
  Furthermore, you can (and some say, should) use link local addresses on transit links between your network devices, eg, between layers of switches in a hyperscale-sized data center network. Typically, the addresses will be deterministically configured, for example, consider
  -(e1.0)[switch1](e1.1)—--(e2.48)[switch2](e2.25)-(eth0)[server1]
  We have server1 connected to top-of-rack switch2 connected to aggregator switch1. Link between switch1 and switch2 is point-to-point transit. You can use exclusively link local addresses there. There are a few approaches:
  - e2.48 gets fe80::2, e1.1 gets fe80::1 - all upstream ports are always fe80::2 in all network, all downstream ports are always fe80::1. A good thing is that link configuration is the same on all switches regardless of the Clos layer.
  - switch1 serial number is 1001, switch2 serial number is 2002. Then, e2.48 gets fe80::2002, e1.1 gets fe80::1001. This way, all interfaces on a switch N have address fe80::N
  You then can set up BGP session between the link local addresses and it either will always be either fe80::1 <-> fe80::2 or fe80::N <-> fe80::M. Switches also have a loopback address for ping, and other ICMP traffic. Either has advantages and disadvantages.
  This is discussed in more details in RFC 6164, and a more high level overview is provided in RFC 7404.
- josephcsible
  I think the weirdness comes from the use of multiple addresses at once, specifically fe80::whatever addresses always being present and getting used even on normal setups when everything's working fine and a global address is configured, as opposed to 169.254.whatever addresses, which most networks never intend to use and so usually only show up when something is wrong.
  nine_k
  Isn't 127/8 always present in IPv4, without I'll consequences?
  josephcsible
  I meant it's one address per interface, and loopback has always been its own interface.
  trumpdong
  One address per host is more common in serious networks that don't have endless IP addresses (10/8 block) allocated to them.
  dcrazy
  There is no problem with allocating one 127.0.0.0/8 to every interface on your host, because 127.0.0.0/8 is only ever accessible to the host itself. So even if you have multi-homed a single routable IPv4 address to 2 NICs on your server (for redundancy), you can still assign 127.0.0.1 to the first and 127.0.0.2 for the second, which you can then use to bind a port to a specific interface in the pair. (I don’t know if anyone actually does this.)
  trumpdong
  How would the receiving host know which 127 address you imagined belongs to it?
  dcrazy
  What do you mean “receiving host?” 127/8 is reserved for loopback. If you bind a socket to an interface with an address in that range, you can only use it to communicate with yourself. The sending and receiving hosts are the same.
  trumpdong
  I mean the host that receives the packet. Weren't you suggesting to use 127/8 as an alternative to link-local addresses?
  dcrazy
  No, I was saying that you can assign different loopback addresses to different interfaces even if the interfaces have been assigned the same routable IP address. This lets you distinguish them.
  On my Mac, however, loopback addresses are only assigned to the `lo0` interface, not to physical interfaces. I don’t know anything about how other platforms handle it, so I caveated my explanation with “maybe nobody does this in practice.”
- farfatched
  The title of the post suggests the issue is allowing that syntax in URLs.
  Is there an equivalent syntax for IPv4 addresses?
  toast0
  No, IPv4 doesn't have the concept of addresses that are scoped to a particular interface.
  Rfc3927 which standardizes the use of 168.254.0.0/16 for ipv4 link local was published in 2005, mentions scoped addresses but does not offer any solutions.
  However, nothing really relies on ipv4 link local addressing, and most networks don't use it. It's a conceptual problem that these are interface scoped addresses and there's no (standard) way to specify them to applications, but it doesn't cause actual problems.
  On the other hand, ipv6 neighbor discovery uses ipv6 link local addresses, so they have to work. And you might try to use them for other things... but then you need to pass through the scope. It's kind of ugh when it causes problems.
- themafia
  > you have to identify which is which somehow
  The _routing_ system does. You have the same problem if you have multiple public IPs on a machine. Your local routing will not automatically return packets back through the address they came to. They will go to the _default_ route. So if you have this configuration you need to setup either the routing tables or the firewall to re-route packets "back out" the proper interface or IP address.
  This is strictly a routing problem and not an addressing problem.
- cyberax
  [dead]
KingMachiavelli
IPv6 zones are great. IPv6 neighbor discover is great. Link local doesn’t need SLAAC or DHCPv6 server.
It seems clear that browsers would need to special case this and implement OS specific rules for what a valid Zone ID is.
Browsers have overloaded the URL/search/AI magic bar for years. I don’t quite see how IPv6 Zone ID are that different. They already auto append or auto hide the www subdomain.
- thayne
  Since the zone is is device specific, it doesn't make sense to send it over the wire in html links or a host header, bit it does seem like it could be used in the url bar, at least.
sedatk
That's a bit of a stretch. First, IPv4 can't handle this scenario at all. It's an IPv6 feature. So, let's just be thankful that this exists. Amen.
Second, if you don't want to use interface IDs, you can just enable ULAs on your networks, and routing will take you to the correct interface.
- mananaysiempre
  It’s not exclusively an IPv6 feature: RFC 3927 defines link-local IPv4 addresses, to be assigned randomly from 169.254.0.0/16 after a bit of ceremony to detect collisions.
  Ideally, you’d be able to connect a PC and a printer with an Ethernet cable, they would both (having failed to find a better alternative) allocate a link-local address for themselves, and then the PC would use DNS-SD over mDNS to discover the printer and show it to you. Similar story with PCs exporting their media files over the network, a—say—set-top box, and a switch they’re all plugged into.
  And for some combinations of parts this actually works. It’s just that the functionality is not always well-exposed by the OS, that a switch + DHCP server in a box (in practice, a consumer router) can work just as well with no configuration as an unmanaged switch can, and that people are not that interested in local-only wired networks anymore.
  There’s also the “having failed to find a better alternative” part: unlike with IPv6, the RFC does not endorse always assigning a link-local address as the second one next to a static or DHCP-provided one, I’m guessing for software compatibility. Thus you really only see 169.254.* in your interface configuration when DHCP is borked, and it’s kind of useless in that case.
  dcrazy
  The IPv6 feature isn’t link-local addresses, it’s being able to specify the interface to bind to as part of the address specification. This lets you demand that your IPv6-based tool use your wired Ethernet connection, for example.
  AshamedCaptain
  You cannot use zones for global addresses, so zones are indeed mostly a feature of link-local addresses only.
  dcow
  That is not a design goal of IPv6. It’s a terrible leak in the abstraction.
  dcrazy
  How is it not a design goal? Why else would this syntax have been invented?
  themafia
  > to be assigned randomly from 169.254.0.0/16
  Yes, but the question is, "what if an address in this range is assigned to _two_ interfaces at the same time?" Now your local routing information base cannot distinguish which interface to use when trying to reach other hosts in that same network. So, it's fair to say, it's not a feature even available in IPv4.
  The second difference is IPv6 is almost always going to have link local addresses assigned and machines with multiple network interfaces are the norm rather than the exception.
  xp84
  I literally had to interrogate an LLM to explain what this was about, because to me, indeed, when I see 169.254 I think "Ah, someone unplugged something critical and the network is now completely down." I didn't even know that in ipv6 land there are any reasons to use link-local addresses for anything. I mean, there still basically isn't a reason for 99.99% of people, I think. But it's interesting.
  I also didn't realize that part of the idea behind these LL things was one of the rounds of wishful networking ideas of the 90s or 2000s, kind of a cousin of UPnP and mDNS in that way (in increasing order of eventual usefulness).
  Considered completely in a vacuum, especially ignoring the WAN, I can see how it seemed silly that if you plugged three computers and a printer into a switch, rolling random IP addresses like this could have allowed things to be discoverable and to function locally (I thought mDNS or "Bonjour"/"Rendezvous" as Apple called it came much later, but I know my PCs could "see" each other with NetBIOS or whatever long before mDNS was invented).
  RiverCrochet
  Link-local addresses (LLAs) are needed in IPv6 because IPv6 doesn't have broadcast. IPv6 uses multicast instead.
  Broadcasts go to all IPv4 addresses in the subnet, multicasts only go to those who subscribed to a multicast group. To subscribe to a IPv6 multicast group you need an IPv6 address. So all IPv6 interfaces will have at least one LLA self-generated.
  One thing that IPv6 uses multicast heavily for is NDP, which is the IPv6 version of ARP. This is how IP addresses on your LAN/WLAN are converted to MAC addresses which is required info for the NIC in your node to talk to another node on your Ethernet LAN/WLAN.
  End users don't typically have to use LLAs directly but you can use them if you want to 100% ensure things won't leave your LAN as routers don't forward LLAs.
  Dagger2
  mDNS on link-locals is what makes the "plug computers and printers into switch" case work. It would have been NetBIOS originally but mDNS is how it's done today.
yjftsjthsd-h
> In order to disambiguate what's the host and what's the port, you typically format the IPv6 address in square brackets, so fe80::4 on port 80 would look like this:
> [fe80::4]:80
I really do wish they'd just stuck with dots. Or if we must upend things, commit to the bit and change the character to separate ports.
- sedatk
  > I really do wish they'd just stuck with dots
  Then it would get confused with domain names (e.g. babe.cafe).
  yjftsjthsd-h
  Ah, right, because we threw in hex. That's fair, but then I return to: If we're doing that, we should have changed the port separator.
  kccqzy
  Using hex is probably a mistake. We didn’t use hex for IPv4 and that worked really well.
  nickburns
  If it ain't broke!
  NewJazz
  https://github.com/cloudnative-pg/cloudnative-pg/issues/1071...
  https://github.com/Qualys/Qualys-Helm-Charts/issues/4
  yjftsjthsd-h
  I contend that needing [] to disambiguate is absolutely broken.
  wpollock
  I wonder how many readers realized your joke here. For the ones who didn't, the 4-byte "magic number" that identifies Java .class files, in hex, spell "CAFEBABE".
  Doxin
  It's also just sort of generally used as easily spotted value in a hex editor. similar to DEADBEEF, ABADBABE, CAFED00D, and probably a bunch more variations on the concept. CAFEBABE seems especially prolific, getting used for -- among other things -- poison value for memory pools in plan9 and MACH-O universal object files magic number[0]
  [0] https://en.wikipedia.org/wiki/Hexspeak
  userbinator
  Dots and decimal, like 47806.51966.0.0.0.4919.57005.48879
  theamk
  reserve a TLD, like ".v6", and you are done.
  URL parsers don't break, the amount of code to change is not that big, and many of the user-space applications can keep working with no changes at all, as long as they use high-level network libraries.
  If you really hate this for some reason, use some other characters. How about underscores (_) for example? Those are not valid in DNS, so there is no chance of confusion.
  Choosing colon when URLs were already using it is either very stupid or very mean.
  ianburrell
  There is the .arpa domain used for reverse lookups. ipv6.arpa is already used for that. But combining the ipv6-literal from Microsoft, gives ipv6-literal.arpa.
- anyfoo
  Yeah. I think that's actually my one, biggest gripe about IPv6, those damn colons. And those damn brackets that were made to mitigate the colons, that just cause more problems:
  Just yesterday I tried to use rsync (like I do all the time, in my mind there's no reason to use scp when rsync does everything better), but this time I needed to specify an IPv6 address. On the (admittedly ancient) rsync version that comes with macOS, this doesn't work:
  rsync foo 'user@[fe80::4]:/tmp'
  Note already, how I had to put the second argument in quotes, because otherwise the shell tries to expand the square brackets as filename expansion.
  But even then rsync just complains, because rsync itself separates host from path through colon. I think the only workaround is to do something like `rsync -e 'ssh user@[fe80::4] ...'`... but I just used an updated rsync from homebrew, which is of course the saner method. Still, just another colon/bracket-caused issue.
  ablob
  Isn't this just an issue with rsync? (or rather your ancient version of it) I think you'd run into the same issues when using an IPv4 address port combination. It was rsync's choice to use colon as an indicator in lieu of IPv6's existence. You'd be complaining all the same for other separator choices if rsync just happened to pick the same one.
  Nonetheless I do agree that the choice of colons isn't great due to how it ambiguates their meaning.
  anyfoo
  Absolutely it is. But still, the colons and brackets often make things awkward, leading not only to such compatibility bugs, but to general usability issues. Colons and brackets are just too overloaded within both destination specifiers (e.g. for ports, paths...) and shell syntax, and probably other things, where as the dot '.' rarely is.
  I'm an avid user of IPv6 by the way, I don't share a lot of the criticism. For me personally it's a net positive. But this is a wart where I wish they went a different direction.
jchw
I ran into some of these issues when working on IPv6 validation in a library. I found that if you just call system functions like inet_pton, you would also get OS-dependent restrictions on what zone identifiers are valid! This isn't ideal so I wound up just making an IPv4/IPv6 parser with a very liberal zone ID production. Said library also supported URLs, and I did not implement it to parse the IPv6 literal as percent encoded in this edge case, but it winds up working both ways anyways. Is this good? Maybe not: maybe it would've been better to pick a strict subset instead. However, whether or not that would be better depends on specific use cases. Unfortunately, there is just no perfect answer sometimes.
epistasis
> And with the right scope it looks like this:
```
    [fe80::4%eth0]:80
```
> Now let's get URL encoding into the mix. ...
About here my I felt my heart start to beat really fast and I started to hyperventilate.
I'll just accept that this is as much of a nightmare as it seems.
- wolletd
  I wonder why IPv6 didn't catch on! It's just unergonomic and ugly!
  At work, I have a rare case of a useful application of IPv6: setting IPv4 addresses. We have multiple embedded devices in one product which all got the same default IPv4. But their serials map to their MACs which map to their link-local IPv6.
  So workers scan the serial and I connect to all devices at once via their IPv6 address. Then, I set their individual IPv4 address and that's all I do via IPv6.
  anyfoo
  Why don't you just use the IPv6 address directly then? Phrased differently, what's better about IPv4 in your particular case that makes it worthwhile to only use IPv6 for "bootstrapping" IPv4?
  I must say, I rather enjoy both IPv6s autoconfiguration, and the fact that my non-link-local addresses are actually unique (and if I want to, routable).
- sundbry
  It's not hard, just don't use those addresses in your application.
  Link-local ipv6 addresses are not designed for the use case of serving web applications.
gerdesj
"so if you have a packet destined to fe80::4, how do you disambiguate it?"
Routing tables get you to the destination but I think the question is about which source address to use ie which network card/interface to use as source - after all, they are all in fe80::.
For a destination in fe80:: the OS will pick the one on the right interface (in effect the IPv6 version of ARP).
You never use fe80:: as a source for a network beyond fe80:: because it and they are link local addresses. You'll send to the default gateway/GoLR/etc unless you have more explicit routes and set your source address as your IPv6 "identity" which might be one of many.
Anyway, here's your problem:
"But if you try to parse this as a URL in Go, you get an error:"
Go needs fixing!
- Dagger2
  Routing tables don't work here, because the routing table looks something like:
  fe80::/64 dev eth0 proto kernel metric 256 pref medium fe80::/64 dev eth0.11 proto kernel metric 256 pref medium fe80::/64 dev eth0.13 proto kernel metric 256 pref medium fe80::/64 dev eth0.14 proto kernel metric 256 pref medium fe80::/64 dev eth0.15 proto kernel metric 256 pref medium fe80::/64 dev eth0.16 proto kernel metric 256 pref medium
  Which interface's fe80::4 are you talking about? They all have an fe80::4.
- themafia
  > in effect the IPv6 version of ARP
  NDP. That's a discovery protocol not an elimination protocol. There's no guarantee that a link local address isn't available on multiple networks.
  > the OS will pick the one
  Linux will simply pick the first entry in the routing table. It may make this appear as if it's working by default or some underlying magic; however, it's literally just the very first entry that matches.
Dagger2
I've never really got why this is so complicated. My interpretation of [] syntax in URLs is "[ enters into a raw address mode", "] exits the raw address mode" and "the characters between the brackets are opaque address characters to be passed to getaddrinfo()".
(It basically has to be this way, or the URL syntax would need to be updated to support future address families with their own address formats. New address families can be loaded at runtime, including ones that didn't exist at the time your current software was compiled -- and this is handled properly by the BSD socket API -- so hardcoding possible address formats is incorrect.)
The _only_ character that needs special handling is ], and if you're willing to declare that you can't be bothered to support link-local addresses at all then declaring that you'll support anything except addresses containing a "]" should be far easier.
GalaxyNova
No they are not. There is frankly no other way to handle the cases that IPv6 zones do.
lxgr
Are URLs of link local addresses a common thing with IPv6? I don’t think I’ve ever encountered one myself (but my home network supports ULAs and more importantly DNS).
- nickcw
  Link local addresses are exactly that. They don't route and they are for low level stuff like adding stuff to the routing table or BGP.
  If you want to do this properly then you configure a Unique Local Addresses (ULA) out of the range fc00::/7. These are the equivalent of 192.168 or 172.16 or 10. and they can be routed.
  Trying to run services on fe80: addresses is a mistake IMHO
- singpolyma3
  No. A well set up network never needs them at all. But I can see the usefulness
- _bernd
  Think of that you want to Provision a "smart device" with just a computer and no router.
  These link local addresses are quiet handy. But sadly the parsing of these with modern browsers is a flame war ever since. I assume that's the reason why we don't see its usage that often.
  Another nice use case is to use these link local addresses in cloud environments...
  thedougd
  mDNS should work here even without a reflector.
  _bernd
  Nobody is talking about mdns here. I don't know where you got this from.
- trumpdong
  Not common, but should be permissible if you want any kind of consistency in your software.
- Dagger2
  They'd be more common if browsers didn't completely break handling them.
johneth
I've just been implementing a bunch of URL-related utility functions in Go. Decided the most pragmatic solution of handling IPv6 addresses in URL hosts is to outright reject zone identifiers because of the ambiguity in how to parse / serialize them, and the inconsistent ways others have done it (or most of the time, not done it).
RFC 3986 says "This syntax does not support IPv6 scoped addressing zone identifiers." Makes sense because '%' is a reserved character for percent encoding (hence the %25 that Go's net/url expects).
The URL Standard explicitly states "Support for <zone_id> is intentionally omitted."
neild
> In theory, there is guidance for how to properly handle IPv6 zones in user interfaces in RFC 9884, but there's no such guidance for URLs.
RFC 6874: Representing IPv6 Zone Identifiers in Address Literals and Uniform Resource Identifiers (https://www.rfc-editor.org/rfc/rfc6874.html)
Which says that, yes, you need to %-encode the %, so a URL containing a host of fe80::4%eth0 becomes http://[fe80::4%25eth0]/. Yes, that's ugly. Sorry.
> TL;DR: computers were a mistake.
I agree entirely.
(For what it's worth, I am a maintainer of Go's net/url package, and I believe net/url correctly handles zone ids in URLs. It's always possible there's something wrong I'm not aware of. Please let me know if there is!)
- agwa
  That RFC is obsoleted by https://datatracker.ietf.org/doc/html/rfc9844 which removes all guidance around URIs:
  > This document completely obsoletes [RFC6874], which implementors of web browsers have determined is impracticable to support [LINK-LOCAL-URI], and replaces it with a generic UI requirement. Note that obsoleting [RFC6874] reverts the change that it made to the URI syntax defined by [RFC3986], so [RFC3986] is no longer updated by [RFC6874]. As far as is known, this change will have no significant impact on non-browser deployments of URIs.
  neild
  Fair enough, but that leaves us with no way to represent zone IDs in URLs at all. Neither http://[fe80::4%eth0]/ nor http://[fe80::4%25eth0]/ is valid under RFC 3986.
  Given that net/url has supported RFC 6874 since before RFC 9844 came along, our choices are:
  * Keep supporting the RFC 6874 syntax.
  * Drop support for it, require strict RFC 3986, have no support for zone IDs in URLs at all. Breaks existing users, utterly infeasible.
  * Stop supporting RFC 6875 and start supporting an unescaped % as the zone ID separator, which conforms to no standard I know of. Also breaks existing users, infeasible.
  * Some sort of hybrid where we try to support both %25 and % as a separator? Ugh.
  Of these, keeping the existing support as-is until or unless a new standard comes along seems like the best option.
  agwa
  Yeah, I agree. No criticism of Go's behavior is intended; just pointing out that the RFC is technically dead.
- xena
  I have published a fix to the post, it should be live within a minute. Thanks!
  https://github.com/Xe/site/commit/f846b489092412b8c1ef70bebd...
  arcanemachiner
  The sibling comment to yours may be useful:
  https://news.ycombinator.com/item?id=48405808
  xena
  i hate computers
OptionOfT
In Rust there is the same problem. The `url::Url` library does not support `%<zone_id>`.
`http::Uri` does, and it accepts both `%` and `%25`.
https://play.rust-lang.org/?version=stable&mode=debug&editio...
jeroenhd
This just proves that Go URL formatting was a mistake. IPv6 addresses existing long before Go decided on how they should be formatting strings. Python has a similar problem with parts of its standard library.
This is what happens when language and standard library designers ignore a spec like IPv6 for a couple of decades.
ghhhibhc
Nothing is more idiomatic Go than ignoring inconvenient edge cases.
- pavon
  Who says Go's handling of the corner case is incorrect? The original IPv6 RFCs didn't address the case at all. Then in 2013 RFC6874[1] clarified that the % in the zone identifier MUST be percent encoded when used in a URI, just like Go requires. Then in 2025 this RFC was obsoleted by RFC 9844, which only talks about UI behavior and says nothing about URIs, basically reverting things back to the undefined state prior to 2013. What a fucking mess.
  [1] https://www.rfc-editor.org/info/rfc6874/
  [2] https://www.rfc-editor.org/info/rfc9844/
- contingencies
  Added to https://github.com/globalcitizen/taoup/
teunispeters
I use IPV6 ULA to deal with this. (IOT link local only connections). Mind, I also ensured scope routing/etc worked, too, as part of the unit test family.
- sedatk
  ULA isn’t link-local though. It’s privately routable.
hamandcheese
Couldn't the need for Zones have been solved with ARP-like probing? I.e. if you don't know on which interface to route a link local address, try pinging the address from each interface, and see which one responds.
- neilalexander
  No, because you could feasibly end up with neighbour entries for the same address via multiple interfaces and then you are no further forward.
pilif
This response to the post announcing the article is very telling:
https://oldbytes.space/@mrrmot/116694151801834138
OptionX
I thought fe80::whatever was only for link local, and link local was only for 1-1 communication with router for SLAAC.
After you'd get a unique local than thebn would be used for normal routing needs.
Did I get the wrong?
- anyfoo
  You can use link local for whatever you want, I don't think there's a restriction, is there?
  Even though it's rare, I actually do use it if I want to talk to another host on a very specific interface. Sometimes there's multiple paths.
  OptionX
  I when to check and I think I get it now, the link-local is routeable (switchable?) but only at the local level, but then you might ask why bother with SLAAC at all then. It's due to router being unable to route anything with a link local origin or destination as they are not globally unique so if you need to talk to anything past layer 2 you need unique-local address (or global).
  anyfoo
  Yeah, but you can still talk to other hosts on the same link, not just the router, at any layer protocol. Link local addresses are not routable, but if you want to talk on the same network segment, that's fine.
  dcrazy
  Case in point, when I SSH from my laptop to my desktop using mDNS hostnames, I see “Last connection from: fe80::<something>”.
jasonjayr
Also, thank you windows for not having consistent interface ids after reboot. I had to rewrite a configuration file every startup with powershell in order to tackle this case.
- v1ne
  Interfaces are persistent in Windows, that's why they get assigned such silly names as "LAN interface (42)".
  If the mapping between the logical and physical interfaces changes, that probably means that your NICs lack proper IDs to differentiate them or the bus topology is somehow not stably sorted. I wouldn't blame the OS for this.
  jasonjayr
  Don't want to go too far off topic; but the interface in question was a uniquely named internal network interface for a Hyper-V VM. Considering it's MS all the way down, I'd expect them to get it right.
j16sdiz
I would go further and argue IPv6 zones is a mistake.
I know it solve some real problem. but the cost for solving those is just too high. A careful planning and manual configuration can avoid those.
IPv6 promised a 128bit address space, but we got 128bit + arbitrary length of string instead. This force lots of complication on to application developer.
manytimesaway
Ads on a blog you selfpost on HN is a new low.
- ihn988765
  [flagged]
rnxrx
[dead]
JackSlateur
TL;DR: computers were a mistake.
- bigstrat2003
  Honestly, I can't really say he's wrong...
nickburns
More strange. Stranger. This is strange. Stranger? Who are you?
singpolyma3
I don't even understand what's being complained about here. If you want a % in a Uri you need to encode it. It's not rocket science
- spartanatreyu
  Except that % is already used to encode something else.
  Now if someone else a URI, is there going to be any confusion on how many times a URI needs to be decoded?
  If the answer is yes, then we have a problem.
  (and by looking at the other comments in this thread, the answer is most definitely yes)
  singpolyma3
  It only needs to be decoded once? The raw % is in the host and will be recovered by this process. Same as any other url encode/decide situation