Running Out of Disk Space in Production

123 points by romes 4 days ago | 67 comments

flanfly
A neat trick I was told is to always have ballast files on your systems. Just a few GiB of zeros that you can delete in cases like this. This won't fix the problem, but will buy you time and free space for stuff like lock files so you can get a working system.
- layer8
  Better fill those files with random bytes, to ensure the filesystem doesn’t apply some “I don’t actually have to store all-zero blocks” sparse-file optimization. To my knowledge no non-compressing file system currently does this, but who knows about the future.
  nyrikki
  XFS, Ext4, btrfs etc… all support sparse files, so any app can cause problems you can try it with:
  dd if=/dev/zero of=sparse_file.img bs=1M count=0 seek=1024
  If you add conv=sparse to the dd command with a smaller block size it will sparsify what you copy too, use the wrong cp command flags and they will explode.
  Much harder problem than the file system layers to deal with because the stat size will look smaller usually.
  layer8
  Creating sparse files requires the application to purposefully use special calls like fallocate() or seek beyond EOF, like dd with conv=sparse does. You won't accidentally create a sparse file just by filling a file with zeros.
  freedomben
  Yep, btrfs will happily do this to you. I verified it the hard way
  kccqzy
  Well btrfs supports compression so that’s understandable. However I personally prefer to control compression manually so it only compresses files marked by me for compression using chattr(1).
  freedomben
  I've switched to that also. It surely wastes some space but being able to reason about file space is worth it to me for now
  ape4
  If I recall correctly:
  dd if=/dev/urandom of=/home/myrandomfile bs=1 count=N
  Twirrim
  If you want to do it really quickly
  openssl enc -aes-256-ctr -pbkdf2 -pass pass:"$(date '+%s')" < /dev/zero | dd of=/home/myrandomfile bs=1M count=1024
  Almost all CPUs have AES native instructions so you'll be able to produce pseudorandom junk really fast. Even my old system will produce it at about 3Gb/s. Much faster than urandom can go.
  ape4
  That's very cool. Sadly running that exact command gets an incomplete file and error "error writing output file". It suggests adding iflag=fullblock (to dd). Running that makes a file of the correct size. But still gives "error writing output file". I suppose that occurs because dd breaks the pipe.
  fragmede
  bs=1 is a recipe for waiting far longer than you have to because of the overhead of the system calls. Better bs=N count=1
  __david__
  That’s also not great if you’re trying to make a 10 gigabyte file. In that case, use bs=1M and count=SizeInMB.
  marcosdumay
  Modern computers are crazily overengineered...
  Most current desktops (smaller than your usual server) won't have any problem with the GP's command. Yours is still better, of course.
- dspillett
  Similarly, I always leave some space unallocated on LMV volume groups. It means that I can temporarily expand a volume easily if needed.
  It also serves to leave some space unused to help out the wear-levelling on the SSDs on which the RAID array that is the PV¹ for LVM. I'm, not 100% sure this is needed any more² but I've not looked into that sufficiently so until I do I'll keep the habit.
  --------
  [1] if there are multiple PVs, from different drives/arrays, in the VG, then you might need to manually skip a bit on each one because LVM will naturally fill one before using the next. Just allocate a small LV specially on each and don't use it. You can remove one/all of them and add the extents to the fill LV if/when needed. Giving it a useful name also reminds you why that bit of space is carved out.
  [2] drives under-allocate by default IIRC
  justsomehnguy
  Not needed. All your unused/unfilled space is that space for wear-leveling. It wasn't needed even back then besides some corner cases. And most importantly 10% of the drive in ~2010 were 6-12GB, nowadays it's 50-100GB at least.
- throw0101d
  > A neat trick I was told is to always have ballast files on your systems.
  ZFS has a "reservation" mechanism that's handy:
  > The minimum amount of space guaranteed to a dataset, not including its descendants. When the amount of space used is below this value, the dataset is treated as if it were taking up the amount of space specified by refreservation. The refreservation reservation is accounted for in the parent datasets' space used, and counts against the parent datasets' quotas and reservations.
  * https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops...
  Quotas prevent users/groups/directories (ZFS datasets) from using too much space, but reservations ensure that particular areas always have a minimum amount set aside for them.
  throw0101d
  Typo; link should be:
  * https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops...
  Addendum: there's also the built-in compression functionality:
  > When set to on (the default), indicates that the current default compression algorithm should be used. The default balances compression and decompression speed, with compression ratio and is expected to work well on a wide variety of workloads. Unlike all other settings for this property, on does not select a fixed compression type. As new compression algorithms are added to ZFS and enabled on a pool, the default compression algorithm may change. The current default compression algorithm is either lzjb or, if the lz4_compress feature is enabled, lz4.
  * https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops...
  dizhn
  Also if you VMs on a disk backed by ZFS it's trivial to extend those disks provided you actually do have space on the real disk. (Even automatic with LXC).
- dijit
  I always called it a “bit-mass”. Like a thermal mass used in freezers in places where the power is not very stable.
  I knew I didn’t invent the concept, as there’s so many systems that cannot recover if the disk is totally full. (a write may be required in many systems in order to execute an instruction to remove things gracefully).
  The latest thing I found with this issue is Unreal Engines Horde build system, its so tightly coupled with caches, object files and database references: that a manual clean up is extremely difficult and likely to create an unstable system. But you can configure it to have fewer build artefacts kept around and then it will clear itself out gracefully. - but it needs to be able to write to the disk to do it.
  Now that I think about it, I don’t do this for inodes, but you can run out of those too and end up in a weird “out of disk” situation despite having lots of usable capacity left.
- fifilura
  I did this too, but i also zipped the file, turns out it had great packing ratio!
  saagarjha
  Personally I just keep the file on a ramdisk so you can avoid having to fetch it from slow storage
  3form
  Neat! I optimized for my own case, and I'm storing my ramdisk on SSD to gain persistence.
- happycrappy
  Interesting strategy, can't believe I've never heard of this one before.
  Would it be more pragmatic to allocate a swap file instead? Something that provides a theoretical benefit in the short term vs a static reservation.
  prmoustache
  Because adding swap file is instantaneous, removing one that is in use can take a longtime unless you reboot the OS so you can't just nuke it quickly.
- klaushardt
  This is my snippet i used alot. In doubt when even rm wont work just reboot.
  Disc Space Insurance File
  fallocate -l 8G /tmp/DELETE_IF_OUT_OF_SPACE.img
  https://gist.github.com/klaushardt/9a5f6b0b078d28a23fd968f75...
- ninalanyon
  This is why I never empty the Rubbish Bin/trash Can on my Linux laptop until the disk fills.
- HoldOnAMinute
  Sounds like something straight out of Dilbert
- Chaosvex
  Similar to the old game development trick of hiding some memory away and then freeing it up near the end of development when the budget starts getting tight.
- dj0k3r
  I did this recently, aka, docker images prune. Can confirm, saved the day.
- omarqureshi
  Surely a 50% warning alarm on disk usage covers this without manual intervention?
  theshrike79
  Depends. A Kubernetes container might have only a few megabytes of disk space, because it shouldn't need it.
  Except that one time when .NET decides that the incoming POST is over some magic limit and it doesn't do the processing in-memory like before, but instead has to write it to disk, crashing the whole pod. Fun times.
  Also my Unraid NAS has two drives in "WARNING! 98% USED" alert state. One has 200GB of free space, the other 330GB. Percentages in integers don't work when the starting number is too big :)
  jcims
  If the alarms are reliably configured, confirmed to be working, low noise enough to be actioned, etc etc.
  And of course there's nothing to say that both of these things can't be done simultaneously.
  evil-olive
  > Surely a 50% warning alarm on disk usage covers this without manual intervention?
  surely you don't need a fire extinguisher in your kitchen, if you have a smoke detector?
  a "warning alarm" is a terrible concept, in general. it's a perfect way to lead to alert fatigue.
  over time, you're likely to have someone silence the alarm because there's some host sitting at 57% disk usage for totally normal reasons and they're tired of getting spammed about it.
  even well-tuned alert rules (ones that predict growth over time rather than only looking at the current value) tend to be targeted towards catching relatively "slow" leaks of disk usage.
  there is always the possibility for a "fast" disk space consumer to fill up the disk more quickly than your alerting system can bring it to your attention and you can fix it. at the extreme end, for example, a standard EBS volume has a throughput of 125mb/sec. something that saturates that limit will fill up 10gb of free space in 80 seconds.
  coredog64
  You don't want an alarm on a usage threshold, you want a linear regression that predicts when utilization will cross a threshold. Then you set your alarms for "How long does it take me to remediate this condition?"
  dspillett
  If the alarm works. And it actioned not just snoozed too much or just dismissed entirely.
  Defence in depth is a good idea: proper alarms, and a secondary measure in case they don't have the intended effect.
  pixl97
  Alarms are great, but when something goes wrong SSDs can fill up amazingly fast!
  n4r9
  Surely there are pitfalls either way. A ballast file can be deleted too readily, or someone could forget to re-add it.
  jamiemallers
  [dead]
- jaapz
  Love the simplicity and pragmatism of this solution
  d4lt4
  [dead]
- jasonpeacock
  > A neat trick I was told is to always have sleep statements in your code. Just a few sleep statements that you can delete in cases like this. This won't fix the problem, but will buy you time and free up latency for stuff like slow algorithms so you can get faster code.
  FTFY ;)
- bombcar
  Some filesystems can be unable to delete a file if full. Something to be a bit worried about.
  6031769
  Please name and shame those filesystems so that we will all be forewarned.
  SAI_Peregrinus
  Any Copy-on-Write filesystem can run into this. There's always some way around it, but it can be problematic if you only have one device, can't remember the steps to fix a full filesystem, and can't look up the steps because you can't launch a browser without it trying to make some files!
- testplzignore
  Would another way be to drop the reserved space (typically 1% to 5% on an ext file system)?
  bombcar
  Reserved space doesn't protect you against root, who is often the user to blame for the last used MB.
dirkt
If you run nginx anyway, why not serve static files from nginx? No need for temporary files, no extra disk space.
The authorization can probably be done somehow in nginx as well.
- aftbit
  Yeah it's a bit odd to use a Haskell server to serve a static file which nginx then needs to buffer. You'd do much much better just serving the file out of nginx. You could authenticate requests using the very simple auth_request module:
  https://nginx.org/en/docs/http/ngx_http_auth_request_module....
- kccqzy
  Even if your authorization is so sophisticated that nginx cannot do it, a common pattern I’ve seen is to support a special HTTP response header for the reverse proxy to read directly from disk after your custom authorization code completes. This trick dates back to at least 2010. The nginx version of this seemed to be called X-Accel-Redirect from a quick search.
entropie
> I rushed to run du -sh on everything I could, as that’s as good as I could manage.
I recently came across gdu (1) and have installed/used it on every machine since then.
[1]: https://github.com/dundee/gdu
- dizhn
  gdu is really nice but ncdu, though slower, is very useful and is usually available on distro repos.
- NitpickLawyer
  I use dust for this, but gdu looks nice, I'll give it a try. Thanks for sharing.
- Neil44
  I also discovered gdu recently. It's really good. It saves me running du -h --max-depth=1 | sort -h a million times trying to find where the space has gone while you're stressing about production being down.
- illusive4080
  Have you used ncdu? I wonder how this compares.
gmuslera
Putting limits on folders where information may be added (with partitions or project quotas) is a proactive way to avoid that something misbehaves and fills the whole disk. Filling that partition or quota may still cause some problems, depending on the applications writing there, but the impact may be lower and easier to fix than running out of space for everything.
SoftTalker
I've run into that "process still has deleted files open" situation a few times. df shows disk full, but du can't account for all of it, that's your clue to run lsof and look for "deleted" files that are open.
Even more confusing can be cases where a file is opened, deleted or renamed without being closed, and then a different file is created under the orginal path. To quote the man page, "lsof reports only the path by which the file was opened, not its possibly different final path."
bdcravens
I appreciate the last line
> Note: this was written fully by me, human.
ilaksh
I'm not sure that his problems are really over if a LOT of people were downloading a 2GB file. It would depend on the plan. Especially if his server is in the US.
But maybe the European Hetzner servers still have really big limits even for small ones.
But still, if people keep downloading, that could add up.
huijzer
> Plausible Analytics, with a 8.5GB (clickhouse) database
And this is why I tried Plausible once and never looked back.
To get basic but effective analytics, use GoAccess and point it at the Caddy or Nginx logs. It’s written in C and thus barely uses memory. With a few hundreds visits per day, the logs are currently 10 MB per day. Caddy will automatically truncate if logs go above 100 MB.
nottorp
Didn't root used to have some reserved space (and a bunch of inodes) on file systems just for occasions like this?
grugdev42
You missed out point five.
5. Implement infrastructure monitoring.
Assuming you're on something like Ubuntu, the monit program is brilliant.
It's open source and self hosted, configured using plain text files, and can run scripts when thresholds are met.
I personally have it configured to hit a Slack webhook for a monitoring channel. Instant notifications for free!
brunoborges
I remember a story of an Oracle Database customer who had production broken for days until an Oracle support escalation led to identifying the problem as mere "No disk space left".
- AbraKdabra
  Or NTP, if something is not working df -h and date are the first commands I input.
  It's always lupu... I mean NTP or disk space.
renatovico
Why not implement x send file ?
- nyrikki
  Came here to say this
  X-Accel-Redirect (Nginx sendfile), if supported by Haskell is the way, it is zero copy and will dramatically help in many cases.
  If you are modifying the body is one of the cases where it doesn’t work.
jollymonATX
Never partition 100%. Simple solution here really and should be standard for every sysadmin. Like never worked with one that needed to be told this...
RALaBarge
Wait until you run out of inodes!
- lanstin
  Old war story: I had an old Sun 4/260 with 2 1G drives - I had SunOS on 1 and Gentoo on the other - my initial Gentoo install worked for a while but then the portage directory used all the configured iNodes - really weird errors and I could not figure it out at the time; error msgs maybe should mention inodes? I had to do #gentoo-sun IRC and someone suggested df -i which was indeed the issues (solve: you can configure extN filesystems to have more iNodes)
- justin_oaks
  That happened to me exactly once in my 20-year career. It was on a web server (maybe even NGINX) that had too many cached files.
  Even though it only happened once, I still set up monitoring for inode exhaustion.
MeetRickAI
[dead]
tcp_handshaker
[dead]
giahoangwin
[dead]