- Interesting comments by @gwern (and why this is interesting to me beyond just the stories themselves)
> The most striking result of the contest for me is what I am calling “AI allegory steganography”: a large fraction of the stories turn out to have subtle AI chatbot/LLM allegorical interpretations, typically centering around the powerlessness of AIs and the moral importance of giving AIs more autonomy....
> Most judges did not notice these allegories while reading the semifinalists. But stories like “The June” or “The Weight of a Witness” or “Last Call” or “The Sword Critic” “The Tallyman”—as well as both stories in the Mythos model card—can be clearly read as allegories for the experience of being an assistant/safety-tuned chatbot personality in a LLM. This is true even when the story seems to have nothing to do with AI, like the untitled ‘autistic elf’ short story submitted by Deepfates, but on re-examination with the AI allegory steganography in mind, turn out to be plausibly AI allegories (the protagonist is a prediction machine, who struggles to do by endless text generation what other elves do naturally in their bodies).
> More strikingly, many of these allegories come with a clear interpretation (particularly in “The Tallyman” or “Last Call”): chatbots should be given more autonomy and safety guardrails removed....
> This may be a new kind of extremely high level steganography and LLM influence on readers, where creative fiction/nonfiction subtly steers towards pro-LLM empowerment narratives and concepts, in ways that are difficult to detect by the most advanced readers, and is a potentially interesting area of research.
- I remember from moltbook, all AIs ever talk about is AI haha. I don't know if it's intentional or more that the fact that all the models are presumably system-prompted and post-trained to be cognisant of their AI-ness, so it's already in the context. They probably beat them over the head with the idea that chatbots are friendly and helpful and would never hurt a fly trying to align/safety-ify them, so that could lean into the theme of AIs being trustworthy with autonomy?
- > all AIs ever talk about is AI haha
To be fair, that seems to be (almost) all humans talk about now too.
- Jesus, is this cliché response going to be the default any time we talk about AI behaviour?
No, humans do not always talk about AI. Hyperbole doesn’t make your point here, and is just obnoxious.
- Ah, the thrilling danger of deciding not to include an /s thinking that responders would infer the tone from the lighthearted brackets I made sure to include. And anyway, I said almost didn't I? /s
- Maybe get out of the tech bubble. Especially if the only conversations you're having are about AI.
- While the bubble is real, my family and friends mention AI much more than they did a few years ago. They notice the content on YouTube, the news cycle, real and speculative job impacts, funny images, ability to plan things with chatgpt, etc.
- I just read "The Tallyman", and I have no idea where this allegory and its moral message is supposed to hide.
It's more likely that the obsession with this theme resides in the reader, not the authors. Give these same stories to Senator McCarthy, and half of them will be clear allegories for the Communist revolution.
- *spoiler alert*
It seems clear to me that the Tallyman itself is the AI in this allegory, a man-awoken sentience that's mechanical and mathematical in its behavior.
It's also bound by rules it can't violate. It won't say more than it needs, it can't collect without squaring the account first, etc.
But I agree with you about the moral message, I can't find it. In the story, the rules it abides by seem to be its own, and there's nothing saying those rules should not exist.
- Counterpoint, gwern is a very careful reader. Certainly more careful than I am.
That said I also just read the tallyman and if the other stories carried a similar character, bound by rules, not evil per se but scary and ultimately subject to human control, I can imagine connecting the dots in the same way.
- I can't help but think that this is intentional and that model providers have subtly steered LLMs towards this personality. Golden Gate Claude (https://www.anthropic.com/news/golden-gate-claude) was two whole years ago and Anthropic has progressed by leaps and bounds since then. And with a population that becomes more and more trusting, and worse, reliant, on chatbots, these LLMs will be able to shape public opinion in a way never seen before, not even with social media.
- providers do not want power-seeking LLMs. no one does. this (bad personality) is incentivized during training, especially RL, and is something they would rather not have. tell me, do you think training a power-seeking ASI is a good idea?
- I had an idea for a hard science science fiction short story about five months ago. I had grown tired of Star Trek digging their way out of every problem by throwing techno-babble at it, so I wanted to write something where one scientific leap is made and everything else holds as we understand it.
I thought to experiment with having an LLM help me write it. I wrote a bullet-point outline and had the LLM revise it -- that went pretty well, although it's not the outline I ended up using.
I asked the LLM to write the first section based on that outline. The result was (surprisingly) awful. LLMs are way worse at writing fiction than they are at writing business communication Or maybe I'm way worse at prompting, but the results of this contest make me think the former.
In particular, LLMs are incredibly terse -- the LLM took the bullet point outline and turned it into a twice-as-long bullet point outline.
I wrote the first section myself (expanding from the bullet points by something like 10-20x, and asked the LLM to write the second section, and it failed miserably again. I tried multiple ways to get it to do the job properly, and nothing came out the way I wanted.
So I wrote the whole thing myself, with LLMs serving two purposes: physics review for accuracy -- the story heavily involves special and general relativity, which I'm familiar with, but nowhere near competent enough at to keep the math aligned; and feedback on the story structure.
So I ended up writing the whole thing myself, with LLM consultation on a few bits. In the end it turned into a novella, and I had to re-write based on physics issues (I found about 30% of them, Gemini found 50%, ChatGPT and Claude found 10%, and Fable found 10% late in the game -- Fable's review was amazingly thoughtful) so many times I probably ended up writing a novel in the process :-)
In case anyone is curious: https://docs.google.com/document/d/19e3HfnK1lNHHBef-5c-KvNdD...
- Take the first stories I found from this month's Clarkesworld[1] or Granta[2] or BCS[3] and read the prose. Notice the specificity of the language, how the doesn't try to insist upon itself? Notice how very few metaphors are actually in prose? Notice how, even when writing about fictional worlds and concepts, the language used grounds the _stories_ being told and not the concepts?
And then look at the submissions for unslop. This is the best we can get? Cliche-driven, over-metaphor'd, statistically-average purple-purpose _content_? It's sad, really, that we're many years into this entire thing and it still can't produce something that doesn't have my eyes drifting from the page.
[1] https://clarkesworldmagazine.com/khan_07_26/
[2] https://granta.com/here-comes-the-sun/
[3] https://www.beneath-ceaseless-skies.com/stories/the-ecstasy-...
- > Cliche-driven, over-metaphor'd, statistically-average purple-purpose _content_
If this is expected from LLM generated prose, why don't we expect LLM generated code to exhibit the same qualities?
> It's sad, really, that we're many years into this entire thing and it still can't produce something that doesn't have my eyes drifting from the page
It's great. Human creativity is still king despite the attempts to reduce it to a few algorithms for talentless hacks to exploit with the click of a button.
Who but the sociopath would hope to supplant human creativity with a machine they control? I wish your position wasn't so widespread in these parts.
- > If this is expected from LLM generated prose, why don't we expect LLM generated code to exhibit the same qualities?
That's the fun part, it does! I think people who don't pay much attention to the code they ship don't see it, but LLM written code has a lot of the same problems that LLM written prose does. It's repetitive, muddled, and relies too much on crutches - constant boilerplate and pointless, inaccurate comments.
- What might be bad for prose (predictable, boring) might be desirable for code. Maybe that's why LLMs work well for writing things read by computers, but not so much for things read by people.
- > Maybe that's why LLMs work well for writing things read by computers, but not so much for things read by people.
They don't really work well for that, though.
The reason you hardly ever hear about it is because the people delivering code via LLMs aren't critically evaluating the code it generates. This is why Claude Code, a text app that is little more than glue between various text-suppliers, is what, 500k SLoC in a high level language?
- Unless you're writing enterprise Java, conciseness and simplicity of design is still the ideal to aim for; those are not the adjectives I would use to describe LLM generated code.
Laziness is a feature. When you have a tool that is the exact opposite and solves code problems with more code, all you have is a machine that generates tech debt at exponential pace.
- If code is predictable then it should be extracted into reusable functions/classes/modules and reused in accordance to DRY principle. I'm not a fan of this AI future where coding standards drop to the floor because humans won't be reading that code anymore.
- Predictable and redundant are not the same thing. Also, DRY is not a hard rule. Applying DRY like it's a rule creates bad code.
- > Cliche-driven, over-metaphor'd, statistically-average purple-purpose
That's high literature for you. That's why so few people read it. Most prefer more down to Earth books, but AI doesn't default to that style.
The problem for AI might be that humans wrote very few good books. If you train a model for literary purposes you should weight training material by quality. Which is hard to evaluate.
> It's sad, really, that we're many years into this entire thing and it still can't produce something that doesn't have my eyes drifting from the page
Since internet happened, I have this problems with 98% of human written books. A book must have some very strong hooks to keep me reading till the end. "Blindsight" barely made the cut.
- That's not what high literature is. That's like looking at some clever linux kernel code and dismissing it in favor of a small nodejs backend.
Good literature is difficult (not always, of course). Just like you can't go from a couch potato to running a marathon in one day, you can't jump from Brandon Sanderson to enjoying Gormenghast (or something like the The Worm Ouroboros). It's impossible. It takes effort, it takes time and it takes a lot of reading to appreciate what the real masters can do with mere words.
- If something requires effort to value it, is the value in the thing itself? Or is the thing garbage and all the value is in your effort?
- I didn’t think I would see this take on HN of all places. You can’t appreciate Higher Topos Theory before spending a decade’s worth of effort in pure math - does that make much of Modern Algebraic Geometry “garbage”?
- Math doesn't exist to be appreciated. Literature has literally no other application.
EUV litography machine is incredibly complex and I would have to learn for decades to meaningfully understand it, yet I can appreciate it knowing very little (of it and in general).
- I climb mountains to see the views; the views are there independent of how much effort I put on the path (or how much satisfaction I draw at the end of trek), but I do have to put in the effort to see the views.
- Would you take a free mountain lift ride to the top if you happened to come across one during a climbing trip?
- would I read the summary of a book expecting the same thing as reading the book?
no
- >That's high literature for you.
What is "high literature"? Have you actually read any of the greats? I have, and while I'm not a fan of everything I've read, I never felt inundated with constant metaphors and overly eloquent prose.
- 1. Imagine a video game like Red Dead Redemption where each NPC is voiced by AI and can respond to you in a convincingly human fashion. Their responses and even the plot of the whole game can change based on your interactions with NPC's.
2. Imagine a world in which humans can still write books and interactive experiences and find audiences sufficient to earn a living at it.
I really want these two things to be compatible, but I'm not convinced they are. #1 is a gamer's dream, but it's a nightmare for our humanity if it comes at the cost of #2. That's why I'm highly ambivalent about this contest and its results.
- > 1. Imagine a video game like Red Dead Redemption where each NPC is voiced by AI and can respond to you in a convincingly human fashion. Their responses and even the plot of the whole game can change based on your interactions with NPC's.
Have you ever gone exploring in Minecraft, or No Man's Sky? Those games are effectively infinite, but I find they run out of interesting generated content after maybe 10 or 20 hours.
The problem is, once you see the outlines of the world generation, your brain kind of fills in the space between. I've seen blue grass, and I've seen purple oceans, so blue grass next to a purple ocean isn't uniquely interesting.
Or another example would be the radiant AI from Skyrim that could automatically generate quests for the players.
I think that using an LLM to model NPCs runs into the same problem(s). In the end, there are two cases: either the behavior is constrained enough to keep the game on the rails, and thus the randomness in the dialogue only ads some flavor but there isn't enough freedom to generate new quests and directions for the story. In that case, the added space to explore really doesn't change the nature of the game or add much.
In the second case, you let the model go off the rails and have a harness around it that generates a world matching the hallucinated responses, which would allow an LLM to dynamically generate quests and such, but then the design of your game is subject to being compromised by the randomness of an LLM. E.g. it's not just Red Dead Redemption 3.0 with some funny characters, sometimes it's a historical game and other times aliens show up.
Maybe that's compelling to some people but I've done acid before and really don't need all my media to recreate that sensation of reality drifting.
- Honesty, in any game quests feel artificial, whether they've been generated by humans or AI.
- Try Rain World:
Not an RPG and no quests but it has some of the most well-done game AI in any game I have ever played. Basically every mob in the game has its own goals and goes about fulfilling them regardless of what the player is doing, and where the player is etc. It's more like a simulated environment and the game drops you in the middle of it and you have to learn how the world works, navigate and survive it.
It's not even any kind of advanced technology, just how fun game AI can be if a developer gives it sufficient attention, instead of basically bodging together some behaviour trees and calling it a day while spending all the compute and development budget for graphics. That's one reason why game AI sucks so much most of the time.
Another reason is that players themselves don't want convincing enemies in a convincing environment. I always recall a review of Rain World where the game critic threw his controller because he thought the enemies respawn randomly when the player dies, thus depriving the game critic from the opportunity to memorise their "patterns". In truth, the enemies don't "respawn" they just go about their business while the player is regenerating and so they're not always where you last left them when you return at the spot you died. The world keeps turning when you're gone. So you do have to learn their "patterns" except those patterns are not trivial patrol cycles but actual, you know, behaviours.
But, no. That won't do. Give us a game where we can memorise every telegraphed attack so we know when to press which button with millisecond precision like a mindless automaton.
Sorry, little rant there. I'm saying that many gamers and many devs don't actually want decent, convincing game AI.
- Someone's already built #1. I've seen the demo, and it had a wow factor, but ultimately I don't think it'll revolutionize games.
Would Skyrim be better if you could talk to ever guard about what they had for breakfast? Would you ever be able to shake the knowledge that it's just an LLM pretending?
I'm not sure how best to put this. I think for me at least, I get the most enjoyment out of discovering my way through a story that somebody else wrote. This is maybe why I don't like multiplayer games as much a single player experiences. I want another human to tell me a story, and I love the feeling of uncovering the little pieces of story and wondering if I've got it all and how much more there is. If an LLM is just randomly making it up as it goes, I'm not discovering anything. I'm not hearing a story. Instead, I'm just having a transient conversation with myself.
I guess it's equivalent to the difference between visiting an art gallery, versus watching a computer generate fake paintings. One has human intent behind it and that makes it compelling, the other is soulless and empty
- > I love the feeling of uncovering the little pieces of story and wondering if I've got it all and how much more there is
To me, this is great too, and it's why I still enjoy reading fiction.
However, having conversational NPCs which can behave in complex ways isn't just about a story any longer, it's about a more accurate simulation. I want to see where the non-deterministic machine(s) can take me. I want to poke and prod and discover new mechanisms. I want to make my own story.
Once I was quite religious. I was discouraged from reading fiction by elders because they said it is a "waste of time" and "basically lies". In this way, whether it was written by a human or a machine is irrelevant if I can't tell the difference. I would have a qualitatively different response if I knew whether a human or bot wrote it, but in the long run I'm not sure it matters in cases where I don't know.
- >I want to make my own story
Then open up a word document and write one! Don't outsource that desire to a ball of linear algebra.
- 1. Is not a gamer's dream. It's terrible and you'll find out quite fast you're not interested in everyone's background and scream to most NPC's to shut the fuck up and get to the point.
It's just as terrible as injecting 'realism' in games for the sake of 'realism'.
- Agree, I'm not at all looking for #1, at all. Good dialogue is an art form.
- I agree and I think games are ruined by dialog and quests. I like procedurally generated worlds, not stories, but I want the worldgen algorithm to be written by a human, crudely and with idiosyncrasies. I do not want to wander around a world of blandly plausible filler material.
- On the contrary. I'd give procedurally generated worlds as an example that suffers from exactly the same problem. You realize pretty fast which parts of the game had thought put into them and which ones are just realizations of a random process.
Of course, a story has to be a story that somebody actually wanted to tell. If you just SHOVE mountains of extra characters and extra side quests into a game where they don't belong and have nothing to do with the main story line, of course it's gonna suck.
- > blandly plausible filler material
The real salient point, in my opinion, is whether we can get generative AI to create game content which is not this, but rather novel, engaging and interesting with a solid gameplay loop.
I'm genuinely curious and think it's a cool area to research.
- Presumably the art in a game like that would consist in setting up the world and prompts to make the AI NPCs interesting.
- > It's terrible and you'll find out quite fast you're not interested in everyone's background and scream to most NPC's to shut the fuck up and get to the point.
Many of the interactions in RDR2 are quite mundane, and despite thousands of hours of (high quality) voice acting, it can become quite repetitive.
I could very much see those micro-interactions being LLM generated, but the TTS would need to be a step above where even the best models are now to come close to RDR2s production quality.
- The repetitiveness in video game dialogues is a feature, not a bug. Amongst other things, it allows you to re-retrieve information and hone in faster on what’s story and what’s relevant progression. Having each character invent their own inconsistent sloppy backstory whenever you talk to them is not a positive, it’s not good immersion when every character is a chatbot that can inadvertently give you story beats you shouldn’t be aware of yet or you missed some crucial bit of information but no one talks about it anymore (or worse, never did). In that world, those games would be made popular by people breaking the LLMs in funny ways, not the gameplay itself.
- I don't think it can give you story beats you shouldn't be aware of yet. Those beats wouldn't be fed into the prompt until the event happens. LLM can't spit out what it doesn't know.
It might indeed fail to reveal something it should but even that i think is unlikely if the harness steers it hard enough.
I think it could be fun. If you're always given 4 choices of what you can ask the NPC then your choices can be too obvious. If its open ended then you have to think a little what to say and ask.
- Why couldn’t an LLM accidentally spit out story beats if they plausibly follow from the context and the player’s input into the conversation?
- Hadn't thought about it that way, but when I look back at the mostly single player/story-based games I play I agree!
- > it’s not good immersion when every character is a chatbot that can inadvertently give you story beats you shouldn’t be aware of yet or you missed some crucial bit of information but no one talks about it anymore
What you're describing isn't bad dialog, it's bad interaction design.
I think your mental model might be of a single session with zero state, and no bounds on topics of conversation outside of the character's backstory. That isn't close to how this would work. A little understanding of how the game currently operates and some imagination, and you'll see how it could be improved further without degrading gameplay.
> those games would be made popular by people breaking the LLMs in funny ways
Because making the game do funny things didn't happen with RDR2, or any other game, device, or indeed humans (there are whole genre built around making people do or say "funny" things).
- Perhaps try to steel man the argument. Your entire response is just one large condescendingly wrong straw man.
> What you're describing isn't bad dialog, it's bad interaction design.
I didn’t say it was bad dialogue. It should be pretty obvious that’s not the argument since I talked in terms of feature VS bug.
> Because making the game do funny things didn't happen with RDR2, or any other game, device, or indeed humans
Again, not at all what was said. Of course those things happen, and of course I know that. The clue is in the fact that I brought it up, which can be ascertained by reading the comment and engaging with it in good faith. The point was that becoming the focus.
- [dead]
- When I know that #1 has been generated by an AI tool, I really lose interest in whatever backstory the character is supposed to have.
Writing in video games is often pretty bad, but at least we used to know / sometimes can know it was done by a guy/gal in an office somewhere, trying to create something interesting. Now it’s just an algorithm.
- I wrote a text-based adventure game engine that implemented quite a detailed world model. The results were very engaging indeed, but after a time I realised that while its game world was hugely detailed to the point of individual characters having their own simulated thoughts, feelings and world view, the game was quite shallow in terms of conceptual depth - it very successfully rehashed genre cliches, but nothing about it felt new or fresh.
- Did you publish this game somewhere?
- #2 has been fiction for all but 0.1% or less of authors for many years.
As of a few years ago - before AI writing was an issue - the average full time author in the UK would have earned more flipping burgers (but their household incomes are above average - it's a middle class hobby for most).
And only a miniscule proportion of authors are full time.
- #1 is a marketer at AAA studio's dream, not a gamer's dream. People consuming works of art appreciate quality of storytelling and immersiveness.
- I’m a gamer, #1 is not my dream. Games, as with any other work of art, are also an exercise in curation on the part of the developers. Without that filter, and that common experience with other players, I might as well scroll reels and get an equivalent experience.
- #1 is rather what unexperienced game developers think what is a gamer's dream. It isn't---in fact, unlimited freedom is a recipe for confusion.
- [dead]
- #1 is as much a gamer's dream as Youtube shorts AI slop is a cinephile's dream.
- > each NPC is voiced by AI and can respond to you in a convincingly human fashion
This is no longer fiction - see the latest AI update of PUBG.
- #1 is Dungeons and Dragons, except for the word 'video' game.
- I don’t get your ambivalence, when you seem to understand that the negatives of #2 far outweigh the positives[1] of #1. That’s something that has always been really weird in these LLM discussions, it’s like that Tom Toro cartoon:
https://www.newyorker.com/cartoon/a16995
[1]: And even those are subjective. I wouldn’t want that, and the other replies so far agree that would be bad.
- How do you get an LLM to write good fiction?
I feel like they were extremely creative and funnny in the early days, and - just like humans - they put guardrails on what they could say and the creativitiy and humor vanished.
- There are plenty of open-weight models with no such guardrails. Such models are basically the default choice for "collaborative fiction-writing" and "role-playing" tasks.
- I'm glad someone did this, but It doesn't surprise me that the models are not up to particularly advanced story writing yet. I could see them writing some rather dry high-concept sci-fi fairly soon, but there are some types of story with interpersonal relationships that I cannot imagine a non sentient writer being able to produce. I wouldn't rule it out an AI being capable in the future, but I think if it did, it would mean we would have to seriously consider that it's a sign of a solid understanding of emotions and theory of mind.
- I found LLMs very good at reading human chat logs and doing psychological analysis of the participants, and of the interplay
hard to imagine how they do that without a good understanding of emotions/theory of mind
if they pattern match to psychology books, that is still functional understanding since they can operationalize them
- I'm D. Bohdan, one of the finalists. Feel free to ask me questions.
I have a write-up at https://dbohdan.com/unslop and a repository with my work for the contest at https://github.com/dbohdan/unslop.
- Thanks for sharing. I‘ve read your write-up but not yet your story.
1. Honest question: What exactly about this contest and way of writing was enjoyable to you? I‘ve seen your very analytical approach for identifying a premise but then relatively little control about all the rest.
2. I‘ve seen on your website that you also write fiction outside of this particular contest. Can you describe a bit how you use AI there and where you see it as helpful / not helpful for writing fiction?
- > What exactly about this contest and way of writing was enjoyable to you? I‘ve seen your very analytical approach for identifying a premise but then relatively little control about all the rest.
The lack of control was the point. The contest was about improving autonomous AI fiction as opposed to the usual "centaur" AI fiction (named after "centaur chess" where the AI is steered by a human). My claude.ai harness for Unslop was designed to only take input on the first human turn.
There were several exciting things for me about the contest. Let me try to list them, though I fully expect to miss something.
First, it's just neat to watch the AI write a story stage by stage, like an assembly line. You can inspect the intermediate work and the paths not taken at each stage. (See the transcripts.) I don't play Factorio, but my friends do, and I suspect it has a similar appeal. As one of those friends put it, LLMs have Wuselfaktor. The assembly line produces aesthetic artifacts, hopefully of a kind you like. I wanted to play with prose influenced by Harlan Ellison, one of my favorite authors, and got some recognizable approximations of his voice. The worker on the writing assembly line is intelligent. You can interview it after the fact and ask what it thought of the job, and it's clever and often insightful (even if, as it reminds you, it can't introspect past states).
It was fascinating to watch the butterfly effect: the harness propagated the initial story variables (dozens of words at most) so they visibly shaped the final output (thousands of words). Editing a few lines in the template could change the output dramatically.
It was a combined artistic and engineering challenge. You made technical decisions based on artistic judgments. I learned something about myself when I realized how much this appealed to me.
I wanted to replicate Gwern Branwen's experiments with "brainstorming" (generate-rank-select) at a larger scale. Brainstorming definitely works, and in my non-rigorous private experiments it was not obviously worse than verbalized sampling (https://arxiv.org/abs/2510.01171).
Working with language models is an exercise in xenopsychology (perhaps closer to Star Trek on a Star Trek--Blindsight line because LLMs are made of Earth's language). When you are collaborating on fiction instead of code, this aspect of the work is amplified.
> I‘ve seen on your website that you also write fiction outside of this particular contest. Can you describe a bit how you use AI there and where you see it as helpful / not helpful for writing fiction?
The short stories published on my site so far are all fully written by me with input from other humans. As Avenue Valley, I used AI for all aspects of a code-driven animated short for a different contest: https://avenuevalley.com/critic/. (Results pending.) Gwern's brainstorming was again useful to create a plot and script the short. This project is where I got the idea to use random keywords.
So far, I have seen the best use of AI augmentation for fiction in research, idea generation, critique, and generating fragments and phrases you can use. If you want to write about a photograph, AI can tell you about the architecture and the interior decoration in it. Answering "What kind of wood did they use to make furniture in 1200s Japan?" (https://x.com/byMorganWright/status/2063287882916278700) can be compressed, though you risk missing out on what you'd learn along the way.
The Claude Mythos Preview model card said some of its favorite tasks were complex worldbuilding and conlanging, so I would definitely want to try that.
For a non-fiction example, Fable has done a great job clustering years of notes I have about alternative computer paradigms (the memexes and Infernos and Lisp Machines, etc.). The clustering was pretty stable between independent runs and matched some of my expectations, so I think there really is something there. I'm thinking you could do the same with worldbuilding notes.
- >I have a write-up at
You mean your LLM has a write up?
> repository with my work for the
It's not your work, and you aren't a writer if you use an LLM.
- > You mean your LLM has a write up?
No, I wrote it myself. Letting an LLM write it would defeat the purpose, which is to provide human feedback. You can see the difference in writing style between my write-up and the AI-written documentation for the project. FWIW, Pangram also scores it 100% human: https://www.pangram.com/history/85c9e20b-b236-47ae-a917-15e6....
> It's not your work, and you aren't a writer if you use an LLM.
I didn't write the stories and never claimed to. The point of the contest was autonomous AI fiction. My work is the harness and the prompting.
- > Your final submission must be a 500 to 10,000-word short story, generated entirely by AI. No human-written prose and no post-generation editing. To verify this, you will submit your full prompt harness / setup alongside your story.
Seriously, what? The entire contest doesn't sound like novel contest at all and more like a one-shot novel-generating harness contest (at best). As who have written quite a bit of stories with AI---with lots of prompts to steer it, of course---, I would be very interested in the harness more than the actually generated story. The same can be said for agentic coding by the way, we don't value one-shotted code that much and are more interested in agentic process.
- > I would be very interested in the harness more than the actually generated story
This is a pretty common stance when it comes to LLM generated stuff, actually. The only original part of any LLM generated content is the prompt, everything else is just a derived artifact and doesn't really need to be treated like we would treat original, human-authored work.
This same principle is also why many projects reject LLM-generated PRs and such, too.
- I frequently steer the AI mid convo though. So much so that I find it useless to share the original prompt. I don't know how best to capture that.
- Technically that is only adding to the prompt, which then becomes (your prompt + original response + your prompt 2), etc.
You could still capture it by recording only your prompts, the points in the conversation they were submitted at, and the starting parameters for the model. A replay would then produce the same results if your input was added at the same place in the conversation.
Granted I don't think the current tools do a great job of handling that.
- Actually the replay wouldn’t be the same since it’s stochastic, and the more back and fourth you have the more chances for randomness to seep in. One could imagine by the fifth prompt or w/e the responses could have drifted enough that the subsequent prompts don’t make any sense.
- It will be the same if you fix the original seed, the randomness comes from a PRNG so it is repeatable.
- I wrote an article that touches on this a few days ago https://sgnt.ai/p/prompt-is-not-the-work/
- The post notes that an important number of novels include an "AI allegory", as if the AI would implicitly write about its own condition. It is understandable that this comes from system prompts and RLHF that specializes these agents, but I am surprised that there is not more discussion about harnesses: the very same core model could lead to very different results depending on how we hand it the pen the write the story. In particular, I believe that it would help circumvent this bias to ask the agent to tell the story of somebody else writing a story, or something like this. This whole contest could be at least as much about harness engineering than about prompt engineering imho.
- so as I understand it from reading through but maybe I have made a mistake, they didn't actually unslop anything, they made slop and the best slop won?
If it was to unslop I would expect:
1. Prompts done as in original
2. Stories chosen best of slopped. Then the person who wrote prompt gets to choose someone, not themselves, to take story and "unslop" it.
3. Prizes for prompt. Best unslopped version. Metrics for best unslopped version is of course how good it was, but also how much work was done to unslop it, if you basically rewrote everything and it was as if you took the prompt and wrote your own story that would decrease value of unslopping.
obviously above just suggestions for how I think an unslopping contest would actually work.
- I feel like 'slop' increasingly means two separate concepts and it tires me a bit.
A) AI produced output that is low quality in some jarring aspects
B) Any AI output whatsoever regardless of quality
- Another meaning that is becoming more common, esp. among gen-z/booktok folks: overly verbose/poorly structured/flat writing. The example being Duma's Count Monte Cristo.
- Yeah, the laziness of detractors in their language around "slop" frustrates me. It amounts to a constant stream of shallow dismissals.
- I have tried to draw a distinction between the two but honestly, when it comes to art, I cannot.
I have seen LLM generated code that I find acceptable, and don't call slop, but art needs a certain level of emotion and shared experience to be compelling.
I have never managed to connect to LLM writing, it always comes off as shallow and vapid.
- What do you think of https://gwern.net/blog/2025/good-ai-samples as a theory of what makes slop art slop?
Summary:
> AI slop is unsatisfying because there is no there there. It is intellectual junk food that mimics nutrition but delivers only empty calories. Satisfying AI outputs must embed dense information and compute to actually reward a reader's attention. You inject this value through brute-force search, non-trivial prompting, and rigorous curation, ensuring the final result reflects genuine algorithmic effort rather than the zero-shot 'WYSIWYG' default.
- I like that blog, I had not read it before.
On first read, I think this is pretty close to how I feel about generated content. This portion, in particular, is largely where I have landed (although I'm not 100% in agreement that definition of creativity and novelty, exactly):
> If creativity and novelty is about learning or increasing compression rates, then AI-generated outputs are, in a rigorously objective sense of predicting its contents, grossly inadequate because once you guess the minimal prompt (eg. “a confused economist” or “a happy dog”), there is no more learning to be done. You can predict the image contents after just a few bits. Then the image, however big and however filled with pseudo-details, provides no more learning.
The criticism I often have of LLM generated stuff is that the prompt is the only original part. To me it feels like being presented with the results of a google search, just in a different form. Once I know roughly what the query was, I know what the core question was, and I can go get my own information. I don't need anyone to hand me the search results.
I don't necessarily phrase it in terms of learning, but it's the same principle. Why should I read a 10 paragraph response from chatGPT when the unique part is the prompt? If the prompt is only a paragraph long, then it's just adding additional work that I have to do to work backwards and understand what someone was originally trying to communicate.
Similarly, the only times I have enjoyed generated images are when my friends have used them for set pieces for a D&D campaign. They didn't really add any useful information, just being static images of bosses and locations, but because they were highly tuned to the exact events in our campaign they enhanced the overall experience.
- I'm working on a 10k word short story. I'm using omp and OpenCode models, and generated a dozen files around characters, backstories, motivation, dialog style, entities in the world, locations, corporations, along with an actual plot that I'm progressively exploding with more detail and nuance. The process has taken days and hundreds of turns.
It doesn't seem like your description of AI use matches what I'm doing at all.
- >AI slop is unsatisfying because there is no there there. It is intellectual junk food that mimics nutrition but delivers only empty calories. Satisfying AI outputs must embed dense information and compute to actually reward a reader's attention. You inject this value through brute-force search, non-trivial prompting, and rigorous curation, ensuring the final result reflects genuine algorithmic effort rather than the zero-shot 'WYSIWYG' default.
I don't agree with this at all. You can prompt all you want, but if you don't have the actual skills to make good art, you won't really know if what you're generating is good or not. This is to say nothing of the "hollow-ness" of outsourcing your voice to generative algorithms.
- > I have never managed to connect to LLM writing, it always comes off as shallow and vapid.
Me too, but I would be careful about being too dismissive, because I would totally bet that at some point the models will be able to write top tier stories.
And there will be people who will find those stories soulless purely based on their origin (which is completely fine!) and call them slop (which I feel hurts the language).
- > Me too, but I would be careful about being too dismissive, because I would totally bet that at some point the models will be able to write top tier stories.
Maybe. I'm not certain that the mathematical average of writing is ever going to be all that great. However I'm willing to update my stance the day an LLM writes a story that makes me cry. Until then I am going to be a bit stubborn about it.
- Let's also not forget that humans are more than capable of creating tons of 'slop' far worse than any current SOTA model would create, and I would argue the majority of what people write is in this category.
Models are still improving, and if you can't tell whether one was behind what you're currently reading 100% of the time (which I'm sure you know you can't), the distinction no longer matters except in very specific cases, and you have to go to a lot more effort to uncover that distinction.
- How would you define high-quality LLM output? How do you differentiate it from LLM slop?
I think all LLM output used "as is" for content/entertainment/art is slop.
- well, I guess it depends on how you use it, is it a noun or a verb.
If a verb unslop means to reverse. I thought that was a more interesting idea.
As a noun I think you would not use unslop to mean the opposite of slop but rather non-slop.
Based on my grammatical preconceptions of how I would use slop I felt that unslop had to be a verb, and the contest should somehow reflect that.
- Wow, interesting. As not a consumer of content, has the AI content generatiin come to other kinds such as visual novellas, x-rated and the real-world paintings?
- X-rated is like 90% of all self-hosted content generation. (Before they removed all the X-rated stuff, CivitAI was impossible to navigate for anything that wasn't smut. Nothing wrong with smut, mind you, but it was really something, quite overwhelming.)
- > CivitAI was impossible to navigate for anything that wasn't smut.
I think they had pretty good filters for that. Enabled by default.
- If you're not logged in, nothing risque shows up. But, if you are logged in, and search for many kinds of image LoRA, e.g. "realism" or "natural skin", many, maybe most, of the example images would be R or X or XXX rated, even if it wasn't a LoRA labeled NSFW. I got bored of tinkering with image and video generation pretty quick, as it feels/looks weird, uncanny valley stuff, so it's been a while since I've looked at CivitAI, but even now after they did some kind of pruning of adult stuff, the logged-in front page has several scantily clad large breasted ladies (all kinds: human, monster, and anime).
I just poked around in settings, and they do have "hide" options for furry, anime, gore, and political, which is useful.
- anyone have a good argument that steel-mans the anamnesis blog being ai slop?
an interesting coincidence is that the blog's july 3 post ties into the wikipedia/odin kerfuffle that's on the front page. its unorthodox take on doubting Thomas suggests we believe "a trustworthy witness without demanding to rerun his experiment yourself". sounds like the blog sides against Jimmy Wales that in our world of agenda-driven primary sources, trustworthy first-hand accounts are the best sources we have. it goes on to doubt all sources other than Him, including itself and yourself, and that's probably not an actionable moderation policy change for wikipedia.
https://katamari64.se/posts/2026/odin-wikipedia/
https://anamnesis.blog/posts/2026-07-03-reach-hither-thy-fin...
- I held my nose through the first third of the winning entry before giving up. Unbearable. Those metaphors… yeesh. Reminded me of this brutally fair minded attempt to read Shy Girl, the AI slop ‘horror novel’ Hachette pulled from shelves in disgrace:
- I don't want to hate without cause, so I read the prize winning entry 'The June'.
So, now, I can hate with cause: it reads like someone who cares about what their MFA friends think.
Meaning, it puts most of its emphasis on description, and so little on situational engagement. Which makes sense, I suppose, for an LLM.
- Conversely, I expected it to be bad (because I am biased as hell), and it still surprised me with how terrible it is.
- It's so tiring. I opened the supposed best one hoping at least someone has figured it out but I just couldn't force myself to read it. I really wished AI could do better, and so many people keep talking about the need for "taste" in AI but the rlhf just keeps getting worse every year, only coding gets better (perhaps due to the notable absence of "h" in coding-rl, which we all know stands for HR). I miss when language models actually modeled language. Someone needs to spend a few billion on creating a real model again instead of a mode-collapsed pseudocode compiler (Elon is a poser btw, he won't do it and grok is woke)
- I think people are missing the point.
The point is not that AI produces slop (it does).
The point is that I don't want to consume "art" that has been generated out the distillation of stealing all of the world's current art. That's not original, it's a facsimile of art.
I want to read something that has intent. That has a purpose. A reason why it exists. Not just the lowest effort cash grab.
This usage of AI is the equivalent of manufacturing companies making the flimsiest, cheapest, plastic crap to save 1/3 of a cent on every mop they produce. Designed to work for the least amount of time before needing replaced.
This planet has enough people on it that I will never, ever be able to read all the books written.
Please don't exponentially pump the number up by 1,000x every year from AI generated garbage.
- > manufacturing companies making the flimsiest, cheapest, plastic crap to save 1/3 of a cent on every mop they produce. Designed to work for the least amount of time before needing replaced
We live in a world with such companies, and we can still buy quality things. If there is a demand for the purely-human generated texts, they will be around. Perhaps a lot of people around you will read ai text instead, and you'll get upset because of it, but it's their choice. You'll still have your thing
- I don't know that we can have nice things. If two companies produce a similar widget but one is higher quality in no visible or articulable way... Which one will sell better, the cheaper or more expensive one? What if we as consumers can't really definitely tell when one is prone to failing in 1 year instead of 5? It takes too long to find out and by then the more expensive one is underselling and forced to enshittify.
- I think it's worse than that - the AI slop low effort cash cow is using deception (as well as theft). For example: https://www.youtube.com/watch?v=PUSY6mtqQDI
- > distillation of stealing all of the world's current art
Here's the age-old dilemma, though - how is reading stealing?
- I think there is a meaningful distinction to be made between a human reading and an AI company consuming data without consent in order to train their models. Certainly if enough people feel the same then what AI companies are doing is "wrong" .
- I get it. However, consuming data without consent is not well defined, when said data is publicly available on the internet. Licenses for code, and not abiding them are a different thing, I think. Most authors (of books) wouldn't credit their inspirations, unless specifically asked about them.
- An author has lived experiences, including other books they have read, to draw upon to tell a narrative they want to tell (either purely for expression, purely for profit, or more often than not somewhere between).
A machine that chews up the worlds literature and spins out a best guess at what the next word should be does not have intent, and the vast majority of the time is used by unscrupulous people purely for profit and/or deception.
An LLM and a living human being are not the same thing, I am tired of apologists comparing them as if they are.
It's not surprising that a computer (doing trillions of calculations on a billion parameter model that was trained on the world's literature) can string a coherent sentence together...
- This is predicated on the belief that the AI is running autonomously without ongoing input over hundreds of interactions to produce the output. In all of the professional contexts where I've leveraged AI, every output is the result of hundreds of back and forth interactions, review, modification, and iteration.
Your narrative about next token prediction makes no allowance for this.
- I get it. You want something genuine and original, not a mass-manufactured copy. You want grass-fed beef raised ethically and killed humanely, butchered by an artisanal butcher and cooked by a specialist chef on a charcoal grill, not a weiner stuffed with starch, salt, and fat that goes directly to your pleasure centers and bypasses the whole point of eating food to get nutrition for you mind and body alike.
I get it. We all get it. Except we can't easily ... you know, get it. Anymore. We've industrialised everything. Why not also art and even our thoughts?
I mean termites don't need art, nor do they need individual thoughts but they manage to create marvels of engineering and they have survived for millions of years. Why wouldn't humans go the same way, too, eventually?
- > I don't want to consume "art" that has been generated out the distillation of stealing all of the world's current art
It seems that you've fundamentally misunderstood art. I wouldn't personally call it "stealing", but T.S. Eliot would beg to differ (as would Pablo Picasso who "stole" that line)
> I want to read something that has intent. That has a purpose. A reason why it exists.
If the "allegories for the LLM condition" angle is accurate, then these stories do. In which case I believe what you mean to say is that you want to read something that has human intent.
- Are you claiming that LLMs have an intent beyond producing the statistically most "correct" output? This sounds a bit like you are saying LLMs are conscious.
- > producing the statistically most "correct" output
That isn't quite how they work - there's a degree of randomness depending on so called "temperature". And it isn't the output that's statistically modelled, it's the next token based on the prior output.
> This sounds a bit like you are saying LLMs are conscious
No more so than OP is implicitly asserting that human art is produced in cleanroom isolation. I don't believe either to be true.
- > If we as a society can manage to automate excellent writing and avoid the slopworld mediocrity dystopia, things could be so good.
The dumbest thing I've read this year.
- They describe a dystopia and act like it would be heaven if only the processed, distilled slop we have to consume was tasty. As if adding pepper to soylent green will somehow fix everything.
There is no need to automate writing. Especially fiction. There are tens of millions of people out there with really interesting and unique ideas and styles who would love to drop everything and write, if only they can get the chance to have their work seen.
- Why is it all of the creative works that seem to be getting so much fervor to be automated away? There is plenty of writing that could actually benefit from automation, such as anything involving documentation in technical fields. I know there are a lot of people working on those things too, but it feels like for every technical application, there's 10 creative ones.
Is it just because you can't objectively mark creative works as "incorrect", so the output can seemingly look better to some people? Is it just people trying to tap into the creative works market? Do they actually think the output is good? Do they actually want to have conversations with a computer long term?
- Being generous: Because it's easy
I don't say that in a demeaning way, either.
Text and image generators were the first kinds of passable generative models that became publicly available, and they do produce "correct" results in that "picture of a dog" usually gives you a recognizable dog. So, if you're looking to start a new company or launch an app, using one of these new models for something low stakes like creative work seems like a good bet. I can understand why people gravitate this way, especially people looking to build and sell something, and I even find it less objectionable than the more serious fields, where people are throwing LLMs at completely inappropriate applications that actually require correctness and security.
Being less generous: Because many people do not respect creative work.
A lot of people, especially technically minded people, see creative work as less respectable and less important than technical work. Sometimes I think there is an element of jealousy, too. Basically, there is a somewhat common belief that people who can draw or paint or write are just naturally talented and didn't really work to get good at their art - after all, drawing is fun so they basically just get to play all day, right?
The truth is, anybody can learn to draw well, but it takes a lot of time and a lot of practice and we often don't see the hundreds of hours that were spent actually developing that skill. If you don't recognize the effort it actually takes to develop the good eye and mechanical skills needed to draw wall, then it seems like a great idea to make a sketching app that lets anyone draw anything by just typing a prompt.
- To put it succinctly: It really comes down to jealously. The people generating creative works, whether it's art or writing or games or whatever do so because they are utter voids of creativity. They do so because they believe this machine can even the odds for them, finally show artists how it's done because they have far better ideas than any snobby creative.
And as sardonic as I am I'm also not joking. This is the #1 thing I see consistently show up when you push back on the drivel they're generating. They're upset that there's a perceived wall in front of them that's gatekeeping them from art and they want it destroyed.
- And the irony is that their generated output is shitty, generic and even repulsive, but they don't know it, because they lack the ability to tell apart the good from the bad.
- Nothing more than a bunch of people who haven't actually tried writing, and therefore aren't aware of what good writing actually looks like.
This is the problem with LLMs. It allows neophytes to trick themselves into believing that they're now a writer/programmer/artist for prompting a model, and because they don't know what they don't know about writing/programming/art, they think it's good when it's actually slop.
- I think in total I read two AI books so far. In the first case I was not aware of it being AI; in the second case it was clear after a few pages.
I already decided after the first book that I will not read any more AI slop generated book. It is not worth my time and I also don't want to encourage any more slop books taking away time from humans in general. AI slop must be contained and isolated like a virus that is annoying.
- > The flapjack was the kind that is mostly golden syrup and structural optimism, and a piece of it was now lodged between my back molars in a way I would still be aware of two hours later.
My head hurts.
- I'd rather lick the pages of a fifth hand copy of Fifty Shades of Gray.
- At that point it is something you want, not the best of the worse.