• Watching the OpenClaw/Molbot craze has been entertaining. I wouldn't use it - too much code, changing too quickly, with too little regard for security - but it has inspired me.

    I often have ideas while cleaning around, cooking, etc. Claude Code (with Opus 4.5) is very capable. I've long wanted to get Claude Code working hands-free.

    So I took an afternoon and rolled my own STT-TTS voice stack for Claude Code. The voice stack runs locally on my M4 Pro and is extremely fast.

    For Speech to Text, Parakeet v3 TDT: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

    For Text to Speech, Pocket TTS: https://github.com/kyutai-labs/pocket-tts

    Custom MCP to hook this into Claude Code, with a little bit of hacking around to get my AirPods' stem click to be captured.

    I'm having Claude narrate its thought process and everything it's doing in short, frequent messages, and I can interrupt it at any time with a stem click, which starts listening to me and sends the message once a sufficiently long pause is detected.

    I stream the Claude Code session via AirPlay to my living room TV, so that I don't have to get close to the laptop if I need extra details about what it's doing.

    Yesterday, I had it debug a custom WhatsApp integration (via [1]) hands-free while brushing my teeth. It can use `osascript` for OS integration, browse the web via Claude Code's builtin tools...

    My back is thankful. This is really fun.

    [1]: https://github.com/jlucaso1/whatsapp-rust

    • On one hand, I think this project is super cool and something I would use and/or would have loved to build myself for my own use.

      On the other hand, it makes me wonder if we’re just heading for a future where everyone is just always working, at all times, even while doing other things.

      “Wow look at our daughter taking her first steps! She’s doing so… wait hold on… No, Claude. I said to name the class “potatoes”, not “‘pot’ followed by eight ‘O’s,” you dumb robot!”

      • I don't disagree, but I think there is the otherside of that same coin... What if we could do other stuff while remaining productive.

        Rather than the example of missing first steps, what if we had, "Ok Claude, prepare a few slides for my presentation, I'm going to watch my childs mid-day recital..." maybe you get a success/failure ping and maybe even need to step out for part of the event, but in another world you couldn't have gone at all.

      • we kind of already are with our phones and Slack, the difference at this point is negligible. i personally won't have airpods in 24/7 with my kid (or ever) so if i were doing something like this, it would be through my phone, which is already something i use fairly often. not too much difference there IMO (at least anecdotally speaking)
        • I don't know what kind of work you do on a daily basis. But, the difference between sending a Slack message and sending a message to kick off an agent to chain a bunch of tasks together is a vastly lower activation barrier. I think many people will jump over that lower barrier out of FOMO, to avoid being outcompeted by those who already jumped.

          As an IC though, me sending a slack message is perhaps less impactful than a PL responding to a report :)

    • I got into a bike accident yesterday and injured both of my arms. Fortunately the damage wasn't too severe, but it was bad enough that using a computer is rather difficult. So now I'm spending some of my idle time playing around with different options for voice control. Like you I am a little wary of OpenClaw so I might try something similar to your setup as an alternative. So far I have gotten to the point where I can use voice dictation in notepad to write comments and commands, but copying and pasting the text is enough of a struggle (compounded by the fact that my cat is competing with me for the keyboard and I am in no state to fend her off) that I am aiming to push things a bit further. Sucks being injured but having a nice distraction to keep my mind occupied has so far been a great way to pass the time.
    • repo?
  • Skimmed the repo, this is basically the irreducible core of an agent: small loop, provider abstraction, tool dispatch, and chat gateways . The LOC reduction (99%, from 400k to 4k) mostly comes from leaving out RAG pipelines, planners, multi-agent orchestration, UIs, and production ops.
    • baby
      RAG seems odd when you can just have a coding agent manage memory by managing folders. Multi agent also feels weird when you have subagents.
      • Yeah, vector embeddings based RAG has fallen out of fashion somewhat.

        It was great when LLMs had 4,000 or 8,000 token context windows and the biggest challenge was efficiently figuring out the most likely chunks of text to feed into that window to answer a question.

        These days LLMS all have 100,000+ context windows, which means you don't have to be nearly as selective. They're also exceptionally good at running search tools - give them grep or rg or even `select * from t where body like ...` and they'll almost certainly be able to find the information they need after a few loops.

        Vector embeddings give you fuzzy search, so "dog" also matches "puppy" - but a good LLM with a search tool will search for "dog" and then try a second search for "puppy" if the first one doesn't return the results it needs.

        • The fundamental problem wit RAG is that it extracts only surface level features, "31+24" won't embed close to "55", while "not happy" will be close to "happy". Another issue is that embedding similarity does not indicate logical dependency, you won't retrieve the callers of a function with RAG, you need a LLM or code for that. Third issue is chunking, to embed you need to chunk, but if you chunk you exclude information that might be essential.

          The best way to search I think is a coding agent with grep and file system access, and that is because the agent can adapt and explore instead of one shotting it.

          I am making my own search tool based on the principle of LoD (level of detail) - any large text input can be trimmed down to about 10KB size by doing clever trimming, for example you could trim the middle of a paragraph keeping the start and end, or you could trim the middle of a large file. Then an agent can zoom in and out of a large file. It skims structure first, then drills into the relevant sections. Using it for analyzing logs, repos, zip files, long PDFs, and coding agent sessions which can run into MB size. Depending on content type we can do different types of compression for code and tree structured data. There is also a "tall narrow cut" (like cut -c -50 on a file).

          The promise is - any size input fit into 10KB "glances" and the model can find things more efficiently this way without loading the whole thing.

          • Ok 2 hours later here is the release: https://github.com/horiacristescu/nub
            • This is a very cool idea. I’ve been dragging CC around very large code bases with a lot of docs and stuff. it does great but can be a swing and a miss.. have been wondering if there is a more efficient / effective way. This got me thinking. Thanks for sharing!
        • Context rot is still a problem though, so maybe vector search will stick around in some form. Perhaps we will end up with a tool called `vector grep` or `vg` that handles the vectorized search independent of the agent.
      • I've been leaning towards multi agent because sub agent relies on the main agent having all the power and using it responsibly.
      • Totally useless indeed.
      • Interesting.

        I guess RAG is faster? But I'm realizing I'm outdated now.

        • lxgr
          No, RAG is definitely preferable once your memory size grows above a few hundred lines of text (which you can just dump into the context for most current models), since you're no longer fighting context limits and needle-in-a-haystack LLM retrieval performance problems.
          • > once your memory size grows above a few hundred lines of text (which you can just dump into the context for most current models)

            A few hundred lines of text is nothing for current LLMs.

            You can dump the entire contents of The Great Gatsby into any of the frontier LLMs and it’s only around 70K tokens. This is less than 1/3 of common context window sizes. That’s even true for models I run locally on modest hardware now.

            The days of chunking everything into paragraphs or pages and building complex workflows to store embeddings, search, and rerank in a big complex pipeline are going away for many common use cases. Having LLMs use simpler tools like grep based on an array of similar search terms and then evaluating what comes up is faster in many cases and doesn’t require elaborate pipelines built around specific context lengths.

            • lxgr
              Yes, but how good will the recall performance be? Just because your prompt fits into context doesn't mean that the model won't be overwhelmed by it.

              When I last tried this with some Gemini models, they couldn't reliably identify specific scenes in a 50K word novel unless I trimmed down the context to a few thousands of words.

              > Having LLMs use simpler tools like grep based on an array of similar search terms and then evaluating what comes up is faster in many cases

              Sure, but then you're dependent on (you or the model) picking the right phrases to search for. With embeddings, you get much better search performance.

              • > Yes, but how good will the recall performance be? Just because your prompt fits into context doesn't mean that the model won't be overwhelmed by it.

                With current models it's very good.

                Anthropic used a needle-in-haystack example with The Great Gatsby to demonstrate the performance of their large context windows all the way back in 2023: https://www.anthropic.com/news/100k-context-windows

                Everything has become even better in the nearly 3 years since then.

                > Sure, but then you're dependent on (you or the model) picking the right phrases to search for. With embeddings, you get much better search performance.

                How do are those embeddings generated?

                You're dependent on the embedding model to generate embeddings the way you expect.

                • That doesn’t match my experience, both in test and actual usage scenarios.

                  Gemini 3 Pro fails to satisfy pretty straightforward semantic content lookup requests for PDFs longer than a hundred pages for me, for example.

                  • > for PDFs longer than a hundred pages for me

                    Your original comment that I responded to said a "few hundred lines of text", not hundred page PDFs.

        • I think it still has a place of your agent is part of a bigger application that you are running and you want to quickly get something in your models context for a quick turnaround
    • Unless I'm misunderstanding what they are, planners seem kind of important.
      • As you mentioned, that depends on what you mean by planners.

        An LLM will implicitly decompose a prompt into tasks and then sequentially execute them, calling the appropriate tools. The architecture diagram helpfully visualizes this [0]

        Here though, planners means autonomous planners that exist as higher level infrastructure, that does external task decomposition, persistent state, tool scheduling, error recovery/replanning, and branching/search. Think a task like “Prompt: “Scan repo for auth bugs, run tests, open PR with fixes, notify Slack.” that just runs continuously 24/7, that would be beyond what nanobot could do. However, something like “find all the receipts in my emails for this year, then zip and email them to my accountant for my tax return” is something nanobot would do.

        [0] https://github.com/HKUDS/nanobot/blob/main/nanobot_arch.png

        • Sure, instruction tuned models implicitly plan, but they can easily lose the plot on long contexts. If you're going to have an agent running continuously and accumulating memory (parsing results from tool use, web fetches, previous history, etc.), then plan decomposition, persistence and error recovery seems like a good idea, so you can start subagents with fresh contexts for task items and they stay on task or can recover without starting everything over again. Also seems better for cost since input and output contexts are more bounded.
      • I don’t know what these planners do, but I’ve had reasonably good luck asking a coding agent to write a design doc and then reviewing it a few times.
    • RAG is broken when you have too much data.
      • Specifically when the document number reaches around 10k+, a phenomenon called "Semantic Collapse" occurs.

        https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Halluc...

        • So you're telling me rampancy ( https://www.halopedia.org/Rampancy ) is real.
        • > Specifically when the document number reaches around 10k+

          Where are you getting this? just read the paper and not seeing it -- interested to learn more

          • The RAG GP used suffered from semantic collapse.
      • Gemini with Google search is RAG using all public data, and it isn't broken.
        • It's not tool use with natural language search queries? That's what I'd expect.
          • It's RAG via tool use, where the storage and retreival method is an implementation detail.

            I'm not a huge fan of the term RAG though because if you squint almost all tool use could be considered RAG.

            But if you stick with RAG being a form of "knowledge search" then I think Google search easily fits.

          • It is tool use with natural language search queries but going down a layer they are searched on a vector DB, very similar to RAG. Essentially Google RankBrain is the very far ancestor to RAG before compute and scaling.
      • Cant you make thresholds higher?

        Hmm... I guess not, you might want all that data.

        Super interesting topic. Learning a lot.

  • The 4k LOC claim is interesting but I think the real insight is about what you remove rather than what you keep. Looking at the codebase, they've essentially bet that LLMs with 100k+ context windows make most RAG pipelines redundant - just give the agent grep/rg and let it iterate.

    What's clever is treating memory as filesystem ops rather than vector stores. For codebases this works great since code has natural structure (imports, function calls) that grep understands. The question is whether this scales to truly unstructured knowledge where semantic similarity matters.

    Would love to see benchmarks comparing retrieval accuracy vs a proper embedding pipeline on something like personal notes or research papers.

    • Didn’t openclaw switch to vector based because it used way less tokens as it always loaded all memories? Seems way more efficient
  • Okay so is this ”inspired” by nanoclaw that was featured here two days ago?
  • What are people using these things for? The use cases I've seen look a bit contrived and I could ask Claude or ChatGPT to do it directly
    • Here’s a copy of a post I made on Farcaster where I’m unconvinced it’s actually being used at all:

      I've used OpenClaw for 2 full days and 3 evenings now. I simply don't believe people are using this for anything majorly productive.

      I really, really want to like it. I see glimpses of the future in it. I generally try to be a positive guy. But after spending $200 on Claude Max, running with Opus 4.5 most of the time, I'm just so irritated and agitated... IT'S JUST SO BAD IN SO MANY WAYS.

      1. It goes off on these huge 10min tangents that are the equivalent of climbing out of your window and flying around the world just to get out of your bed. The /abort command works maybe 1 time out of 100, so I end up having to REBOOT THE SERVER so as not to waste tokens!

      2. No matter how many times I tell it not to do things with side effects without checking in with me first, it insists on doing bizarre things like trying to sign up for new accounts people when it hits an inconvenient snag with the account we're using, or it tried emailing and chatting to support agents because it can't figure out something it could easily have asked ME for help with, etc.

      3. Which reminds me that its memory is awful. I have to remind it to remind itself. It doesn't understand what it's doing half the time (e.g. it forgets the password it generated for something). It forgets things regularly; this could be because I keep having to reboot the server.

      4. It forgets critical things after compaction because the algorithm is awful. There I am, typing away, and suddenly it's like the Men in Black paid a visit and the last 30min didn't happen. Surely just throwing away the oldest 75% of tokens would be more effective than whatever it's doing? Because it completely loses track of what we're doing and what I asked it NOT to do, I end up with problem (1) again.

      5. When it does remember things, it spreads those memories all over the place in different locations and forgets to keep them consistent. So after a reboot it gets confused about what is the truth.

      • i've never had situations where i prompt and had to go out for coffee or a walk or drive. one shotting - your first prompt. perhaps.

        but like a person - when the possibility of going off in the wron g direction is so high, i've always had 1 - 2 line prompts, small iterations much more appealing. The only times i've had to rollback would be when i run out of credits, and a new model cant deal with the half baked context, errors, refactoring.

      • there's an entire cohort on HN who still claim AI is utterly and completely useless despite in your face evidence. Literally people making a similar claim word for word who say that they don't understand the hype that they used AI themselves and it's shit.

        Meanwhile my entire company uses AI and the on the ground reality for me versus the cohort above is so much at odds with each other we're both claiming the other side is insane.

        I haven't used these bots yet but I want to see the full story. Not just one guys take and one guys personal experience. The hype exists because there are success stories. I want to hear those as well.

        • The comment above was saying OpenClaw was useless relative to their other heavy AI usage.

          The person you’re criticizing says they’re a heavy AI user. The take was about OpenClaw, not AI.

          • Yeah, and he's basically asking for more OpenClaw success stories.
        • I don’t know how you came to that conclusion from my comment. I’m talking about a particular product named OpenClaw, representing a new style of doing work; not AI in general.

          I dropped $200 on Claude Max in my personal capacity to test OpenClaw because I use Opus 4.5 all day in Cursor on an enterprise subscription… because it works for those problems.

          • >I don’t know how you came to that conclusion from my comment. I’m talking about a particular product named OpenClaw, representing a new style of doing work; not AI in general.

            Right, I'm saying AI in general is an example of the unreliability of peoples experiences on openclaw. If people are so unreliable about the narrative of AI, I don't trust the narrative of openclaw which on this thread in particular is very negative and in stark contrast to the hype.

            >I dropped $200 on Claude Max in my personal capacity to test OpenClaw because I use Opus 4.5 all day in Cursor on an enterprise subscription… because it works for those problems.

            The comment wasn't directed at you personally. I'm just saying I want to see counter examples of openclaw succeeding, not just examples of it failing. Frankly on this thread there's Zero success stories which I find sort of strange.

        • What do you use AI for?

          Pretty much everyone in my company also uses AI. But everyone sees the same downsides.

          • Yep. But on HN, there's a huge cohort of people saying AI is useless.

            Everyone sees the downsides but the upside is the one everyone is in denial about. It's like yeah, there's downsides but why is literally everyone using it?

        • You’re correct. Any statement by HN users that something is useless has no value because they say that about useful things too.

          Moltbot has the shape of the future but doesn’t feel like it to me. Sort of like Langchain once was. Demonstrated some new paradigm shift but is itself flawed so may not be the implementation that lasts. Time will tell.

          The only thing here to say is “put it in a VM and try it”. It’s easy to try.

        • There's people saying AI isn't living up its hype / valuation, I don't see many saying "utterly useless".

          And there's plenty who worship at the altar of Claude.

          • >There's people saying AI isn't living up its hype / valuation, I don't see many saying "utterly useless".

            There's more people saying AI doesn't live up to the hype. The people who are saying it's utterly useless is still quite large on HN. It's just that most of them are midway through changing their story because reality is smashing them in the face.

            >And there's plenty who worship at the altar of Claude.

            I mean who doesn't use it? No one claims it's perfect or a god of code. But if you're not using it you're behind.

            • > There's more people saying AI doesn't live up to the hype.

              It is possible they are correct and nothing you have written suggests otherwise.

              > The people who are saying it's utterly useless is still quite large on HN.

              Are these people's opinions less valid than your own? Are you angry your opinion might be a minority on this one website?

              > It's just that most of them are midway through changing their story because reality is smashing them in the face.

              You made this up.

              > But if you're not using it you're behind.

              Yeah, well, you know, that’s just, like, your opinion, man

    • Disclaimer: Haven't used any of these (was going to try OpenClaw but found too many issues). I think the biggest value-add is agency. Chat interfaces like Claude/ChatGPT are reactive, but agents can be proactive. They don't need to wait for you to initiate a conversation.

      What I've always wanted: a morning briefing that pulls in my calendar (CalDAV), open Todoist items, weather, and relevant news. The first three are trivial API work. The news part is where it gets interesting and more difficult - RSS feeds and news APIs are firehoses. But an LLM that knows your interests could actually filter effectively. E.g., I want tech news but don't care about Android (iPhone user) or MacOS (Linux user). That kind of nuanced filtering is hard to express as traditional rules but trivial for an LLM.

      • But can't you do the same using appropriate MCP servers with any of the LLM providers? Even just a generic browser MCP is probably enough to do most of these things. And ChatGPT has Tasks that are also proactive/scheduled. Not sure if Claude has something similar.

        If all you want to do is schedule a task there are much easier solutions, like a few lines of python, instead of installing something so heavy in a vm that comes with a whole bunch of security nightmares?

        • > But can't you do the same just using appropriate MCP servers with any of the LLM providers?

          Yeah, absolutely. And that was going to be my approach for a personal AI assistant side project. No need to reinvent the wheel writing a Todoist integration when MCPs exist.

          The difference is where it runs. ChatGPT Tasks and MCP through the Claude/OpenAI web interfaces run on their infrastructure, which means no access to your local network — your Home Assistant instance, your NAS, your printer. A self-hosted agent on a mac mini or your old laptop can talk to all of that.

          But I think the big value-add here might be "disposable automation". You could set up a Home Assistant automation to check the weather and notify you when rain is coming because you're drying clothes on the clothesline outside. That's 5 minutes of config for something you might need once. Telling your AI assistant "hey, I've got laundry on the line. Let me know if rain's coming and remind me to grab the clothes before it gets dark" takes 10 seconds and you never think about it again. The agent has access to weather forecasts, maybe even your smart home weather station in Home Assistant, and it can create a sub-agent, which polls those once every x minutes and pings your phone when it needs to.

          • But if you run e.g. Claude/Codex/opencode/etc locally you also have access to your local machine and network? What is the difference?
        • OpenClaw allow the LLM to make their own schedule, spawn subagents, and make their own tool.

          Yes, basically just some "appropriate MCP servers" can do. but OpenClaw sell it as a whole preconfigured package.

      • I have a few cron jobs that basically are `opencode run` with a context file and it works very well.

        At some point OpenClaw will take over in terms of it's benefits but it doesn't feel close yet for the simplicity of just run the job every so often and have OpenCode decide what it needs to do.

        Currently it shoots me a notification if my trip to work is likely to be delayed. Could I do it manually well sure.

      • But this could be done for 1/100 the cost by only delegating the news-filtering part to an LLM API. No reason not to have an LLM write you the code, too! But putting it in front of task scheduling and API fetching — turning those from simple, consistent tasks to expensive, nondeterministic ones — just makes no sense.
        • Like I said, the first examples are fairly trivial, and you absolutely don't need an LLM for those. A good agent architecture lets the LLM orchestrate but the actual API calls are deterministic (through tool use / MCPs).

          My point was specifically about the news filtering part, which was something I had tried in the past but never managed to solve to my satisfaction.

          The agent's job in the end for a morning briefing would be:

            - grab weather, calendar, Todoist data using APIs or MCP  
            - grab news from select sources via RSS or similar, then filter relevant news based on my interests and things it has learned about me  
            - synthesize the information above
          
          The steps that explicitly require an LLM are the last two. The value is in the personalization through memory and my feedback but also the ability for the LLM to synthesize the information - not just regurgitate it. Here's what I mean: I have a task to mow the lawn on my Todoist scheduled for today, but the weather forecast says it's going to be a bit windy and rain all day. At the end of the briefing, the assistant can proactively offer to move the Todoist task to tomorrow when it will be nicer outside because it knows the forecast. Or it might offer to move it to the day after tomorrow, because it also knows I have to attend my nephew's birthday party tomorrow.
      • That’s ChatGPT Pulse
    • I spun up an Debian stable ec2 vm (using an agent + aws cli + aws-vault of course) to host openclaw, giving it full root access, and I talk to it on discord.

      It's a little slow sometimes, but it's the first time I've felt like I have an independent agent that can handle things kind of.

      The only two things I did were 1. Ask it to create a Monero address so I could send it money, and have it notify me whenever money is sent to that address. It spun up its own monerod daemon which was really heavy and it ran out of space. So I had to get it to use the Monero wallet instead, but had to manually intervene to shut down the monerod daemon and kill the process and restart openclaw. In the end it worked and still works. 2. I simply asked it "@ me the the silver price every day around 8am ET" and it just figured out how to do it and schedule it. To my understanding it has its own cron functionality using a json file. 3. Write and host some python scripts I can ping externally to send me a notification

      I've had it done other misc stuff, but ChatGPT is almost always better for queries, and coding agents + Zed is much better for coding. But with a cheap enough vm and using openrouter plus glm 4.7 or flash, it can do some quirky fun stuff. I see the advantage as mainly having control of a system where it can have long term state (like files, processes, etc) and manage context itself. It is more like glue and it's full mastery and control of a Linux system gives it a lot of flexibility.

      Think of it more as agent+os which you aren't getting with raw Claude or ChatGPT.

      I've done nothing that interesting with it, it's absolutely a security nightmare, but it's really fun!

    • I couldn't really use OpenClaw (it was too slow and buggy), but having an agent that can autonomously do things for you and have the whole context of your life would be massively helpful. It would be like having a personal assistant, and I can see the draw there.
    • I have no idea. the single thing I can think of is that it can have a memory.. but you can do that with even less code. Just get a VPS. create a folder and run CC in it, tell it to save things into MD files. You can access it via your phone using termux.
      • You could, but Claude Code's memory system works well for specialized tasks like coding - not so much for a general-purpose assistant. It stores everything in flat markdown files, which means you're pulling in the full file regardless of relevance. That costs tokens and dilutes the context the model actually needs.

        An embedding-based memory system (letta, mem0, or a self-built PostgreSQL + pgvector setup) lets you retrieve selectively and only grab what's relevant to the current query. Much better fit for anything beyond a narrow use case. Your assistant doesn't need to know your location and address when you're asking it to look up whether sharks are indeed older than trees, but it probably should know where you live when you ask it about the weather, or good Thai restaurants near you.

    • Yeah, I don't get it either. Deploy a VM that runs an LLM so that I can talk to it via Telegram... I could just talk to it through an app or a web interface. I'm not even trying to be snarky, like what the hell even is the use case?
      • Difference is that openclaw is not LLM but engine that spawns up agent that interact with LLM and the system its installed on.

        It can have full access to the system it’s running on. So it can browse internet via browser, run cli commands, api’s via skills etc.

        Idea is to act like a Jarvis personal assistant. You tell what to do via chat e.g telegram, then it does it for you.

      • It's not even an LLM it's just to pipe api calls.
    • One significant advantage over Claude/ChatGPT is that your own agent will be able to access many websites that block cloud-hosted agents via robots.txt and/or IP filters. This is unfortunately getting more common.

      Another is that you have access to and control over its memory much more directly, since it's entirely based on text files on your machine. Much less vendor lock-in.

  • Yeah I mean idk, my takeaway from OpenClaw was pretty much the same - why use someone's insane vibecoded 400k LoC CLI wrapper with 50k lines of "docs" (AI slop; and another 50k Chinese translation of the same AI slop) when I can just Claude Code myself a custom wrapper in 30 mins that has exactly what I need and won't take 4 seconds to respond to a CLI call.

    But my reaction to this project is again: Why would I use this instead of "vibecoding" it myself. It won't have exactly what I need, and the cost to create my own version is measured in minutes.

    I suspect many people will slowly come to understand this intrinsic nature of "vibecoded software" soon - the only valuable one is one you've made yourself, to solve your own problems. They are not products and never will be.

    • px43
      "Open source" is no longer about "Hey I built this tool and everyone should use it". It's about "Hey I did this thing and it works for me, here's the lessons I learned along the way", at which point anyone can pull in what they need, discard what they don't, and build out their own bespoke tool sets for whatever job they're trying to accomplish.

      No one is trying to get you to use openclaw or nanobot, but now that they exist in the world, our agents can use the knowledge to build better tooling for us as individuals. If the projects get a lot of stars, they become part of the global training set that every coding agent is trained against, and the utility of the tooling continues to increase.

      I've been running two openclaw agents, and they both made their own branchs, and modified their memory tooling to accommodate their respective tasks etc. They regularly check for upstream things that might be interesting to pull in, especially security related stuff.

      It feels like pretty soon, no one is going to just have a bunch of apps on their phone written by other people. They're going to have a small set of apps custom built for exactly the things they're trying to do day to day.

      • > Open source" is no longer about "Hey I built this tool and everyone should use it".

        Was open source ever about that? I thought it was "Hey I built this tool and I'm putting it on internet if anyone wants to use it" often accompanied by a license saying "no warranties".

        > It feels like pretty soon, no one is going to just have a bunch of apps on their phone written by other people. They're going to have a small set of apps custom built for exactly the things they're trying to do day to day

        I think today's AI tools like Agents are for people who are programmers but don't want to program, not ones who aren't programmers and don't want to program. As in, "no one is going to..." is a very broad statement to make for an average person who just uses apps on thier phone. Your average person will not start vibe coding their own apps just because they can (because they couldn't care less).

      • "If the projects get a lot of stars, they become part of the global training set that every coding agent is trained against, and the utility of the tooling continues to increase."

        OpenClaw currently has 1.8k issues, 400k lines of code, had an RCE exploit discovered just a few days ago, it takes 5 seconds to get a response when I type "openclaw" in my CLI and most of the top skills are malware. I'm pretty sure training on that repository is the equivalent to eating a cyanide pill for a coding model.

        I actually agree with your take that custom apps will take over a subset of established software for some users at some point, but I don't think models poisoning themselves with recklessly vibecoded bloatware is how we get there at all.

      • Are you me?? I'm literally building highly personalized and/or idiosyncratic software with claude to solve personal and professional problems.

        Thanks to tauri, I've now made two desktop apps and one mobile app for the first time in the last two months.

        None of this was nearly as feasible just a year ago

    • It is not about making it yourself but a tradeoff between how much it can be controlled and how much has seen the real world. Adding requirements learned by mistakes of others is slower in self-controlled development vs an open collaboration vs a company managing it. This is the reason vibe-coded(initial requirements) projects feels good to start but tough to evolve(with real learnings).

      Vibe-coded projects are high-velocity but low-entropy. They start fast, but without the "real-world learnings" baked into collaborative projects, they often plateau as soon as the problem complexity exceeds the creator's immediate focus.

    • So, as an OpenClaw disliker, the agent harness at the core of it (pi) is really good, it's super minimal and well designed. It's designed to be composed using custom functionality, it's easy to hack, whereas Claude Code is bloated and totally opinionated.

      The thing people are losing their shit over with OpenClaw is the autonomy. That's the common thread between it, Ralph and Gastown that is hype-inducing. It's got a lot of problems but there's a nugget of value there (just like Steve Yegge's stuff)

      • The core "design" not bad, but the "code" quality is .. mid.

        They are basically keep breaking different feature on every release.

    • What I read is the unlimited token count. You get the most out of this when having it run on an autonomous loop where your interaction is much more minimal? But pinging the thing every minute in a loop is going to terminate your token limit so running the LLM locally is the way to get infinite tokens.

      The problem is local models aren't as good as the ones in the cloud. I think the success stories are people who spent like 2-4k on a beefy system to run OpenClaw or these chatbots locally.

      The commands they run are, I assume like detailed versions of prompts that are essentially: "build my website." "Invest in stocks." And then watch it run for days.

      When using claude code it's essentially a partnership. You need to constantly manage it and curate it for safety but also so the token count doesn't go overboard. With a fully autonomous agent and unlimited token count you can assign it to tasks where this doesn't matter as much. Did the agent screw up and write bad code? The point is you can have the system prompt engage in self correction.

    • I mean, in not vibecoding it yourself you are already saving tokens... Personally, I see no benefit in having an instance of something like this... so, I wouldn't spend tokens, and I wouldn't spend server-time, or any other resource into it, but a lot of people seem to have found a really nice alternative to actually having to use their brains during the day.
      • > a lot of people seem to have found a really nice alternative to actually having to use their brains during the day.

        Or have they have found a way to use their brains on what they deem as more useful, and less on what is rote?

        • I see this retort pasted everywhere. What exactly are you referring to? I think it's fair to assume any competent person never spends their brain in what may be considered as rote in the first place. If one was doing that, well it's unfortunate.

          I just keep coming to the conclusion about devs who use agents or other AI tooling extensively: these are programmers who did not like to program.

        • Yeah, I guess I just don't really have a lot of meaningful things to take care of.
      • I do see the potential in something like OpenClaw, personally, but more as a kind of interface for a collection of small isolated automations that _could_ be loosely connected via some type of memory bank (whether that's a RAG or just text files or a database or whatever). Not all of these will require LLMs and certainly none of them will require vibecoding at all if you have infinite time; But the reality is I don't have infinite time, and if I have 300 small ideas and I can only implement my like 10 of them a week by myself, I'd personally rather automate 30 more than just not have them at all, you know?

        But I am talking about shell scripts here, cronjobs, maybe small background services. And I would never dare publish these as public applications or products. Both because I feel no pride about having "made" these - because, you know, I haven't, the AI did - and because they just aren't public facing interfaces.

        I think the main issue at the moment is that so many devs are pretending that these vibecoded projects are "products". They are not. They are tailor-made, non-recyclable throwaway software for one person: The creator. I just see no world at the moment where I have any plausible reason to use someone else's vibecoded software.

        • Our team doesn't use things like OpenClaw. We use Windmill, which is a workflow engine that can use AI to program scripts and workflows. 90% of our automated flows are just vanilla python or nodejs. We re-use 10% of scripts in different flows. We do have LLM nodes and other AI nodes, and although windmill totally supports AI tool calling/Agentic use, we DON'T let AI agents decide the next step. Boring? Maybe. Dependable? Yes.
  • Can this be sandboxed? I've been running OpenClaw in a VM on macOS, which seems more resource intensive than necessary.
  • Is this something I run for my company in Slack, where employees send messages and the LLM processes the text, uses the functions I created to handle different tasks, and then responds back?
  • Not bad, but I’m a bit skeptical. Is it mainly about the way of working in IM?
  • What are your solutions for if your AI bot wants to leak your credentials?
  • Has anyone managed to get the WhatsApp integration working and chatting that way?
  • can anyone breakdown a comparison of multi-agent vs subagent?

    looking for pro's and cons.

  • What? OpenClaw has 450kLoC? Why?
  • I'd like to see one of these in Rust (over Python, Node, etc) and in Apple's container environment.
  • The main novelty I see in openclaw is the amount of channels and how easy it is to set them up. This just has whatsapp, telegram & feishu
  • Bottom Line HAL‑AI‑2 is a real system. Nanobot is a toy. They are not peers. They are not even in the same category. Nanobot is useful only as a conceptual sketch of an agent loop. HAL‑AI‑2 is the substrate you’ve been building toward for months.