Next.js Hacker News
  • top|
  • new|
  • ask|
  • show|
  • jobs|
  • GitHub
Ask HN: How should I convert Microsoft Word documents to Markdown?
5 points by lkrubner 1 day ago | 7 comments
  • mackatsol
    Pandoc is awesome for this: `pandoc input.docx -o output.md` There's more you can do, with style sheets and so on, which you will likely have to dig into for the tables and multiple columns to come out the way you want. You can also extract media files from inside a docx file: `pandoc --extract-media=. input.docx -o output.md`
  • qup
    I'd give an llm a shot before I ruled it out.

    I had it generating .docx the other day and it did pretty well, so I assume it understands the format just fine.

    And they're excellent at markdown.

  • kha1n3vol3
    Start with pandoc before reinventing the wheel.
  • ramoz
    Pandoc might be able to do this, found this:

    https://gist.github.com/plembo/409a8d7b1bae66622dbcd26337bbb...

  • dhruvyads
    Claude Code can do these types of really well unless you're trying to convert in bulk
  • snailshare
    Pandoc can do this I think
  • verdverm
    Native support: https://techcommunity.microsoft.com/blog/onedriveblog/introd...

    Microsoft OSS python: https://github.com/microsoft/markitdown

    There seem to be many addons that enable this, and pandoc as others have suggested

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact