• I’ve been lately trying to build something similar, where did you acquire case law database to build this tool on top?
    • Thanks for asking! We're using a multi-source approach:

      Case law: Google Scholar + CourtListener's bulk data (great coverage of federal and state appellate decisions).

      Statutes & regulations: Currently using Justia for state statutes, but working on scraping directly from state legislature sites. U.S. Code from the Office of Law Revision Counsel's XML releases, and eCFR's APIs for federal regulations.

    • Same question. This is a massive undertaking to do well, and incumbents Lexis and Westlaw gate API behind their own crappy agent. Unsure about vLex or Bloomberg. What’s the scope? U.S. Federal? State? Administrative hearings?

      Part of the value of the incumbents are the case treatment (Sheparidizing) and case issue categories. The world is due for a disruptor, but sadly court data is terrible to get. Bad formats, paygates, slow publishing times, and in some cases physically going to the courthouses

  • One problem with AI in law is AI's tendency to hallucinate non-existent cases and invent citations. This should be easy to solve and if you've done so, congratulations.
    • To vet correctly, you now need a fortune 500 legal department. Good luck.
  • How exactly do you handle hallucinations? Hallucinations need not just be in the citations, right? And what if there are hallucinations without any citations?
    • You're right - hallucinations aren't limited to citations. We see a few failure modes:

      Fabricated citations: Case doesn't exist at all

      Wrong citation: Case exists but doesn't say what the model claims

      Misattributed holdings: Real case, real holding, but applied incorrectly to the legal question

      From our internal testing, proper context engineering significantly reduces hallucination across the board.

      Once we ground the model in the relevant source documents, hallucination rates drop substantially.

    • The lawyer can handle hallucinations by reading the underlying case. For example, "Brady exempts the prosecution from turning over embarassing evidence. See, Brady v. Maryland, 373 US 83 (1963)." If you're a lawyer, you know Brady doesn't say this at all. To be sure, you have to read the case. Errors in the citation are like typos. The must still be corrected, but an occasional typo is not the end of the werld.
    • It’s called grounding and ChatGPT and Gemini both do it by linking the appropriate sources
  • How do we know it’s not just a crappy wrapper? What’s the difference between just uploading documents into a general purpose LLM and asking it to cite sources?

    I would also add as feedback that it’s kind of scammy to use the word “open” and “.org” like this when you’re running a for-profit business. It’s not illegal but it feels unethical. Just because OpenAI made fake non-profit status popular doesn’t mean you have to follow that oath.

    > This free tier will be subsidized by our enterprise functions

    I assume you are not in any way a non-profit organization.