10 points by archivarix 21 hours ago | 2 comments
- Search engine for YouTube content that's no longer on YouTube: deleted, removed, region-blocked, DMCA'd. ~1.5B videos indexed from 2005 onwards by aggregating archive sources Internet Archive Wayback Machine (CDX + HEAD-spread discovery), Common Crawl. What you get for any video ID: metadata (title, description, channel, upload date, duration, view counts, tags), thumbnails, original captions when the archive captured them, and reconstructed URLs to play the archived video file when available. Channel discovery reconciles legacy username/handle eras to a single canonical identity (lots of channels renamed themselves a dozen times — that part was painful).
- Seems pretty cool. So this is a recent project, and you haven’t been working on this since 2005 right?
Have you considered also indexing videos that haven’t been deleted?
- [flagged]
- [flagged]
- [flagged]