Suggestion: clarify FAQ #2 + storage/CDN idea to i
Posted by eliekh05 on .
Hi Vimm’s team,
I wanted to suggest a revision to FAQ #2 because checking only file size isn’t always reliable from a data integrity perspective:
Instead of:
“Check your file size with the size shown in The Vault and if they don’t match, try downloading it again.”
It could say something like:
“File size alone can’t guarantee your download is valid — especially since HTTP downloads can truncate or corrupt data silently while still showing the same size (due to browser cache, proxy, or partial file reuse). If possible, compare your file’s hash (CRC32, MD5, SHA-1) with the original hash provided by the Vault or trusted sources. This ensures the actual file content is correct, not just the size.”
Why: many users rely on download managers or browsers that might “resume” partial downloads incorrectly. File size mismatches are an obvious sign of corruption, but matching size with wrong content happens more often than people expect — so hashes are the safest check.
?
Second idea:
Have you considered offloading part of the download load to object storage or CDNs like Amazon S3, Cloudflare R2, Backblaze B2, or Wasabi?
• Pros: faster global delivery, better resilience, lower bandwidth spikes
• You could even let the community help: some users could donate to fund storage or bandwidth, or you could build read-only mirrors and load-balance traffic
• For really large archives, torrent or HTTP/2 multi-source could help too
These options might significantly improve download reliability and speed, especially as the collection keeps growing and demand increases.
Thanks so much for keeping the site alive — it’s one of the most important preservation projects out there!
I wanted to suggest a revision to FAQ #2 because checking only file size isn’t always reliable from a data integrity perspective:
Instead of:
“Check your file size with the size shown in The Vault and if they don’t match, try downloading it again.”
It could say something like:
“File size alone can’t guarantee your download is valid — especially since HTTP downloads can truncate or corrupt data silently while still showing the same size (due to browser cache, proxy, or partial file reuse). If possible, compare your file’s hash (CRC32, MD5, SHA-1) with the original hash provided by the Vault or trusted sources. This ensures the actual file content is correct, not just the size.”
Why: many users rely on download managers or browsers that might “resume” partial downloads incorrectly. File size mismatches are an obvious sign of corruption, but matching size with wrong content happens more often than people expect — so hashes are the safest check.
?
Second idea:
Have you considered offloading part of the download load to object storage or CDNs like Amazon S3, Cloudflare R2, Backblaze B2, or Wasabi?
• Pros: faster global delivery, better resilience, lower bandwidth spikes
• You could even let the community help: some users could donate to fund storage or bandwidth, or you could build read-only mirrors and load-balance traffic
• For really large archives, torrent or HTTP/2 multi-source could help too
These options might significantly improve download reliability and speed, especially as the collection keeps growing and demand increases.
Thanks so much for keeping the site alive — it’s one of the most important preservation projects out there!
