r/DataHoarder

Convert JPEG to TIFF for archiving photos?

I found a host of family photos that were taken on a digital camera in the 2000s, they are all JPEG. I have them on my cloud but I want to put them on my new HDD which I'm using to back up all my photos, media, etc. Is there any benefit to converting them to TIFF (because I heard it was better for archiving), or is there no point since the photos were originally taken as JPEGs?

reddit.com
u/pasta-pesto — 3 hours ago

I finally decided what I want to hoard.

I've been hanging out here for a few months after a chance link in another sub showed me this place. The things you guys do to hold on to data are so cool. I feel like archiving is so important in an era where all kinds of media are being faked, destroyed, or removed for the sake of money. So I've been quietly reading and thinking about what I want to save.

I finally decided that I want to archive the media of my childhood. Things like cartoons and kids' shows aren't at the top of the list for many people. But I want to be able to share the things I loved, the things that made me who I am, with my kids one day. It's also a type of media that has degraded rapidly in quality lately, with stuff like YouTube Kids getting flooded with bizarre and unmoderated slop.

I could use some advice on storage. I've read a little about M-Discs and I think that might work for me. I want something that's "set it and forget it", not "I have to check this every month to make sure it hasn't died for some esoteric reason". I don't know if they're any good for frequent reading, but it's fine if not. I figure I can archive with them and then copy onto DVD/Blu-ray when I want the data available years in the future.

I also need some suggestions on how to include captions. I have some audio processing issues, so being able to read captioning has always been important. Captions are also an educational tool for learning how words are spelled and pronounced. I want to make sure captioning is included as much as possible, even if I need to bake them into the image recording.

reddit.com
u/baar-ur — 8 hours ago

Creating .zim for Malaysia's online dictionary, Dewan Bahasa dan Pustaka

[X-Post from /r/Kiwix]

Hey there, I need help with making a .zim files of our governmental body website that helps oversees the national language dictionary. Think of it like the easily accessible online Cambridge Dictionary or the Merriam-Webster Dictionary. The site link is as below:

https://prpm.dbp.gov.my/

Since it requires user input of searching for words before displaying the page, I'm guessing this would require some sort of workaround that I am not familiar with in order to scrape the words database. I have tried with the Zimit website and it online gives the frontpage of around 400kb XD (please forgive me for my noobness).

My request: Is it possible for this website to be archived to zim? If it can be, would you kind enough to direct me to the righy direction to do so?

My reason: I want to have our language's website be accessible by students in school from deep rural areas where Internet access can be limited and patchy. Setting up offline Kiwix Wikipedia has been tremendous for us, and the next step is for us to have dictionary that we can use to bridge the gap between English-Bahasa Melayu so students can then use the English Wikipedia just as well as the Malay Wikipedia too.

reddit.com
u/AmirulAshraf — 4 hours ago

CBS Radio News audio

Looks like they have the archives going back at least to *2009*

URL format for the top of the hour newscasts is https://audio.cbsradionewsfeed.com/YYYY/MM/DD/HH/Hourly-HH.mp3 (where HH is 01 to 24 for time in Eastern Time) and https://audio.cbsradionewsfeed.com/YYYY/MM/DD/HH/Update-HH.mp3 for the bottom of the hour news brief.

Wonder what the best way to download it all might be, if anyone isn't already downloading it. Probably would be placed in directories by year and month, with the filename including the day and hour to avoid too many subfolders. Put it on the Internet Archive and/or Usenet. It would probably be about 100 MB a day though, 4 gigs a year, 70 gigs for the whole thing, and while the actualities are great there is also a lot of mundane stories about stuff like holiday travel and random fires in the west. Any interest in this?

reddit.com
u/Binders-Full — 4 hours ago

Advice and pointers welcome.

Hi, I'm in the initial stages of planning a long term scientific project that will involve storing multiple video files. The current plan is to run 2 or even 3 high speed/definition cameras for up to 10 hours a day and store the recordings to create a data set, so the recordings are the purpose of the project. Does anyone know of a capacity calculator that I can use to get an idea of how much storage per month/week/year Ill need for this, and any recommendations for a rugged storage enclosure that is resistant to low temperatures. The intention for this enclosure is to store the initial copies of any recordings, but it will be the first part of redundant storage array with probably 2 nodes. If possible Id like to have both nodes mirrored with one offsite that will also become the back end storage of a website in the future, any recommendations on that or file formats for the recordings that would allow for compression without losing any resolution. Thank you in advance.

reddit.com
u/Utwig_Chenjesu — 6 hours ago

Seagate Data Recovery

I recently lost a drive that contained my Plex library - sensibly, that was backed up on a 26TB Seagate External - sadly, this now looks like it's dead. Luckily it's within warranty so it will be replaced. Seagate also offer a Data Recovery Service - this would save an awful lot of time and effort to rebuild my movie database. However, should some copyrighted material have found it's way onto that drive, will Seagate care? Or will I get a scary letter/knock on the door from the Feds?

reddit.com
u/Arcal — 13 hours ago

I have a bunch of old laptop HDDs that I am using as external hard drives. Is there a cheap case I can buy for them on the internet?

They all have the standard SATA 2.5" connecter. And I would prefer one that does not have an adapter, just a hole to stick my adapter into.

reddit.com
u/Jax_King55 — 9 hours ago

Fixing a Pending Sector Count Without a Full Wipe?

My WD Passport drive has a CPSC of 4 at the moment. For several reasons I cannot just backup and full-wipe:

- The drive is a slow, 4800 RPM, SMR drive that I have to copy in small bursts to, as a large copy without "time to breathe" will tank write speeds... So copying back to the drive would take eons.

- Even if I was willing to sink the time in, I also don't just have 4.3 TB of extra space for all the stuff on the drive lying around so I could even do a backup.

So my real question is: since these pending sectors are known, is there a way I can force a write to their specific spot so the drive can finally determine whether they're dead or not? Because naturally writing data has only brought the count randomly down from 6 to 4 over the course of months.

reddit.com
u/Hawthm_the_Coward — 14 hours ago

Backing Up Colbert’s YouTube Channel

Hi all, is anyone here aware of any efforts to back up The Late Show YouTube, as now that the show has finished I have a feeling CBS will try and kill it as quickly as possible…

reddit.com
u/abthegeek93 — 24 hours ago

Best way to handle growing YouTube videos archive?

Heya,

I have around 5+TB of YouTube videos from my "Watch Later", "Liked" and other playlists I archived over the years, now I need a bit more space on my NAS.

Due to the still rather high prices (and growing...) of hard drives in Austria I can't really build another 5 drive NAS just yet, I was already looking up 18TB drives to expand my current storage capability but that'll cost quite a bit... I do however already have the enclosures.

So... I was wondering if there may be a public archive for YT videos I can submit these to so I know they'll be in good hands at least :)

Thanks!

reddit.com
u/EpicLPer — 19 hours ago

Toshiba 2TB - good recommendation?

I've been doing research into HDDs and yes, I am planning to use the 3-2-1 method. But to start with, I need to know what to use and I've seen a lot of people complain about WD and Seagate failing. I know that all HDDs have the potential to fail at some point, but it seemed from research and looking up that WD and Seagate are less reliable than Toshiba?

Help please!

u/SnowyDeerling — 15 hours ago
▲ 182 r/DataHoarder+5 crossposts

Android's USB MTP always crashes when I try to scan my media folders. So I built an open-source C++/Rust storage analyzer that maps 10,000+ files instantly.

If you've ever tried to figure out what is eating up your Android's storage before doing a massive data dump, you know the pain. Trying to view a /DCIM folder with 10,000+ files over a standard USB cable usually makes Windows Explorer or macOS Finder infinitely load, freeze, or crash because MTP is fundamentally broken for high file counts.

I got tired of waiting 4+ minutes just to see my folder sizes, so I built an open-source analyzer called SocketSweep that bypasses MTP entirely.

How it works (The Architecture): Instead of using standard USB bulk transfers to read the filesystem, it uses a multi-language stack to pull the file tree at bare-metal speeds:

  • The Engine: It pushes a native C++17 daemon to /data/local/tmp via ADB. Because it runs under the shell context, it executes POSIX filesystem traversals natively on the device. (Zero root required).
  • The Bridge: It pipes the raw JSON tree data back to your PC over a local TCP socket tunnel, bridged via adb forward.
  • The UI: A Rust/Tauri desktop app consumes the TCP stream concurrently and maps your entire storage into an interactive React Treemap.

The result: You can visually hunt down your biggest folders and delete the junk instantly. A 4-minute MTP "Loading..." hang becomes a 1.2-second instant scan.

Right now, the first release is compiled for macOS (Windows/Linux builds via GitHub actions are next, but you can build from source). It is GPL-3.0.

(Note: For Android 11+, the app automatically uses an appops ADB command to grant itself Scoped Storage bypass permissions so it can read your full /sdcard without issues).

Let me know what you guys think of the architecture!

u/Cuber2113 — 1 day ago

MD5 checksum automation tools

Hi all,

Note - reposting this from the account I actually use for these things. My apologies.

Am working on a pro-bono archiving project for a filmmaker and thus don’t have institutional support to lean on for this. It involves about 30 large .dpx files - folders with thousands of individual frames scanned from 16mm film at 4K resolution. I was supplied MD5 checksums for each frame. Obviously I need to do due diligence and verify them but equally obvious is the time suck for this to run. (And she wants to make backup drives thus doubling the time…) Adding to the problem is only having access to the computers and hard drives (spinning) a few days a week. What tools or automation strategies can anyone recommend to keep this project from sprawling out over months? (MAC environment.)

Thanks,

Jeff

reddit.com
u/JmartinChicago — 14 hours ago
▲ 111 r/DataHoarder+1 crossposts

Personal Information Management System

I work in construction. Due to the number of documents generated in the process, i had to comply with rigid ISO standards. I thought: why not do the same at home?

Disclaimer: I'm an architectural engineer, not a network architect. This is a synthesis of ISO standards for information management, adapted from practice. I lost enough files to learn how to build a redundant architecture.

This is the result. Done in Affinity.

u/hosamzidan — 23 hours ago

Download all your Saved Posts Collections on Instagram (OPEN SOURCE)

I'm a professional procrastinator - when I distract myself from work by scrolling social media, I manage to build out huge collections of saved posts with different themes which I never came back to because I had no option to organize them.

There was no easy way to download all of them online, so I created my own set of programs and decided to share them with you today

  1. Saved Posts Scraper (Tampermonkey Script). Works with profiles too, explore page, etc. Auto-scrolls until the page is fully loaded (no more loading indicator at the bottom of the page), and captures post URL of each post, which you can then download in a txt file or copy to clipboard
    https://github.com/doncezart/IGbulkCollector

https://preview.redd.it/ta7ujooj9q2h1.png?width=655&format=png&auto=webp&s=e74edb3cd5ec3b0494dcba0eb5793ae00690fb5f

  1. Bulk Instagram Downloader (Python Program). Takes the list of URLs and downloads all the media - videos, photos, carousels. Also generates a JSON with metadata related to said posts - author, caption, post type and some more. This helps in case you have your own media galleries or websites where you want to automate upload or include that metadata. There's also a dashboard to see your JSON in a decent looking GUI
    https://github.com/doncezart/IGbulkDL

https://preview.redd.it/kysbh8je9q2h1.png?width=2733&format=png&auto=webp&s=d45cf6b0688864e9b96987e4e452ecaff92c989d

Well that's it, good luck

reddit.com
u/goldieczr — 14 hours ago

Best strategy for saving PDFs as Markdown?

I have a few thousand PDFs. This is cool, but I want to be able to do stuff with all of this info, rather than just open it in a PDF Reader. Ideally, I want to be able to load it into an Obsidian Vault, but this requires extracting the text and converting it into markdown. But I'm not having much luck with this. The biggest problems are figuring out how to handle footnotes and endnotes (citations), as well as reliably capturing images, figures, etc.

I've had a quick look online, and most discussions just say capturing footnotes is "hard". And then there is a lot of discussion about capturing graph data, etc. which is less important to me.

There must be other people who would prefer to store their texts as markdown than PDF, but I can't seem to find anybody working on solutions to this problem. Does anybody here have any ideas or achieved something like this?

reddit.com
u/DJ_Beardsquirt — 17 hours ago

how to download a private playlist of my college's channel

i took a course and the videos are in the form of a private playlist which only students can access through a portal. i want to download the playlist for my future use, any way i can do that?

reddit.com
u/ContributionFirst454 — 18 hours ago

Need advice from storage wizards

I know this has probably been asked to death, but I could really use some help. I've been getting into hoarding game installers this past year. I really enjoy building up my own version of steam and it's nice to have something to work on in the background.

But now I'm realizing 8TB is weenie-hut junior storage, and I'm also realizing I missed the cheap $/TB era. What am I even supposed to do? I don't know where to buy reliable hard drives that isnt amazon, bestbuy, walmart, or the sellers websites.

I think I can squeeze out another year of this hobby if I get anywhere from 16-28TBs, but the max I can afford for a while is $400-500. Is there a strategy that you more experienced data hoarders use to keep prices low? Is the fact that I need reasonable read/write for downloading and using the installers going to make it harder? Is the second hand market risky?

Any advice helps, sorry if this comes across as a struggle session, I've just been financially locked out of a small hobby of mine and I miss it. Thanks!

reddit.com
u/ToastedBulbasaur — 1 day ago