Face Recognition Couldn't See My Brown Family. So We Found One That Could.

Photo by freestocks.org on Pexels.com

I had 155,919 photos spread across five devices. Three iPhone exports, a MacBook backup, folders from a Vista laptop I haven’t touched since 2012, and whatever survived a Windows XP machine that once lived under my desk. All of it dumped into a clunky hard drive carrying years of documents and memories.

I wanted them organized. That was the whole ask. Put them in folders by year and month, kill the duplicates, make it something I could actually scroll through without seeing the same beach photo from 2017 six times because three devices each thought they owned it.

So I asked Claude to help me build the system. And what followed was days of problem-solving that taught me more about how I’ve been storing my life than any photo album ever could.

The Duplicates

The first pass found that 87,890 of those files were duplicates. Fifty-six percent. More than half my photo library was copies of copies. iPhones syncing to MacBooks syncing to OneDrive syncing to export folders I’d made in a panic before switching phones. Every device faithfully preserving what every other device had already preserved.

We wrote a PowerShell script that hashed every file, compared them, and kept one. The number that survived: 68,029.

The Wrong Years

Here’s the thing nobody tells you about iPhone photo exports. When you dump your camera roll, every photo lands in a folder stamped with the export date, not the date you took it. So a July 2018 export of my entire camera roll created a folder called “2018” that was full of photos from 2015, 2016, 2017. Thousands of them.

The original organization script had trusted those folder names. It put everything where the folder said it belonged. Which meant 19,447 photos were filed under the wrong year.

We had to go back in, read the EXIF data buried inside each file, compare the actual capture date to the folder it was sitting in, and move everything. 7,396 photos moved out of 2018 alone, most of them landing in 2016 where they’d always belonged. A year that had zero photos in it suddenly had 7,470.

My 2016 came back.

The Faces

I wanted to find every photo of my daughters. Ammu and Pattu, twins. Laddu, our youngest. I wanted to see all three of them across time, gathered in one place.

The first face recognition tool we tried, OpenCV, couldn’t do it. It clustered random white faces from random photos with confidence, because that’s what the training data knew. My largely brown family barely registered. Saathi, Laddu, my mother Geetha, my niece Sruthi, all missed or mismatched. Laddu’s baby photos kept getting confused with Sruthi’s. The model wasn’t built for us. We threw it out.

Google Photos was no help either. Their API locked down face recognition data in 2025. You can tag faces in the app, but you can never export those tags. Your own metadata, held hostage.

We landed on InsightFace with ArcFace, a model trained on a far more diverse dataset. It scanned 68,108 photos, found 127,754 faces, and matched them against reference photos I provided. Ammu: 11,201 photos. Pattu: 8,477. Saathi: 6,087. Laddu: 6,040. Geetha: 1,881.

Numbers that felt, for a moment, like a census of love.

The Stumbles Along the Way

Nothing about this was smooth. My machine runs Python 3.14, and half the AI packages don’t ship pre-built for it yet. We had to install Microsoft Visual C++ Build Tools just to compile the face recognition library. Sessions crashed and lost all their context, so we had to build a memory system that saved progress between conversations. Then, my collection of K-pop idols that have been building steadily since 2022. The first celebrity detection pass was too aggressive and nearly sorted my own Instagram posts into the “not family” pile.

Every fix created a new problem. Every new problem taught me something about the data.

What I Actually Wanted

Somewhere around the EXIF correction, when 2016 filled up with photos I hadn’t seen in years, toddler photos of the girls I’d thought were lost in the wrong folder, I realized I hadn’t been organizing files. I’d been organizing memory.

The mess on my hard drive wasn’t a technical problem. It was the accumulated weight of a life lived across devices and continents and identities. Photos from Chennai mixed with saved images from the adoption paperwork mixed with screenshots I’d forgotten I’d taken. All of it piled together because that’s how life actually arrives. Not sorted. Not labeled. Just continuous.

What I wanted wasn’t a clean folder structure. I wanted to be able to find my daughters’ faces across sixteen years. I wanted to see my mother in photos I’d forgotten I had. I wanted to braid those images with the words I’ve been writing on my blog, the essays about belonging and identity and the long work of building a family across cultures. I wanted to make something my kids could hold.

Photo by Ayşenaz Bilgin on Pexels.com

A photo album isn’t storage. It’s an argument about what mattered.

The same instinct surfaced elsewhere this year. The database I built for Saathi inside his own engine was the impulse applied to a different mess of accumulated data. The split between the librarian who finds and the historian who knows what was at stake is what InsightFace gave me, finally. The model can find Ammu in 11,201 photos. It cannot say which one is the one I want to keep.

The Tools We Built

For anyone who wants to do this, here’s what we ended up with:

A PowerShell script to organize and deduplicate that takes a chaotic photo dump and sorts it into Year/Month folders, removes duplicates by file hash, and names everything by date and source device. It handled 155,919 files in one pass.

A PowerShell script to add new batches that lets you throw new photos at the existing collection. It deduplicates against what’s already organized and slots new files into the right folders. I’ve already used it once again for 10,000 downloaded images from my Google Photos.

A Python script for face recognition that scans your entire library for faces, clusters them, and matches against reference photos you provide. Drop a clear photo of someone into the reference folder, run the script, and it finds every photo of that person across your whole collection.

All of it running locally on a Windows laptop with no cloud dependency. Your photos stay yours.

What’s Next

I’m going to use the face tagger to pull together photo reels for each of my girls. Ammu across sixteen years. Pattu across sixteen years. Laddu across every year she’s been alive. We already started on Laddu’s birthday book, pulling notes from my blog posts over the years and pairing them with photos from each year, building something she can hold in her hands and know she was seen, every single year, in words and in pictures.

The photos are organized now. The work of turning them into something worth keeping has just started.

If this moved you, send a tip.

I Set Out to Organize 155,919 Photos. I Ended Up Organizing Memory