I need your help: Seeking anonymized whoop tracking data & your insights on the most reliable metrics!
Hey everyone!
I'm a student currently doing research on software that takes tracked raw data and automatically finds correlations (e.g., "How did late alcohol consumption on Friday affect my HRV on Tuesday?").
Long-term, the algorithm should be able to retrospectively label massive historical datasets entirely on its own (sports, leisure, work, etc.) and make predictions for new situations.
The tech behind it: Unlike typical cloud apps, the whole thing runs completely offline and locally on your phone. I'm experimenting with architectures from current LLM research and applying them to sensor data. The idea is to have a model that makes excellent predictions with just a few data points while being able to handle completely new situations.
My problem: I simply lack the data to train the very first prototype.
My request: I would be incredibly grateful if anyone were willing to provide me with their anonymized tracking data. High-resolution data (gyroscope, accelerometer, HRV, steps, mood) is especially exciting. Also, since algorithms generally extract much better insights from a few highly meaningful metrics rather than a flood of noisy data, I'd love to know which specific metrics have proven to be the most reliable predictors in your own self-tracking experiments?
My privacy promise:
- Your data won't be looked at manually; it only flows directly into the training script.
- It will be completely deleted after training.
- I want absolutely NO private data like names or raw GPS coordinates.
Here’s how you can safely anonymize your CSV data in 30 seconds, just copy this prompt into ChatGPT/Claude. The AI will write a local Python script for you that deletes all sensitive data before you send me anything:
"Write a Python script for me that strictly locally anonymizes a CSV containing my tracking data. Delete all columns with identifying information (ask me first which ones those are).
For the GPS columns (Lat/Lon), build a selection menu into the script that offers me two options for anonymization:
Option 1 (Distance): Create a variable HOME_LOCATION = (lat, lon). Replace the original GPS data with a new column distance_from_home_meters.
Option 2 (Offline Context Categories): The script must use absolutely NO web APIs at runtime to guarantee maximum privacy. Instead, use a local offline map (e.g., via osmnx, pyrosm, or a downloaded .osm.pbf file) to convert the coordinates into general environment categories like 'Forest', 'Residential Area', 'Commercial Area', etc.
- Explain to me in the script (as print output or a comment) how I can download and integrate the local map of my region beforehand.
- Create a variable HOME_LOCATION = (lat, lon). If a GPS point lies within a 50-meter radius of this location, strictly set the category to 'My Home'.
- If a GPS point lies outside the downloaded offline map (e.g., because I was on vacation and didn't download that region), set the value for this category to NaN.
In both cases, the raw coordinates (Lat/Lon) must be completely deleted from the dataset after the conversion. Save the result as a new CSV at the end."
Feel free to drop your anonymized dataset here: https://driveuploader.com/upload/dnstx3Wi4U/
Thank you!