TLDR: I run a correlation analysis between all of my daily self-tracked metrics. 60% of the statistically significant results are “obvious” relationships, while the rest are surprising, meaningful, and actionable.
My journey with self-tracking
I’ve been keeping a daily log of the day’s activities for approximately the past 4 years. I track my fitness, diet, sleep, mindfulness, mood, goals, good habits, bad habits, time spent, and personality traits.
For a full dive into how I started quantifying my life, the categories of data that I track, and the reasons why I do it, please check out my most recent post.
Why I analyze my quantified self data
I don’t collect data just for the sake of collecting data, trust me I have more interesting hobbies than that. I collect this data to discover meaningful relationships between the aspects of my life that I care about, and learn new things about myself that I can use for future decisions, both big and small.
Besides uncovering insights for future decision-making, it’s just fun. I’ve always been a data-driven person, and have enjoyed visualizing data and finding the hidden meaning in it. It’s one thing to experience that feeling on a data set, but a completely another one to have the data set be something that has such direct implications on your day-to-day.
Seriously, if you could answer questions like “What makes me happy?” or “When do I feel my most healthy?” in an objective and unbiased way, wouldn’t you want to do that?
The analysis that I do on my personal data
- Stats: Basic computations like “Going to the gym 4 times a week on average”
- Trends: Movement of certain metrics over time, like “Coffee consumption down 20% over the past year”
- Correlations: Calculating relationships between metrics, like “On days you work out, your mood in the evening tends to be higher”
- Timespan Insights: Looking at a time range to find what makes it unique and answer questions like “How has the COVID lockdown changed my day-to-day?”
I’m going to focus on correlations today, mostly because I find them fascinating & insightful, and they make some pretty charts that I’m excited to share with you.
Poke around my data
I have an interactive visualization on my data blog where you can see the analysis that I do on my data. Here’s a screenshot (since I can’t embed the whole thing here), but if you’re interested you can go check it out here.
Seriously, the data vis is pretty fun. Click here to go look at the analysis of about 30 health, fitness, diet, and mood data sources.
The data in the charts above is probably much more fascinating to me than to you, as it helps tell the story of the past couple of years of my life.
With that in mind, I’ve spent a lot of time diving into all of these correlations. Here are the top conclusions that I’ve come around to realizing.
The metrics cluster up
I’ve found “families” of metrics that tend to move together and often fall in the same category of “good” or “bad” metrics. For example, days/weeks when I don’t eat a healthy diet, I’m much more likely to also be sleeping poorly, working out less, drinking more alcohol, and feeling worse about myself, all at the same time! Likewise, some of my good habits like doing yoga, meditating, flossing, eating healthy, are most likely to be recorded on the same day.
What can I say, I prefer to live life at either end of the extreme. It’s not as fun in the middle!
Jokes aside, that’s actually quite a valuable insight. Since recognizing this, I’ve gotten more serious about my morning routine, making sure I start the day off on the right foot.
Not always clear what causes what
Because of the above insight on metric clusters, I’m interested to see if there’s a particular metric that could be the trigger for a “good” or “bad” cluster of data for the day. My current analysis just looks for correlation, but not causation, so I can’t determine what those triggers might be… yet…
I’ll be diving into that in a future analysis. If you’re reading this and have some resources for me to look into to do a causal analysis, please reach out!
Obvious vs surprising relationships
I plugged in all of my data (77 metrics at the time) into my correlation notebook and did the correlation analysis on the daily timeframe. Even when looking at the last year of data, and filtering for a p-value of <= 0.05, I had 197 highly statistically significant relationships.
Wow, that’s a lot! I decided to cut it down to the top 50 from that set and analyze each one.
What I found was that some of them seemed “obvious” because they were very clearly related (like going to the gym and exercise time for the day, or time awake in the morning and total time asleep that night), but there is a fair share of them that surprised me.
30 of the top 50 relationships were directly linked to one another, while the other 20 of 50 relationships were insightful, and showed me relationships between my data that I wouldn’t have assumed without this type of analysis.
The insightful relationships
The “obvious” relationships wouldn’t be very fun to talk about, but here are the top 20 insightful relationships that I found:
- When Anxiety is up, Confidence is down
- When Sleep quality is up, Mood in the morning is up
- When Athleticism is up, Confidence is up
- When Drank Alcohol is up, Spent time with [unnamed friend] is up
- When Kind to people is up, Confidence is up
- When Ate Veggies is up, Worked out is up
- When Spent time with the girlfriend is up, Drank Alcohol is down
- When Drank Alcohol is up, Consumed Dairy is up
- When Flossed Teeth is up, Early Night is up
- When Used marijuana is up, Audiobook is up
- When Used marijuana is up, Worked out is up
- When Had Sex is up, Sleep quality is up
- When Worked on side projects is up, Felt Humble is up
- When Ate Veggies is up, Worked out is up
- When Meditated is up, Felt Brave is up
- When Work-life balance is up, Consumed Bread is up
- When Spent time with the girlfriend is up, Coffee Consumption is up
- When Spent time with the girlfriend is up, Ate Veggies is up
- When Flossed Teeth is up, Money spent on Alcohol is down
- When Drank Alcohol is up, Did Yoga is down
Applications of correlation analysis
As you can see from the relationships above, there’s a mixed bag of insights.
Some are logical, some confirm the consequences of good or bad habits, some point to habits that are proven to make me feel a certain way. And some, they are just plain funny (don’t worry unnamed friend, we can still hang out even though you’re a bad influence on me. You know who you are).
For those that are into biohacking or experimentation, this kind of analysis can be quite revealing to see the effects of those experiments. One QS community member even cured himself of Crohn's disease by experimenting with various supplements, nutrition, and fitness regiments, and correlating those experiments with his symptoms.
What’s next for me
This analysis has truly been a ton of fun to do, and it’s inspired me to keep on digging into my data. Here’s my gameplan for the next couple of analyses and write-ups.
- Integrated Data: This whole data set is manually tracked, but I’m going to be extracting some other interesting Fitness, Time, Music, and Productivity data to add to the mix.
- Before/After: These correlations would only compare the current day/week/month/year against the same timeframe in other metrics. I’m planning on expanding that to answer questions like — “When I make this decision today, what will be the effect tomorrow?”
- Clustering: As I mentioned, some of these metrics appear to move together. I’m going to be making a fancy cluster visualization to see the relationships all in one place.
- Causal: Instead of just looking for correlations, I’m going to see if any metrics might be the originating factor for other behaviors.
- Aggregations: I track 20 different metrics that are diet-related, but individually those don’t tell me how my diet is doing overall. I’ll be experimenting with a way to normalize and combine different kinds of data streams into one easy to digest metric.
- Timespan Insights: The analysis we did here today was one where we zoomed out and looked at relationships. The timespan insight would be the opposite, where we zoom in on a period of time and visualize discrepancies and differences between the rest of the data.
If you enjoyed this post and are interested in future writings and online visualization tools, feel free to drop your email here and I’ll let you know when they are up.
Have data of your own that you would like to visualize?
If you have your own self-tracked data (in a CSV format) that you’d like to do similar correlation analysis for, you can do so on this site! Check out this data exploration notebook for a fun and insightful time!