



General questions/prompts:
-
What do you think this graph shows us?
-
What does this graph make you wonder?
-
Does any part of this graph surprise you?
-
How would you describe the relationship between the different variables involved?
-
What title would you give this graph?
-
Can you think of any further questions or future investigations? How would you go about getting the data you need for answers?
Specific questions/prompts:
A bit of fun with some of our favorite Spurious Correlations from https://www.tylervigen.com/spurious-correlations.
These absurd examples illustrate that completely unrelated phenomena can follow eerily close patterns. Our instinct is to wonder if the two variables really are connected in some way - a change in one variable causes a change in the other variable.
Using good experimental design when getting the data helps us to not get mislead. Tyler explains:
"I have 25,153 variables in my database. I compare all these variables against each other to find ones that randomly match up. That's 632,673,409 correlation calculations! This is called “data dredging.” Instead of starting with a hypothesis and testing it, I instead abused the data to see what correlations shake out. It’s a dangerous way to go about analysis, because any sufficiently large dataset will yield strong correlations completely at random."
Most importantly, even when a correlation between two variables appears very convincing, even obvious, it does not prove that one causes the other. Running a controlled experiment is the best way to prove causation - but that isn't always possible. Imagine giving humans a dangerous chemical that you suspect is toxic, just to prove that it is dangerous! In cases like these, good data handling and statistical methods help build a strong argument when trying to show causation.