tags : Machine Learning, Statistics

FAQ

“correlation does not implies causation” meme

  • “correlation does not implies causation” : Yes, commonly said
  • But causation also does not imply correlation too. There can be hidden variables.

What are associations?

  • correlation is a limited measure of association
    • variables can be associated but have no correlation
  • association are bi-directional: associations between variables run in both direction

Basics

What is Causal Interference

  • What happens when some intervention is done
  • Can only be done if we have a causal model

Causal Prediction

  • Different from normal prediction
  • Predicting the effect
    • Being able to predict the consequences of an intervention.
    • Eg. Movement of the trees and wind are statically associated. But nothing in the data tells you that wind causes the trees to move.
  • What if I do this?

Causal Imputation

  • Knowing the cause
    • Being able to construct some unobserved counterfactual outcome
  • What if I had done something else?
  • Description (of population)
    • The sample is caused by things
  • Design (of research project)
    • things need to be drawn w a causal logic which will help us design/calculate around them
    • thinking about why sample differs from the population

Models

DAGs

  • Example
    • Here A influences the treatment X
    • Here B influences the outcome Y
    • Here C influences both X and Y (Confound, we’d want to control it)
  • We can ask multiple questions to this model
  • DAGs are intuition pumps: get head out of data, into science
  • Gives you a strategy for which control variables you need to play with

GOLEMS

statistical models to produce scientific insight

  • They require additional scientific (causal) models
  • The reasons are not found in the data, but rather in the causes of the data
    • i.e We should not try to infer the reason in the data.
  • No causes in(the data), No causes out(from the data)

Decision Tree/Flowchart

  • We use it for selecting an appropriate statistical procedure.

  • But this kind of using a decision tree to select the statistical procedure is not much helpful for a research scientist because this quite limiting
  • Each of these procedures can have bayesian and frequestist version of them
  • Mostly useful in industrial testing