tags : Book,Math

Chapter 1: Statistics!

Some basic definitions, this chapter is not really math heavy but drives home the fact that if you get the samples wrong, you’re fucked. The exercises in this chapter were easy so didn’t put the effort to document that, also this book has solutions to all problems.

Sample

It is usually a subgroup of the population, but in a census, the whole population is the sample. The sample size is usually denoted as n and the population size as N and the sample size is always a definite number.

Statistics

Descriptive Statistics

Summarizing and presenting the data that was measured, A statistic is a statement of Descriptive stats and is a numerical summary of a sample.

Inferential Statistics

Making statements about the population based on measurements of a smaller sample. A parameter is a statement of Inferential stats and a numerical summary of a population.

Good and bad samples

A good sample is a smaller group that is representative of the population, all valid samples are chosen through probability means. Following are some ways to collect samples, ordered by preference:

Random Samples

Random doesn’t mean unplanned; even collecting random samples needs proper planning. For this, you need a list and some way to select random subjects from the list for your sample.

Systematic Samples

Best explained through an example, Standing outside the grocery store all day, you survey every 40th person. That is a systematic sample with k=40.

Cluster Samples

This one makes a big assumption, that the individuals in each cluster are representative of the whole population. A cluster sample cannot be analyzed in all the same ways as random or systematic samples. You subdivide the population into a large number of subunits(clusters) and then construct random samples from the clusters.

Stratified Samples

This needs analyzing what data you’ll be working with, if you can identify subgroups(strata) that have something in common related to what you’re trying to study, you want to ensure that you have the same mix of those groups as the population. Eg. 45% Girls and 55% Boys in a school, If you’re taking samples of 400, it should be 45% x 400 and 55% x 400, each mini sample should be constructed using other random sampling methods.

Census

A census sample contains every member of the population.

Quantitative and Qualitative

Quantitative

Numeric, sometimes it’s hard to differentiate between discrete and contd. but it’s important to identify the difference when you need to graph them

  • Discrete: how many
  • Contd. : how much

Qualitative

non-numeric

Errors

In stats, errors are not like programing errors but are the discrepancy between your findings and reality.

Sampling error

These are part of the sampling process, cannot be eliminated can be minimized by increasing the sample size

Non-sampling error

When you mess up in collecting data/analyzing data etc.

Variables

Variables are the question and data points are the answers. Eg. birth weight is the variable and 5Kg will be the data point. sometimes variable type is also called data type.

In either observational study or experimental study there are two variables:

  • Explanatory variables/factors: Suspected causes
  • Response variables: Suspected effects/results

Explanatory variables that make the results/response variables questionable:

  • Lurking variables: A hidden variable that isn’t measured but affects the outcome. A careful randomized experimental study can get rid of these.

  • Confounding variables: You know what they are, but you cannot untangle their effect from what you actually wanted. Try to rule these out if possible before experimentation.

Gathering data

Observational study

A retrospective study. Lurking variables are the reason an obs study can never establish cause/causation, no matter how strong of an association do you find.

Experiment

Here we can manipulate the explanatory variables, each level of the assigned explanatory variable is known as a treatment. If we do have a randomized experiment, we can prove causation. Eg. Doing a study by giving your 3 children different toys, the explanatory variable is toy, and treatments are the different toys.

Basics of designing experiments

Completely Randomized Design

Randomly assign members to the various treatment groups, this is called randomization

Randomized Block Design

When there is a confounding variable that you can detect, before conducting the experiment divide subjects into blocks according to that variable, then randomize within each block. This variable is called the blocking variable. i.e confounding variable became the blocking variable here.

“Block what you can, randomize what you cannot”

Matched pairs

Type of randomized block design where each block contains two identical subjects without any fear of lurking variables. (Eg. twins) another special type is matching experimental results with itself (eg. delta)

Control groups and placebo

When doing experiments involving the placebo effect, the group that gets the placebo is called the control group

Rounding Numbers

It was surprising to me that I am was on the wrong side when coming to rounding numbers! Most important thing is to round in one step and to round in the last step of the calculation.

  • Draw a line mentally at the point where you want to round.
  • If the number next to the line is 0-4, throw away everything to the right of the line.
  • If the number next to the line is 5-9, raise the digit to the left by one and throw away everything to the right of the line.

Eg. 1.2|4768 ~ 1.2 but 1.2|7432 ~ 1.3