Definitions

We'll put our definitions here along with __**page numbers**__ from the text if appropriate.

Measurement (quantitative)** - a variable that takes on numeric values **p. 13** Nouns of Statistics- **mean, median, mode, range and outlier of the data collected is considered to be the nouns of statistics. Process of Statistical Investigation-** techniques and tools that aid in the collection, organization, summarization, and interpretation of a collection of information referred to as data. **p. 1** **Stacked Bar Graph** - shows data in a relative or "part-to-whole" perspective where each individual segment represents a particular part and all segments combined make up the whole. **pg 18** Variability- the spread of the data; includes the range of the data, any groups or clusters of data, any gaps in the data, a center, or a most common value p. 42-43** Negative association** - on a scatter plot, an association is negative IF as one variable increases, the other one decreases. The data would have a downward, right slope. **p. 74 Strength** - the strength of an association is determined by how closely the data fits the "invisible" trend line. If the data is tight to the trend line, it is **strong**. If the data is loosely gathered around the trend line, it is **moderate**. If the data is scattered and far away from the trend line (showing no pattern), it is **weak**. **p. 74** **Data -** outcomes of our variables; information you collect can be numerical or categorical Upper Quartile -** The median of the upper half of the data. **pg. 116** Lower Extreme -** The minimum, or smallest, value in a data set. **pg.118 Upper Extreme -** The maximum, or largest, value in a data set. **pg. 118** I put these here as these ideas will be studied in more detail later in the term. Mean Deviation -** the mean of the absolute values of the deviation scores (or distances) from the mean. **pg 129 Standard Deviation -** squared deviations from the mean are used to compute another measure of dispersion. instead of finding the absolute value of the differences, the differences are squared. Make sure that you use the square root of the variances to measure the data spread. This is also referred to as the population standard deviation. **pg 130 Variance -** the mean average of the squared deviations for the data set. This is before you use the square root to find the standard deviation. **pg 130 Probability** - a method of measuring the likelihood that certain events will or will not occur. **p. 140** Therefore, probability deals with events tha we can anticipate but that we may never see. **p. 140 Likelihood Continuum** - a scale that ranges from Impossible to Certain, where events can be placed to determine probability. **p. 140 Representativeness** - the tendency to infer whole population traits to particular outcomes (assuming that the two guys in our class like to play combat games and placing that event on the low end of the continuum). **p. 141 Availability** - the tendency to relate to our own personal experiences as we estimate likelihoods (because I like diet soda, I think most women do, too - so I would place that event toward the high end of the continuum). **p. 141 Random** - applied to situations that may have a known number of possible outcomes, but the actual outcome is uncertain. **p. 144 Experiment** - is any activity where an observation or measurement can be made and recorded as data. **p. 150** Outcome set/sample space** - is all the possible outcomes for the experiment. For example, the outcome set for the coin tossing experiment is the set {heads, tails}. **p.** **150 Probability of an outcome** - is the number assigned to measure the likelihood of an outcome and it can be assigned by observing the relative frequency of occurrence of the outcome accrosss many trials of the experiment. For example, if we were flipping a coin, the probability of obtaining a head, often abbreviated P(Heads), should be 1/2 because we expect the outcome of heads in about half of the coin flips we make. p. 150 Theoretical Probability -** As the number of conducted trials of an experiment increases, the relative frequency associated with a particular outcome tends to approach and stabilize at a specific value. This specific value is called the **theoretical probability** of that outcome. When mathematicians speak about the "probability of an outcome," they are usually referring to the theoretical probability associated with that outcome. **pg. 157** Relative frequency of occurrence for each of the outcomes over total number of outcomes. Event -** a collection of one or more outcomes; any subset of the outcome set. **pg. 158 Two-stage experiment-** is an experiment that involves two simple experiments that are carried out concurrently or in succession. We usually uses a probability tree diagram or an area model to represent this type of experiment. **pg. 184 Probability tree diagram-**shows the outcome set for conducting a trial and the probabilities associated with each of the possible outcomes. **pg. 184 Area model-**an alternative to the use of a tree-diagram where you begin with a square whose area is defined to be 1 square unit. **pg. 191**
 * Variable-** is any characteristic that can be assigned a number or category. **pg. 13**
 * Numerical Data**- Data that can be represented by numbers. Ex: Age, distance, amounts, etc. **p.13**
 * Categorical Data-** Data that would be represented in terms of categories. Ex: Color, food, types of things, etc. **p.13
 * Count-** A third type of variable that could be thought of as a subset of the measurement variable. The values would be numeric, but specifically **whole numbers**. **p.13**
 * Verbs of Statistics- gathering, organizing, analyzing and interpreting data collected. **
 * Binary Categorical Data-** There are only two possible response categories (Ex. yes or no, male or female). This data is usually displayed using a back-to-back bar graph. **p.16**
 * Read Question -** requires the respondent to read information from the table to determine a solution. Can be used to focus student's attention on the structure and organization of the data presented in the summarizing display. (read data right off of graph). **pg 32**
 * Derive Question -** some type of computation involving information read from the table must be performed in order to determine a solution. requires students to use mathematical concepts and skills to "read between the data". **pg 33**
 * Interpret Question -** requires an extension, prediction or inference to "read beyond the data". **pg 33
 * Dot Plot (line plot)** is a simple way to visually display the distribution of a small set of measurment data. **pg 41**
 * Grouped Frequency Distribution** is an alternative method for organizing large data sets by condensing the data using a grouping strategy. In this method, each data value is assigned to an interval or category and then presented in a condensed form. **pg 55**
 * Histogram -** used to organize large numerical data sets. A graphical display of the information found in a grouped frequency table. **pp. 55**
 * Stem-and-Leaf Plot -** visual display of the distributon of the elements of a set of measurement data **pp. 43 (organize numerical data)**
 * Conditional-** information, that is, information about one variable knowing or given certain conditions. **pg. 67**
 * Contingency Tables-** these are used to show relationships between two categorical variables . **p.67**
 * Scatter Plot -** a useful display for visualizing and examining relationships; consists of 2 axes, where one of the variables under consideration is represented on the horizontal axis and the other on the vertical axis, with the pairs of data appearing as points on the resulting coordinate plane. **pg. 72** The horizontal or x axis is used for predictive or independent variables, the verticle or y axis is used for predicted or dependent variable. **pg. 74.**
 * Positive association** - on a scatter plot, an association is positive IF as one variable increases, the other one increases also. The data would have an upward, right slope. **p. 74
 * Frequency -** how many times an outcome occurs--Frequency is NOT the data!!
 * Lower Quartile -** The median of the lower half of the data. **pg. 116
 * Box-and-Whisker Plot (aka Box Plot) -** A summarizing display that defines 4 intervals, where each interval contains roughly one-quarter of the values of the data set. This display can be constructed with 5 values: the minimum and maximum data values, the median, the median of the upper quartile, and the median of the lower quartile. **pgs.117-118**
 * Interquartile Range -** The difference between the upper quartile and the lower quartile. **pg.118
 * Marginal Data - information that can be found in the margins of a table. To find a fraction that is marginal data you take a piece of information from the margin and divide it by the total amount (people, m&m's) surveyed.**
 * Line chart/Line graph-** is a commonly used graphical display. Like the scatter plot, the line graph provides a visual of the relationship between two paired variables but, in the case of the line graph, the predictor is always a measure of time.
 * Mean-** The average of all the numbers in a list of numerical data. This can be calculated by finding the sum of all of the numbers and then dividing by how many numbers are in the set of data. **pg. 96**
 * Median-** The value of the middle term or the midpoint of the two middle terms of a data set when the data are arranged in increasing (or decreasing) order. **pg. 103**
 * Mode**- The data element that occurs most often in a set of data. It applies to all categorical variables but has limited usefulness as a measure of central tendency for measurement variables. **pg. 105**
 * Range-** The difference between the minimum and maximum values in a data set. For example, if the maximum value was 23 and the minimum value was 13, the range for that data set would be 10. **pg. 112**
 * Outlier-** The data value that is widely separated from the rest of the data. For example, if your data set was 1, 1, 2, 2, 3, 3, 4, 37, the outlier would be 37. Because the term "outlier" is frequently interchanged with the term "extreme value," statisticians often use the following rule to determine if an extreme value is, in fact, an outlier: An **outlier** is any number in the data set that is more than 1.5 quartile ranges above the upper quartile or more than 1.5 quartile ranges below the lower quartile. **pg.124
 * Outcome** - is one of the possible results form the experiment. **p. 150
 * Experimental Probabilities -** probabilities of the outcomes associated with an experiment and are estimated by computing the relative frequency of occurrence for each of the outcomes over many trials of the experiment. **pg. 157
 * Probability Table -** A table that displays the possible outcomes for a given experiment as well as the probability of each of the possible outcomes. Each of the probability values is a non-negative number between 0 and 1 inclusive, and the sum of the probabilities in the table is 1. **pg. 158