# Collecting Data

It is important that data are collected for a purpose. Data is found as either:

- Primary data – data you collect yourself using a survey or experiment; or
- Secondary data – data that is already collected for you. You can find secondary data in books or on the internet.

For a sample to be statistically significant there needs to be a sample size of at least 30.

# Representing data and interpretation

It is important that graphs and diagrams are drawn on the appropriate paper:

- bar charts, scatter diagrams and line graphs on squared or graph paper.
- pie charts on plain paper.

# Guidelines for Constructing Graphs and Charts

When creating graphs and charts you should always:

- Use a sharp pencil and ruler.
- Label both axes and give a title.
- Use independent variable on
*x*-axis, and dependent variable on the*y-*axis, e.g. If graphing temperature of a cooling liquid, time should go on the*x*-axis and temperature on the*y*-axis. (The temperature of the liquid is dependent on the time of the reading.) - Label lines not spaces, unless a bar-chart with discrete data.
- Use equally spaced intervals.
- Use convenient and consistent scales.
- Mark points by a small cross not a dot.
- Draw graphs on squared or graph paper.
- Draw graphs of a sensible size (many students tend to make them too small).
- If axes do not start from zero, a break represented by a zigzag line should be shown on the axis.

## Different types of graphs

### Bar Charts

These are the diagrams most frequently used in areas of the curriculum other than mathematics. The way in which the graph is drawn depends on the type of data to be processed.

**Non-numeric and discrete data**

Graphs should be drawn with **gaps between the bars **if the data categories are not numerical (colours, makes of car, names of pop star, etc.) or are discrete data (numeric data but can only take a particular value – shoe size, year group, etc.) In cases where there are gaps between the bars, the horizontal axis will be labelled beneath the columns. All bars and gaps should be of equal width.

The number labels on the vertical axis should be on the lines.

When the data are discrete the numbers on the horizontal scale can be placed between the lines.

Bar chart for discrete, non-numeric data.

Bar chart for discrete, numeric data.

**Continuous data**

Data is described as continuous if all values can exist, (e.g. height and weight are continuous data as potentially any value could be measured). If data collected are continuous, there should be **no gaps between the bars**. This data is usually represented in groups so the scale along the x axis is continuous.

Where the data are continuous, the numbers on the horizontal scale should be placed on the line.

In Microsoft Excel, the default is to have gaps between the bars. To change this:

- Right click on one of the bars.
- Select
**Format Data Series** - Change the
**Gap Width**from 150% to 0%. - Click
**Close**.

### Line Graphs

Line graphs should only be used with data in which the order in which the categories are written is significant. Points are joined if the graph shows a trend or when the data values between the plotted points make sense to be included. For example the measure of a patient’s temperature at regular intervals shows a pattern but not a definitive value.

**Plotting Points**

When drawing a diagram on which points have to be plotted, remember that the numbers written on the axes must be on the lines not in the spaces. Points should be plotted with crosses and not dots.

### Pie Charts

These are typically used to compare categories as fractions of the whole data. The way in which you are expected to work out angles for a pie chart will depend on the complexity of the question. If the numbers involved are simple it will be possible to calculate simple fractions of 360°.

However, with more difficult numbers which do not readily convert to a simple fraction you should first work out the share of 360° to be allocated to **one **item and then multiply this by its frequency.

e.g. 180 pupils were asked their favourite core subject.

Each pupil has 360/180 = 2**°**of the pie chart. Pie charts should have each sector labelled and have an overall title. Alternatively a key could be provided.

### Scatter Diagrams

These are typically used to see if one measurement varies with another measurement. Each measurement is plotted on its own axis i.e. one on the *x* axis and one on the* y* axis. If possible a ‘line of best fit’ should be drawn. Looking at relationships, scatter diagrams tell us whether there is a **correlation **(link) between the two data sets. The line of best fit should go through the middle of the data, passing as close to as many points as possible. This would allow us to make estimates for certain cases.

The line of best fit is not added automatically in Excel. It can be added by picking **Layout 3** from the Design menu.

The ‘line of best fit’ does not have to go through (0,0) and should be one straight line.

A positive correlation between the two variables occurs when one variable increases as the other increases. However you need to ensure that there is a reasonable connection between the two. For example ice cream sales and temperature. Plotting use of mobile phones against cost of houses will give two increasing sets of data but are they connected? The two variables (measurements) should relate to the same ‘item’ – ice cream sales and the temperature on that day.

Negative correlation depicts one variable increasing as the other decreases; no correlation comes from points that have no linear relationship.