In this post
There are many different methods that can be used when collecting data. Firstly, we must have a group of people or objects that we can observe and gain information from; this is called the population. The population in a factory when testing things that are made would be the total number of products that are created. The population of an election to determine a party leader would be all of the people that vote. Clearly, the size of a population can range from being very small to being very large, and when we are faced with a very large population things can get quite tricky! Because of this, statisticians may choose a sample of the entire population. The sample size and how it is chosen will depend on many different factors.
Going back to the idea of an election, clearly it would take a huge length of time to ask every member of a country individually when we are making predictions, therefore a sample will be taken. However, the people for a sample cannot be from the same area or within the same age group as they may all have very similar opinions if this is done. Therefore, we can choose many different ways to find a sample but one of the best is a random sample, which simply picks the amount needed from the population with absolutely no preferences.
Using statistics
Statistics are only as good as the way in which we collect them. This means that it is very important to collect them properly to make sure that the conclusions which we draw are not based on false evidence. One of the best ways to get a good set of statistics is to make the sample as large as possible. This will then give us a greater accuracy and the statistics that we have will be much closer to the truth for the entire population.
Say that we wanted to measure the colours of different cars that travel down a road in one day. Obviously, it would take us a long time to sit outside and count every single car that passes by (which would be the population) so we would look to take a sample. If we take a sample of just one car we would have a very poor statistic. The one car would be a certain colour, say blue, and we would be forced to say that 100% of the cars that passed were blue. Of course, it would be very unlikely that every car that passed was blue! Therefore, we need to increase the sample size and use this to give us an idea of the most popular colours for cars in that area.
Raw data
Raw data is basically information that has been collected and not organised in any way at all. The data is usually still in the order that events happened and may look to be very random at first sight. For example, if we were looking at the colours of cars on a road and we had the raw data for the first ten it may look something like this:
Red, blue, blue, black, blue, yellow, white, red, green, black
There is no real pattern in this data and the colours are in the sequence that they came down the road: not ordered in any way at all.