Understanding your Data: Statistical Analysis

Library Student Team
4 min readJan 30, 2024

Getting started with data analysis for your dissertation or final year project can seem daunting, especially if you thought you’d left maths and statistics behind in school - but don’t worry, we’re here to help! In order to choose the right statistical testing procedure, you need to really understand what your data means and what you’re trying to show. Once you’ve worked that out, doing the stats is the easy bit! Have a read through our step-by-step guide and if you’re still stuck, visit our webpage on Specialist Library Support for more resources.

Hands typing on laptop displaying various charts
Photo Credit: Ruthson Zimmerman, March 22 2019

Step 1: Dependent vs Independent Variables

Make sure you’re clear on the definitions of ‘dependent variable’ and ‘independent variable’.

A dependent variable is the thing that you measure / what is affected during your experiment.

An independent variable is the thing that you change or control during your experiment.

For each research question, make it clear in your notes which variables are dependent and which are independent as you’ll need to know how many of each you have.

Here’s an example of a research question: “Does social media usage differ between adolescent boys and girls?”

For this research question, the dependent variable is social media usage, and the independent variable is gender split into two groups.

Here’s another example of a research question: “Does lecture attendance, time spent revising and exam anxiety predict exam performance?”

For this research question, the dependent variable is exam performance, and the independent variables are lecture attendance, time spent revising and exam anxiety.

Step 2: What type of data do you have?

This will affect what statistical tests are available to you, as some can only be used for categorical data whereas others can be used for continuous data — here are some definitions of data types.

Nominal data: Uses categories e.g. gender, age, height

Ordinal data: Ranked categories e.g. strongly agree, agree, neutral, disagree, strongly disagree

Interval data: Continuous data that can be placed on a scale with equal distances e.g. test scores

Ratio data: Similar to interval data but 0 has meaning e.g. time, distance (0s = no time has elapsed)

Hands over pile of notes and smartphone, highlighting information
Photo Credits: Firmbee, May 29 2015

Step 3: Is your data parametric?

This means that you have no significant outliers, and that your dependent variable is normally distributed with equal variances. You can check for these by looking at the kurtosis, skewness and variance in the descriptive statistics and by looking at the graphs - do they roughly look like a bell-shaped curve? You can also use tests to check these assumptions - use both ‘Shapiro-Wilk test of normality’ and ‘Levene’s test for homogeneity of variances’ to check whether your data is parametric. If your data fails these assumptions, that’s fine, but it means you’ll need to use the non-parametric version of the tests. Non-parametric data doesn’t have a normal distribution and doesn’t have homogeneity of variance.

Also bear in mind that parametric tests should only be used on interval/ratio data (if the assumptions in step 3 are also met), whereas non-parametric tests should be used on nominal/ordinal data.

Step 4: What are you trying to show?

Before you can begin analysing your data, you need to have a think about what you’re trying to look for. Are you measuring group differences or are you looking for the relationship between variables? Use your research questions to decide - breakdown the key words to work out exactly what you’re investigating. For example, the research question “Does social media usage differ between adolescent boys and girls?” is examining social media usage in group 1 (boys) vs group 2 (girls), therefore it is looking at group differences. Whereas the research question “Does lecture attendance, time spent revising and exam anxiety predict exam performance?” is investigating the relationship between 3 predictors (lecture attendance, time spent revising and exam anxiety) and 1 outcome variable (exam performance).

Step 5: What type of experimental design did you use?

A between-subject design involves data that is independent (or unpaired). This is where the sets of data are from different groups of people. If the same individuals are in multiple conditions but at different points in time, it is described as within-subjects design and involves repeated measures (or paired). For example, are you investigating the effect of a drug on people with one condition vs people with another condition (unpaired), or are you investigating the effect of a drug on one group of people over different times, or at different dosages (paired). It is also possible to have a mixed design where both within-subjects and between-subject factors are measured.

Flowchart depicting which statistical test to use based on type of question

Once you’ve worked through the above steps and have decided exactly what you’re looking for, then you can begin to choose which test to use. Use the flow-chart above to decide. There are some really great YouTube tutorials demonstrating how to perform these tests on SPSS or GraphPad, as well as some handy guides that are specific to the software, so have a google for help on how to perform your chosen tests.

--

--