Developing University Students’ Statistical Reasoning via a Research-informed Course

Statistical reasoning (SR) is an essential skill for students in studying statistics. This study aims to develop SR for university students through an appropriately designed course in statistics. Employing a quasi-experimental design, this study evaluated students' SR changes as they participated in a research-informed statistics course, designed to integrate statistical reasoning. The results showed that the course helps students improve their SR, with their scores increased by 23% from the pre-test to the post-test. Specifically, the SR scores on measures of central tendency of the experimental class increased by 8%; while the SR scores on the measures of variability of the experimental class increased by 34%. Suggestions about curriculum and teaching are included, such as offering opportunities for statistical investigations with real data; going deeper into building and understanding formulas; and training lecturers to teach statistics using approaches that are different from traditional teaching methods, as well as evaluating students' statistical learning with reliable tools.


INTRODUCTION
Statistics is an important content in the curriculum; however, learners encounter many difficulties when learning statistics, especially statistical reasoning (SR).Some research in psychology has shown the error of "outcomeoriented" thinking, which means using probability models to decide whether individual events will occur or not rather than observing a series of events (Konold, 1989).Later work by Konold and colleagues documented inconsistencies in students' reasoning when they answered the same assessment questions, suggesting that the context of a problem can impact on students' use (or not use) of reasoning or intuitive strategies (see Konold et al., 1997).Thus, people of all ages, even some experienced researchers, can make erroneous inferences about statistical ideas, and it is difficult to change such misconceptions.
Although SR is a necessary skill for students in, there is a scarcity of research on this topic in Vietnam.A study by Nguyen and Vu (2015) shows that high school students' statistical reasoning is still limited, especially regarding the ability to reason with data representation and data distribution.The reason is that teaching and learning statistics in high schools only focuses on computational processes but disregards reasoning.In addition, students have few opportunities to practice statistical survey, so they have difficulty applying statistical reasoning to solve problems arising from real life.Another study by Tran (2017) developed questions to probe statistical literacy, reasoning, and thinking of university students in medical and pharmaceutical programs in estimating confidence intervals to design better teaching and assessment activities.In 2021, Hoang also focused on training statistical literacy, reasoning and VIETNAM JOURNAL OF EDUCATION  70  thinking for high school students to meet the requirements of the 2018 Mathematics Education Program.Based on student's level of completion, the teacher can differentiate or evaluate their student's level of statistical awareness.Recently, research by Pham and Tran (2023) on developing statistical reasoning for high school students through practical problems shows that a statistics teaching activity can simultaneously develop all three elements of statistical literacy, reasoning and thinking.
In summary, the above studies mainly propose tasks for developing students' statistical reasoning.However, these studies have not yet used experimental or quasi-experimental methods to evaluate the effectiveness of their programs.In addition, most studies focus on high school students; a few on university students.Therefore, we conducted this study using a quasi-experimental method on university students to evaluate the changes in their statistical reasoning.

Statistical reasoning
Statistical reasoning (SR) is the way people reason with statistical ideas and make sense of statistical information.Statistical reasoning may involve connecting one concept to another, or it may connect ideas about statistics and probability.Statistical reasoning also means understanding and being able to explain statistical processes and statistical results (Garfield, 2002).Based on the definitions by Garfield andBen-Zvi (2008) anddelMas (2004), Sabbag (2017) summarized that SR evaluates students' ability to make connections between statistical concepts, create mental representations of statistical problems, and explain relationships among statistical concepts.SR often entails more than one statistical concept and requires students to make connections between them.

Developing students' statistical reasoning
Researchers have long been interested in teaching and learning statistics in undergraduate classrooms (e.g., Zieffler, 2006).Few studies among college students have examined specific activities or interventions; others explored the use of technological tools or teaching methods (e.g., Noll, 2007).Some statisticians who teach statistics have focused on studying student learning in their classes (e.g., Chance, 2002;Lee et al., 2002;Zeleke & Wachtel, 2002;Wild et al., 1997).Most of these studies involve the researcher's own classroom, which may be a single class or a combination of classes within the same school.

Statistical reasoning learning environment
Based on the review of relevant research, Ben-zvi et al. (2018) provided a framework including factors to consider when designing effective and active statistics learning environments that develop learners' statistical reasoning.These principles include: • Focus on developing understanding of statistical ideas and concepts rather than presenting statistics as tools and processes; • Use well-designed tasks to develop statistical reasoning, promoting student learning through collaboration, interaction, discussion and interesting problem solving; • Use real data to engage students in making and testing conjectures based on data analysis; • Integrate appropriate technology tools into statistics teaching to allow students to test conjectures, explore and analyze data, and develop statistical reasoning; • Promote statistical reasoning and exchange important statistical ideas in the classroom; • Use assessments to monitor students' statistical learning and to evaluate instructional plans and progress.

Using technology to develop statistical reasoning
Technology is an important tool in exploring data, performing statistical analysis, and helping learners visualize abstract concepts.Using technology in the statistics classroom can help learners spend more time on statistical reasoning by reducing the need to perform calculations or draw graphics.Students can undertake statistical projects and statistical investigations.In addition to data exploration, technology is also used to explore complex statistical ideas or processes through simulation.Using computer simulations can allow learners to visualize relationships.Computer software provides opportunities for students to learn about modeling, allowing students to build their own models to describe data and create explorable simulations (Pratt et al., 2011).Garfield and Franklin (2011) argue that teachers need to provide resources so that learners can use technology when learning statistics or provide output from computer software when learners cannot access technology, allowing students to learn how to interpret and analyze data, thereby drawing appropriate conclusions in a statistical context.2.2.3.Effects of teaching and learning statistics in groups Keeler and Steinhorst (1995), Giraud (1997) and Magel (1998) examined different cooperative learning methods in teaching statistics and found quite positive results.Keeler and Steinhorst (1995) found that when students worked in pairs, the results were often better, and more students continued to study compared to the previous semester.Giraud (1997) found that using collaborative groups to do exercises led to higher test results than lectures.Similarly, Magel (1998) found that implementing cooperative groups in a large class also improved test results compared to classes in the previous semester that did not use group work.
Meletiou and Lee (2002) organized their curriculum according to the Project -Activity -Collaborative Learning -Exercise model, emphasizing statistical reasoning, combined with investigation of conjectures and discovery of results using data.Students are assessed on their understanding at the beginning and end of the course.Increased understanding is observed through tasks that require statistical reasoning, for example whether a data set can be randomly drawn from a particular unit of inquiry.

Developing student reasoning in a statistics course
Zieffler ( 2006) focused on the development of students' reasoning about bivariate data through an introductory statistics course.He was interested in three research questions: (1) What is the nature, or pattern of change in students' development in reasoning about bivariate data?; (2) Is the sequencing of bivariate data within a course associated with changes in the pattern of change in students' reasoning about bivariate data?; and (3) Are changes in students' reasoning about the foundational concepts of distribution associated with changes in the pattern of change in students' reasoning about bivariate data?To measure change in students' reasoning, a scale from the Assessment Resource Tools for Improving Statistical Thinking (ARTIST) project was administered to 113 students in four sections of a course, four times during the semester.Students' reasoning about distributions was also assessed four times during the course using ten items from the Comprehensive Assessment of Outcomes in a First Statistics course (CAOS).Data were analyzed using linear mixed-effects model methodology.He found that most students had developed this type of reasoning before formally studying bivariate data.That result may be explained by that the course developing general statistical reasoning helps them reason well about the distribution of bivariate data before formally studying the topic.He suggests using similar, longer-term studies to model student growth throughout training.Scheaffer (2007) also suggested the same in reporting on statistics of mathematics education research.
In Slauson's study (2008), the impact of the hands-on experience compared to the traditional teaching method on the content of measures of variability (standard deviation, sample distribution, standard error, confidence interval and error range) was considered in an introductory statistics course at the undergraduate level.The hands-on experience method focuses on students using existing knowledge to make predictions, collecting, and analyzing data to test predictions, and then evaluating predictions based on the results.The study used the CAOS instrument to evaluate the results before and after the experiment.The results showed an improvement in statistical inference in terms of standard deviation in the experimental group, but not in the control group.However, there was no statistically significant improvement in reasoning about sampling distribution for the experimental group.Besides, there was no improvement in understanding standard errors in both groups and some students scored lower on this topic in the post-test than in the pre-test.Qualitative data analysis shows that it is important for students to understand the relationship between sampling distributions and measures of variability, as well as understand the connection between the concepts of probability and variability, to understand standard errors.
Thus, studies focusing on the teaching and learning of statistics at the undergraduate level continue to point out the difficulties that students encounter in learning and using statistics, as well as suggest some modest successes.These studies also illustrate problems faced by university statistics instructors such as how to incorporate collaborative and active learning in a large class or how to choose an effective software tool.

Research methods and objects
This study uses a quasi-experimental method to evaluate the effectiveness of a research-informed course in statistics on changing students' statistical reasoning.The research was conducted on 98 third-year students in two classes, 3A and 3B of the Mathematics Education major, Hue University's College of Education.At the time of the research, students had not enrolled in any statistics courses at the university level.The students' current statistical knowledge is what they have learned in high school.The two classes have similar ability and size (Class 3A: 48 students, Class 3B: 50 students), so Class 3B was randomly selected to become the control group, while Class 3A was chosen to be the experimental group.

Research materials
We used the CAOS 4, a product of the statistics education research group (delMas et al., 2006) from the ARTIST project, funded by the US National Science Foundation (NSF).The CAOS instrument is designed to measure students' basic reasoning about descriptive statistics, probability, bivariate data, and basic conclusions of statistics.This tool has been tested three times with 1470 students and has a Cronbach's alpha reliability coefficient of 0.82 (delMas et al., 2007).So, this tool is reliable for assessing important learning outcomes in statistics.
We only selected questions to test statistical reasoning, removing questions about statistical knowledge, and keeping the questions most specific to concepts taught in the course.We selected eight questions in CAOS 4 test and adapted appropriate contexts of the questions.Of those eight questions, we converted two questions into open questions that require students to explain the reasons for their choice.These are questions that ask students to explain the relationship between box plots and statistical choices (Question 2), and a given data series and their statistical choices (Question 5) (Figure 1).-Questions assessing SR related to characteristic of measures of central tendency include three questions (3, 6, and 8).These questions require students to recognize the connection between statistical objects: charts and data tables, histograms, and characteristic of measures of central tendency, such as mean, median or mode, thereby making the right choice for the situations raised in the question.-Questions assessing SR related to characteristics of measures of variability include five questions (1, 2, 4, 5 and 7).These questions require students to recognize connection between statistical objects: column chart, boxplot, data series, and characteristic of measures of variability, such as variance, standard deviation, thereby making the appropriate choice for the mentioned situations in the question.To evaluate students' SR regarding bivariate correlation relationships, we used a real-life problem as an introductory situation at the beginning of the lesson and reused it to test students' knowledge at the end of the third lesson (Figure 2).

VIETNAM JOURNAL OF EDUCATION
 73 

Figure 2. Situation on bivariate correlation relationships and linear regression equations 3.3. Research and data collection process
We conducted the research in five weeks of the first semester in 2023-2024 school year, with two hours per week.In the first week, we introduced the project and students individually took the pre-test on SR.During the next three weeks we sequentially taught lessons related to measures of central tendency, measures of variability, and correlation coefficients and linear regression.
We taught the control class in a traditional way, that is, stating definitions and concepts, providing calculation procedures and formulas for students, and stating the meaning of measures using Statistics and Applications textbook by Dang Hung Thang (1999).
For the experimental, the course was designed to develop students' SR, that is, informed by research such as Benzvi and colleagues (2018) and others (Tran & Tarr, 2018;Tran et al., 2023aTran et al., , 2023b)).In addition, we focused on using visual representations, relationships between two or more statistical objects, and statistical investigations to help students understand statistical concepts and their meanings in real-world situations.In addition, we regularly presented situations that use real data and require students to use software (e.g., Excel) and websites that support statistical calculations (e.g., https://grapher.nz/).We divided each experimental and control class into 12 groups of 4 to 5 students to improve the effectiveness of the class.
In the last week, students individually took the post-test on SR with the same eight questions as the pre-test.

Measures of variability
Correlation coefficient and linear regression

Data analysis
To analyze the data, we used quantitative and qualitative analysis methods.Quantitative data are the results of evaluating students' responses to questions.Each response was coded 0 if it was incorrect and 1 if correct.We calculated the average score for each group of questions and for the entire test and then compared the scores of two classes, 3A and 3B.In particular, we compared the pre-tests of the two classes, the pre-tests and post-test of each class, using t-test for two samples on SPSS software.
Qualitative data were collected through students' explanations for the two open questions.For example, in Question 5 asking students to choose a route for Giang, students must know how to use measures of variability to consider the problem, thereby making the correct choice, rural roads.When analyzing students' answers, we found four levels of increasing SR as follows: -Level 0: incorrect answer or no explanation; -Level 1: correct answer but no explanation or incorrect explanation; -Level 2: answer correctly, explain with outliers, variability, certainty, standard deviation, … but do not calculate the data as a proof; -Level 3: answer correctly and calculate measures of variability (mainly variance, standard deviation) as a proof.In addition, we analyzed the work of 12 groups in the experimental class on the tasks about correlation coefficient and linear regression to evaluate their level of development of SR on bi-variate relationships.

SR at the initial time of the two classes
To examine students' SR of Classes 3A and 3B in the first week, a t-test for two independent samples was performed based on the results of the SR pre-test.The result shows that there is no statistically significant difference in average scores between the two classes ( 0.773; 0.443 tp = − = ).In other words, these two classes had equivalent SR when starting the study (see Table 2).
Table 2 This result is also true when comparing the average SR scores of the two classes in each question group (see Table 3).Thus, the two classes had equivalent SR at the beginning of the study.

Students' SR changes from the pre-test to post-test Experimental class
 75  The mean SR scores on the measures of central tendency of the experimental class increased from 2.59 to 2.8 (maximum score of 3), while the average SR score on the measures of variability increased from 3.2 to 4.3 (maximum score of 5).The SR on the measures of variability of the experimental class increased more than the SR on the measures of central tendency.Specifically, regarding the SR on measures of central tendency, Class A increased by an average of 0.21 points, or 8%.Regarding the SR on the measure of variability, Class A increased by an average of 1.1 points, or 34%.In sum, the overall SR of Class A increased by 1.31 points, or 23%.Next, to evaluate the development of students' SR in the experimental class after participating in three lessons, we used a t-test for two samples to compare the results of the test before and after the experiment.The result shows that there was a statistically significant difference in the SR scores after the study ( 6.235; 0.000 tp == ).This improvement is also clearly shown in each group of questions corresponding to each lesson (Table 4).
Thus, with the two designed lessons, the students' SR on measures of central tendency and measures of variability has developed and this difference is statistically significant.

Control class
In the control class, 3B, the average SR score in each group of questions increased but not statistically significant.The t-test for two samples between the pre-test and post-test on the SR of the control class shows that there is no statistically significant growth ( 1.580; 0.121 tp == ) (see Table 5).Therefore, using traditional methods does not help students develop statistical reasoning, that is, they do not know how to connect two or more statistical concepts.
The above results show that while the control class had no change in SR, the experimental class had a statistically significant development in SR.The following qualitative analysis will help illustrate the changes in SR of the experimental class while participating in the research.

SR on measures of variability in open questions
We also analyzed responses to Question 5 in the SR test, requiring students to choose the correct answer and justify their choice.
The pre-test and post-test responses to Question 5 were coded as shown in  76  In addition to the increase in correct answers (from 69.6% to 77.3%), the quality of answers was also improved.Correct answers with incorrect explanations or no explanations were significantly reduced in the post-test.Students knew how to choose the correct answer with a reasonable explanation, although it was incomplete because they did not use data as a proof.In particular, 11.4% of students provided correct answers with accurate explanations.
The following provides an example of VTL student work.This student could not provide an answer in the pretest.However, when it came to the post-test, this student chose the correct answer, even though the standard deviation of the times taking two different paths had not been calculated.

Figure 3. VTL student's answer to Question 5
In another example, NQT chose the correct answer in the pre-test but did not provide an explanation.By the posttest, this student was able to provide the correct answer with a complete explanation.

. SR in correlation and regression
The responses to Q1 at the beginning of the lesson showed that six groups disagreed with the statement (50%), one group did not answer (8.3%), and five groups agreed with the statement (41.7%).The groups who disagreed with the statement mainly used outlier examples to argue that foot length has nothing to do with height.

Figure 5. Group 3 pre-lesson answer to Q1
The groups who agreed with the statement reason either through observation (without supporting data) or by dividing the data about foot length into groups and averaging the data in the groups.The results show that the groups' SR related to correlation and regression was still low at the beginning of the lesson.

Group 5 pre-lesson response to Q1
VIETNAM JOURNAL OF EDUCATION  77  At the end of the lesson, when answering Q1 again, all 12 groups agreed with the statement and created a linear regression line with a positive coefficient a to prove their assertion.The results show that with carefully designed and appropriate lesson plans, students' SR related to bivariate relationships is developed.All groups understand the meaning of correlation coefficients and regression equations, thereby being able to provide accurate answers and predictions for specific statistical situations.

CONCLUSION
The results show that the experimental class, designed to develop SR compared to calculation procedures and formulas approach has developed their ability and this result is statistically significant.Furthermore, the learning of the experimental class was much better than that of the control class.This result corroborates that of Slauson (2008), when it confirmed that there was an improvement in SR about standard deviation in the experimental group, but not in the control group.However, the current study focused on measures of central tendency and measures of variability other than standard deviation.There is also a similar development of SR related to bivariate correlation and linear regression equations.This result further strengthens the findings of Zieffler (2006) that a course or lesson plan that is appropriately and carefully designed helps students develop their SR.Compared with studies in Vietnam, the results of this study move beyond proposing learning tasks to develop students' SR by including real data.In addition, this study used a quasi-experimental method to evaluate the changes of students' SR.
The current research has several limitations.The number of assessment questions is only eight, and other important statistical concepts and other relationships between two variables have not been tested.In the future, researchers could expand the study by including more questions to evaluate student performance in other statistical topics and observe the process of changing in other areas more comprehensively.
Based on the findings of this study, we suggest teaching statistics focusing on developing SR.That is, courses need to be designed appropriately with careful activities, conducting in groups.Instructors could restrain from teaching statistics by providing calculation procedures and formulas, but instead selecting activities that pique students' curiosity and interest by using more practical scenarios, especially in situations that require connecting two or more statistical concepts.Through these situations, students can learn the meaning of concepts.In addition, instructors need to apply statistical software to reduce the burden of calculating and processing data for students and use the software to provide visual images and charts to help students understand statistical concepts.To accomplish these goals, there needs to be synchronous changes.First, there are needed changes in curriculum and textbooks (cf.,  Tran et al., 2016;Tran & Tarr, 2018) that offer opportunities for statistical investigations with real data.Second, it is necessary to reduce formulas memorization but focus on building and understanding the formulas.Next, it is necessary to train lecturers to teach statistics in methods that are different from traditional teaching methods.These rely on teacher professional knowledge to teach in a different way (Nguyen & Tran, 2023;Tran & O'Connor, 2023).The above changes will contribute to help students develop their SR, thereby prepare students in today's informationrich society.

Figure 1 .
Figure 1.Question 5 Thus, the pre-test and post-test on statistical reasoning includes eight questions corresponding to two content areas:-Questions assessing SR related to characteristic of measures of central tendency include three questions (3, 6, and 8).These questions require students to recognize the connection between statistical objects: charts and data tables, histograms, and characteristic of measures of central tendency, such as mean, median or mode, thereby making the right choice for the situations raised in the question.-Questions assessing SR related to characteristics of measures of variability include five questions (1, 2, 4, 5 and 7).These questions require students to recognize connection between statistical objects: column chart, boxplot, data series, and characteristic of measures of variability, such as variance, standard deviation, thereby making the appropriate choice for the mentioned situations in the question.To evaluate students' SR regarding bivariate correlation relationships, we used a real-life problem as an introductory situation at the beginning of the lesson and reused it to test students' knowledge at the end of the third lesson (Figure2).

Figure 4 .
Figure 4. NQT student's answer to Question 54.4.SR in correlation and regressionThe responses to Q1 at the beginning of the lesson showed that six groups disagreed with the statement (50%), one group did not answer (8.3%), and five groups agreed with the statement (41.7%).The groups who disagreed with the statement mainly used outlier examples to argue that foot length has nothing to do with height.

Figure 7 .
Figure 7. Group 3 post-lesson answer to Q1 For Q2, seven groups predicted the heights based on observations without any data.The remaining five groups did not provide an answer.

Figure 8 .
Figure8.Group 5 pre-lesson answer to Q2 However, after finishing the lesson, all 12 groups relied on the linear regression equation to predict heights when knowing foot length.

Figure 9 .
Figure 9. Group 5 post-lesson answer to Q2The results show that with carefully designed and appropriate lesson plans, students' SR related to bivariate relationships is developed.All groups understand the meaning of correlation coefficients and regression equations, thereby being able to provide accurate answers and predictions for specific statistical situations.

Table 3 .
Means of pre-test on SR each SR component

Table 4 .
Means of SR on pre-test and post-test of Class 3A

Table 5 .
Means of SR pre-test and post-test of Class 3B Table 6 below: Table 6.Distribution of answers according to the level of Class 3A