File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1111/bjet.13604
- Scopus: eid_2-s2.0-105004850753
- Find via

Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Evaluating the use of BERT and Llama to analyse classroom dialogue for teachers' learning of dialogic pedagogy
| Title | Evaluating the use of BERT and Llama to analyse classroom dialogue for teachers' learning of dialogic pedagogy |
|---|---|
| Authors | |
| Keywords | APT artificial intelligence classroom dialogue large language model teacher learning |
| Issue Date | 12-May-2025 |
| Publisher | Wiley |
| Citation | British Journal of Educational Technology, 2025 How to Cite? |
| Abstract | Classroom dialogue is crucial for effective teaching and learning, prompting many professional development (PD) programs to focus on dialogic pedagogy. Traditionally, these programs rely on manual analysis of classroom practices, which limits timely feedback to teachers. To address this, artificial intelligence (AI) has been employed for rapid dialogue analysis. However, practical applications of AI models remain limited, often prioritising state-of-the-art performance over educational impact. This study explores whether higher accuracy in AI models correlates with better educational outcomes. We evaluated the performance of two language models—BERT and Llama3—in dialogic analysis and assessed the impact of their performance differences on teachers' learning within a PD program. By fine-tuning BERT and engineering prompts for Llama3, we found that BERT exhibited substantially higher accuracy in analysing dialogic moves. Sixty preservice teachers were randomly assigned to either the BERT or Llama3 group, both participating in a workshop on the academically productive talk (APT) framework. The BERT group utilized the fine-tuned BERT model to facilitate their learning, while the Llama3 group employed the Llama3 model. Statistical analysis showed significant improvements in both groups' knowledge and motivation to learn the APT framework, with high levels of satisfaction reported. Notably, no significant differences were found between the two groups in posttest knowledge, motivation, and satisfaction. Interviews further elucidated how both models facilitated teachers' learning of the APT framework. This study validates the use of AI in teacher training and is among the first to investigate the relationship between AI accuracy and educational outcomes. Practitioner notes What is already known about this topic Given the significance of classroom dialogue, many teacher professional development programmes have been implemented focusing on dialogic pedagogy. To provide timely feedback to teachers, artificial intelligence (AI) techniques are increasingly utilised to investigate classroom dialogue. However, a small proportion of studies have investigated the impacts of AI models in practice, with a predominant focus on pursuing state-of-the-art performance. It is unclear whether more accurate AI models necessarily lead to more positive educational outcomes. What this paper adds This study evaluated the performance of two AI-powered language models, BERT and Llama3, in dialogic move analysis through fine-tuning and prompt engineering. BERT exhibited significantly higher accuracy than Llama3. Through an experimental study, this paper revealed that teachers using either the more accurate BERT model or the less accurate Llama3 model showed substantial improvements in their knowledge and motivation to learn the APT framework and reported high levels of satisfaction. The performance difference between BERT and Llama3 did not cause significant differences in teachers' knowledge, learning motivation, and satisfaction during the learning of the APT framework. Implications for Practice and/or Policy Deep learning models and large language models can be integrated into professional development programs to effectively facilitate teachers' learning of dialogic pedagogy. AI models with moderate performance can also produce impressive outcomes and provide a satisfactory experience. In some scenarios, the manner in which teachers collaborate with AI may be more pivotal than the AI's accuracy. |
| Persistent Identifier | http://hdl.handle.net/10722/361951 |
| ISSN | 2023 Impact Factor: 6.7 2023 SCImago Journal Rankings: 2.425 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Wang, Deliang | - |
| dc.contributor.author | Chen, Gaowei | - |
| dc.date.accessioned | 2025-09-17T00:32:16Z | - |
| dc.date.available | 2025-09-17T00:32:16Z | - |
| dc.date.issued | 2025-05-12 | - |
| dc.identifier.citation | British Journal of Educational Technology, 2025 | - |
| dc.identifier.issn | 0007-1013 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/361951 | - |
| dc.description.abstract | <p>Classroom dialogue is crucial for effective teaching and learning, prompting many professional development (PD) programs to focus on dialogic pedagogy. Traditionally, these programs rely on manual analysis of classroom practices, which limits timely feedback to teachers. To address this, artificial intelligence (AI) has been employed for rapid dialogue analysis. However, practical applications of AI models remain limited, often prioritising state-of-the-art performance over educational impact. This study explores whether higher accuracy in AI models correlates with better educational outcomes. We evaluated the performance of two language models—BERT and Llama3—in dialogic analysis and assessed the impact of their performance differences on teachers' learning within a PD program. By fine-tuning BERT and engineering prompts for Llama3, we found that BERT exhibited substantially higher accuracy in analysing dialogic moves. Sixty preservice teachers were randomly assigned to either the BERT or Llama3 group, both participating in a workshop on the academically productive talk (APT) framework. The BERT group utilized the fine-tuned BERT model to facilitate their learning, while the Llama3 group employed the Llama3 model. Statistical analysis showed significant improvements in both groups' knowledge and motivation to learn the APT framework, with high levels of satisfaction reported. Notably, no significant differences were found between the two groups in posttest knowledge, motivation, and satisfaction. Interviews further elucidated how both models facilitated teachers' learning of the APT framework. This study validates the use of AI in teacher training and is among the first to investigate the relationship between AI accuracy and educational outcomes. Practitioner notes What is already known about this topic Given the significance of classroom dialogue, many teacher professional development programmes have been implemented focusing on dialogic pedagogy. To provide timely feedback to teachers, artificial intelligence (AI) techniques are increasingly utilised to investigate classroom dialogue. However, a small proportion of studies have investigated the impacts of AI models in practice, with a predominant focus on pursuing state-of-the-art performance. It is unclear whether more accurate AI models necessarily lead to more positive educational outcomes. What this paper adds This study evaluated the performance of two AI-powered language models, BERT and Llama3, in dialogic move analysis through fine-tuning and prompt engineering. BERT exhibited significantly higher accuracy than Llama3. Through an experimental study, this paper revealed that teachers using either the more accurate BERT model or the less accurate Llama3 model showed substantial improvements in their knowledge and motivation to learn the APT framework and reported high levels of satisfaction. The performance difference between BERT and Llama3 did not cause significant differences in teachers' knowledge, learning motivation, and satisfaction during the learning of the APT framework. Implications for Practice and/or Policy Deep learning models and large language models can be integrated into professional development programs to effectively facilitate teachers' learning of dialogic pedagogy. AI models with moderate performance can also produce impressive outcomes and provide a satisfactory experience. In some scenarios, the manner in which teachers collaborate with AI may be more pivotal than the AI's accuracy.</p> | - |
| dc.language | eng | - |
| dc.publisher | Wiley | - |
| dc.relation.ispartof | British Journal of Educational Technology | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject | APT | - |
| dc.subject | artificial intelligence | - |
| dc.subject | classroom dialogue | - |
| dc.subject | large language model | - |
| dc.subject | teacher learning | - |
| dc.title | Evaluating the use of BERT and Llama to analyse classroom dialogue for teachers' learning of dialogic pedagogy | - |
| dc.type | Article | - |
| dc.description.nature | published_or_final_version | - |
| dc.identifier.doi | 10.1111/bjet.13604 | - |
| dc.identifier.scopus | eid_2-s2.0-105004850753 | - |
| dc.identifier.eissn | 1467-8535 | - |
| dc.identifier.issnl | 0007-1013 | - |
