A review of
“Large Language Models in Teaching and Learning: Reflections on Implementing an AI Chatbot in Higher Education”
📄 Full paper: https://arxiv.org/pdf/2603.17773
The idea in one line
AI can make learning faster and more accessible.
It does not guarantee that people actually understand more.
This paper is valuable because it moves away from speculation. Instead of asking what AI could do in education, the authors implemented it inside a real course and observed what actually happens.
Who is behind this research
The study was conducted by
Fiammetta Caccavale , Carina L. Gargalo , Julian Kager , Magdalena Skowyra , Steen Larsen , Krist Gernaey , and Ulrich Krühne , all affiliated with the DTU – Technical University of Denmark (DTU), particularly within engineering and applied sciences.
What they actually did
They built a chatbot called ChatGMP, designed to simulate the role of a company in a learning exercise. Instead of students interacting with a human instructor acting as the company, they interacted with the AI.
The system used retrieval-augmented generation, meaning it could access course-specific material and ground its answers in the content students were supposed to learn.
To evaluate the impact, the researchers designed a controlled experiment where students experienced both types of interaction: human and AI. They collected data through surveys, performance analysis, and behavioral observation.
What they found
Students found the AI easier to interact with. It was always available, responded instantly, and removed the pressure that often comes with asking questions in front of a human. That alone increased engagement. Students were more willing to explore and ask questions.
But when the task required deeper reasoning, the picture changed. Students still leaned toward human interaction when they needed clarity, judgment, or confirmation that their understanding was correct.
The AI was helpful, but it was not fully trusted.
Another issue is accuracy. The paper highlights that AI can produce answers that sound correct but are not. In a learning environment, that creates risk because students may not have enough context to challenge those answers.
Over time, this leads to a subtle shift. Students rely more on the system, but may engage less with the underlying concepts.
The real insight
The most important takeaway is not about performance metrics. It is about behavior.
AI reduces friction. It makes it easier to move forward, easier to get answers, and easier to stay engaged. But learning is not just about moving faster. It requires effort, reflection, and sometimes confusion.
When AI removes too much friction, it can also reduce the depth of thinking. This creates a tradeoff. Efficiency goes up, but cognitive engagement can go down.
<>
The same dynamic is already visible inside companies. Employees using AI tools complete tasks faster and with less effort. But the risk is similar. If the system becomes the default source of answers, people may stop questioning or validating what they receive.
Over time, that affects decision quality.
The paper is essentially a small-scale version of what is happening at the organizational level.
What leaders should take from this
AI should not be evaluated only on speed or productivity. It should be evaluated on whether it improves the quality of thinking and decisions.
That requires designing systems where AI supports reasoning instead of replacing it. It also requires training people to question outputs, not just use them.
Without that layer, AI can create efficiency while quietly weakening understanding.
👉 Explore the Chief AI Officer (CAIO) Certification
Frequently Asked Questions
What does this mean for a Chief AI Officer?
This research signals that AI deployment in knowledge-intensive environments requires measurement of actual learning outcomes, not just user satisfaction—a critical distinction that will shape your ROI conversations with the board. You’ll need to design implementation pilots that distinguish between improved access and improved comprehension, because stakeholders will demand evidence that AI investments translate to measurable competency gains rather than convenience alone.
Should we replace human instructors with AI chatbots in our corporate training programs?
Not yet. This DTU study found students preferred the AI interface, but it doesn’t prove they learned better—the critical gap your organization needs to measure before scaling. Before replacing human instructors, run a controlled pilot in one department where you track both engagement metrics and actual performance improvements to understand whether ease of use translates to skill retention in your specific context.
How can our organization assess whether AI in learning is actually working?
The DTU team’s approach—combining surveys, performance analysis, and behavioral observation—provides a template your L&D team should adopt before rolling out AI tutoring broadly. Organizations evaluating AI Assessment for companies should establish baseline comprehension metrics before deployment, then measure whether students who used the AI chatbot demonstrate deeper understanding on assessments that require application and reasoning, not just knowledge recall. Silicon Valley Certification Hub offers frameworks for validating AI learning outcomes that align with industry competency standards, ensuring your investment builds measurable organizational capability.
What should we do in the next 90 days if we want to pilot AI in our learning programs?
Design a small controlled experiment with a single course or training module where half your learners use an AI chatbot and half use traditional instruction, then measure comprehension through practical assessments rather than satisfaction surveys. Simultaneously, define what “better learning” means for your organization—whether that’s faster time-to-competency, higher retention rates, or improved job performance—because this research reveals that improved access doesn’t automatically translate to your business outcomes.
0 Comments