At Internshala Trainings, our online training platform that helps learners acquire new age skills, we have been grappling with the question of how to measure effectiveness of our trainings for learners.
We wanted to define a single metric that would help us answer the question ‘How many of my learners really learned how much of what I wanted to teach?‘ and will become our guiding star to continuously improve the quality of our individual trainings and platform.
In a classroom training, this can be measured via assessment scores and marks in practicals; but to do so in an online environment has many challenges. While the MOOCs have been around for more than a decade now, most commonly used metrics are completion rates and user ratings which only partially and indirectly answer the above question. We wanted to develop a metric which goes into the heart of learning, is scientifically more rigorous, and can differentiate between what is just entertaining and what is engaging & effective.
With a bit of research, a bit of tinkering with an existing framework and marrying it with another scientific concept, we came up with a framework that we call ‘Internshala Model of Online Training Evaluation‘ and a metric called ‘Learning Effectiveness Index (LEI)‘. This article details the framework and the metric at conceptual level with an objective to invite comments, reactions, and feedback from the community.
1. Starting point – We started with The Kirkpatrick Model for Training Evaluation which was developed in 1950s to evaluate the effectiveness of various instructor led industrial training programs. This model evaluates a training program on 4 levels –
Level 1 – Reaction: The degree to which participants find the training favorable, engaging and relevant to their job-related goals i.e. did they like it?
Level 2 – Learning: The degree to which participants acquire the intended knowledge, skills, attitude, confidence and commitment based on their participation in the training i.e. did they learn the subject?
Level 3 – Behavior: The degree to which participants apply what they learned during training in a job i.e. can they apply the acquired knowledge independently outside training environment?
Level 4 – Results: The degree to which targeted program outcomes occur and contribute to the organization’s highest-level result i.e. did they achieve the larger goal with which they did this training (such as finding a job, getting a raise, becoming more productive etc.)
We particularly liked this framework because it goes beyond just user ratings and makes a valid attempt to measure the learning outcome which is the true purpose of a training.
2. Developing instruments to measure each of the levels – Once we had clarity on what we wanted to measure, it was easier to think about ways in which it could be measured in our context. We came with following –
To measure Reaction – We used our existing feedback form which every learner fills after completing the training where she rates us on a 5-star scale. We made some tweaks to ask feedback on overall experience as well as individual components such as videos, exercises, support etc. to help us get more granular data on what is working and what isn’t. So, if every learner rates each of our trainings 5 stars, LEI score for this level would be 100% and if they rate us 4, it would be 80%. and so on.
To measure Learning – We are using the final test score that every learner undergoes upon completion of the training. One significant change we made was to account for the pre-knowledge via a pre-test that learner undertakes before the start of the training and we measure what % of knowledge gap the training was able to cover. So, if you scored 30 in a pre-test and 70 in final test, this means training helped you bridge 40/70 i.e. ~57% of the gap.
To measure Behavior – Each of our training comes with a project (say develop an e-commerce website in a Web Development training) that a learner is expected to work on independently and submit during the training. For each project problem statement, we are developing a rubric which would have various parameters (say Database design, modularity of the code, functionality etc.) with different weights against which each submission will be evaluated and given a final score.
And finally, To measure Results – We developed a pre and post survey of expectations. In pre-survey, we ask a learner what the result she expects after completing the training (this could range from ‘getting an internship in this field’, ‘good marks in exam’, to ‘build a project for self’ etc.). We reach out to same set of learners after 6 months (because results like exam marks or an internship may not happen immediately) and ask them to what extent they (re) were able to achieve the results they had expected.
3. Assigning weights to each of the levels using AHP – One of the key question that is not answered by original Kirkpatrick model is the relative importance of each of the levels to arrive at an overall score for the training. For example, if a training scores 80% on reaction and 40% on results vs a training that scores 60% on reaction and 50% on results – which of the two is a better training overall. This was particularly important for us to answer since we wanted to come up with a single metric (LEI).
We solved this by using this beautiful decision making framework called Analytical Hierarchy Process (AHP) – At the heart of AHP is pairwise comparison, argument being that it is always easy to answer that between the two options which one do you feel (subjective) or know (objective) is more important and by how much; than having to compare multiple items together. The outcomes of the pairwise comparisons is then summed up to arrive at relative weights. This worked out example may make it easier for you to understand how AHP works.
We got all the stakeholders (product, content, subject matter experts, customer delight team, marketing etc.) in a room and asked them to relatively rank each of the 4 levels using AHP and aggregated their responses to arrive at weights for each of the levels that we would use for Internshala Trainings. I would keep the actual weights that we came up with as a secret 🙂
4. Bringing it all together to define a training and platform level LEI – So now we know what to measure and how. We also know the weight of each of the levels. The last thing that remains to be done is to multiply these together with training completion rates at different levels (not everyone who starts the training would get to the test stage, fewer would submit the project, and a minuscule % would take part in post expectations survey) to come up with Learning Effectiveness Index (LEI) for that training. Aggregating the LEI numbers across different trainings (weighted by enrollment numbers) would give you a platform level LEI which hopefully truly answers ‘How many of my learners really learned how much of what I wanted to teach?‘ and will become our guiding star to continuously improve the quality of our individual trainings and platform as a whole :).
5. Limitations and next steps – This of course is work in progress and we are expecting to encounter newer challenges as implement this platform wide gradually. One of the key limitations is that neither the tests nor the project submissions are invigilated. The other limitation is the design of the test where we currently ask only multiple choice objective questions. I hope to keep updating this thread as we get more data post implementing it across first few programs.