Resource Allocation Analytics

Our Customer - Peoples-uni

The Project - Which students are most likely to finish a course?

Project Brief

The People's Open Access Education Initiative is a charity and needs to allocate available resources in the most efficient manner possible. The most valuable resource offered is teaching time. To make most efficient use of this time we need to know which students are most likely to complete the course successfully, and which students might require targeted attention to assist completion.


There were tight budgetary constraints since Peoples-uni is a charity.

The Peoples-uni Open Online Courses web site is new and the underlying Moodle database was sparsely populated at the time of the analysis


Moodle Reporting

Moodle provides effective reporting on student activity and progress.

However, the interface isn't designed to extract data in a manner effective for predictive analytics. Fortunately Moodle does provide an SQL reporting interface allowing flexibility for any data extracts that may be required.

Reverse Engineering and SQL Data Extraction

There were 328 tables in the Moodle database at the time of the analysis.

An overview of the database schema was required and MySQL Workbench was used to reverse engineer the modules involving enrollments and student demographic information.


This allowed the creation of SQL scripts to load into Moodle to provide the data required for analysis.

R Statistics

Data Wrangling: Feature Extraction and Aggregation

The newness of the web site meant that there was little data with which to work, so getting the maximum information out of the data was paramount. To this end data was extracted from the users' profile information and aggregated it into groups which might provide further insight into user behaviour. This included the aggregation of regional, occupational and email domain data etc.


A number of graphs were created to help the customer visualise the enrollment demographics before the final analysis.


The Solution

The available enrollment data was dominated by categorical variables making the more common prediction models ineffective.

The final solution involved using the random forest algorithm to provide a meaningful solution.

Please note that some data labels have been removed