WGU Capstone


Scenario

An educational agency intends to develop an application that predicts graduate school admission chances using a deep learning process. By estimating these chances for its students, the agency enables them to set more informed goals. The application should take inputs such as University Rating, Letter of Recommendation Strength, and Research Experience to predict the likelihood of admission.
However, the agency encountered a challenge: crucial columns—namely, GRE Scores, Undergraduate GPA, and TOEFL Scores—were absent from their dataset. Fortunately, the educational agency still possesses old survey data—namely, Average Daily Study Hours, English Self-Confidence Scale, Address (proximity), and Age—that may correlate with the admission chances. By leveraging State Knowledge Graph (SKG) of the data, a Large Language Model (LLM) will recommend adding new columns that probably align with the admission chances. This addition is expected to enhance the deep learning process.
During the evaluation phase, we will assess how the deep-learning predictions perform with the newly suggested columns of data compared to their performance without it.

Dataset

Content

The dataset has different important things that are needed when applying for Master's programs.
These things include:

  • GRE Scores ( out of 340 )
  • TOEFL Scores ( out of 120 )
  • University Rating ( out of 5 )
  • Statement of Purpose Strength ( out of 5 )
  • Letter of Recommendation Strength ( out of 5 )
  • Undergraduate GPA ( out of 10 )
  • Research Experience ( either 0 or 1 )
  • Chance of Admit ( ranging from 0 to 1 )
  • Source

    https://www.kaggle.com/datasets/mohansacharya/graduate-admissions
    Acharya, M. S., Armaan, A., & Antony, A. S. (2019, February). A comparison of regression models for prediction of graduate admissions. In 2019 international conference on computational intelligence in data science (ICCIDS) (pp. 1-5). IEEE.

    Deep learning trial 1: training Artificial Neural Network (ANN) with missing data

    The scnerio dictates the input data for deep learning are missing three important columns: GRE Scores, Undergraduate GPA, and TOEFL Scores.
    Before we enhance our data by leveraging SKG later, let's see how the admission prediction model with these missing data performs for the comparison.

    ChatGPT analyzes SKG: expanding connections

    Now that we have seen how the prediction model with the missing data performs, the next step is to include data to the dataset that may help the model perform better.
    According to the scenario, in addition to the original dataset, there are old surveys that include information about Average Daily Study Hours, English Self-Confidence Scale, Address (proximity), and Age. Each student answered these surveys.
    Let's see the current SKG, then ask ChatGPT to expand connections in such SKG.

    SKG with the missing data

    Result:

    Python result
    Note that weighted edges are based on the Pearson correlation coefficient of a dataset from Deep learning trial 1.
    Ask ChatGPT
    ChatGPT answer


    Try it yourself:
    1. Remove lines of ChatGPT's previous response between "=== what LLM suggests Start ===" and "=== what LLM suggests End ===" in the following code.
    2. Copy and paste a new response generated by you (ChatGPT often gives different answers each time) into the previous response's place.

    SKG with the expanded connections

    Result:

    Python result
    Note that LLM has analyzed/updated the SKG by finding possible correlations between student's survey data and admission chances. Actual correlations are likely different from weights of new connections because these weights are values guessed by LLM.

    Deep learning trial 2: training ANN with a reinforced dataset

    Now that the expanded SKG has suggested that "Average Daily Study Hours" and "English Self-Confidence Scale" (among other survey data) may have the strongest correlations with admission chances, let's add those variables to the dataset.
    Since the scenario "Fortunately, there was survey data that were answered by each student" is hypothetical, we need to create new data for "Average Daily Study Hours" and "English Self-Confidence Scale". Based on the missing data, new columns have been created/added to the dataset: "Average Daily Study Hours" has a Pearson product-moment correlation coefficient of 0.94 with Undergraduate GPA and 0.79 with GRE Scores, while "English Self-Confidence Scale" has a correlation coefficient of 0.96 with TOEFL Scores. Note that while these correlation coefficients are arbitrary values, they are derived from the "missing" data because it is only natural that if you study more, your grades will likely improve, and if you're confident in English, you'll likely achieve a better English test score.

    Evaluation: Deep learning trial 1 vs trial 2

  • Deep learning trial 1: The input data is missing GRE Scores, Undergraduate GPA, and TOEFL Scores.
  • Deep learning trial 2: The same input data as trial 1, but it incorporates Average Daily Study Hours and the English Self-Confidence Scale.
  • Trial 2 outperforms Trial 1 in terms of Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE), indicating better predictive accuracy. Trial 2 also has a higher variance regression score, suggesting that it explains more of the variance in the data. Overall, Trial 2 is the more effective model for predicting admission chances.

    Deep Learning TrialMAEMSERMSEVariance Regression Score
    Trial 10.0619978065252303960.007203339089107540.084872487232951640.6707320647852236
    Trial 20.047999281287193290.0045441610768688830.067410392944032630.7863313792402611

    Conclusion

    We have achieved a better prediction model by leveraging SKG. In this case, the SKG was quite simple, consisting of only one object node with several states. We could easily expand connections in the SKG ourselves.
    However, in reality, many things are interconnected in complex ways, and sometimes the right solutions don't immediately surface. For instance, if you're unable to use your car for an appointment, SKG could suggest considering alternative transportation options such as a taxi or flight, or even postponing the appointment.
    SKG is a useful structure that helps identify meaningful connections in the environment (see Advantage of SKG 1: expanding connections).

    The application: predict your admission chance

    It is based on the trained model from Deep learning trial 2.

    Result: