H1B visa petition case status prediction model using Azure machine learning

In this post, let us see an example for Azure Machine learning - Classification model.

Steps involved to build an Azure ML experiment (classification model) that will predict based on the input data feed:
1) Data preparation
2) Choosing the Algorithm
3) Train, Score & Evaluate the model
4) Deploying the trained model 

Dataset used for this experiment is from Kaggle - H-1B Visa Petitions 2011-2016
Import the dataset into R studio / Power BI, to get better understanding of data.

Based on the input (Job title, Job type, Annual wage, Year of visa petition submitted, work location), the prediction model should be able to classify the case status.

Step 1:

As mentioned above, data preparation is first and important step. 
In this experiment, below data preparation activities are performed to achieve better accuracy in the prediction.
  i)   Data records / Features / Observations are filtered that are necessary for this experiment
  ii)  Unnecessary columns are removed
  iii) Data types are changed 
  iv) Data unbalance is handled by replicating the data 

To know more about data preparation activities that are essential for machine learning prediction model, refer here.

Apply SQL transformation to filter the records and to convert Case status label as binary value:

Select columns in dataset to exclude the columns

Edit metadata to change column types

Split the data in such a way that 70% is fed for training and 30% for testing.

Execute R script to balance the data as Case status label = 'Denied' is very less

Step 2:

Second step is choosing the right algorithm. For an data scientist expert, this should be an easy task. But still we have an option to create the experiment with two algorithm's (based on our knowledge the algorithm's that are suitable for this experiment) and then the performance of the model can be evaluated to find which algorithm is more suitable for this prediction experiment.

To get more help with algorithm's, refer Machine learning algorithm cheat sheet for Microsoft Azure Machine Learning Studio

Step 3:

Step three is train, score and evaluate model.

After running the experiment, click on Evaluate model and Visualize the output.

For two-class classification model, ROC curve, Accuracy, Precision, Recall and the confusion matrix values determine the model efficiency.

More the upper curve is near the line, better is the model 

Confusion matrix: 

Positive label -  1 indicates Case Status = CERTIFIED
Negative label -  0 indicates Case Status = DENIED

Refer here to know more about evaluating classification model.

Once we are satisfied with training experiment, we can then click on the Train model and save that as a Trained model and we can then convert the experiment into predictive experiment by clicking SET UP WEB SERVICE in the bottom of the canvas.

In the predictive experiment, we can notice the web service input and output.

As trained model is connected to the Score model, we can connect the web service input directly to Score model bypassing the data transformations.

Also to remove the response variable (Case status) in the input, add a transformation before Score model to exclude the column.

Step 4:

Now run the predictive experiment once and click on DEPLOY WEB SERVICE.

Once deployed we can test the experiment by passing the input.

Also click on Request\Response API, to get the sample code in C#, python or R to invoke this experiment as web service.

See Also:

No comments: