Investigate the use of Logistic Regression on a subset of the Kaggle Credit Card Fraud Data set (www.kaggle.com/dalpozz/creditcardfraud). Note that in this data set, the number of fraud data are much smaller than the normal data.
Your first task would be to construct subset data set(s) from the Kaggle data set. Construct three subset data sets of 100K, 20K, and 10K, with normal and fraud data included (make sure you maximize the number of fraud data elements). Out of this data set construct a training data set and a testing data set (using 80% of the data for the former, and 20% for the latter) to build and test the logistic regression model.
Tasks:
1. Perform Logistic Regression on the three data subsets (100K, 20K, 10K). Show your results using a cross-table. Discuss your results for each of the data sets.
2. Perform Ridge Logistic Regression and Lasso Logistic Regression on the three data subsets. Hint: http://ricardoscr.github.io/how-to-use-ridge-and-lasso-in-r.html (Links to an external site.). Show your results using a cross-table and discuss the results in comparison to (1)Attachments