This is a hands-on exercise to use the SVC API of scikit- learn1 to train a SVM with the linear kernel and the rbf kernel, respectively, on a binary classification dataset.
The dataset is a preprocessed dataset of the Adult dataset in the UCI Irvine Machine Learning Repository2, which consists of a training set(train.csv) and a test set(test.csv).
Each file (the train set or the test set) is a text format in which each line represents a labeled data instance as follows:
label index1:value1 index2:value2 ...
where “label” denotes the class label of each instance, “indexT” denotes the T-th feature, and valueT denotes the value of the T-th feature of the instance.
example from training set (1 record):
-1 3:1 11:1 14:1 19:1 39:1 42:1 55:1 64:1 67:1 73:1 75:1 76:1 80:1 83:1
This is a sparse format, where only non-zero feature values are stored for each instance. For example, suppose given a data set, where each data instance has 5 dimensions (features). If a data instance whose label is “+1” and the input data instance vector is [2 0 2.5 4.3 0], then it is presented in a line as
+1 1:2 3:2.5 4:4.3
Hint: sciki-learn provides an API (“sklearn.datasets.load svmlight file”) to load such a sparse data format.
Now the question is:
Regarding the linear kernel,show 3-fold cross-validation results in terms of classification accuracy on the training set with different values of the parameter C in {0.01, 0.05, 0.1, 0.5, 1}, respectively. Note that for all the other parameters.
give me the code of showing the result.