Research talk:Revision scoring as a service/Work log/2016-02-08
Monday, February 8, 2016
edit$ make models/wikidata.reverted.all.rf.model cut datasets/wikidata.features_reverted.all.nonbot.500k_2015.tsv -f2- | \ revscoring train_test \ revscoring.scorer_models.RF \ wb_vandalism.feature_lists.experimental.all \ --version 0.0.1 \ -p 'max_features="log2"' \ -p 'criterion="entropy"' \ -p 'min_samples_leaf=1' \ -p 'n_estimators=80' \ -s 'pr' -s 'roc' \ -s 'recall_at_fpr(max_fpr=0.10)' \ -s 'filter_rate_at_recall(min_recall=0.90)' \ -s 'filter_rate_at_recall(min_recall=0.75)' \ --balance-sample-weight \ --center --scale \ --label-type=bool > \ models/wikidata.reverted.all.rf.model 2016-02-08 18:44:01,006 INFO:revscoring.utilities.train_test -- Training model... 2016-02-08 18:44:50,121 INFO:revscoring.utilities.train_test -- Testing model... ScikitLearnClassifier - type: RF - params: min_samples_leaf=1, warm_start=false, class_weight=null, min_weight_fraction_leaf=0.0, oob_score=false, verbose=0, min_samples_split=2, n_estimators=80, max_features="log2", bootstrap=true, center=true, criterion="entropy", max_depth=null, balanced_sample_weight=true, max_leaf_nodes=null, random_state=null, scale=true, n_jobs=1 - version: 0.0.1 - trained: 2016-02-08T18:44:50.110700 ~False ~True ----- -------- ------- False 80971 11 True 83 17 Accuracy: 0.9988406798056289 ROC-AUC: 0.968 Filter rate @ 0.9 recall: threshold=0.013, filter_rate=0.982, recall=0.94 Recall @ 0.1 false-positive rate: threshold=0.713, recall=0.04, fpr=0.0 Filter rate @ 0.75 recall: threshold=0.075, filter_rate=0.996, recall=0.76 PR-AUC: 0.413
Well, that doesn't look bad. It seems like we can clearly filter out 98.2% of edits and expect a high recall of 0.94. Our ROC-AUC looks pretty good, but that PR-AUC is difficult given the extremely low prevalence of vandalism. It seems like we couldn't really do a ClueBot-like strategy and expect a very high recall. It'll be interesting to plot these results next to those of the other models. I'll kick off the next feature extraction and model generation. --EpochFail (talk) 19:34, 8 February 2016 (UTC)
General and user features
editJust finished training the model. Here's what I get:
$ make models/wikidata.reverted.general_and_user.rf.model cut datasets/wikidata.features_reverted.general_and_user.nonbot.500k_2015.tsv -f2- | \ revscoring train_test \ revscoring.scorer_models.RF \ wb_vandalism.feature_lists.experimental.general_and_user \ --version 0.0.1 \ -p 'max_features="log2"' \ -p 'criterion="entropy"' \ -p 'min_samples_leaf=1' \ -p 'n_estimators=80' \ -s 'pr' -s 'roc' \ -s 'recall_at_fpr(max_fpr=0.10)' \ -s 'filter_rate_at_recall(min_recall=0.90)' \ -s 'filter_rate_at_recall(min_recall=0.75)' \ --balance-sample-weight \ --center --scale \ --label-type=bool > \ models/wikidata.reverted.general_and_user.rf.model 2016-02-09 15:40:32,005 INFO:revscoring.utilities.train_test -- Training model... 2016-02-09 15:41:25,816 INFO:revscoring.utilities.train_test -- Testing model... ScikitLearnClassifier - type: RF - params: max_depth=null, n_estimators=80, balanced_sample_weight=true, class_weight=null, min_samples_split=2, warm_start=false, oob_score=false, verbose=0, random_state=null, max_features="log2", min_samples_leaf=1, criterion="entropy", n_jobs=1, center=true, max_leaf_nodes=null, min_weight_fraction_leaf=0.0, bootstrap=true, scale=true - version: 0.0.1 - trained: 2016-02-09T15:41:25.813516 ~False ~True ----- -------- ------- False 99003 14 True 93 30 Accuracy: 0.9989207181763163 ROC-AUC: 0.965 Filter rate @ 0.9 recall: threshold=0.025, filter_rate=0.992, recall=0.902 Recall @ 0.1 false-positive rate: threshold=None, recall=None, fpr=None Filter rate @ 0.75 recall: threshold=0.087, filter_rate=0.996, recall=0.764 PR-AUC: 0.457
So, that's comparable to the "all" features set, which suggests that we get most of our signal beyond "general" with the user features. --EpochFail (talk) 17:33, 9 February 2016 (UTC)