Phil 9.6.16

7:00 – 4:30 ASRC

  • Have everyone’s schedule for proposal but Shimei.
  • Saw an interesting article in Gamasutra on Behaviourism, In-game Economies and the Steam Community Market, which led me to get Hooked: How to Build Habit-Forming Products, which should be good for gamification of the UI
  • Working on section 1.6 – What the rest of the proposal looks like. Kinda done?
  • Just found a blog post that mentions this reviewer guideline for registered reports, which is kind of like a study proposal, where the research methods of a paper are submitted before the study is done. Interesting. Need to make sure that my proposal fits with this…
  • Back to WEKA and the analysis of the physician data.
    • Overall stats – 30 ‘good’, 12 junk, per these rules in RatingObj2 in the GoogleCSE2 project:
      public String junkOrGood(){
          boolean junk = true;
          if(personCharacterization.equals(INAPPROPRIATE)){
              return "junk";
          }
          if(sourceType.equals(MACHINE_GENERATED)){
              return "junk";
          }
          if(qualityCharacterization.equals(LOW) || qualityCharacterization.equals(MINIMAL))
          {
              return "junk";
          }
          if(trustworthiness.equals(NOT_CREDIBLE) || trustworthiness.equals(DISTRUSTWORTHY) || trustworthiness.equals(VERY_DISTRUSTWORTHY)){
              return "junk";
          }
          return "good";
      }
    • This shows the second pass using just the text. It turns out that the classifiers were targeting the meta information as the best predictor. And of course they were right. Pulled out the meta information and got the following. I do want to try some of the other meta information as well, like trustworthiness and see if there’s anything that makes sense. Not that this corpus is just html pages that were successfully downloaded and scanned. No MSWORD or PDF.
    • NaiveBayes:
      Time taken to build model: 0.01 seconds
      
      === Stratified cross-validation ===
      === Summary ===
      
      Correctly Classified Instances 33 78.5714 %
      Incorrectly Classified Instances 9 21.4286 %
      Kappa statistic 0.5116
      Mean absolute error 0.2143
      Root mean squared error 0.4629
      Relative absolute error 51.8311 %
      Root relative squared error 102.1856 %
      Total Number of Instances 42 
      
      === Detailed Accuracy By Class ===
      
       TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
       0.750 0.200 0.600 0.750 0.667 0.519 0.747 0.509 junk
       0.800 0.250 0.889 0.800 0.842 0.519 0.810 0.876 good
      Weighted Avg. 0.786 0.236 0.806 0.786 0.792 0.519 0.792 0.771 
      
      === Confusion Matrix ===
      
       a b <-- classified as
       9 3 | a = junk
       6 24 | b = good
    • SGD (stochastic gradient descent):
      === Stratified cross-validation ===
      === Summary ===
      
      Correctly Classified Instances 35 83.3333 %
      Incorrectly Classified Instances 7 16.6667 %
      Kappa statistic 0.637 
      Mean absolute error 0.1667
      Root mean squared error 0.4082
      Relative absolute error 40.3131 %
      Root relative squared error 90.1193 %
      Total Number of Instances 42 
      
      === Detailed Accuracy By Class ===
      
       TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
       0.917 0.200 0.647 0.917 0.759 0.660 0.858 0.617 junk
       0.800 0.083 0.960 0.800 0.873 0.660 0.858 0.911 good
      Weighted Avg. 0.833 0.117 0.871 0.833 0.840 0.660 0.858 0.827 
      
      === Confusion Matrix ===
      
       a b <-- classified as
       11 1 | a = junk
       6 24 | b = good
    • SMO (sequential minimal optimization algorithm for training a support vector classifier.):
      Time taken to build model: 0.02 seconds
      
      === Stratified cross-validation ===
      === Summary ===
      
      Correctly Classified Instances 32 76.1905 %
      Incorrectly Classified Instances 10 23.8095 %
      Kappa statistic 0.5139
      Mean absolute error 0.2381
      Root mean squared error 0.488 
      Relative absolute error 57.5901 %
      Root relative squared error 107.7131 %
      Total Number of Instances 42 
      
      === Detailed Accuracy By Class ===
      
       TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
       0.917 0.300 0.550 0.917 0.687 0.558 0.808 0.528 junk
       0.700 0.083 0.955 0.700 0.808 0.558 0.808 0.882 good
      Weighted Avg. 0.762 0.145 0.839 0.762 0.773 0.558 0.808 0.781 
      
      === Confusion Matrix ===
      
       a b <-- classified as
       11 1 | a = junk
       9 21 | b = good
    • Multilayer Perceptron took a long time but didn’t produce any results?
    • Attribute Selected Classifier – J48(Dimensionality of training and test data is reduced by attribute selection before being passed on to a classifier.)
      Time taken to build model: 1.41 seconds
      
      === Stratified cross-validation ===
      === Summary ===
      
      Correctly Classified Instances 34 80.9524 %
      Incorrectly Classified Instances 8 19.0476 %
      Kappa statistic 0.4815
      Mean absolute error 0.2238
      Root mean squared error 0.3805
      Relative absolute error 54.1364 %
      Root relative squared error 83.9928 %
      Total Number of Instances 42 
      
      === Detailed Accuracy By Class ===
      
       TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
       0.500 0.067 0.750 0.500 0.600 0.499 0.729 0.682 junk
       0.933 0.500 0.824 0.933 0.875 0.499 0.729 0.823 good
      Weighted Avg. 0.810 0.376 0.803 0.810 0.796 0.499 0.729 0.783 
      
      === Confusion Matrix ===
      
       a b <-- classified as
       6 6 | a = junk
       2 28 | b = good
    • Discussion with Aaron about the upcoming epics for machine learning. I thin ka lot of this is going to be about classifying data well for subsequent learning
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: