Phil 4.27.16

7:00 – 5:30 VTX

  • Finished A fistful of bitcoins: characterizing payments among men with no names
    • In reading the discussion about ‘peeling’, I wonder if in a similar way, if someone returns to a story repeatedly, would an adversary be able to find out anything useful?Or, if Bitcoin were used to pay for stories, would tracking transactions do anything as well? One of the nice things about using aliases for BC addresses is that other than the initial mapping, the address can be hidden in the system.
    • Page 93: ...even the most motivated Bitcoin users (i.e., criminals) are engaging in idioms of use that allow us to erode their anonymity.
      • This is an important point. As with biometrics at the small scale, we are identifiable through our behaviors. In this case, idioms or patterns of usage.
  • Rating app
    • Add people – done
    • Add John’s suggestions – done
    • Build and deploy – Done. Waiting on Andy.
  • Write up TF_IDF story
    • Basic capability – 11 points
      • The initial part of the effort is to scan over the collection of documents and produce a list of words ordered by TF-IDF. This means iterating over all the documents and producing a Set<String> of words that are then run over the the set of documents. The output should be an excel file that lists the documents in the corpus, and the list of words.
        • Documents should be listed in a file (xml?) as URIs. HTML docs can be read by jsoup, PDF by PDFBox.
        • The TF-IDF algorithm is discussed here: https://guendouz.wordpress.com/2015/02/17/implementation-of-tf-idf-in-java/
    • Pull pages from approved flags – 3 points
      • The second part of the effort is to use Jeremy’s REST interface to extract the URLs of ‘cleared’ flags to use as the input to the app, via the input file (or call from within the app, though there may be certs issues)
    • Report with new term recommendations – 3 points
      • Using the rating app, we should be able to try using these new terms and see if they improve results. One of the items that will need to be returned from the DB (that’s already stored in the QueryObject2) so we can see if we’re getting cleaner results.
  • LanguageModelNetworks
    • Read in a spreadsheet (xls and xlsx)
    • Write out spreadsheets (page containing the data information
      • File
      • User
      • Date run
      • Settings used
    • allow for manipulation of row and column values (in this case, papers and codes, but the possibilities are endless)
      • Select the value to manipulate (reset should be an option)
      • Spinner/entry field to set changes (original value in label)
      • ‘Calculate’ button
      • Sorted list(s) of rows and columns. (indicate +/- change in rank)
    • Reset all button
    • Normalize all button
    • Progress for today! Lots of wiring up to do though: LMT
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: