Monthly Archives: April 2016

Phil 4.29.16

7:00 – 5:00 VTX

  • Expense reports and timesheets! Done.
  • Continuing Informed Citizenship in a Media-Centric Way of Life
    • The pertinence interface may be an example of a UI affording the concept of monitorial citizenship.
      • Page 219: The monitorial citizen, in Schudson’s (1998) view, does environmental surveillance rather than gathering in-depth information. By implication, citizens have social awareness that spans vast territory without having in-depth understanding of specific topics. Related to the idea of monitorial instead of informed citizenship, Pew Center (2008) data identified an emerging group of young (18–34) mobile media users called news grazers. These grazers ind what they need by switching across media platforms rather than waiting for content to be served.
    • Page 222: Risk as Feelings. The abstract is below. There is an emotional hacking aspect here that traditional journalism has used (heuristically?) for most(?) of its history.
      • Virtually all current theories of choice under risk or uncertainty are cognitive and consequentialist. They assume that people assess the desirability and likelihood of possible outcomes of choice alternatives and integrate this information through some type of expectation-based calculus to arrive at a decision. The authors propose an alternative theoretical perspective, the risk-as-feelings hypothesis, that highlights the role of affect experienced at the moment of decision making. Drawing on research from clinical, physiological, and other subfields of psychology, they show that emotional reactions to risky situations often diverge from cognitive assessments of those risks. When such divergence occurs, emotional reactions often drive behavior. The risk-as-feelings hypothesis is shown to explain a wide range of phenomena that have resisted interpretation in cognitive–consequentialist terms.
    • At page 223 – Elections as the canon of participation

  • Working on getting tables to sort – Done

  • Loading excel file -done
  • Calculating – done
  • Using weights -done
  • Reset weights – done
  • Saving (don’t forget to add sheet with variables!) – done
  • Wrapped in executable – done
  • Uploading to dropbox. Wow – the files with JavaFX are *much* bigger than Swing.
Advertisements

Phil 4.28.16

7:00 – 5:00 VTX

  • Reading Informed Citizenship in a Media-Centric Way of Life
    • Jessica Gall Myrick
    • This is a bit out of the concentration of the thesis, but it addresses several themes that relate to system and social trust. And I’m thinking that behind these themes of social vs. system is the Designer’s Social Trust of the user. Think of it this way: If the designer has a high Social Trust intention with respect to the benevolence of the users, then a more ‘human’ interactive site may result with more opportunities for the user to see more deeply into the system and contribute more meaningfully. There is risks in this, such as hellish comment sections, but also rewards (see the YouTube comments section for The Idea Channel episodes). If the designer has a System Trust intention with respect to say, the reliability of the user watching ads, then different systems get designed that learns to generate click-bait using neural networks such as clickotron). Or, closer to home, Instagram might decide to curate a feed for you without affordances to support changing of feed options. The truism goes ‘If you’re not paying, then you’re the product’. And products aren’t people. Products are systems.
    • Page 218: Graber (2001) argues that researchers oten treat the information value of images as a subsidiary to verbal information, rather than having value themselves. Slowly, studies employing visual measures and examining how images facilitate knowledge gain are emerging (Grabe, Bas, & van Driel, 2015; Graber, 2001; Prior, 2014). In a burgeoning media age with citizens who overwhelmingly favor (audio)visually distributed information, research momentum on the role of visual modalities in shaping informed citizenship is needed. Paired with it, reconsideration of the written word as the preeminent conduit of information and rational thought are necessary.
      • The rise of infographics  makes me believe that it’s not image and video per se, but clear information with low cognitive load.
  • ————————–
  • Bob had a little trouble with inappropriate and unclear identity, as well as education, info and other
  • Got tables working for terms and docs.
  • Got callbacks working from table clicks
  • Couldn’t get the table to display. Had to use this ugly hack.
  • Realized that I need name, weight and eigenval. Sorting is by eigenval. Weight is the multiplier of the weights in a row or column associated with a term or document. Mostly done.

Phil 4.27.16

7:00 – 5:30 VTX

  • Finished A fistful of bitcoins: characterizing payments among men with no names
    • In reading the discussion about ‘peeling’, I wonder if in a similar way, if someone returns to a story repeatedly, would an adversary be able to find out anything useful?Or, if Bitcoin were used to pay for stories, would tracking transactions do anything as well? One of the nice things about using aliases for BC addresses is that other than the initial mapping, the address can be hidden in the system.
    • Page 93: ...even the most motivated Bitcoin users (i.e., criminals) are engaging in idioms of use that allow us to erode their anonymity.
      • This is an important point. As with biometrics at the small scale, we are identifiable through our behaviors. In this case, idioms or patterns of usage.
  • Rating app
    • Add people – done
    • Add John’s suggestions – done
    • Build and deploy – Done. Waiting on Andy.
  • Write up TF_IDF story
    • Basic capability – 11 points
      • The initial part of the effort is to scan over the collection of documents and produce a list of words ordered by TF-IDF. This means iterating over all the documents and producing a Set<String> of words that are then run over the the set of documents. The output should be an excel file that lists the documents in the corpus, and the list of words.
        • Documents should be listed in a file (xml?) as URIs. HTML docs can be read by jsoup, PDF by PDFBox.
        • The TF-IDF algorithm is discussed here: https://guendouz.wordpress.com/2015/02/17/implementation-of-tf-idf-in-java/
    • Pull pages from approved flags – 3 points
      • The second part of the effort is to use Jeremy’s REST interface to extract the URLs of ‘cleared’ flags to use as the input to the app, via the input file (or call from within the app, though there may be certs issues)
    • Report with new term recommendations – 3 points
      • Using the rating app, we should be able to try using these new terms and see if they improve results. One of the items that will need to be returned from the DB (that’s already stored in the QueryObject2) so we can see if we’re getting cleaner results.
  • LanguageModelNetworks
    • Read in a spreadsheet (xls and xlsx)
    • Write out spreadsheets (page containing the data information
      • File
      • User
      • Date run
      • Settings used
    • allow for manipulation of row and column values (in this case, papers and codes, but the possibilities are endless)
      • Select the value to manipulate (reset should be an option)
      • Spinner/entry field to set changes (original value in label)
      • ‘Calculate’ button
      • Sorted list(s) of rows and columns. (indicate +/- change in rank)
    • Reset all button
    • Normalize all button
    • Progress for today! Lots of wiring up to do though: LMT

Phil 4.26.16

7:00 – 4:00 VTX

  • Reading through (and coding) A Fistfull of Bitcoins. In the ‘duh’ department, I realize that it should be possible to pay anonymous sources using BC since they both rely on the same mechanism. So when you submit a story, you can also use a bitcoin address. It would help in tracking users, that’s for sure. If you want to associate a bitcoin address at a later time, then a more detailed biometric analysis would have to take place. Maybe a game. Also, users should be able to create a BC address alias. These would have to be unique across the system(? Is this really true?), but that’s kind of like user name, so there are issues….
  • Worked on JavaFX layout issues need to figure out how to get the grid to scale? Or maybe use anchor points. More tomorrow.
  • Sprint retrospective
  • Presented the tool. Need to add users.

Phil 4.25.16

5:30 – 4:00 VTX

  • Saw this on Twitter about visualizing networks with D3
  • Working my way through the JavaFX tutorial. It is a lot like a blend of Flex and a rethought Swing. Nice, actually…
  • Here is the list of stock components
  • Starting with the ope file dialog – done.
  • Yep, there’s a spinner. And here’s dials and knobs
  • And here’s how to do a word cloud.
  • Here’s a TF-IDF implementation in JAVA. Need to build some code that reads in from our ‘negative match’ ‘positive match’ results and start to get some data driven terms
  • Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for “tree regular expressions”). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, calledsemgrex).
  • Semgrex
  • Sprint review
    • Google CSEs
      • Switched over from my personal CSEs to Vistronix CSEs
      • Added VCS rep for CSEs
      • Figured out how to save out and load CSE from XML
      • Added a few more CSEs ONLY_NET, MOBY_DICK
      • Wrote up care and feeding document for Confluence
      • Added blacklists
    • Rating App
      • Re-rigged the JPA classes to be Ontology-agnostic Version 2 of nearly everything)
      • Upped my JQL game to handle SELECT IN WHERE precompiled queries
      • Reading in VA and PA data now
      • Added the creation of a text JSON object that formalizes the rating of a flag
      • Got hooked up to the Talend DB!!!
      • Deployed initial version(s)
      • Added backlink logging using SemRush
    • Future work
      • Developed Excel ingest
      • Still working on PDF and Word ingest

Phil 4.22.16

7:00 – 4:30 VTX

  • Had a thought going to sleep last night that it would be interesting to see the difference between a ‘naive’ ranking based on the number of quotes vs. PageRank. Pretty much as soon as I got up, I pulled down the spreadsheet and got the lists. It’s in the previous post, but I’ll pot them here too:
    • Sorted from most to least quotes
      P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
      P13: The Egyptian Blogosphere.pdf
      P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
      P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
      P 5: Saracevic_relevance_75.pdf
      P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
      P77: The Law of Group Polarization.pdf
      P43: On the Accuracy of Media-based Conflict Event Data.pdf
      System Trust
      P37: Security-control methods for statistical databases – a comparative study.pdf
    • Sorted on Page Rank eigenvector
      P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
      System Trust
      Social Trust
      P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
      P84: What is Trust_ A Conceptual Analysis–AMCIS-2000.pdf
      P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
      Credibility Cues
      P13: The Egyptian Blogosphere.pdf
      P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
      P82: The ‘like me’ framework for recognizing and becoming an intentional agent.pdf
  • To me it’s really interesting how much better the codes are mixed in to the results. I actually thought it could be the other way, since the codes are common across many papers. Also, the concepts of System Trust, Social Trust and Credibility  Cues very much became a central point in my mind as I worked through the papers.
  • A second thought, which is the next step in the research, is to see ho weighting affects relationships. Right now, the the papers and codes are weighted by the number of quotes. What happens when all the weights are normalized (set to 1.0)?. And then there is the setup of the interactivity. With zero optimizations, this took 4.2 seconds to calculate on a modern laptop. Not sliderbar rates, but change a (some?) values and click a ‘run’ button.
  • So, moving forward, the next steps are to create the Swing App that will:
    • read in a spreadsheet (xls and xlsx)
    • Write out spreadsheets (page containing the data information
      • File
      • User
      • Date run
      • Settings used
    • allow for manipulation of row and column values (in this case, papers and codes, but the possibilities are endless)
      • Select the value to manipulate (reset should be an option)
      • Spinner/entry field to set changes (original value in label)
      • ‘Calculate’ button
      • Sorted list(s) of rows and columns. (indicate +/- change in rank)
    • Reset all button
    • Normalize all button
  • I’d like to do something with the connectivity graph. Not sure what yet.
  • And I think I’ll do this in JavaFX rather than Swing this time.
  • Huh. JavaFX Scene Builder is no longer supported by Oracle. Now it’s a Gluon project.
  • Documentation still seems to be at Oracle though
  • Spent most of the day seeing what’s going on with the Crawl. Turns out it was bad formatting on the terms?

Phil 4.21.16

7:00 – VTX

  • A little more bitcoin
  • Installed *another* new Java 1.8.0_92
  • Discovered the arXiv API page. This might be very helpful. I need to dig into it a bit.
  • Testing ranking code. I hate to say this, but if it works I think I’m going to write *another* Swing app to check interactivity rates. Which means I need to instrument the matrix calculations for timing.
  • Ok, the rank table is consistent across all columns. In my test code, the eigenvector stabilizes after 5 iterations:
    initial
     , col1, col2, col3, col4,
    row1, 11, 21, 31, 41,
    row2, 12, 22, 32, 42,
    row3, 13, 23, 33, 43,
    
    derived
    , row1, row2, row3, col1, col2, col3, col4,
    row1, 1, 0, 0, 0.26, 0.49, 0.72, 0.95,
    row2, 0, 1, 0, 0.28, 0.51, 0.74, 0.98,
    row3, 0, 0, 1, 0.3, 0.53, 0.77, 1,
    col1, 0.26, 0.28, 0.3, 1, 0, 0, 0,
    col2, 0.49, 0.51, 0.53, 0, 1, 0, 0,
    col3, 0.72, 0.74, 0.77, 0, 0, 1, 0,
    col4, 0.95, 0.98, 1, 0, 0, 0, 1,
    
    rank
    , row1, row2, row3, col1, col2, col3, col4,
    row1, 0.61, 0.62, 0.64, 0.22, 0.41, 0.59, 0.78,
    row2, 0.62, 0.65, 0.67, 0.23, 0.42, 0.61, 0.8,
    row3, 0.64, 0.67, 0.69, 0.24, 0.43, 0.63, 0.83,
    col1, 0.22, 0.23, 0.24, 0.08, 0.15, 0.22, 0.29,
    col2, 0.41, 0.42, 0.43, 0.15, 0.27, 0.4, 0.52,
    col3, 0.59, 0.61, 0.63, 0.22, 0.4, 0.58, 0.76,
    col4, 0.78, 0.8, 0.83, 0.29, 0.52, 0.76, 1,
    
    EigenVec
    row1, 1, 0.71, 0.62, 0.61, 0.61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    row2, 0, 0.46, 0.61, 0.62, 0.62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    row3, 0, 0.48, 0.63, 0.64, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col1, 0.26, 0.13, 0.21, 0.22, 0.22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col2, 0.49, 0.25, 0.38, 0.41, 0.41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col3, 0.72, 0.37, 0.55, 0.59, 0.59, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col4, 0.95, 0.49, 0.73, 0.78, 0.78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  • And after a lot of banging away, here’s my lit review in PageRank: pageRank
  • And here’s the difference between PageRank and sorting based on number of quotes:
  • Page Rank
    P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
    System Trust
    Social Trust
    P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
    P84: What is Trust_ A Conceptual Analysis–AMCIS-2000.pdf
    P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
    Credibility Cues
    P13: The Egyptian Blogosphere.pdf
    P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
    P82: The ‘like me’ framework for recognizing and becoming an intentional agent.pdf
  • Sorted from most to least quotes
    P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
    P13: The Egyptian Blogosphere.pdf
    P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
    P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
    P 5: Saracevic_relevance_75.pdf
    P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
    P77: The Law of Group Polarization.pdf
    P43: On the Accuracy of Media-based Conflict Event Data.pdf
    System Trust
    P37: Security-control methods for statistical databases – a comparative study.pdf

Phil 4.20.16

7:00 – 4:00 VTX

  • Read a little more of the BitCoin article. Nice description of the blockchain.
  • Generated a fresh Excel matrix of codes and papers. I excluded the ‘meta’ codes (Definitions, Methods, etc) and all the papers that have zero quotes
  • Duke Ellington & His Orchestra make great coding music.
  • Installed new Java
  • Well drat. I can’t increase the size of an existing matrix. Will have to return a new one.
  • Yay!
    FileUtils.getInputFileName() opening: mat1.xls
    initial
     , col1, col2, col3, col4, 
    row1, 11.0, 21.0, 31.0, 41.0, 
    row2, 12.0, 22.0, 32.0, 42.0, 
    row3, 13.0, 23.0, 33.0, 43.0, 
    done calculating
    done creating
    derived
     , row1, row2, row3, col1, col2, col3, col4, 
    row1, 0.0, 0.0, 0.0, 11.0, 21.0, 31.0, 41.0, 
    row2, 0.0, 0.0, 0.0, 12.0, 22.0, 32.0, 42.0, 
    row3, 0.0, 0.0, 0.0, 13.0, 23.0, 33.0, 43.0, 
    col1, 11.0, 12.0, 13.0, 0.0, 0.0, 0.0, 0.0, 
    col2, 21.0, 22.0, 23.0, 0.0, 0.0, 0.0, 0.0, 
    col3, 31.0, 32.0, 33.0, 0.0, 0.0, 0.0, 0.0, 
    col4, 41.0, 42.0, 43.0, 0.0, 0.0, 0.0, 0.0,
  • Need to set Identity and other housekeeping.
  • Added ‘normalizeByMatrix that sets the entire matrix on a unit scale
  • Need to have a calcRank function that squares and normalizes until the difference between output eigenvectors are below a certain threshold or a limit of iterations. Done?

Phil 4.19.16

7:00 – 4:00 VTX

  • No reading today! Actually, scanning the Bitcoin acm article, but the goal is to get the PageRank code working with the excel outputs from Atlas and then be able to re-weight the eigenvectors. Then we’ll see if that can be used for collapsing the codes.
  • Reading in spreadsheets!
  • Working on buildSymmetricMatrix that will create a matrix that has all the row and columns represented on each axis of the matrix. it looks(?) like setSubMatrix will allow this…
  • Got the symmetric double[][] built and validated in the debugger. tomorrow we put this all together.
  • Walked through my ML concept for document filtering with Mike G and Aaron in two unrelated conversations.
  • Checked in John Robert’s ONLY_NET CSE

Phil 4.18.16

7:00 – 11:00 VTX

  • Timesheets
  • Sent Aaron a status since I’ll be in class for scrum
  • Continuing The ‘like me’ framework for recognizing and becoming an intentional agent
    • Page 12, at the end of General Discussion: Based on the ‘like me’ perception of others’ actions, social encounters are more interpretable than supposed by classical ‘solipsism’ theories such as Piaget’s. Infants can use themselves as a framework for understanding the subjectivity of others and reciprocally learn about the possibilities inherent in their own actions by observing the actions of others. Through social interaction with other intentional agents who are viewed as ‘like me,’ infants develop a richer social cognition
      • So we’re wired to trust that other people’s(!) actions will be like our own. We learn early (only?) behavior through empathy. That sets the pattern we use going forwards. And it is also wired into us. This suggests two things. First, it’s hackable. Second, there are affordances for social and system trust that should be used in the designs of systems that one, may have a ‘social back end’ (IR/Search/GPS), and two, systems that appear to be social need to be carefully constructed so that human trusting intention isn’t short-circuited.
  • Continuing to work on Adjacency and PageRank java code. Working from home today, so that means updating, checking out and fixing what’s broke.
    • Built test matrices from codes and docs
    • IntelliJ housekeeping. New version! Restarts…
    • And the code compiles!
    • Need to read from excel
    • Need a ‘build symmetric matrix’ method