Monthly Archives: November 2016

Phil 11.30.16

7:00 – 3:30 ASRC

  • Wrote up my notes from chat with Shimei. I think the first step is to look through the UTOPIAN paper again and see how (if?) summary and coclustering is being handled.
    • Downloaded her suggested papers
    • It looks like the row and column matricies might be useful and manipulable. Digging into the NMF java class for some more manipulation
    • Added raw, weight and scaled matrices
    • Need to add ranked row, column and cell output for L2DMat – done here’s some data and thoughts:
      rMat
       , D1, D2, D3, D4, 
      U1, 5, 3, 0, 1, 
      U2, 4, 0, 0, 1, 
      U3, 1, 1, 0, 5, 
      U4, 1, 0, 0, 4, 
      U5, 0, 1, 5, 4, 
      
      newMat
       , D1, D2, D3, D4, 
      U1, 5.05, 2.87, 5.26, 1, 
      U2, 3.96, 2.25, 4.27, 1, 
      U3, 1.11, 0.71, 4.4, 4.99, 
      U4, 0.94, 0.6, 3.57, 3.99, 
      U5, 2.35, 1.39, 4.87, 4.05, 
      average difference = 0.09750770110043207
      sorted columns {D3=22.36862672329615, D4=15.038484762558607, D1=13.410342394629499, D2=7.815842574518472}
      sorted rows {U1=14.17790755369198, U5=12.657839100920228, U2=11.485548694067901, U3=11.209516468182759, U4=9.102484638139858}
      
      Manipulting row weights by column
      
      newMat weight col 0 set to 1.0
       , D1, D2, D3, D4, 
      U1, 4.9, 2.76, 4.44, 0, 
      U2, 3.81, 2.15, 3.45, 0, 
      U3, 0.35, 0.2, 0.32, 0, 
      U4, 0.34, 0.19, 0.31, 0, 
      U5, 1.73, 0.98, 1.57, 0, 
      sorted columns {D1=11.121458227331996, D3=10.081893895718448, D2=6.276360972184673, D4=0.0}
      sorted rows {U1=12.101008726739368, U2=9.406070587569038, U5=4.271932099958697, U3=0.869188591004756, U4=0.8315130899632566}
      
      newMat weight col 1 set to 1.0
       , D1, D2, D3, D4, 
      U1, 0.15, 0.1, 0.82, 1, 
      U2, 0.15, 0.1, 0.82, 1, 
      U3, 0.76, 0.51, 4.08, 4.99, 
      U4, 0.61, 0.41, 3.26, 3.99, 
      U5, 0.62, 0.41, 3.31, 4.05, 
      sorted columns {D4=15.038484762558607, D3=12.286732827577703, D1=2.2888841672975038, D2=1.539481602333799}
      sorted rows {U3=10.340327877178003, U5=8.38590700096153, U4=8.2709715481766, U2=2.079478106498862, U1=2.076898826952612}
    • According to Choo, the columns in the factor mats are the latent topics. That means, for example, when all the document columns are zeroed out but one, the high-ranked terms are the topics for that document (And LSI will extract those terms???). And when all the term columns are zeroed out but one, the documents are sorted relevant to that term. Big gaps mean clusters, or maybe just the cluster is up to the first gap???
  • Add this one to the list? Characteristics to look for? Hate Spin: The Twin Political Strategies of Religious Incitement and Offense-Taking
  • Deep Learning MIT book (pdf)
  • Back to Sociophysics.
    • To build a scale-free network, AL Barabási, R Albert in Emergence of scaling in random networks start with a small random network and incrementally add nodes where the probability of connecting a new node with existing nodes is proportional to how many connections the current nodes have.
      network.createInitialNodes(SOME_SMALL_VALUE)
      for(i = 0 to desired)
      	n = createNewNode()
      	totalLinks = countAllLinks()
      	for(j = 0 to network.numNodes)
      		curNode = getNode(j)
      		links = curNode.getLinks
      		probability = links/totalLinks
      		curNode.addNeighbor(n, probability)
      	network.addNode(n)
    • Does node aging matter in this model?
    • Null Models For Social Networks (for comparison and testing)
  • Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources <- One of the most popular articles from 2016 via Altmetric
  • Skype messenger meeting with Aaron and Katy going over the data we have

Phil 11.29.16

7:00 – 5:30 ASRC

  • How Le Monde is taking on fake news
  • Thinking about Jonathan Albright‘s work. How is it crawled? Is it really just inbound links? Can I get the data? I need to ask.
  • Back to Sociophysics.
    • Clustering coefficient (video)
      CC = 0
      numNodes = 0
      for(i = 0 to max)
      	for(j = 0 to max)
      		n = node(i,j)
      		k = n.numNeighbors()
      		a = n.numLinksBetweenNeighbors()
      		n.setNodeCC((2*a)/(k*(k-1)))
      		CC += n.getNodeCC()
      		numNodes++
      CC = CC/numNodes
    • Clustering coefficient ordering: random -> small world -> regular
  • Got the NMF built into CorpusManager. Here’s the first four chapters of Moby Dick as:
    • BOW: there think harpooneer about little landlord sleep could would
    • TF-IDF: nantucket harpooneer queequeg landlord euroclydon bedford lazarus passenger circumstance
    • NMF: nantucket harpooneer queequeg landlord euroclydon bedford lazarus passenger circumstance
    • BOW/centrality: there think would queequeg could about whale little first
    • TF-IDF/centrality : about harpooneer night landlord stand light nantucket where other
    • NMF/centrality : harpooneer queequeg landlord water nantucket circumstance sailor passenger about
    • (centrality with equalized docs)
  • Meeting with Shimei

Phil 11.28.16

7:00 – 5:00 ASRC

  • Stumbled upon the ACM Transactions on Interactive Intelligent Systems (TIIS). They have two interesting upcoming issues:
  • Jonathan Albright came up on my Twitter feed. He’s doing interesting data journalism. Here’s his thoughts on fake news. It’s really odd that he’s not published peer reviewed. Is this because he’s at a teaching university?
  • Looking through Sociophysics, and finding some interesting references.
    • Minority Opinion Spreading in Random Geometry
      • Abstract: The dynamics of spreading of the minority opinion in public debates (a reform proposal, a behavior change, a military retaliation) is studied using a diffusion reaction model. People move by discrete step on a landscape of random geometry shaped by social life (offices, houses, bars, and restaurants). A perfect world is considered with no advantage to the minority. A one person-one argument principle is applied to determine locally individual mind changes. In case of equality, a collective doubt is evoked which in turn favors the Status Quo. Starting from a large in favor of the proposal initial majority, repeated random size local discussions are found to drive the majority reversal along the minority hostile view. Total opinion refusal is completed within few days. Recent national collective issues are revisited. The model may apply to rumor and fear propagation.
  • Updating intellij and waiting for 497MB to download
  • Continue to generalize NMF. get k tested and implicit in the matrix passing. Start NMF class as part of JavaUtils. Done
  • Start to integrate NMF into CorpusManager. Initially, I’m just going to use it to produce the matrix, like TF-IDF.
    • Computing, now I need to sort and trim
  • Fika with Aaron on writing. Need to ask for his slide deck.
  • Meeting with Wayne, mostly catching up. What book should I give hime? The most tabbed are Sciences of the Artificial, Last Place on Earth, and Social Science.

Phil 11.24.16

8:00 – 10:00 ASRC

Phil 11.23.16

7:30 – 10:30 ASRC

  • Wrote up notes from yesterday’s meetings with Don and Shimei.
  • Really just getting ready for T-day, but I ran my list of recipies through the TF-IDF and LMN tools and now I have a nice, sparse matrix that I can try the NMF on.
  • Finish Matrix dot-product code and promote to Labled2DMatrix – done!!

Phil 11.22.16

7:00 – 5:00 ASRC

  • Worked on getting the spreadsheet of conferences, journals and grant started
  • Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. Details here.
    • When δ increases, the communities become smaller but more densely connected.
    • It should be very interesting to look at belief velocity at different scales.
  • A Plethora of Data Set Repositories
  • More NMF. Getting closer
  • Installing Python on the laptop for discussion with Don
  • Got everything working in java! Need to move the dot product code into Labeled2DMatrix and flesh out the other cases.
    rMat
     , D1, D2, D3, D4, 
    U1, 5, 3, 0, 1, 
    U2, 4, 0, 0, 1, 
    U3, 1, 1, 0, 5, 
    U4, 1, 0, 0, 4, 
    U5, 0, 1, 5, 4, 
    
    rowMat
    
    U1, 0.67, 0.89, 
    U2, 0.36, 0.47, 
    U3, 0.51, 0.27, 
    U4, 0.11, 0.84, 
    U5, 0.23, 0.88, 
    
    colMat
    
    D1, 0.36, 0.68, 
    D2, 0.84, 0.06, 
    D3, 0.07, 0.06, 
    D4, 0.65, 0.16, 
    
    steps = 5000
    
    P
    Array2DRowRealMatrix{{0.1714659334,2.4334642215},{0.2222526463,1.8424266034},{1.8809519431,0.3877676639},{1.5002592207,0.3319796716},{1.398228183,1.5413729554}}
    
    Q
    Array2DRowRealMatrix{{0.1642944844,0.083284122,1.152720993,2.6155442597},{2.0998133805,1.0434120295,2.0884233062,0.228777745}}
    
    rowMat
    
    U1, 0.17, 2.43, 
    U2, 0.22, 1.84, 
    U3, 1.88, 0.39, 
    U4, 1.5, 0.33, 
    U5, 1.4, 1.54, 
    
    colMat
    
    D1, 0.16, 2.1, 
    D2, 0.08, 1.04, 
    D3, 1.15, 2.09, 
    D4, 2.62, 0.23, 
    
    newMat
     , D1, D2, D3, D4, 
    U1, 5.14, 2.55, 5.28, 1.01, 
    U2, 3.91, 1.94, 4.1, 1, 
    U3, 1.12, 0.56, 2.98, 5.01, 
    U4, 0.94, 0.47, 2.42, 4, 
    U5, 3.47, 1.72, 4.83, 4.01,
  • Meeting with Don.
    • Looked through the modelling and UTOPIAN papers, and walked through some of the math. We’ll meet next Friday to try to convert some of the equations into java code
  • Meeting with Shimei
    • There are ways of getting better stability with LDA. Still ok to do NMF, though there may be issues with scaling. That’s where a stable version of LDA might make sense.

Phil 11.21.16

6:45 – 4:45 ASRC

  • Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. Details here.
  • More NMF
    P = [[ 0.67503659  0.89795272]
     [ 0.36939303  0.47816356]
     [ 0.51019257  0.27772317]
     [ 0.1130504   0.84860109]
     [ 0.23238542  0.88222005]]
    
    Q = [[ 0.36692407  0.6844149 ]
     [ 0.84469693  0.06331073]
     [ 0.07366106  0.06603799]
     [ 0.65677669  0.16947152]]
    
    nP = [[ 0.16286496  2.42456084]
     [ 0.21647521  1.83981127]
     [ 1.9047257   0.39049035]
     [ 1.52103295  0.33509559]
     [ 1.41350212  1.51711067]]
    
    nQ = [[ 0.15875994  2.09665688]
     [ 0.08334172  1.04818927]
     [ 1.16320811  2.09280482]
     [ 2.56431807  0.24424636]]
    
    nQt = [[ 0.15875994  0.08334172  1.16320811  2.56431807]
     [ 2.09665688  1.04818927  2.09280482  0.24424636]]
    
    R = [[5 3 0 1]
     [4 0 0 1]
     [1 1 0 5]
     [1 0 0 4]
     [0 1 5 4]]
    
    nR = [[ 5.10932861  2.55497211  5.26357846  1.00982771]
     [ 3.89182055  1.94651185  4.10217161  1.00447849]
     [ 1.12111842  0.56805092  3.03281247  4.97969837]
     [ 0.94405957  0.4780091   2.47056752  3.98225815]
     [ 3.40526805  1.70802283  4.81921366  3.99521777]]
    • Hard coded the random values for gradient descent to compare python and java
    • Stepping h
  • Sprint stuff?
    • Scrum
    • Sent Jeremy the svn file names for my Vistronix code
  • Fika
  • Meeting with Wayne? Basic catching up. started the spreadsheet of conferences and grants

Phil 11.18.16

7:00 – 4:00 ASRC

Phil 11.17.16

7:00 – 10:00, 10:30 – 5:30 ASRC

Phil 11.16.16

7:00 – 4:00 ASRC