Category Archives: Fact Checking

Phil 6.15.16

7:00 – 10:00, 12:00 – 4:00 VTX

  • Got the official word that I should be charging the project for research. Saved the email this time.
  • Continuing to work on the papers list
  • And in the process of looking at Daniele Quercia‘s work, I found Auralist: introducing serendipity into music recommendation which was cited by
    An investigation on the serendipity problem in recommender systems. Which has the following introduction:

    • In the book ‘‘The Filter Bubble: What the Internet Is Hiding from You’’, Eli Pariser argues that Internet is limiting our horizons (Parisier, 2011). He worries that personalized filters, such as Google search or Facebook delivery of news from our friends, create individual universes of information for each of us, in which we are fed only with information we are familiar with and that confirms our beliefs. These filters are opaque, that is to say, we do not know what is being hidden from us, and may be dangerous because they threaten to deprive us from serendipitous encounters that spark creativity, innovation, and the democratic exchange of ideas. Similar observations have been previously made by Gori and Witten (2005) and extensively developed in their book ‘‘Web Dragons, Inside the Myths of Search Engine Technology’’ (Witten, Gori, & Numerico, 2006), where the metaphor of search engines as modern dragons or gatekeepers of a treasure is justified by the fact that ‘‘the immense treasure they guard is society’s repository of knowledge’’ and all of us accept dragons as mediators when having access to that treasure. But most of us do not know how those dragons work, and all of us (probably the search engines’ creators, either) are not able to explain the reason why a specific web page ranked first when we issued a query. This gives rise to the so called bubble of Web visibility, where people who want to promote visibility of a Web site fight against heuristics adopted by most popular search engines, whose details and biases are closely guarded trade secrets.
    • Added both papers to the corpus. Need to read and code. What I’m doing is different in that I want to add a level of interactivity to the serendipity display that looks for user patterns in how they react to the presented serendipity and incorporate that pattern into a trustworthiness evaluation of the web content. I’m also doing it in Journalism, which is a bit different in its constraints. And I’m trying to tie it back to Group Polarization and opinion drift.
  • Also, Raz Schwartx at Facebook: , Editorial Algorithms: Using Social Media to Discover and Report Local News
  • Working on getting all html and pdf files in one matrix
  • Spent the day chasing down a bug where if the string being annotated is too long (I’ve set the  number of wordes to 60), then we skip. THis leads to a divide by zero issue. Fixed now

Phil 5.5.16

7:00 – 5:30 VTX

  • Continuing An Introduction to the Bootstrap.
  • This helped a lot. I hope it’s right…
  • Had a thought about how to build the Bootstrap class. Build it using RealVector and then use Interface RealVectorPreservingVisitor to do whatever calculation is desired. Default methods for Mean, Median, Variance and StdDev. It will probably need arguments for max iteration and epsilon.
  • Didn’t do that at all. Wound up using ArrayRealVector for the population and Percentile to hold the mean and variance values. I can add something else later
  • I think to capture how the centrality affects the makeup of the data in a matrix. I think it makes sense to use the normalized eigenvector to multiply the counts in the initial matrix and submit that population (the whole matrix) to the Bootstrap
  • Meeting with Wayne? Need to finish tool updates though.
  • Got bogged down in understanding the Percentile class and how binomial distributions work.
  • Built and then fixed a copy ctor for Labled2DMatrix.
  • Testing. It looks ok, but I want to try multiplying the counts by the eigenVec. Tomorrow.

Phil 5.2.16

7:00 – 3:00 VTX

  • How to get funding using Web of Science
  • Finished  Supporting Reflective Public Thought with ConsiderIt
    • Watched the ConsiderIt demo. I love the histogram that shows how the issue polarization is characterized.
  • Back to  Informed Citizenship in a Media-Centric Way of Life
    • Page 225 – Conclusions: As prescriptive as it may sound, it is time to suspend the normative traditions that envelop journalism and democracy, take stock of how knowledge is explicated and operationalized, and calibrate research practice to accommodate an explication of informed citizenship and democratic participation itted to contemporary life. Doing so strays from the dominant research paradigm, grounded in convictions about the supremacy of rational thought, verbal information, news as cold hard facts, and electoral activities as the gold standard of participatory practices. We advanced arguments for a departure from tradition and elaborated on how the very notions of informed citizenship and political participation are mutating in (and because of) the current media environment.
    • And this is kind of scary: Freedom is on the longest global downward trajectory in 40 years (Freedom House, 2011), democratic failure is at the highest rate since the mid-1980s (Diamond, 1999), and there are indicators of qualitative erosion in democratic practice worldwide (Bertelsmann Foundation, 2012). he people’s view on democratic life appears tepid, in several parts of the world, there are reports of a so-called authoritarial nostalgia among citizens who live in Asian countries that are transforming to democratic systems of governance (Chang, Chu, & Park, 2007) while a mere half (or fewer) of Russians, Poles, Ukrainians, and Indonesians expressed strong support for democratic rule (World Public, 2015).
      • Make America Great Again.
    • Done. Reading this makes me feel more like a connectivist/AI revolution is coming that will either tend towards isolating us more or finding ways to bring us together. The thing is that we’re wired to do both. So this really is a design problem.
  • ————————————
  • Well drat, was going to do some light work on developing the ranking app, but it looks like I forgot to check in the latest version of Java Utils
  • Installed Launch4j
  • TODO:
    • Add a ‘session name’ text field – done
    • Add a ‘interactive’ checkbox. If it’s selected, then change in the weight slider will fire calculate(). Done
    • Fixed the ‘Reset Weights’
    • Got the ‘Use Unit Weights’ option. I just replace all the non-zero values in the derived symmetric matrix to 1.0. I have a suspicion that this will come back to bite me, but for now I can’t think of a reason. The only thing that I really don’t like is that there is no obvious change in the data. The ‘Weights’ column actually means ‘scalar’. This issue is that the whole matrix would have to be shown, since the weight exists at the intersection of two items. So a row or column is sort of a sum of weights.
    • Start TF-IDF app. It should do the following:
      • Take a list of URIs (local or remote, pdf, html, text). These are the documents
      • Read each of the documents into a data structure that has
        • Document title
        • Keywords (if called out)
        • Word list (lemmatized)
          • Word
          • Document count
          • Parts Of Speech(?)
      • Run TF-IDF to produce an ordered list of terms
      • Build a co-occurrence matrix of terms and documents
      • Output matrix to Excel.
  • The end of a good day:

LMT With Data

Phil 4.29.16

7:00 – 5:00 VTX

  • Expense reports and timesheets! Done.
  • Continuing Informed Citizenship in a Media-Centric Way of Life
    • The pertinence interface may be an example of a UI affording the concept of monitorial citizenship.
      • Page 219: The monitorial citizen, in Schudson’s (1998) view, does environmental surveillance rather than gathering in-depth information. By implication, citizens have social awareness that spans vast territory without having in-depth understanding of specific topics. Related to the idea of monitorial instead of informed citizenship, Pew Center (2008) data identified an emerging group of young (18–34) mobile media users called news grazers. These grazers ind what they need by switching across media platforms rather than waiting for content to be served.
    • Page 222: Risk as Feelings. The abstract is below. There is an emotional hacking aspect here that traditional journalism has used (heuristically?) for most(?) of its history.
      • Virtually all current theories of choice under risk or uncertainty are cognitive and consequentialist. They assume that people assess the desirability and likelihood of possible outcomes of choice alternatives and integrate this information through some type of expectation-based calculus to arrive at a decision. The authors propose an alternative theoretical perspective, the risk-as-feelings hypothesis, that highlights the role of affect experienced at the moment of decision making. Drawing on research from clinical, physiological, and other subfields of psychology, they show that emotional reactions to risky situations often diverge from cognitive assessments of those risks. When such divergence occurs, emotional reactions often drive behavior. The risk-as-feelings hypothesis is shown to explain a wide range of phenomena that have resisted interpretation in cognitive–consequentialist terms.
    • At page 223 – Elections as the canon of participation

  • Working on getting tables to sort – Done

  • Loading excel file -done
  • Calculating – done
  • Using weights -done
  • Reset weights – done
  • Saving (don’t forget to add sheet with variables!) – done
  • Wrapped in executable – done
  • Uploading to dropbox. Wow – the files with JavaFX are *much* bigger than Swing.

Phil 4.27.16

7:00 – 5:30 VTX

  • Finished A fistful of bitcoins: characterizing payments among men with no names
    • In reading the discussion about ‘peeling’, I wonder if in a similar way, if someone returns to a story repeatedly, would an adversary be able to find out anything useful?Or, if Bitcoin were used to pay for stories, would tracking transactions do anything as well? One of the nice things about using aliases for BC addresses is that other than the initial mapping, the address can be hidden in the system.
    • Page 93: ...even the most motivated Bitcoin users (i.e., criminals) are engaging in idioms of use that allow us to erode their anonymity.
      • This is an important point. As with biometrics at the small scale, we are identifiable through our behaviors. In this case, idioms or patterns of usage.
  • Rating app
    • Add people – done
    • Add John’s suggestions – done
    • Build and deploy – Done. Waiting on Andy.
  • Write up TF_IDF story
    • Basic capability – 11 points
      • The initial part of the effort is to scan over the collection of documents and produce a list of words ordered by TF-IDF. This means iterating over all the documents and producing a Set<String> of words that are then run over the the set of documents. The output should be an excel file that lists the documents in the corpus, and the list of words.
        • Documents should be listed in a file (xml?) as URIs. HTML docs can be read by jsoup, PDF by PDFBox.
        • The TF-IDF algorithm is discussed here:
    • Pull pages from approved flags – 3 points
      • The second part of the effort is to use Jeremy’s REST interface to extract the URLs of ‘cleared’ flags to use as the input to the app, via the input file (or call from within the app, though there may be certs issues)
    • Report with new term recommendations – 3 points
      • Using the rating app, we should be able to try using these new terms and see if they improve results. One of the items that will need to be returned from the DB (that’s already stored in the QueryObject2) so we can see if we’re getting cleaner results.
  • LanguageModelNetworks
    • Read in a spreadsheet (xls and xlsx)
    • Write out spreadsheets (page containing the data information
      • File
      • User
      • Date run
      • Settings used
    • allow for manipulation of row and column values (in this case, papers and codes, but the possibilities are endless)
      • Select the value to manipulate (reset should be an option)
      • Spinner/entry field to set changes (original value in label)
      • ‘Calculate’ button
      • Sorted list(s) of rows and columns. (indicate +/- change in rank)
    • Reset all button
    • Normalize all button
    • Progress for today! Lots of wiring up to do though: LMT

Phil 4.26.16

7:00 – 4:00 VTX

  • Reading through (and coding) A Fistfull of Bitcoins. In the ‘duh’ department, I realize that it should be possible to pay anonymous sources using BC since they both rely on the same mechanism. So when you submit a story, you can also use a bitcoin address. It would help in tracking users, that’s for sure. If you want to associate a bitcoin address at a later time, then a more detailed biometric analysis would have to take place. Maybe a game. Also, users should be able to create a BC address alias. These would have to be unique across the system(? Is this really true?), but that’s kind of like user name, so there are issues….
  • Worked on JavaFX layout issues need to figure out how to get the grid to scale? Or maybe use anchor points. More tomorrow.
  • Sprint retrospective
  • Presented the tool. Need to add users.

Phil 4.15.16

7:00 – 4:30 VTX

  • Good meeting with Wayne yesterday evening
  • Tensorflow playground
  • Continuing The ‘like me’ framework for recognizing and becoming an intentional agent
    • Page 4: Based on the ‘like me’ framework, I hypothesized that it would be possible to demonstrate such tool-use learning at younger ages by transforming the situation. Instead of having the infant sit across the table from the adult, I had them sit side-by-side. In that way the adult’s actions could more easily serve as a blueprint for the child’s own action plans. Recent brain imaging studies with adults show the facilitative effects of seeing a to-be-imitated action from one’s own point of view (Jackson, Meltzoff, & Decety, 2006).
    • Page 5:This study was the first to show infants how to use complex tools ‘from their own perspective.’ Sitting shoulder-to-shoulder with the child closes the gap between the perceived and executed actions. The model becomes more ‘like me.’ 
      • Eyewitness value, photos and images all come from a ‘like me’ framework. As much as possible, we are looking out of the eyes of the witness. This high level of credibility traces all the way back to infancy. Wow. On a related note, this has implications for news reporting using VR.
    • Page 6: Evidently, young toddlers can understand our goals even if we fail to fulfill them. In another study (Meltzoff, 1995; Experiment 2), it was shown that infants did not reenact the target act if  they saw a mechanical device rather than a person performing the ‘slipping’ movements. The device did not look human and had poles as arms and pincers instead of fingers, but it traced the same spatiotemporal pattern as did the person’s yanking. Infants did not pull apart the dumbbell at any higher than baseline levels in this case. They did, however, correctly perform the target act in another condition in which the mechanical device succeeded in pulling apart the dumbbell. This makes sense, because in the case of success the object transformation is visible (it is pulled apart), but in the case of the unsuccessful attempt, there is no object transformation, only a ‘slipping’ motion that has to be interpreted at a different level.
      • Does this mean that we have a ‘wired-in’ model of the intention of others?
    • Page 7: Persistence and emotions as markers of infants’ intention—In further work, I showed 18-month-olds (N = 33) the standard unsuccessful-attempt display, but handed them a trick toy. The toy had been surreptitiously glued shut before the study began. When infants picked it up and attempted to pull it apart, their hands slipped off the ends. This, of course, matched the surface behavior of the adult. The question was whether this imitation of the adults’ behavior satisfied the infants. It did not. When infants matched the surface behavior of the adult, they did not terminate their behavior. They repeatedly grabbed the toy, yanked on it in different ways, and appealed to the adult for help by looking and vocalizing. About 90% (20/23) of those who tried to pull apart the object immediately stared at the adult after they failed to do so (mean latency = 1.74 s). Why were they appealing for help? They had matched the adult’s surface behavior. Evidently, they were striving toward something else: the adult’s goals, not his literal behavior
      • Definately a model of something… And a goal.
    • Page 7: We also conducted related neuroscience work in adults. The results reveal that neural structures known to be involved in adult theory-of-mind tasks (medial prefrontal cortex) are activated in tasks requiring adults to infer unconsummated goals in basic action tasks (Chaminade, Meltzoff, & Decety, 2002; see also Reid, Csibra, Belsky, & Johnson, 2007, for related work). This suggests a tie between the processing of action sequences in terms of goals and more sophisticated aspects of social cognition.
    • Page 7: Our adult commonsense psychology includes a distinction between the types of entities that are accorded goals and intentions and those that are not. We ascribe a goal to the archer not to the arrow that reaches (or misses) the target
      • That’s a fundamental ‘humanness’ definition that Social Trust depends on. If the inferred goals are trustworthy, then slips in behavior are discounted.
    • Page 7: I am currently exploring whether mechanical devices such as social robots can be treated as ‘like me’ based on bodily structure and/ or the type of behavior they exhibit, prompting action imitation by the infant. Preliminary results suggest so.
  • —————–
  • Updated the deployable RatingApp.exe. Asked Andy to set up a Skype meeting so I can demo.
  • Presented and deployed.
  • Made a new CSE that only points to the online Moby Dick, that can be used for query testing.

Phil 4.14.16

7:00 – 3:30 VTX

  • Continuing The ‘like me’ framework for recognizing and becoming an intentional agent
  • Page 2: Perception influences production, and production influences perception, with substantial implications for social cognition.
    • This must be a foundational element of Social Trust. I see you do a thing. I imitate the thing. I feel (not think!) that it is the same thing. I do a thing. You imitate the thing. Think peekaboo. We establish a rapport. This is different from System Trust, where I put something somewhere and it’s still there. System trust may be derived fundamentally from Object Permanence, while Social Trust comes from imitation?
    • This is(?) tied to motor neurons. From Mirror neurons: Enigma of the metaphysical modular brainEssentially, mirror neurons respond to actions that we observe in others. The interesting part is that mirror neurons fire in the same way when we actually recreate that action ourselves.
      • Implications for design? Journalism is definitely built around the ‘like me’ concept that it is built around stories. IR is much less so, and is more data focused.
    • At section 3 – Experiment 1: learning tool-use by observing others
      • We have Social Trust first. Then we learn to use tools. Tools are different from, though related to the environment. They are not ‘like me’, but they extend me (Heidegger again). More later.
  • Page 3: For example, there is an intimate relation between striving to achieve a goal and a concomitant facial expression and effortful bodily acts.
    • This is like the boot loader or initial dictionary entry. Hard-wired common vocabulary.
  • Page 3: Humans, including preverbal infants, imbue the acts of others with felt meaning not solely (or at first) through a formal process of step-by-step reasoning, but because the other is processed as ‘like me.’ This is underwritten by the way humans represent action—the supramodal action code—and self experience
    • So is there a ‘more like me’ and ‘less like me’?
  • Meeting with Wayne this evening
    • Go over notes
    • Coding session
  • ——————
  • Check to see that reports are being made correctly
    • Fix “Get all rated” Numerous issues, including strings with commas
    • Fix “Get Match Counts” all zeros
    • Fix “Get No Match Counts” redundent
    • Change “Get Blacklist (CSV)” to “Black/White list (CSV)
    • Add “Get Whitelist (Google CSE)
    • Change the Sets in getBlack/Whitelist to use maps rather than sets so blacklist culling can be used with more informative rows.
  • Update remote DB and test a few pages. Ran into a problem with LONGTEXT and Postgress. Went back to TEXT
  • Went over Aaron’s ASB slides a couple of times. Introduced him to Partial Least Squares Structural Equation Modeling (PLS-SEM).
  • Present new system to Andy, Margarita and John. Tomorrow…

Phil 4.13.16

7:00 – 4:30 VTX

  • One last thing from Deindividuation Effects on Group Polarization in Computer-Mediated Communication: The Role of Group Identification, Public-Self-Awareness, and Perceived Argument Quality. This is from the opening paragraph:
    • Group polarization refers to the well-established finding that following group discussion, individuals tend to endorse a more extreme position in the direction already favored by the group (Hogg, Turner, & Davidson, 1990; Isenberg, 1986; Moscovici & Zavalloni, 1969).
      • So this may require some expounding on, but it isn’t something that’s in dispute. Merging GPT with information network analytics in a way to simultaneously determine group membership while nudging for a view of a larger information horizon will require more scaffolding. But this plank is pretty solid.
      • And I like that my bibliography spans over 40 years and multiple disciplines.
  • I think I was able to put in a slot for the development of the slider functionality as it relates to a particular corpus. On a related note vector space classification is becoming a thing in NLP. In Deep or Shallow, NLP is Breaking Out in CACM, both Word2vec and GloVe are discussed. This ties back to The Hybrid Representation Model for Web Document Classification that I read back in January, where documents can be represented as clusters of vectors in an n-dimensional space. But now it looks like there are libraries. Woohoo! I do wonder if there’s a vector space analogy to chunking in this that could be useful. Maybe? Probably?
  • Starting The ‘like me’ framework for recognizing and becoming an intentional agent
    • Page 1: Autism has been described as a kind of ‘mind-blindness’ (Baron-Cohen, 1995) because children with autism do not conceptualize other people as psychological agents with a rich palette of mental states.
      • . What’s the influence on System Trust and Social Trust? Do groups of Autistic people polarize? Differently?
  • ————————-
  • Ok, after much flailing, I got the creation of queries correct for arbitrary PersonOfInterest. The problem is that beyond name and State, I really don’t know wht’s going to be used for a source file. So PoiObject has been generalized to handle name and state by default but everything else is optional. What was causing me all the trouble was having license_no as a ‘native’ value with a default of zero. When I made rules based on the presence or absence of this field, they couldn’t work. So much cleanup ensued…
  • Need to test the new data
    • Internal Exception: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column ‘source_url’ at row 1 – Fixed
      • Drop ‘queries’ out of provider menu dialog – Fixed
  • Need to change the queries in PoiObject that depended on licence_no. Commented out for now. Fixed.
  • Discussed geolocation with Aaron and sent out a note to the curation team for comments.

Phil 4.12.16

7:00 – 6:00 VTX

  • At the poster session yesterday, I had a nice chat with Yuanyuan about her poster on Supporting Common Ground Development in the Operation Room through Information Display Systems. It turns out that she is looking at information exchange patterns in groups independent of content, which is similar to what I’m looking at. We had a good discussion on group polarization and what might happen if misinformation was introduced into the OR. It turns out that this does happen – if the Attending Physician becomes convinced that, for example, all the instruments have been removed from the patient, the rest of the team can become convinced of this as well and self-reinforce the opinion.
  • Scanned through Deindividuation Effects on Group Polarization in Computer-Mediated Communication: The Role of Group Identification, Public-Self-Awareness, and Perceived Argument Quality. The upshot appears that individuation of participants acts as a drag on group polarization. So the more the information is personalized (and the more that the reader retains self awareness) the less the overall group polarization will move.
  • I’ve often said that humans innately communicate using stories and maps (Maps are comprehended at 3-4.5 years, Stories from when?). The above would support that stories are more effective ways of promoting ‘star’ information patterns. This is all starting to feel very fractal and self similar at differing scales…
  • Looking for children’s development of story comprehension led to this MIT PhD Thesis: TOWARD A MODEL OF CHILDREN’S STORY COMPREHENSION. Good lord – What a Committee: Marvin Minsky (thesis supervisor), Professors Joel Moses and Seymour Papert (thesis committee), Jeff Hill, Gerry Sussman, and Terry Winograd.
  • ———————
  • While reading Deep or Shallow, NLP is Breaking Out, I learned about word2vec. Googling led to, which has its own word2vec page, among a *lot* of other things. From their home page:
    • Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments, rather than as a research tool. Skymind is its commercial support arm.
    •  Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers. DL4J is customizable at scale. Released under the Apache 2.0 license, all derivatives of DL4J belong to their authors.
    •  By following the instructions on our Quick Start page, you can run your first examples of trained neural nets in minutes.
  • The word vector alternative is from the Stanford NLP folks: GloVe: Global Vectors for Word Representation. The link also has trained (extracted?) word vectors.
  • Testing the behavior of query construction and search results. Fixing stupid bugs. Testing more. Lathering, rinsing and repeating.
  • Some good discussions with Aaron on inferencing and toxicity profiles. Basically taking the outputs and determining correlations with the inputs. Which led to a very long day.