Monthly Archives: March 2016

Phil 3.31.16

7:00 – 4:00 VTX

  • Starting on What is Trust? A Conceptual Analysis and An Interdisciplinary Model.
  • Starting to set up the key and sitelist repo
  • It turns out that you can export xml configuration of the CSE and the annotations for that CSE. From
  • We can only have a total of 5k annotations. That’s not a problem – yet.
  • All the files are set up and transferred. New search engines are
    ONLY_COM = "cx=006834724223295726872:k0pebqyqa8m"
    ONLY_EDU = "cx=006834724223295726872:gded1dvdt94"
    ONLY_GOV = "cx=006834724223295726872:ydjrxqpedqq"
    ONLY_ORG = "cx=006834724223295726872:lsgxnigrfme"
    ONLY_US = "cx=006834724223295726872:dw0n0_hai6s"
  • Found a more credible source than (possibly just for New York state? But it has VA records..). Anyway, not only does it have a nice listing, it also has a pdf of the relevant board order. Which means we can build a good legal languagge model. Very nice:
  • Need to rethink the PoiObject class to be more general.

Phil 3.30.16

7:00 – 3:30 VTX

  • So I was starting The spreading of misinformation online, but it was discussing more of the same. This feels a lot like saturation. My thoughts are coalescing around the idea of the difference between trusted and trustworthy interactions in computer-mediated systems. The anonymous citizen journalism concept becomes a unifying thought experiment that can be used to show the potential strengths and weaknesses of particular concepts.
  • The last piece I think I need is what is trust from a developmental perspective. The initial google scholar search of “trust development” didn’t bring up exactly what I want (object permanence maybe?), but it did provide this: Effects of four computer-mediated communications channels on trust development. The citations provided this: The mechanics of trust: A framework for research and design In International Journal of Human – Computer Studies 2005 62(3):381-422. This one seems different enough to look through carefully.
  • Ok, I think I found what I’m looking for: The ‘like me’ framework for recognizing and becoming an intentional agent. I think I’ll read The Mechanics of Trust first, them ‘like me’ second.
  • Starting The mechanics of trust: A framework for research and design.
    • It does seem to be focused on how effectively a system transmits(?) cues that support well-placed trust. I think that we tend to confuse the trust we place in the channel vs the trust we place in the entity at the other end of the channel. And these lines are not clearly drawn:
      • In IR, we trust that the search engine is providing us with the relevant documents we seek. People trust Google more than Bing because the results are more pertinent. Does this trust carry over into the documents retrieved? Probably, though I can’t find a study that does this. (It would be pretty easy to do with the Google Custom Search Engine API + noise)
      • In GPS the trust in the system is very high, even though it is synthesizing information from retrieved and processed sources (maps, DTED, etc) that could in turn be wrong. Here though, the entity we are interacting with is clearly the GPS, not the mapmakers.
      • Skype, on the other hand is essentially transparent when it’s working right. And that ‘working right’ is a kind of conditional trust in the system that has no effect on out evaluation of the trustworthiness of the person that we are interacting with at the other end of the channel.
      • So what does that mean in the context of our imaginary citizen journalists?
        • They are anonymized. We have no names. We probably don’t even have the exact words as written. These are the same issues that newspapers face when dealing with anonymous sources. And in this case, it’s reasonable to assume that the newspaper is the entity that is attempting to get us to place our trust in it.
          • Reporters as proxies
          • Additional perspectives – images, videos etc.
          • Stories that match reader’s experiences, so that trust can be evaluated.
          • What else?
    • One of the cited papers is What is Trust? A Conceptual Analysis and An Interdisciplinary Model. Quickly scanning through it, I found this on page 830-831: Garfinkel found in natural experiments that people don’t trust others when things “go weird,” that is, when they face inexplicable, abnormal situations. For example, one subject told the experimenter he had a flat tire on the way to work. The experimenter responded, “What do you mean, you had a flat tire?” The subject replied, in a hostile way, “What do you mean? What do you mean? A flat tire is a flat tire. That is what I meant. Nothing special. What a crazy question!” At this point, trust between them broke down because the illogical question produced an abnormal situation. 
      • I think that this is core. Trust is tied to normalicy, and probably builds out from there. 
  • Prepping for the sprint planning session.
  • As far as the OMG work, I think the following
    • Set up version controlled system for Google CSE keys and url exclude lists, including a way to submit an url for inclusion in an exclusion list.
    • Add PDF parsing and storing to Crawl Service
    • Add MSWord parsing and storing to Crawl Service
    • Add MSExcel parsing and storing to CrawlService
    • Add backlink calculation and storing to CrawlService – this is looking like a good way to increase pertinence within a return, particularly with respect to the matched-name wrong-person condition.
    For the machine learning work
    • Get DB up, accessible and on a backup schedule
    • Set up deployment infrastructure for Rating App.
    • Small scale test of Rating App, with refinement and development of manual
    • Accumulate corpus
    • Test corpus in WEKA
      • Translator from DB to WEKA format
      • Construction of training data sets
      • Tests and evaluations
      • Report
    As far as my research, it’s more vague, so I’m just going to free-associate a bit here.
    First, I just need to write up the proposal, and since that’s where my head is at right now, it’s hard to come up with specifics. One of the overall goals is to build a search result interface that ‘nudges’ users from bubble patterns into star patterns.
    Secondly, it’s my current belief is that this interface could be along the lines of the word cloud plus slider display interface I’ve discussed with you before. On the back end, there’s a topic extraction/document classification system that builds a graph database that is used for:
    • In my case, placing the search results in a context of discussion vs information (DvI) along the axis’ defined by the topics in the search results. The user can select a topic (which then shows the DvI graphs and where the current search falls on those spectrums). Once a topic has been selected, the user can adjust the weights on subsequent topics, causing the result list to reorder and the position on the DvI graph to move.
    • In EIT’s case (1) predictions and alerts and (2) for the user interface [and I think this can be pitched as the gamified display]. For example, I think there are many cases where conditions for making a judgment (medical best practices or behavior related) may be ambiguous. Using such an interface could allow a user to explore and resolve such ambiguity. The nice thing is that in the EIT case, the data is (potentially) more structured and granular, allowing a more fluid analysis (e.g. a bad manager indirectly affecting performance or combined conditions such as opiate addiction + newborn).

Phil 3.29.16

7:00 – 4:00 VTX

  • Continuing The Law of Group Polarization – done!
    • Group polarization: A critical review and meta-analysis. Looks like a more rigorous version of TLoGP. It’s available in the library as a PDF if needed.
    • Page 194: In short, the external materials and expert panels shift the argument pool available to the deliberators and are also likely to have effects on social influence. 
      • The way I read this, external trusted sources can shift the poles if they are incorporated into the discussion. Think about how a GPS affects wayfinding arguments. If search interfaces are modified such that they show the range of opinion and the position of the ‘Ten Blue Links’ within that range then, given its high system trust, we might expect individuals to adjust their belief trajectories based on their understanding of the pole’s position given the larger information landscape.
    • Page 195: There are large lessons here about appropriate institutional design for 
      deliberating bodies. Group polarization can be heightened, diminished, and
      possibly even eliminated with seemingly small alterations in institutional

      • Now substitute system for institutional. Although I would contend that search is an institution, given its reach. Also, presenting a better mechanism for placing the returned information in a context allows for ‘nudging‘ cues, which seem to work better than more ‘authoritarian’ systems.
  • Starting The spreading of misinformation online
  • Before continuing on backlinks, I spent some tom,e looking at the Microsoft Oxford system. LUIS is interesting, though I’m not sure exactly how to take advantage of it yet. I think this can be a chatbot construction kit? The WebLM system looks more immediately useful, kinda like AlchemyNLP. Maybe cheaper? You need a key, which you get here. And this is different from the Academic Knowledge API, which is also an Oxford project, but not listed on the Oxford site.
  • Got the SrBacklinkObject persisting
  • Adding backlinks to the ResultItemObject2 class. Whoops! Forgot that you have to set both relationships in a many-to-one:


  • needed to split off the protocol from the and add it back to the curResult.displayLink to get backlinks
  • Done and working. Kinda like the fallback strategy.

Phil 3.28.16

7:00 – 2:30 VTX

  • Took some notes on the MS Tay fiasco yesterday. Need to ping Peter Lee and see if I can get anywhere talking about Group Polarization Theory. Done
  • Microsoft Research Open source for academics
  • Microsoft Language Understanding Intelligent Service (beta) LUIS
  • Veracity Roadmap:Is Big Data Objective, Truthful and Credible?
  • Continuing The Law of Group Polarization
    • Page 193: The  constraints  of  time  and  attention  call  for  limits  to heterogeneity; and-a separate point-for good deliberation to take place, some views  are properly placed off the  table, simply because  time  is  limited and they are so  invidious,  implausible,  or both.  This  point might seem  to  create  a  final conundrum: To know what points of view should be represented in any group deliberation,  it  is  important  to  have  a  good  sense  of  the  substantive  issues involved,  indeed a  sufficiently  good sense  as to  generate judgments about what points of view must be included and excluded. But if we already know that, why should we  not proceed directly to  the  merits?  If we  already  know that,  before deliberation occurs, does deliberation have any point at all?
    • The answer is that we often do  know enough to  know which views  count as reasonable,  without  knowing  which  view  counts  as  right,  and  this  point  is sufficient to allow people to  construct deliberative  processes that should correct for the most serious problems potentially created by group deliberation. What is necessary is not to allow every view to be heard, but to ensure that no single view is so  widely heard,  and reinforced,  that  people  are  unable  to  engage  in  critical evaluation of the reasonable competitors. 
  • Now that I’ve gotten the queries behaving, working on the SemRushIO and BacklinkObject
    • Added configuration file
    • Nice to know. If SemRush finds nothing, it returns
      ERROR 50 :: NOTHING FOUND so, we can do two passes; if the specific result returns nothing, we can go to the root.
    • Built up the SemRush base class based on the JsonLoadable
    • Built the SrBacklinkObject
    • Loading the object successfully.
  • Fika

Phil 3.26.16

Peter LeeCorporate Vice President, Microsoft Research

Learning from Tay’s introduction

Phil 3.25.16

7:30 – 3:30 VTX

  • Saw The Who last night and got into bed after 1:00am. Sleeeeeeeeeepy.
  • Still browsing the team sensemaking paper over breakfast. There are some very similar goals. In group polarization, the awareness of where the boundaries of the discussion help to determine how the average viewpoint moves. Current search returns no context on where the results lie on those axis. Translucency in search could allow users to see ‘meta information’ about the search results that they have and where the results lie in that information space, while also providing a means to adjust the position in that space in a way that is not intrusive. Or something like that.
  • Group polarization works on chatbots. There is something really interesting here about measuring polarization. Not quite sure what exactly yet.
  •  Continuing The Law of Group Polarization
    • Phrase of the day ‘Skewed Argument Pools
    • Page 187: And shifts toward more in the way of  enclave deliberation will increase society’s aggregate “argument pool,” and hence enrich the marketplace of ideas, while also increasing extremism, fragmentation, hostility, and even violence.
    • First, it’s a neat thought to think of an interwoven pattern of Bubbles and Stars. Second, I think the continuum to be most interested is the one from most bubble-ish to most star-ish for a given topic. Now that, in and of itself is a big document classification/topic extraction problem, but I would submit that being able to visualize what that search result could look like could help to produce useful work in that direction. And there are proxies that can be used intact, such as papers. Bubbles are papers and topics that point at each other a lot, for example.
    • Page 187: It is important to ensure social spaces for deliberation by like minded persons, but it is equally important to ensure that members of the relevant groups are not isolated from conversation with people having quite different views.
    • ^^^Translucency^^^
    • Page 187: The most important point here is that those who emphasize the ideals associated with deliberative democracy tend to emphasize its preconditions, which include political equality, an absence of strategic behavior, full information, and the goal of “reaching understanding (pp. 52-94).”
  • Scrum – some big changes coming?
  • 11:00 all hands
  • Working on backlink object
    • Started query generator, After some hiccups in getting the format of the query results right, the generation part is working. The reader should be pretty straightforward, though a little more complex/brittle than reading JSON. Here’s an example return:
      1;Visit AZ – Vacation Information for Arizona, the Grand Canyon State | Arizona Office of Tourism;;;Coordinates;29;116;1452309348;1452309348
      1;"""New Concertina Wire"" Fencing Around Closed Nevada Prison And Guard In Tower - Are Closed Prisons Going To Be Used As ""Fema Camps""? - Veteran Who Took Photos Followed By White Van";;;Coordinates;14;23;1443887704;1452368321
      1;Evening Meeting. - Kirkham & Rural Fylde;;;Coordinates;170;59;1444332718;1457694435
      1;Visit AZ – Vacation Information for Arizona, the Grand Canyon State | Arizona Office of Tourism;;;Coordinates;29;115;1447807529;1457622986
      1;Visit AZ – Vacation Information for Arizona, the Grand Canyon State | Arizona Office of Tourism;;;Coordinates;29;113;1454861531;1457744933
      1;UNIST -;;;Coordinates;60;48;1448012851;1454851002
      1;1 عدد تمبر جان مون نت - دیپلمات - جمهوری فدرال آلمان 1977;;;Coordinates;27;1308;1452387237;1452387237
      1;About Puslinch Lake - Calmwaters Cottage & Fly Fishing;;;Coordinates;42;12;1454519315;1457440491
      1;PEABODY 100 -;;;Coordinates;32;16;1454518621;1454518621
      1;PEABODY SPORTS -;;;Coordinates;21;16;1454518773;1454518773
    • SemRushIO will create the backlink object
      • Calls the service, using a default or read-in key
      • Is fired the same time the page source is loaded (in GuiVars.loadNextPage)
      • Creates a BackLinkObject data from SEMRush includes:
        • page_score
        • source_title
        • source_url
        • target_url
        • anchor (the text in the source)
        • external_num
        • internal_num
        • first_seen
        • last_seen
    • ResultItemObject2 changes
      • Set of BackLinkObjects

Phil 3.24.16

7:00 – 10:00, 11:00 – 3:00 VTX

  • Was going to continue The Law of Group Polarization, but got sucked into the following. On a related note, I peeked at the group sensemaking paper from CSCW and realized that they are dealing with group polarization issues.
  • Soooooooooo, I went back to check the links that the google search “link:” brings up. In looking at the pages (mostly other blog-like sites), the link to dotearth is almost always in the blogroll list that’s off to the side on many of these sites. For example look at the lower right on, and you’ll see the link.
  • I think this makes sense. These are the generic pages that point to other generic pages. So I went back to Google and searched for ‘Paul Krugman blog‘ and then looked for the oldest post that I could find in the result, which was this one from January 16. Top ratings means that it has to be linked to a lot, so I tried ““. Alas, that doesn’t return anything, though “” does.
  • So I went to the the Wikipedia most referenced pages page. Top ranked was Geographic coordinate system, which has over 600k inbound links. But –
  • Apparently, this is Google being coy. Searching for backlinks can be expensive. Moz has plans that start at $500/month. Bing also seems to have something with an API. Starting to check that out.
    • Added to my bing webmaster profile. Had to add BingSiteAuth.xml to the site.
    • Nope, looks like it’s just the verified pages
  • Looking at SEMrush. Pretty straightforward and $15 buys you 7,500 lines of results.
    • Here’s the REST-ish API
    • Here’s the first format I’ve tried:
    • The first thing I tried out was on my angular blog entry, and this is what comes back:
      1;Philip Feldman;;;blog;7;2;1435698192;1452178691
      1;Phil Feldman Resume (WebGL);;;My Primary Blog;15;4;1424207638;1452178080
      1;Phil Feldman Resume (WebGL);;;My Primary Blog;15;4;1435689880;1452178091
    • Pretty good! Very clean. Then I tried
      0;Plastic Surgery - Avoiding The Nightmare Case - Social Gaming Wiki FR;;;Georgia Medical Board Actions;4;32;1454582397;1454582397
      0;Plastic Surgeon - Advice To Allow You Choose – TFC;;;Doctors to avoid;2;28;1452634501;1452634501
      0;Finding A Plastic Surgeon In Your Area – TheorieWiki;;;Ohio Medical Board Actions;4;40;1451297137;1451297137
      0;How To Prepare For Your Breast Augmentation – TheorieWiki;;;Doctor Complaints;4;33;1444916428;1453210146
      0;Finding A Plastic Surgeon In Your Area: Unterschied zwischen den Versionen – TheorieWiki;;;Florida Medical Board Sanctions;4;39;1457400844;1457400844
      0;Benutzer:FelicaAngelo06 – TheorieWiki;;;NC Medical Board Actions;5;35;1448297485;1458043290
      0;Benutzer:FelicaAngelo06 – TheorieWiki;;;;5;35;1448297485;1458043290
      0;Benutzer:FelicaAngelo06 – TheorieWiki;;;NC Medical Board Actions;5;30;1456257160;1457931212
      0;Benutzer:FelicaAngelo06 – TheorieWiki;;;;5;30;1456257160;1457931212
      0;Finding A Plastic Surgeon In Your Area – TheorieWiki;;;Florida Medical Board Sanctions;4;33;1443858328;1457622408
    • Note that it’s a good thing I’m limiting the results to 10! The second thing to notice is every one of these links is SEO garbage. This one is my favorite. Now, this is ordered according to rank (however that’s calculated) and maybe there are better ways to order the results, but this does make me nervous about using backlinks without some checking. Maybe cosine similarity?
    • So the last thing, if we want to spend some money is to use the common crawl for backlinks. Not sure if it would make any difference, but there would be more insight. As an example, there’s wikireverse which did exactly that.

Phil 3.23.16

7:00 – 4:00 VTX

  • Continuing The Law of Group Polarization. Slow going. Mostly because there is so much good stuff.
    • Overall, I’m arguing that viewing Group Polarization through the lens of Connectivism, we can see how networked communities are often driven into bubbles and that property can be used to evaluate the trustworthiness of an information source. This has implications for design at different levels of abstraction.At the UI level, it implies that giving a user more interactive control over the makeup of their news feed can inform them about the range of diversity in views about a particular topic and where their feed falls on that spectrum. Because this implies the presence of a larger group, it it is possible to provide the user with the means (through direct manipulation) to interactively adjust the makeup of their news feeds and expose them to more trustworthy sourcesAt the document level, it imples that a mix of lexical and link analysis should be sufficient to allow for indexing a document on a trustworthiness scale.At the network level, it implies that the relationships of documents within a network should be sufficient to place documents on a trustworthiness scale.
    • Page 182 – And when one or more people in a group know the right answer to a factual question, the group is likely to shift in the direction of accuracy.
      • This is the effect of the Star Pattern. So how does someone find the right answer?
    • Looking around for automated ways of doing Delphi Method
    • Page 184: Group polarization has particular implications for insulated “outgroups” and (in the extreme case) for the treatment of conspiracies. Recall that polarization increases when group members identify themselves along some salient dimension, and especially when the group is able to define itself by contrast to another group. Outgroups are in this position-of self-contrast to others-by definition. Excluded by choice or coercion from discussion with others, such groups may become polarized in quite extreme directions, often in part because of group polarization. It is for this reason that outgroup members can sometimes be led, or lead themselves, to violent acts
    • Stopped at pg 186 – III. DELIBERATIVE TROUBLE.
  • Looking at IBM Bluemix briefly in case we have to go down that route
    • Registered.
    • Chrome, or at least the way I set up Chrome and bluemix do not get along. trying Firefox. Still not great, but better.
    • Since it looks like we’re not going to do wacky mash-ups, back to work on the rating app.
  • Hit the MySql max_packet limit. Changed to 4M. Other follow-on changes:
## of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 64M
innodb_additional_mem_pool_size = 8M
## Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 20M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50

Phil 3.22.16

7:00 – 7:30

  • I think I want to install this???
  • Still thinking about social trust and system trust. Today, Brussels was attacked by ISIS or ISIS sympathisers. An official when interviewed said that Belgium had been ‘prepared’ and was ready. No one was surprised that one group of people would try to kill another group of people. In other news, the iPhone from another set of killers was unflaggingly resisting attempts to unlock it. In many ways, every day (ironically because of the news) we are informed how horrible and untrustworthy people can be. And at the same time, every day, our machines generally do what they are supposed to do, and when looked at over time, get better at it. Is it any wonder that we have high system trust and low social trust (or high cynicism?).
  • This isn’t really new. Music can be pure. Musicians can be awful.
  • Continuing The Law of Group Polarization.
    • Page 181: Thus when the  context emphasizes  each  person’s  membership  in  the  social  group  engaging  in deliberation,  polarization  increases.  This finding  is  in  line  with  more  general evidence  that social  ties  among  deliberating  group  members  tend  to  suppress dissent  and  in  that  way  to  lead  to  inferior  decisions.
      • So a website with a strong point of view (Breitbart or Moveon or PETA for example) should have less variance among commenters, while more balanced should have more variance? Data may be here: I would think that these could be compared against edit histories on Wikipedia for a more Star-like pattern?
    • Persuasive Arguments Theory (PAT)?
    • Interaction with others increases decision confidence but not decision quality: evidence against information collection views of interactive decision making.
      • So in this case, the paper was scanned and protected, so I couldn’t do OCR on it. The workaround was to export as jpg, then open the first jpg in Acrobat DC, select Tools->organize pages then Inset->from file, shift-click all the pages, select ‘insert after’ and read them in. Once that’s done go to ‘Enhance scans’ and run OCR on the file.
      • Anyway, the paper looks interesting, with quantitative support. I wonder why all this research seems to be focussed in the 1990s through early 2000s? The Wikipedia page on Group Polarization has a wider date range.
  • Working on the rating app. Worried that jsoup doesn’t seem to be pulling down pages that well
    • Got a 403 on using URL.openStream, but it works on Google.
    • Going to try a more web-scapey pattern. Checking out Jaunt.
  • Changing the selection lists
  • Adding a check to see what ratings have changed as a user check – Done
  • Need to start on the backlinks.
  • Meeting with Aaron about next steps based on the

Phil 3.21.16

7:30 – 4:30 VTX

  • Class today
    • Two things – First, I wonder if we as researchers need to use the GSA standards for storing PII:
      • Encryption. Encrypt, using only NIST certified cryptographic modules, all data on mobile computers/devices carrying agency data unless the data is determined not to be sensitive, in writing, by your Deputy Secretary25 or a senior-level individual he/she may designate in writing;
      • Control Remote Access. Allow remote access only with two-factor authentication where one of the factors is provided by a device separate from the computer gaining access;
      • Time-Out Function. Use a “time-out” function for remote access and mobile devices requiring user re-authentication after thirty minutes of inactivity;
      • Log and Verify. Log all computer-readable data extracts from databases holding sensitive information and verify each extract, including whether sensitive data has been erased within 90 days or its use is still required; and
      • Ensure Understanding of Responsibilities. Ensure all individuals with authorized access to personally identifiable information and their supervisors sign at least annually a document clearly describing their responsibilities.
    • Second, basically every security measure we take in a closed network provides a value judgement to the owner of the network. But our high system trust prevents us from seeing that when we untag a picture of us doing something embarrasing, we’re essentially saying to Facebook ‘this is a guilty pleasure‘.
  • Taxes this evening
  • In Emergencies, Should You Trust a Robot?
  • Starting The Law of Group Polarization. And in a semi-related thought, I wonder if flocking behavior can be used to describe this kind of behavior along dimensions of belief???
    • Cass R. Sunstein
    • Wacky. The text was unrecognizable so the quotation manager wouldn’t work. Wound up exporting the PDF to jpg, then using the ‘combine files’ tool to import all the pages, combining them into one document again then running OCR on that. And this was the official file from the Journal of Political Philosophy, so go figure.
  • Did some shepherding of the Crawl configuration. Gregg was sending 4 CSEs.
  • Finished up the CSEkiller. Wrote up documentation and added it to the CommonComponents.
  • Back to getting the rating app working.
  • Changing Provider to PersonOfInterest
  •  Need to add ‘Personal’, ‘Educational’ and ‘Other’ to sources