Category Archives: research


7:00 – 4:00 VTX

  • Rollers
  • Reworking the lit review. Meeting set up with Wayne for tomorrow at 4:00.
  • Still thinking about modelling. I could use sets of strings that would define a CAs worldview and then compare individuals by edit distance.
    • Not sure how to handle weights, a number, or repetitions of the character?
    • Comparing a set of CAs using centrality could see what the most important items are in that (overall and sub) population. how close the individual CA conforms to that distribution is a measure of the ‘belonging’?
    • CAs could adjust their internal model. Big changes should be hard, little changes should be easy. Would the dropping of a low ranked individual item result in a big change in edit distance with a group that doesn’t have the item?
    • Working on infrastructure that builds, collects and maintains Factoids

Phil 7.22.16

7:00 – 1:00 VTX

  • More bubble modelling. Found a nice paper from a financial perspective that looks like a good source for similar models.
  • Split out the calculation and spreadsheet functions to support snapshots and debugging.
    • Set up the base class to be the control. Explorers only look outside their SD, while confirmers and avoiders stay within. Not sure how to tease out the difference between those. I think it will have something to do with the way they look for information, which is beyond the scope of this model for now. Also switched to a random distribution. Here’s an initial result. Much more work to follow


  • I was riding and thinking about something I read on fivethirtyeight.comThis isn’t the most artful way to say it, but it’s like, where do you go when the only people who seem to agree with you on taxes hate black people?” It’s by Ben Howe, a redstate commentator. And it makes me think that rather than basing the sim on only one value, there should be a cluster. Confirmed could look for a match in the cluster while avoiders would clusters if they hit somethings that doesn’t match. And the distance from the value should matter. Adopting a very different concept should take more energy than a similar one. And this makes me think that the CAs have to have a bit more alife in them. They need to budget their energy with reference to their internal and external states.
  • And then mom died. Here’s the OPM web page that matters:

Phil 6.21.16

7:00 – 5:00 VTX

  • Finished MostRecent.
  • Checked Data directory into SVN
  • Testing rating algorithms. Seems to be working pretty well 🙂
  • Rated all day. Should finish tomorrow.
  • Worked through paragon and fallen angel patterns with Aaron. Pulled out by bayesian spreadsheets and realized I no longer understood them…

Phil 6.20.16

7:00 – 7:00 VTX

  • Building chair corpus = Current and Cited
  • Filled MostCited.
  • Rating a few more pages. Still not getting any name hits.
  • Going to advanced search and entering items into each field, I get a different looking query:
    • These seem to be the important differences
    • as_q=New+York — This is a ‘normal’ query
    • as_epq=Nader+Golian — This must be in the results
    • as_oq=+license+board+practice+patient+physician+order+health+practitioner+medicine+medical — at least one of these must be in the result
  • Going to add a test to look for the name in the query (and the state?) and at least check the NA box and throw up a dialog. Could also list the number of occurrences by default in the notes

1:00 – Patrick’s proposal

  • Framing of problem and researcher
  • Overview of the problem space
    • Ready to Hand
    • Extension of self
  • Assistive technology abandonment
    • Ease of Acquisition
    • Device Performance
    • Cost and Maintenance
    • Stigma
    • Alignment with lifestyles
  • Prior Work
    • Technology Use
    • Methods Overview
      • Formative User Needs
      • Design Focus Groups
      • Design Evaluation and Configuration Interviews
    • Summary of Findings
    • Priorities
      • Maintain form factor
      • Different controls for different regions
      • Familiarity
      • Robustness to environmental changes
    • Potential of the wheelchair
      • Nice diagram. Shows the mapping from a chair to a smartphone
    • Inputs to wheelchair-mounted devices
    • Force sensitive device, new gestures and insights
    • Summary (This looks like research through design. Why no mention?)
      • Prototypes
      • Gestures
      • Demonstration
  • Proposed Work
    • Passive Haptic Rehabilitation
      • Can it be done
      • How effective
      • User perception
      • Study design!!!
    • Physical Activity and Athletic Performance
      • Completed: Accessibility of fitness trackers. (None of this actually tracks to papers in the presentation)
      • Body location and sensing
      • Misperception
        • Semi-structured interviews
        • Low experience / High interest (Lack of system trust!)
    • Chairable Computing for Basketball
      • Research Methods
        • Observations
        • Semi-structured interviews
        • Prototyping
        • Data presentation – how does one decide what they want from what is available?
  • What is the problem – Helena
    • Assistive technologies are not being designed right. We need to improve the design process.
    • That’s too general – give me a citation that says that technology abandonment WRT wheelchair use has high abandonment
    • Patrick responds with a bad design
    • Helena – isn’t the principal user-centered design. How has the HCI community done this before WRT other areas than wheelchairs to interact with computing systems
    • Helena – Embodied interaction is not a new thing, this is just a new area.Why didn’t you group your work. Is the prior analysis not embodied? Is your prior work not aligned with this perspective
  • How were the design principles used o develop an refine the pressure sensors?

More Reading

  • Creating Friction: Infrastructuring Civic Engagement in Everyday Life
    • This is the confirming information bubble of the ‘ten blue links’: Because infrastructures reflect the standardization of practices, the social work they do is also political: “a number of significant political, ethical and social choices have without doubt been folded into its development” ([67]: 233). The further one is removed from the institutions of standardization, the more drastically one experiences the values embedded into infrastructure—a concept Bowker and Star term ‘torque’ [9]. More powerful actors are not as likely to experience torque as their values more often align with those embodied in the infrastructure. Infrastructures of civic engagement that are designed and maintained by those in power, then, tend to reflect the values and biases held by those in power.
  • Meeting with Wayne. My hypothesis and research questions are backwards but otherwise good.

Phil 6.15.16

7:00 – 10:00, 12:00 – 4:00 VTX

  • Got the official word that I should be charging the project for research. Saved the email this time.
  • Continuing to work on the papers list
  • And in the process of looking at Daniele Quercia‘s work, I found Auralist: introducing serendipity into music recommendation which was cited by
    An investigation on the serendipity problem in recommender systems. Which has the following introduction:

    • In the book ‘‘The Filter Bubble: What the Internet Is Hiding from You’’, Eli Pariser argues that Internet is limiting our horizons (Parisier, 2011). He worries that personalized filters, such as Google search or Facebook delivery of news from our friends, create individual universes of information for each of us, in which we are fed only with information we are familiar with and that confirms our beliefs. These filters are opaque, that is to say, we do not know what is being hidden from us, and may be dangerous because they threaten to deprive us from serendipitous encounters that spark creativity, innovation, and the democratic exchange of ideas. Similar observations have been previously made by Gori and Witten (2005) and extensively developed in their book ‘‘Web Dragons, Inside the Myths of Search Engine Technology’’ (Witten, Gori, & Numerico, 2006), where the metaphor of search engines as modern dragons or gatekeepers of a treasure is justified by the fact that ‘‘the immense treasure they guard is society’s repository of knowledge’’ and all of us accept dragons as mediators when having access to that treasure. But most of us do not know how those dragons work, and all of us (probably the search engines’ creators, either) are not able to explain the reason why a specific web page ranked first when we issued a query. This gives rise to the so called bubble of Web visibility, where people who want to promote visibility of a Web site fight against heuristics adopted by most popular search engines, whose details and biases are closely guarded trade secrets.
    • Added both papers to the corpus. Need to read and code. What I’m doing is different in that I want to add a level of interactivity to the serendipity display that looks for user patterns in how they react to the presented serendipity and incorporate that pattern into a trustworthiness evaluation of the web content. I’m also doing it in Journalism, which is a bit different in its constraints. And I’m trying to tie it back to Group Polarization and opinion drift.
  • Also, Raz Schwartx at Facebook: , Editorial Algorithms: Using Social Media to Discover and Report Local News
  • Working on getting all html and pdf files in one matrix
  • Spent the day chasing down a bug where if the string being annotated is too long (I’ve set the  number of wordes to 60), then we skip. THis leads to a divide by zero issue. Fixed now

Phil 6.13.16

6:30 – 2:30 VTX

Phil 6.2.16

7:00 – 5:00 VTX

  • Writing
  • Write up sprint story – done
    • Develop a ‘training’ corpus known bad actors (KBA) for each domain.

      • KBAs will be pulled from, which provides a large list.
      • List of KBAs will be added to the content rating DB for human curation
      • HTML and PDF data will be used to populate a list of documents that will then be scanned and analyzed to prepare TF-IDF and LSI term-document tables.
      • The resulting table will in turn be analyzed using term centrality, with the output being an ordered list of terms to be evaluated for each domain.

  • Building view to get person, rating and link from the db – done, or at least V1
    CREATE VIEW view_ratings AS
      select, qo.search_type, po.first_name, po.last_name, po.pp_state, ro.person_characterization from item_object io
        INNER JOIN query_object qo ON io.query_id =
        INNER JOIN rating_object ro on = ro.result_id
        INNER JOIN poi_object po on qo.provider_id =;
  • Took results from and ran them through the whole system. The full results are in the Corpus file under and The results seem to make incredibly specific searches. Here are the two first examples. Note that there are very few .com sites.:

Phil 5.31.16

7:00 – 4:30 VTX

  • Writing. Working on describing how maintaining many codes in a network contains more (and more subtle) information than grouping similar codes.
  • Working on the UrlChecker
    • In the process, I discovered that the annotation.xml file is unique only for the account and not for the CSE. All CSEs for one account are contained in one annotation file
    • Created a new annotation called ALL_annotations.xml
    • fixed a few things in Andy’s file
    • Reading in everything. Now to produce the new sets of lists.
    • I think it’s just easier to delete all the lists and start over.
    • Done and verified. You run UrlChecker from the command line, with the input file being a list of domains (one per line) and the ALL_annotations.xml file.
  • Need to add a Delete or Hide button to reduce down a large corpus to a more effective size.
  • Added. Tomorrow I’ll wire up the deletion of a row or cilumn and the recreation of the initialMatrix

Phil 5.30.16

7:00 – 10:00 Thesis/VTX

  • Built a new matrix for the coded lit review. I had coded a couple of more papers
  • Working on copying over the read papers into a new folder that I can run text analytics over
  • After carefully reading through the doc manager list and copying over each paper, I just discovered I could have exported selected.
  • Ooops: Exception in thread “JavaFX Application Thread” java.lang.IllegalArgumentException: Invalid column index (16384).  Allowable column range for EXCEL2007 is (0..16383) or (‘A’..’XFD’)
    • Going to add a limit of

      columns for now. Clearly that can be cut down.

    • Figuring out where to cut the terms. I’m summing the columns of the LSI calculation, starting at the highest value and then dividing that by the sum of all values. The top 20% of rank weights gives 280 columns. Going to try that first
    • Success! Some initial thoughts
      • The coded version is much more ‘crisp’
      • There are interesting hints in the LSI version
      • Clicking on a term or paper to see the associated items is really nice.
      • I think that document subgroups might be good/better, and it might be possible to use the tool to help build those subgroups. This goes back to the ‘hiding’ concept. (hide item / hide item and associated)

Phil 5.9.16

7:00 – 4:00 VTX

  • Started the paper describing the slider interface
  • TF-IDF today!
    • Read docs from web and PDF
    • Calculate the rank
    • Create matrix of terms and documents, weighted by occurrence.
  • Hmm. What I’m actually looking for is the lowest-occurring terms within a document that occur over the largest number of documents. I’ve used this page as a starting point. After flailing for many hours in java, I wound up walking through the algorithm in Excel and I think I’ve got it. This is the spreadsheet that embodies my delusional thinking ATM.