Monthly Archives: April 2017

Phil 4.28.17

7:00 – 8:00 Research

8:30 – 4:30 BRC

  • Working on finding an exit condition for the subdivision surface
  • I’m currently calculating the corners of a constricting rectangle that contracts towards the best point. Each iteration is saved, and I’m working on visualizing that surface, but my brain has shut down, and I can do simple math anymore.
  • Had a thought for Aaron about how to visualize his dimension reduction. Turns out to do well.

Aaron 4.27.17

  • Cycling
    • Got a late start in the office today, so as soon as I get in I got my gear on for a brain cleaning ride. Pushed really hard today, and combined with some nice weather and low traffic hit my first 16+ average MPH door-to-door. Landed a 16.4 mph average, and felt really proud of it.
  • Focus today was on learning some more about Manifold learning and its applications for reduction of high dimensional data for unsupervised learning.
    • SciKit includes some great documentation and resources including a working sample comparing various Manifold learning techniques against test data sets.
    • My goal now is to take the sorted data_generator.py code from yesterday and compare the manifold learning examples against the clustered output of the unreduced data. Once I have a benchmark set up I can do the same for the sample live data.
    • The output of the SciKit examples in MatPlotLib is really attractive as well.manifold_learning_sample

Phil 4.27.17

7:00 – 9:00 Research

  • Some more echo chamber flocking: Iran Deal Is More Popular Than Ever, Poll Shows 170426_iran-1Republicans registered the biggest uptick in support for the deal, which has been heavily criticized by GOP lawmakers since its inception in July 2015: 53 percent of Republican voters said they supported it, compared with 37 percent who backed it last summer and just 10 percent who supported it shortly after it was announced. Democratic support for the deal has been largely unchanged since August, and a larger share of independents are getting on board, from 41 percent in August to 48 percent now.
  • Finishing corrections to paper
  • This really is my phase I research question: If ‘laws of motion’ can indeed be ascribed to behavior, we should be able to model the effects of those laws. The question them becomes what form do these models take? Also, how do we detect these behaviors with domain independence and at scale?
  • Submitted!
  • The Relevance of Hannah Arendt’s Reflections on Evil:Globalization and Rightlessness

BRC 9:30 5:00

  • Continuing Subdivision surfacing
  • didn’t like the documentation on sortedcollections. going to try panda Series
  • Allowable options in an arg:
    parser.add_argument("--algorithm", type=str, choices=['naive', 'subdivision', 'genetic'], default='naive', help="hill climbing algorithm")

    Note that range(), which returns a list should also work

  • And here’s how you get the key/values from a pandas Series:
    print("calc_subdivision_fitness_landsape(): key = {0}, val = {1}".format(fitness.index[0], fitness.values[0]))
  • Looks like it’s working. I think I should be using the average of the 4 fitnesses to decide if I’m done
    calc_subdivision_fitness_landsape(): fitness = 
    1    10.0
    0     7.0
    3     6.0
    2     6.0
    dtype: float64
    calc_subdivision_fitness_landsape(): fitness = 
    1    10.0
    0     7.0
    3     6.0
    2     6.0
    dtype: float64
    calc_subdivision_fitness_landsape(): fitness = 
    1    10.0
    0     7.0
    3     6.0
    2     6.0
    dtype: float64
    done calc_subdivision_fitness_landsape

Phil 4.26.17

7:00 – 8:30 Research

  • Proofreading and tweaking the CSCW paper.
  • Finished the paper edit. Started to roll in the changes
  • Made a 10D chart of the explorer probability distribution. I think it tells the story better:
  •  ExplorerPDF
  • Had to install a dictionary in TexStudio. This helped a lot.
  • Started rolling in the changes to the tex file

BRC 9:00 – 4:30

  • Looks like the sort changes to the data_generator.py code haven’t been pushed yet
  • Starting on subdivision surfacing
    def calc_subdivision_fitness_landsape(self, eps_step: float, min_cluster: int) -> pandas.DataFrame:
        # create the four extreme corners. These will work their way in
        # calculate halfway points
        # keep the square with the greatest (single? average?) value
        # repeat until an epsilon, max value, or max iterations are reached
        # construct a sparse matrix with spacing equal to the smallest spacing
        # fill in the values that have been calculated
        # build a dataframe and return it for visualization
  • I need to sort a dict, so I’m trying SortedContainers.
  • Then things went off the hails a bit, and I wrote a haiku program as a haiku that prints itself:
    def haiku(sequence):
        this_is_not_needed = ""
        return "".join(sequence)
    
    if __name__ == "__main__":
        f = open('haiku.py')
        print(haiku(f.readlines()[:3]))

Aaron 4.25.17

  • Wasted a ton of time today tracking down progress of integration of additional teams into our program.
  • Spent a couple of hours tackling a poster presentation to be delivered at a technical leadership summit next week. I’ll be presenting the “Advanced Analytics” presentation and discussing all of our tools, and capabilities. Phil helped a lot, and I ended up quite pleased with the results. One of the nice things is we were able to include screenshots of actual tools and graphs of the data we’re using. I think this will be a nice difference from the rest of the presenters.
  • Did some good pair programming with Phil on the Pandas DataFrame.sort issue, moved to the non-deprecated version of DataFrame.sort_values and got it working correctly at all matrix sizes.

Phil 4.25.17

7:00 – 8:30 Research

  • Wikipedia founder Jimmy Wales launches Wikitribune, a large-scale attempt to combat fake news
  • Listening to the BBC Business Daily on Machine Learning. They had an interview with Joanna J Bryson (Scholar). She has an approach for explaining the behavior of AI that seems to involve simulation? Here are some papers that look interesting:
    • Behavior Oriented Design (MIT Dissertation: Intelligence by Design: Principles of Modularity and Coordination for Engineering Complex Adaptive Agents)
    • Learning from Play: Facilitating character design through genetic programming and human mimicry
      • Mimicry and play are fundamental learning processes by which individuals can acquire behaviours, skills and norms. In this paper we utilise these two processes to create new game characters by mimicking and learning from actual human players. We present our approach towards aiding the design process of game characters through the use of genetic programming. The current state of the art in game character design relies heavily on human designers to manually create and edit scripts and rules for game characters. Computational creativity approaches this issue with fully autonomous character generators, replacing most of the design process using black box solutions such as neural networks. Our GP approach to this problem not only mimics actual human play but creates character controllers which can be further authored and developed by a designer. This keeps the designer in the loop while reducing repetitive labour. Our system also provides insights into how players express themselves in games and into deriving appropriate models for representing those insights. We present our framework and preliminary results supporting our claim.
    • Replicators, Lineages and Interactors: One page note on cultural evolution
      • If we adopt the other option and refer to culture itself is the lineage, then the culture itself can evolve since the replicators are the ideas and practices that exist within that culture. However, if it is the culture that is the lineage, we cannot say that it evolves when it takes more territory, in the same way that a species does not evolve with more individuals. Adaptation is presently understood to be about changes in the frequency of replicators, not about absolute numbers of interactors. In sum, cultural evolution (changes of practices within a group) is necessarily a separate process from cultural group selection (changes of the frequency of group-types at a specific location).
    • The behavior-oriented design of modular agent intelligence
    • Should probably cite some of these and a reference to Behavior-Oriented Design in the conclusions section of the paper
  • Continuing Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter
    • We collected data using the Twitter Streaming API, tracking on the following terms (shooter, shooting, gunman, gunmen, gunshot, gunshots, shooters, gun shot, gun shots, shootings) for a ten-month period between January 1 and October 5, 2016. This collection resulted in 58M total tweets. We then scoped that data to include only tweets related to alternative narratives of the event—false flag, falseflag, crisis actor, crisisactor, staged, hoax and “1488”.
      • These keywords specify a ‘primary information space’. Bag-of-words of text correlated with each term could make this a linear axis
    • Of 15,150 users who sent at least one tweet with a link, only 1372 sent (over the course to the collection period) tweets citing more than one domain.
      • This is the difference between implicit behaviors (clicking, reading, navigating) and explicit actions. Twitter monitors what people are willing to write
    • Interestingly, the two most influential Domains in Alternative Narrative Tweets Interesting, the two most highly tweeted domains were both associated with significant automated account or “bot” activity. The Real Strategy, an alternative news site with a conspiracy theory orientation, is the most tweeted domain in our dataset (by far). The temporal signature of tweets citing this domain reveals a consistent pattern of coordinated bursts of activity at regular intervals generated by 200 accounts that appear to be connected to each other (via following relationships) and coordinated through an external tool.
      • There is clearly a desire to have a greater effect through the use of bots. Two questions: 1) How does this work? 2) How did this emerge?
    • The InfoWars domain, an alternative news website that focuses on Alt-Right and conspiracy theory themes, was the second-most tweeted domain, but as (Figure 1) shows it was only tenuously connected to one other node.
      • Why? Is InforWars more polarized? Is it using something other than Twitter?
      • Infowars Inbound links
        Domain score Domain trust score Domain Backlinks IP Address Country First seen Last seen
        0 0 breakingnewsfeed.com 1857029 174.129.22.101 us 2015-09-28 2017-03-26
        4 4 e-graviton.com 1335835 67.210.126.35 us 2014-01-19 2017-03-21
        33 39 prisonplanet.com 648958 69.16.175.42 us 2013-06-07 2017-03-25
        1 0 nwostop.com 346153 104.28.28.16 us 2014-01-19 2017-03-21
        13 31 nwoo.org 182060 81.0.208.215 cz 2013-06-07 2017-03-26
        12 30 conservative-headlines.com 151778 104.18.50.72 us 2016-06-27 2017-03-22
        1 0 america2fear.com 92766 69.64.46.138 us 2014-11-14 2017-03-23
        4 29 subbmitt.com 49288 64.251.23.173 us 2015-02-04 2017-03-26
        14 30 anotherdotcom.com 47195 174.129.236.72 us 2014-10-02 2017-03-20
        1 0 exzacktamountas.com 43748 208.100.60.13 us 2016-06-08 2017-03-24

9:00 – 5:30 BRC

  • John is having trouble getting Linux running on the laptop
    • No luck. Re-submitting for an Alienware deskside
  • Back to getting the temporal coherence. last try to finish up, then switching to fitness landscape optimization, which I dreamed about last night
  • Finished coherence! Had to include a state check for a timeline to see if a DIRTY state had been touched with an update. If not, then the timeline is set to CLOSED. If a new cluster appears that would have had some overlap, a new timeline is created anyway. This could be an optional behavior.
    • Still need to test rigorously across multiple data sets
  • Long scrum, then ML meeting.
    • Hard tasks
      • TF server set up to work in our environment
      • Pre-calculated models to speed up training from research browser
      • T-SNE or other mapping of returned CSE text to support exploration
      • Fast, on-the-fly classification and entity extraction within the research browser framework. Plus interactive training
      • NMF (or other) topic extraction tied to human labeling and curation, plus cross-user validation of topics
  • Poster with Aaron later? Yep. Couple of hours. Done?
  • Oh, just why? Spent an hour on this before going brute force:
    def get_last_cluster(self) -> ClusterSample:
        # return self._cluster_dict[self._cluster_dict.keys()[-1] TODO: This should work
        toReturn = None
        for key in self._cluster_dict:
            toReturn = self._cluster_dict[key]
        return toReturn
  • Walked through some gradient descent regression code with Bob. More tomorrow?
  • Got the new sort working with Aaron. Much faster progress as a pair

Phil 4.24.17

7:00 – 8:00, 3:00 – 4:00  Research

  • Continuing to tweak paper
  • Starting Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter
    • From the introduction. Do I need something like this? Our contributions include an increased understanding of the underlying nature of this subsection of alternative media — which hosts conspiratorial content and conducts various anti-globalist political agendas. Noting thematic convergence across domains, we theorize about how alternative media may contribute to conspiratorial thinking by creating a false perception of information diversity.
  • Conspiracy Theories
  • A cool thing on explorers and veloviewer: maxsquare Here’s an overview of the project
  • Brownbag
    • Teaching abstract concepts to children and tweens (STEM)
    • Cohesive understanding of science over time
    • Wearable technology as the gateway for elementary-school-aged kids? Research shows that they find them valuable
    • How are these attributes measured? <——-!!!!!!!
    • Live visualization sensing and visualization
    • Zephyr bioharness
    • Gender/age differences? Augmented reality? Through-phone?
    • leylanorooz.com/research/

8:30 – 2:30 BRC

  • Expense report! Done? Had to get a charge number, and re enter. Took forever.
  • Found out that I’m getting a laptop rather than what I asked for
    • Having John install Ubuntu and verify that multiple monitors work in Linux
  • Helped Bob set up Git repo
  • Still working on temporal coherence. Think I’ve figured out the logic. Now I need to set clusters in ClusterTimelines
  • Learned how to do Enums in Python

Phil 4.21.1

6:00 – 7:00 Research

8:30 – 5:00 BRC

  • Need to think about handling time, so we can see if people are getting better
  • All hands meeting
    • Transforming healthcare WRT identifying risks and anomalies for the purpose of reducing variance in care. From what to what?
  • 4 things:
    • Technology: get to and stay at the leading edge of what we’re marketing. Investment commitment (CCRi These guys? Alias, Commonwealth university)
    • Sales development
    • Partnering
    • Capital (direct raise from investors) Alignment at a capital level?
  • V2 & V3 timelines and capabilities
  • Sales and capital story
  • Discussion (2 hours)

Phil 4.20.17

7:00 – 8:00 Research

8:30 – 6:00, 7:00 – 10:00 BRC

  • Drove up to NJ
  • Still working on temporal coherence of clusters. Talked through it with Aaron, and we both believe it’s close
  • Another good discussion with Bob
  • BRC dinner meet-n-greet

Phil 4.19.17

7:00 – 8:00 Research

8:30 – 5:00 BRC

  • Have Aaron read abstract
  • Finishing up temporal coherence in clustering. Getting differences, now I have to figure out how to sort, and when to make a new cluster.
    timestamp = 10.07
    	t=10.07, id=0, members = ['ExploitSh_54', 'ExploitSh_65', 'ExploitSh_94', 'ExploreSh_0', 'ExploreSh_1', 'ExploreSh_17', 'ExploreSh_2', 'ExploreSh_21', 'ExploreSh_24', 'ExploreSh_29', 'ExploreSh_3', 'ExploreSh_35', 'ExploreSh_38', 'ExploreSh_4', 'ExploreSh_40', 'ExploreSh_43', 'ExploreSh_48', 'ExploreSh_49', 'ExploreSh_8']
    	t=10.07, id=1, members = ['ExploitSh_50', 'ExploitSh_51', 'ExploitSh_52', 'ExploitSh_53', 'ExploitSh_55', 'ExploitSh_56', 'ExploitSh_57', 'ExploitSh_58', 'ExploitSh_59', 'ExploitSh_60', 'ExploitSh_61', 'ExploitSh_62', 'ExploitSh_64', 'ExploitSh_66', 'ExploitSh_67', 'ExploitSh_69', 'ExploitSh_70', 'ExploitSh_71', 'ExploitSh_72', 'ExploitSh_73', 'ExploitSh_74', 'ExploitSh_75', 'ExploitSh_76', 'ExploitSh_77', 'ExploitSh_78', 'ExploitSh_79', 'ExploitSh_80', 'ExploitSh_81', 'ExploitSh_82', 'ExploitSh_83', 'ExploitSh_84', 'ExploitSh_85', 'ExploitSh_87', 'ExploitSh_88', 'ExploitSh_89', 'ExploitSh_90', 'ExploitSh_91', 'ExploitSh_92', 'ExploitSh_93', 'ExploitSh_95', 'ExploitSh_96', 'ExploitSh_97', 'ExploitSh_99', 'ExploreSh_10', 'ExploreSh_11', 'ExploreSh_13', 'ExploreSh_14', 'ExploreSh_15', 'ExploreSh_16', 'ExploreSh_18', 'ExploreSh_19', 'ExploreSh_20', 'ExploreSh_23', 'ExploreSh_25', 'ExploreSh_26', 'ExploreSh_27', 'ExploreSh_28', 'ExploreSh_30', 'ExploreSh_31', 'ExploreSh_32', 'ExploreSh_33', 'ExploreSh_34', 'ExploreSh_36', 'ExploreSh_37', 'ExploreSh_41', 'ExploreSh_42', 'ExploreSh_45', 'ExploreSh_46', 'ExploreSh_47', 'ExploreSh_5', 'ExploreSh_7', 'ExploreSh_9']
    	t=10.07, id=-1, members = ['ExploitSh_63', 'ExploitSh_68', 'ExploitSh_86', 'ExploitSh_98', 'ExploreSh_12', 'ExploreSh_22', 'ExploreSh_39', 'ExploreSh_44', 'ExploreSh_6']
    
    timestamp = 10.18
    	t=10.18, id=0, members = ['ExploitSh_50', 'ExploitSh_51', 'ExploitSh_52', 'ExploitSh_53', 'ExploitSh_55', 'ExploitSh_56', 'ExploitSh_57', 'ExploitSh_58', 'ExploitSh_59', 'ExploitSh_60', 'ExploitSh_61', 'ExploitSh_62', 'ExploitSh_63', 'ExploitSh_64', 'ExploitSh_65', 'ExploitSh_66', 'ExploitSh_67', 'ExploitSh_69', 'ExploitSh_70', 'ExploitSh_71', 'ExploitSh_72', 'ExploitSh_73', 'ExploitSh_74', 'ExploitSh_75', 'ExploitSh_76', 'ExploitSh_77', 'ExploitSh_78', 'ExploitSh_79', 'ExploitSh_80', 'ExploitSh_81', 'ExploitSh_82', 'ExploitSh_83', 'ExploitSh_84', 'ExploitSh_85', 'ExploitSh_86', 'ExploitSh_87', 'ExploitSh_88', 'ExploitSh_89', 'ExploitSh_90', 'ExploitSh_91', 'ExploitSh_92', 'ExploitSh_93', 'ExploitSh_94', 'ExploitSh_95', 'ExploitSh_96', 'ExploitSh_97', 'ExploitSh_99', 'ExploreSh_0', 'ExploreSh_1', 'ExploreSh_10', 'ExploreSh_11', 'ExploreSh_13', 'ExploreSh_14', 'ExploreSh_15', 'ExploreSh_16', 'ExploreSh_17', 'ExploreSh_18', 'ExploreSh_19', 'ExploreSh_2', 'ExploreSh_20', 'ExploreSh_21', 'ExploreSh_23', 'ExploreSh_24', 'ExploreSh_25', 'ExploreSh_26', 'ExploreSh_27', 'ExploreSh_28', 'ExploreSh_29', 'ExploreSh_3', 'ExploreSh_30', 'ExploreSh_31', 'ExploreSh_32', 'ExploreSh_33', 'ExploreSh_34', 'ExploreSh_35', 'ExploreSh_36', 'ExploreSh_37', 'ExploreSh_38', 'ExploreSh_4', 'ExploreSh_40', 'ExploreSh_41', 'ExploreSh_42', 'ExploreSh_43', 'ExploreSh_45', 'ExploreSh_46', 'ExploreSh_47', 'ExploreSh_48', 'ExploreSh_49', 'ExploreSh_5', 'ExploreSh_7', 'ExploreSh_8', 'ExploreSh_9']
    	t=10.18, id=-1, members = ['ExploitSh_54', 'ExploitSh_68', 'ExploitSh_98', 'ExploreSh_12', 'ExploreSh_22', 'ExploreSh_39', 'ExploreSh_44', 'ExploreSh_6']
    current[0] 32.43% similar to previous[0]
    current[0] 87.80% similar to previous[1]
    current[0] 3.96% similar to previous[-1]
    current[-1] 7.41% similar to previous[0]
    current[-1] 82.35% similar to previous[-1]

    In the above example, we originally have 3 clusters and then 2. The two that map are pretty straightforward: current[0] 87.80% similar to previous[1], and current[-1] 82.35% similar to previous[-1]. Not sure what to do about the group that fell away. I think there should be an increasing ID number for clusters, with the exception of [-1], which is unclustered. Once a cluster goes away, it can’t come back.

  • Long discussion with Bob and Aaron, basically coordinating and giving Bob a sense of where we are. That wound up being most of the day.