Phil 6.18.18

ASRC MKT 7:00 – 8:00

  • Nice ride on Saturday on Skyline drive
  • Using Social Network Information in Bayesian Truth Discovery
    • We investigate the problem of truth discovery based on opinions from multiple agents who may be unreliable or biased. We consider the case where agents’ reliabilities or biases are correlated if they belong to the same community, which defines a group of agents with similar opinions regarding a particular event. An agent can belong to different communities for different events, and these communities are unknown a priori. We incorporate knowledge of the agents’ social network in our truth discovery framework and develop Laplace variational inference methods to estimate agents’ reliabilities, communities, and the event states. We also develop a stochastic variational inference method to scale our model to large social networks. Simulations and experiments on real data suggest that when observations are sparse, our proposed methods perform better than several other inference methods, including majority voting, the popular Bayesian Classifier Combination (BCC) method, and the Community BCC method.
  • Scale-free correlations in starling flocks
    • From bird flocks to fish schools, animal groups often seem to react to environmental perturbations as if of one mind. Most studies in collective animal behavior have aimed to understand how a globally ordered state may emerge from simple behavioral rules. Less effort has been devoted to understanding the origin of collective response, namely the way the group as a whole reacts to its environment. Yet, in the presence of strong predatory pressure on the group, collective response may yield a significant adaptive advantage. Here we suggest that collective response in animal groups may be achieved through scale-free behavioral correlations. By reconstructing the 3D position and velocity of individual birds in large flocks of starlings, we measured to what extent the velocity fluctuations of different birds are correlated to each other. We found that the range of such spatial correlation does not have a constant value, but it scales with the linear size of the flock. This result indicates that behavioral correlations are scale free: The change in the behavioral state of one animal affects and is affected by that of all other animals in the group, no matter how large the group is. Scale-free correlations provide each animal with an effective perception range much larger than the direct inter-individual interaction range, thus enhancing global response to perturbations. Our results suggest that flocks behave as critical systems, poised to respond maximally to environmental perturbations.
  • Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study
    • By reconstructing the three-dimensional positions of individual birds in airborne flocks of a few thousand members, we show that the interaction does not depend on the metric distance, as most current models and theories assume, but rather on the topological distance. In fact, we discovered that each bird interacts on average with a fixed number of neighbors (six to seven), rather than with all neighbors within a fixed metric distance. We argue that a topological interaction is indispensable to maintain a flock’s cohesion against the large density changes caused by external perturbations, typically predation. …
  • Thread on the failure to replicate the Stanford Prison Experiment by Alex Haslam (scholar) (home page). Paper coming soon
    • The Stanford Prison Experience—as it is presented in textbooks—presents human nature as naturally conforming to oppressive systems. This is a lesson that extends well beyond prison systems and the field criminology—but it’s wrong. Alex and his colleagues (especially Steve Reicher) have been arguing for years that conformity often emerges when leaders cultivate a sense of shared identity. This is an active, engaged process—very different from automatic and mindless conformity.
  • Started Irrational Exuberance, by Robert Shiller
  • Send note to Don, Aaron and Shimei
  • Read Ego-motion in Self-Aware Deep Learning on Medium. It’s about reflective learning of navigation in physical spaces, though I wonder if there is an equivalent process in belief spaces. Looked through scholar and
  • Slide prep and Fika walkthrough
    • Went well. Ravi suggested adding another slide that discusses the methods in detail, while Sy pretty much demanded that I get rid of “Questions” and put the title of the paper in its place
    • When adding the detail for Ravi, I discovered that the simulator and map reconstruction did not handle single, high dimensional agents well, so I spent a few hours fixing bugs to get the screen captures to build the slides.

Phil 6.11.18

7:00 – 6:00 ASRC MKT

  • More Bit by Bit. Reading the section on ethics. It strikes me that simulation could be a way to cut the PII Gordion Knot in some conditions. If a simulation can be developed that generates statistically similar data to the desired population, then the simulated data and the simulation code can be released to the research community. The dataset becomes infinite and adjustable, while the PII data can be held back. Machine learning systems trained on the simulated data can then be evaluated on the confidential data. The differences in the classification by the ML systems between real data and simulated data can also provide insight into the gaps in fidelity of the simulated data, which would provide an ongoing improvement to the simulation, which could in turn be released to the community.
  • Continuing with the cleanup of the SASO paper. Mostly done but some trimming of redundent bits and the “Ose Simple Trick” paragraph.
  • SASO travel link
    • Monday prices: SASO
  • Fika
    • Come up with 3-5 options for a finished state for the dissertation. It probably ranges from “pure theory” through “instance based on theory” to “a map generated by the system that matches the theory”
    • Once the SASO paper is in, set up a “wine and cheese” get together for the committee to go over the current work and discuss changes to the next phase
    • Start on a new IRB. Emphasize how everyone will have the same system to interact with, though their interactions will be different. Emphasize that the system has to allow open interaction to provide the best chance to realize theoretical results.
    • Will and I are on the hook for a Fika about LaTex

Phil 6.7.18

7:00 – 4:30 ASRC MKT

  • Che Dorval
  • Done with the whitepaper! Submitted! Yay! Add to ADP
  • The SLT meeting went well, apparently. Need to determine next steps
  • Back to Bit by Bit. Reading about mass collaboration. eBird looks very interesting. All kinds of social systems involved here.
    • Research
      • Deep Multi-Species Embedding
        • Understanding how species are distributed across landscapes over time is a fundamental question in biodiversity research. Unfortunately, most species distribution models only target a single species at a time, despite strong ecological evidence that species are not independently distributed. We propose Deep Multi-Species Embedding (DMSE), which jointly embeds vectors corresponding to multiple species as well as vectors representing environmental covariates into a common high-dimensional feature space via a deep neural network. Applied to bird observational data from the citizen science project \textit{eBird}, we demonstrate how the DMSE model discovers inter-species relationships to outperform single-species distribution models (random forests and SVMs) as well as competing multi-label models. Additionally, we demonstrate the benefit of using a deep neural network to extract features within the embedding and show how they improve the predictive performance of species distribution modelling. An important domain contribution of the DMSE model is the ability to discover and describe species interactions while simultaneously learning the shared habitat preferences among species. As an additional contribution, we provide a graphical embedding of hundreds of bird species in the Northeast US.
  • Start fixing This one Simple Trick
    • Highlighted all the specified changes. There are a lot of them!
    • Started working on figure 2, and realized (after about an hour of Illustrator work) that the figure is correct. I need to verify each comment before fixing it!
  • Researched NN anomaly detection. That work seems to have had its heyday in the ’90s, with more conventional (but computationally intensive) methods being preferred these days.
  • I also thought that Dr. Li’s model had a time-orthogonal component for prediction, but I don’t think that’s true. THe NN is finding the frequency and bounds on its own.
  • Wrote up a paragraph expressing my concerns and sent to Aaron.

Phil 6.5.18

7:00 – 6:00 ASRC

  • Read the SASO comments. Most are pretty good. My reviewer #2 was #3 this time. There is some rework that’s needed. Most of the comments are good, even the angry ones from #3, which are mostly “where is particle swarm optimization???”
  • Got an example quad chart from Helena that I’m going to base mine on
  • Neat thing from Brian F: grayson-map-2
  • Lots. Of. White. Paper.

Phil 6.1.18

7:00 – 6:00 ASRC MKT

  • Bot stampede reaction to “evolution” in a thread about UNIX. This is in this case posting scentiment against the wrong thing. There are layers here though. It can also be advertising. Sort of the dark side of diversity injection.
  • Seems like an explore/exploit morning
  • Autism on “The Leap”: Neurotypical and Neurodivergent (Neurodiversity)
  • From a BBC Business Daily show on Elon Musk
    • Thomas Astebro (Decision Science): The return to independent invention: evidence of unrealistic optimism, risk seeking or skewness loving? 
      • Examining a sample of 1,091 inventions I investigate the magnitude and distribution of the pre‐tax internal rate of return (IRR) to inventive activity. The average IRR on a portfolio investment in these inventions is 11.4%. This is higher than the risk‐free rate but lower than the long‐run return on high‐risk securities and the long‐run return on early‐stage venture capital funds. The portfolio IRR is significantly higher, for some ex anteidentifiable classes of inventions. The distribution of return is skew: only between 7‐9% reach the market. Of the 75 inventions that did, six realised returns above 1400%, 60% obtained negative returns and the median was negative.
  • Myth of first mover advantage
    • Conventional wisdom would have us believe that it is always beneficial to be first – first in, first to market, first in class. The popular business literature is full of support for being first and legions of would-be business leaders, steeped in the Jack Welch school of business strategy, will argue this to be the case. The advantages accorded to those who are first to market defines the concept of First Mover Advantage (FMA). We outline why this is not the case, and in fact, that there are conditions of applicability in order for FMA to hold (and these conditions often do not hold). We also show that while there can be advantages to being first, from an economic perspective, the costs generally exceed the benefits, and the full economics of FMA are usually a losing proposition. Finally, we show that increasingly, we live in a world where FMA is eclipsed by innovation and format change, rendering the FMA concept obsolete (i.e. strategic obsolescence).
  • More Bit by Bit
  • Investigating the Effects of Google’s Search Engine Result Page in Evaluating the Credibility of Online News Sources
    • Recent research has suggested that young users are not particularly skilled in assessing the credibility of online content. A follow up study comparing students to fact checkers noticed that students spend too much time on the page itself, while fact checkers performed “lateral reading”, searching other sources. We have taken this line of research one step further and designed a study in which participants were instructed to do lateral reading for credibility assessment by inspecting Google’s search engine result page (SERP) of unfamiliar news sources. In this paper, we summarize findings from interviews with 30 participants. A component of the SERP noticed regularly by the participants is the so-called Knowledge Panel, which provides contextual information about the news source being searched. While this is expected, there are other parts of the SERP that participants use to assess the credibility of the source, for example, the freshness of top stories, the panel of recent tweets, or a verified Twitter account. Given the importance attached to the presence of the Knowledge Panel, we discuss how variability in its content affected participants’ opinions. Additionally, we perform data collection of the SERP page for a large number of online news sources and compare them. Our results indicate that there are widespread inconsistencies in the coverage and quality of information included in Knowledge Panels.
  • White paper
    • Add something about geospatial mapping of belief.
    • Note that belief maps are cultural artifacts, so comparing someone from one belief space to others in a shared physical belief environment can be roughly equivalent to taking the dot product of the belief space vectors that you need to compare. This could produce a global “alignment map” that can suggest how aligned, opposed, or indifferent a population might be with respect to an intervention, ranging from medical (Ebola teams) to military (special forces operations).
      • Similar maps related to wealth in Rwanda based on phone metadata: Blumenstock, Joshua E., Gabriel Cadamuro, and Robert On. 2015. “Predicting Poverty and Wealth from Mobile Phone Metadata.” Science350 (6264):1073–6.
    • Added a section about how mapping belief maps would afford prediction about local belief, since overall state, orientation and velocity could be found for some individuals who are geolocated to that area and then extrapolated over the region.

Phil 5.31.18

7:00 – ASRC MKT

  • Via BBC Business Daily, found this interesting post on diversity injection through lunch table size:
  • KQED is playing America Abroad – today on russian disinfo ops:
    • Sowing Chaos: Russia’s Disinformation Wars 
      • Revelations of Russian meddling in the 2016 US presidential election were a shock to Americans. But it wasn’t quite as surprising to people in former Soviet states and the EU. For years they’ve been exposed to Russian disinformation and slanted state media; before that Soviet propaganda filtered into the mainstream. We don’t know how effective Russian information warfare was in swaying the US election. But we do know these tactics have roots going back decades and will most likely be used for years to come. This hour, we’ll hear stories of Russian disinformation and attempts to sow chaos in Europe and the United States. We’ll learn how Russia uses its state-run media to give a platform to conspiracy theorists and how it invites viewers to doubt the accuracy of other news outlets. And we’ll look at the evolution of internet trolling from individuals to large troll farms. And — finally — what can be done to counter all this?
  • Some interesting papers on the “Naming Game“, a form of coordination where individuals have to agree on a name for something. This means that there is some kind of dimension reduction involved from all the naming possibilities to the agreed-on name.
    • The Grounded Colour Naming Game
      • Colour naming games are idealised communicative interactions within a population of artificial agents in which a speaker uses a single colour term to draw the attention of a hearer to a particular object in a shared context. Through a series of such games, a colour lexicon can be developed that is sufficiently shared to allow for successful communication, even when the agents start out without any predefined categories. In previous models of colour naming games, the shared context was typically artificially generated from a set of colour stimuli and both agents in the interaction perceive this environment in an identical way. In this paper, we investigate the dynamics of the colour naming game in a robotic setup in which humanoid robots perceive a set of colourful objects from their own perspective. We compare the resulting colour ontologies to those found in human languages and show how these ontologies reflect the environment in which they were developed.
    • Group-size Regulation in Self-Organised Aggregation through the Naming Game
      • In this paper, we study the interaction effect between the naming game and one of the simplest, yet most important collective behaviour studied in swarm robotics: self-organised aggregation. This collective behaviour can be seen as the building blocks for many others, as it is required in order to gather robots, unable to sense their global position, at a single location. Achieving this collective behaviour is particularly challenging, especially in environments without landmarks. Here, we augment a classical aggregation algorithm with a naming game model. Experiments reveal that this combination extends the capabilities of the naming game as well as of aggregation: It allows the emergence of more than one word, and allows aggregation to form a controllable number of groups. These results are very promising in the context of collective exploration, as it allows robots to divide the environment in different portions and at the same time give a name to each portion, which can be used for more advanced subsequent collective behaviours.
  • More Bit by Bit. Could use some worked examples. Also a login so I’m not nagged to buy a book I own.
    • Descriptive and injunctive norms – The transsituational influence of social norms.
      • Three studies examined the behavioral implications of a conceptual distinction between 2 types of social norms: descriptive norms, which specify what is typically done in a given setting, and injunctive norms, which specify what is typically approved in society. Using the social norm against littering, injunctive norm salience procedures were more robust in their behavioral impact across situations than were descriptive norm salience procedures. Focusing Ss on the injunctive norm suppressed littering regardless of whether the environment was clean or littered (Study 1) and regardless of whether the environment in which Ss could litter was the same as or different from that in which the norm was evoked (Studies 2 and 3). The impact of focusing Ss on the descriptive norm was much less general. Conceptual implications for a focus theory of normative conduct are discussed along with practical implications for increasing socially desirable behavior. 
    • Construct validity centers around the match between the data and the theoretical constructs. As discussed in chapter 2, constructs are abstract concepts that social scientists reason about. Unfortunately, these abstract concepts don’t always have clear definitions and measurements.
      • Simulation is a way of implementing theoretical constructs that are measurable and testable.
  • Hyperparameter Optimization with Keras
  • Recognizing images from parts Kaggle winner
  • White paper
  • Storyboard meeting
  • The advanced analytics division(?) needs a modeling and simulation department that builds models that feed ML systems.
  • Meeting with Steve Specht – adding geospatial to white paper

Phil 5.30.18

7:15 – 6:00 ASRC MKT

  • More Bit by Bit
  • An interesting tweet about the dichotomy between individual and herd behaviors.
  • More white paper. Add something about awareness horizon, and how maps change that from a personal to a shared reality (cite understanding ignorance?)
  • Great discussion with Aaron about incorporating adversarial herding. I think that there will be three areas
    • Thunderdome – affords adversarial herding. Users have to state their intent before joining a discussion group. Bots and sock puppets allowed
    • Clubhouse – affords discussion with chosen individuals. THis is what I thought JuryRoom was
    • JuryRoom – fully randomized members and topics, based on activity in the Clubhouse and Thunderdome

Phil 5.25.18

7:00 – 6:00 ASRC MKT

  • Starting Bit by Bit
  • I realized the hook for the white paper is the military importance of maps. I found A Revolution in Military Cartography?: Europe 1650-1815
    • Military cartography is studied in order to approach the role of information in war. This serves as an opportunity to reconsider the Military Revolution and in particular changes in the eighteenth century. Mapping is approached not only in tactical, operational and strategic terms, but also with reference to the mapping of war for public interest. Shifts in the latter reflect changes in the geography of European conflict.
  • Reconnoitering sketch from Instructions in the duties of cavalry reconnoitring an enemy; marches; outposts; and reconnaissance of a country; for the use of military cavalry. 1876 (pg 83) reconnoitering_sketch
  • rutter is a mariner’s handbook of written sailing directions. Before the advent of nautical charts, rutters were the primary store of geographic information for maritime navigation.
    • It was known as a periplus (“sailing-around” book) in classical antiquity and a portolano (“port book”) to medieval Italian sailors in the Mediterranean Sea. Portuguese navigators of the 16th century called it a roteiro, the French a routier, from which the English word “rutter” is derived. In Dutch, it was called a leeskarte (“reading chart”), in German a Seebuch (“sea book”), and in Spanish a derroterro
    • Example from ancient Greece:
      • From the mouth of the Ister called Psilon to the second mouth is sixty stadia.
      • Thence to the mouth called Calon forty stadia.
      • From Calon to Naracum, which last is the name of the fourth mouth of the Ister, sixty stadia.
      • Hence to the fifth mouth a hundred and twenty stadia.
      • Hence to the city of Istria five hundred stadia.
      • From Istria to the city of Tomea three hundred stadia.
      • From Tomea to the city of Callantra, where there is a port, three hundred stadia
  • Battlespace
  • Cyber-Human Systems (CHS)
    • In a world in which computers and networks are increasingly ubiquitous, computing, information, and computation play a central role in how humans work, learn, live, discover, and communicate. Technology is increasingly embedded throughout society, and is becoming commonplace in almost everything we do. The boundaries between humans and technology are shrinking to the point where socio-technical systems are becoming natural extensions to our human experience – second nature, helping us, caring for us, and enhancing us. As a result, computing technologies and human lives, organizations, and societies are co-evolving, transforming each other in the process. Cyber-Human Systems (CHS) research explores potentially transformative and disruptive ideas, novel theories, and technological innovations in computer and information science that accelerate both the creation and understanding of the complex and increasingly coupled relationships between humans and technology with the broad goal of advancing human capabilities: perceptual and cognitive, physical and virtual, social and societal.
  • Reworked Section 1 to incorporate all this in a single paragraph
  • Long discussion about all of the above with Aaron
  • Worked on getting the CoE together by CoB
  • Do Diffusion Protocols Govern Cascade Growth?
    • Large cascades can develop in online social networks as people share information with one another. Though simple reshare cascades have been studied extensively, the full range of cascading behaviors on social media is much more diverse. Here we study how diffusion protocols, or the social exchanges that enable information transmission, affect cascade growth, analogous to the way communication protocols define how information is transmitted from one point to another. Studying 98 of the largest information cascades on Facebook, we find a wide range of diffusion protocols – from cascading reshares of images, which use a simple protocol of tapping a single button for propagation, to the ALS Ice Bucket Challenge, whose diffusion protocol involved individuals creating and posting a video, and then nominating specific others to do the same. We find recurring classes of diffusion protocols, and identify two key counterbalancing factors in the construction of these protocols, with implications for a cascade’s growth: the effort required to participate in the cascade, and the social cost of staying on the sidelines. Protocols requiring greater individual effort slow down a cascade’s propagation, while those imposing a greater social cost of not participating increase the cascade’s adoption likelihood. The predictability of transmission also varies with protocol. But regardless of mechanism, the cascades in our analysis all have a similar reproduction number ( 1.8), meaning that lower rates of exposure can be offset with higher per-exposure rates of adoption. Last, we show how a cascade’s structure can not only differentiate these protocols, but also be modeled through branching processes. Together, these findings provide a framework for understanding how a wide variety of information cascades can achieve substantial adoption across a network.
  • Continuing with creating the Simplest LSTM ever
    • All work and no play makes jack a dull boy indexes alphabetically as : AllWork

Phil 5.18.18

7:00 – 4:00 ASRC MKT

Phil 5.16.18

7:00 – 3:30 ASRC MKT

  • My home box has become very slow. 41 seconds to do a full recompile of GPM, while it takes 3 sec on a nearly identical machine at work. This may help?
  • Working on terms
  • Working on slides
  • Attending talk on Big Data, Security and Privacy – 11 am to 12 pm at ITE 459
    • Bhavani Thiraisingham
    • Big data management and analytics emphasizing GANs  and deep learning<- the new hotness
      • How do you detect attacks?
      • UMBC has real time analytics in cyber? IOCRC
    • Example systems
      • Cloud centric assured information sharing
    • Research challenges:
      • dynamically adapting and evolving policies to maintain privacy under a changing environment
      • Deep learning to detect attacks tat were previously not detectable
      • GANs or attacker and defender?
      • Scaleabe is a big problem, e.g. policies within Hadoop operatinos
      • How much information is being lost by not sharing data?
      • Fine grained access control with Hive RDF?
      • Distributed Search over Encrypted Big Data
    • Data Security & Privacy
      • Honypatching – Kevin xxx on software deception
      • Novel Class detection – novel class embodied in novel malware. There are malware repositories?
    • Lifecycle for IoT
    • Trustworthy analytics
      • Intel SGX
      • Adversarial SVM
      • This resembles hyperparameter tuning. What is the gradient that’s being descended?
      • Binary retrofitting. Some kind of binary man-in-the-middle?
      • Two body problem cybersecurity
    • Question –
      • discuss how a system might recognize an individual from session to session while being unable to identify the individual
      • What about multiple combinatorial attacks
      • What about generating credible false information to attackers, that also has steganographic components for identifying the attacker?
  • I had managed to not commit the embedding xml and the programs that made them, so first I had to install gensim and lxml at home. After that it’s pretty straightforward to recompute with what I currently have.
  • Moving ARFF and XLSX output to the menu choices. – done
  • Get started on rendering
    • Got the data read in and rendering, but it’s very brute force:
          double posScalar = ResizableCanvas.DEFAULT_SCALAR/2.0;
          List<WordEmbedding> weList = currentEmbeddings.getEmbeddings();
          for (WordEmbedding we : weList){
              double size = 10.0 * we.getCount();
              SmartShape ss = new SmartShape(we.getEntry(), Color.WHITE, Color.BLACK);
              ss.setPos(we.getCoordinate(0)*posScalar, we.getCoordinate(1)*posScalar);
              ss.setSize(size, size);

      It took a while to remember how shapes and agents work together. Next steps:

      • Extend SmartShape to SourceShape. It should be a stripped down version of FlockingShape
      • Extend BaseCA to SourceCA, again, it should be a stripped down version of FlockingBeliefCA
      • Add a sourceShapeList for FlockingAgentManager that then passes that to the FlockingShapes

Phil 5.15.18

7:00 – 4:00 ASRC MKT

Phil 5.8.18

7:00 – 5:00 ASRC MKT

5:00 – 8:00 ASRC Tech Conference

Phil 5.7.18

7:00 – 5:00 ASRC MKT

  • Content Sharing within the Alternative Media Echo-System: The Case of the White Helmets
    • Kate Starbird
    • In June 2017 our lab began a research project looking at online conversations about the Syria Civil Defence (aka the “White Helmets”). Over the last 8–9 months, we have spent hundreds of hours conducting analysis on the tweets, accounts, articles, and websites involved in that discourse. Our first peer-reviewed paper was recently accepted to an upcoming conference (ICWSM-18). That paper focuses on a small piece of the structure and dynamics of this conversation, specifically looking at content sharing across websites. Here, I describe that research and highlight a few of the findings.
  • Matt Salganik on Open Review
  • Spent a lot of time getting each work to draw differently in the scatterplot. That took some digging into the gensim API to get vectors from the corpora. I then tried to plot the list of arrays, but matplotlib only likes ndarrays (apparently?). I’m now working on placing the words from each text into their own ndarray.
  • Also added a filter for short stop words and switched to a hash map for words to avoid redundant points in the plot.
  • Fika
    • Bryce Peake
    • ICA has a computational methods study area. How media lows through different spaces, etc. Python and [R]
    • Anne Balsamo – designing culture
    • what about language as an anti-colonial interaction
    • Human social scraping of data. There can be emergent themes that become important.
    • The ability of the user to delete all primary, secondary and tertiary data.
    • The third eye project (chyron crawls)

Phil 5.4.18

7:00 – 4:30 ASRC MKT

  • Listening to the Invisibilia episode on the stories we tell ourselves. (I, I, I. Him)
  • Listening to BBC Business Daily, on Economists in the doghouse. One of the people being interviewed is Mariana Mazzucato, who wrote The Entrepreneurial State: debunking public vs. private sector myths. She paraphrases Plato: “stories rule the world”. Oddly, this does not show up when you search through Plato’s work. It may be part of the Parable of the Cave, where the stories that the prisoners tell each other build a representation of the world?
  • Moby Dick, page 633 – a runaway condition:
    • They were one man, not thirty. For as the one ship that held them all; though it was put together of all contrasting things-oak, and maple, and pine wood; iron, and pitch, and hemp-yet all these ran into each other in the one concrete hull, which shot on its way, both balanced and directed by the long central keel; even so, all the individualities of the crew, this man’s valor, that man’s fear; guilt and guiltiness, all varieties were welded into oneness, and were all directed to that fatal goal which Ahab their one lord and keel did point to.
  • John Goodall, one of Wayne’s former students is deep into intrusion detection and visualization
  • Added comments to Aaron’s Reddit notes / CHI paper
  • Chris McCormick has a bunch of nice tutorials on his blog, including this one on Word2Vec:
    • This tutorial covers the skip gram neural network architecture for Word2Vec. My intention with this tutorial was to skip over the usual introductory and abstract insights about Word2Vec, and get into more of the details. Specifically here I’m diving into the skip gram neural network model.
    • He also did this:
    • wiki-sim-search: Similarity search on Wikipedia using gensim in Python.The goals of this project are the following two features:
      1. Create LSI vector representations of all the articles in English Wikipedia using a modified version of the script in gensim.
      2. Perform concept searches and other fun text analysis on Wikipedia, also using gensim functionality.
  • Slicing out columns in numpy:
    import numpy as np
    dimension = 3
    size = 10
    dataset = np.ndarray(shape=(size, dimension))
    for x in range(size):
        for y in range(dimension):
            val = (y+1) * 10 + x +1
            dataset[x,y] = val

    Results in:

    [[11. 21. 31.]
    [12. 22. 32.]
    [13. 23. 33.]
    [14. 24. 34.]
    [15. 25. 35.]
    [16. 26. 36.]
    [17. 27. 37.]
    [18. 28. 38.]
    [19. 29. 39.]
    [20. 30. 40.]]
    [11. 12. 13. 14. 15. 16. 17. 18. 19. 20.]
    [21. 22. 23. 24. 25. 26. 27. 28. 29. 30.]
    [31. 32. 33. 34. 35. 36. 37. 38. 39. 40.]
  • And that makes everything work. Here’s a screenshot of a 3D embedding space for the entire(?) Jack London corpora: 3D_corpora
  • A few things come to mind
    • I’ll need to get the agents to stay in the space that the points are in. I think each point is an “attractor” with a radius (an agent without a heading). IN the presence of an attractor an agent’s speed is reduced by x%. It there are a lot of attractors (n), then the speed is reduced by xn%. Which should make for slower agents in areas of high density. Agents in the presence of attractors also expand their influence horizon, becoming more “attractive”
    • I should be able to draw the area covered by each book in the corpora by looking for the W2V coordinates and plotting them as I read through the (parsed) book. Each book gets a color.

Phil 5.3.18

7:30 – 5:00 ASRC MKT