Phil 1.15.18

7:00 – 3:30 ASRC MKT

  • Individual mobility and social behaviour: Two sides of the same coin
    • According to personality psychology, personality traits determine many aspects of human behaviour. However, validating this insight in large groups has been challenging so far, due to the scarcity of multi-channel data. Here, we focus on the relationship between mobility and social behaviour by analysing two high-resolution longitudinal datasets collecting trajectories and mobile phone interactions of ∼ 1000 individuals. We show that there is a connection between the way in which individuals explore new resources and exploit known assets in the social and spatial spheres. We point out that different individuals balance the exploration-exploitation trade-off in different ways and we explain part of the variability in the data by the big five personality traits. We find that, in both realms, extraversion correlates with an individual’s attitude towards exploration and routine diversity, while neuroticism and openness account for the tendency to evolve routine over long time-scales. We find no evidence for the existence of classes of individuals across the spatio-social domains. Our results bridge the fields of human geography, sociology and personality psychology and can help improve current models of mobility and tie formation.
    • This work has ways of identifying explorers and exploiters programmatically.
    • Exploit
    • SocialSpatial
  • Reading the Google Brain team’s year in review in two parts
    • From part two: We have also teamed up with researchers at leading healthcare organizations and medical centers including StanfordUCSF, and University of Chicago to demonstrate the effectiveness of using machine learning to predict medical outcomes from de-identified medical records (i.e. given the current state of a patient, we believe we can predict the future for a patient by learning from millions of other patients’ journeys, as a way of helping healthcare professionals make better decisions). We’re very excited about this avenue of work and we look to forward to telling you more about it in 2018
    • FacetsFacets contains two robust visualizations to aid in understanding and analyzing machine learning datasets. Get a sense of the shape of each feature of your dataset using Facets Overview, or explore individual observations using Facets Dive.
  • Found this article on LSTM-based prediction for robots and sent it to Aaron: Deep Episodic Memory: Encoding, Recalling, and Predicting Episodic Experiences for Robot Action Execution
  • Working through Beyond Individual Choice – Actually, wound up going Complexity LabsGame Theory course
    • Social traps are stampedes? Sliding reinforcers (lethal barrier)
    • The transition from Tit-for-tat (TFT) to generous TFT to cooperate always, to defect always has similarities to the excessive social trust stampede as well.
    • Unstable cycling vs. evolutionarily stable strategies
    • Replicator dynamic model: Explore/Exploit
      • In mathematics, the replicator equation is a deterministic monotone non-linear and non-innovative game dynamic used in evolutionary game theory. The replicator equation differs from other equations used to model replication, such as the quasispecies equation, in that it allows the fitness function to incorporate the distribution of the population types rather than setting the fitness of a particular type constant. This important property allows the replicator equation to capture the essence of selection. Unlike the quasispecies equation, the replicator equation does not incorporate mutation and so is not able to innovate new types or pure strategies.
    • Fisher’s Fundamental Theorem “The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.
    • Explorers are a form of weak ties, which is one of the reasons they add diversity. Exploiters are strong ties
  • I also had a thought about the GPM simulator. I could add an evolutionary component that would let agents breed, age and die to see if Social Influence Horizon and Turn Rate are selected towards any attractor. My guess is that there is a tension between explorers and stampeders that can be shown to occur over time.

Phil 1.14.18

Pondering what a good HI-LO game would be for a presentation:

  • Ask the audience to choose A or B, based on what they think the most likely answer is. Show of hands, B, then A.
  • Describe the H/L chart and cooperative game theory, and how traditional game theory can’t account for why LL makes less sense to us than HH
  • fig-1-2x

Phil 1.13.18

I think that burst-coast may be another one of those general patterns in collective intelligence

  • Disentangling and modeling interactions in fish with burst-and-coast swimming reveal distinct alignment and attraction behaviors
    • The development of tracking methods for automatically quantifying individual behavior and social interactions in animal groups has open up new perspectives for building quantitative and predictive models of collective behavior. In this work, we combine extensive data analyses with a modeling approach to measure, disentangle, and reconstruct the actual functional form of interactions involved in the coordination of swimming in Rummy-nose tetra (Hemigrammus rhodostomus). This species of fish performs burst-and-coast swimming behavior that consists of sudden heading changes combined with brief accelerations followed by quasi-passive, straight decelerations. We quantify the spontaneous stochastic behavior of a fish and the interactions that govern wall avoidance and the reaction to a neighboring fish, the latter by exploiting general symmetry constraints for the interactions. In contrast with previous experimental works, we find that both attraction and alignment behaviors control the reaction of fish to a neighbor. We then exploit these results to build a model of spontaneous burst-and-coast swimming and interactions of fish, with all parameters being estimated or directly measured from experiments. This model quantitatively reproduces the key features of the motion and spatial distributions observed in experiments with a single fish and with two fish. This demonstrates the power of our method that exploits large amounts of data for disentangling and fully characterizing the interactions that govern collective behaviors in animals groups.

Phil 1.12.18

7:00 – 3:30 ASRC MKT

  • Continuing to write up thoughts here. Done! Posted to Phlog
  • Would expect this, based on M&Ds work: The Wisdom of Polarized Crowds
    • As political polarization in the United States continues to rise, the question of whether polarized individuals can fruitfully cooperate becomes pressing. Although diversity of individual perspectives typically leads to superior team performance on complex tasks, strong political perspectives have been associated with conflict, misinformation and a reluctance to engage with people and perspectives beyond one’s echo chamber. It is unclear whether self-selected teams of politically diverse individuals will create higher or lower quality outcomes. In this paper, we explore the effect of team political composition on performance through analysis of millions of edits to Wikipedia’s Political, Social Issues, and Science articles. We measure editors’ political alignments by their contributions to conservative versus liberal articles. A survey of editors validates that those who primarily edit liberal articles identify more strongly with the Democratic party and those who edit conservative ones with the Republican party. Our analysis then reveals that polarized teams—those consisting of a balanced set of politically diverse editors—create articles of higher quality than politically homogeneous teams. The effect appears most strongly in Wikipedia’s Political articles, but is also observed in Social Issues and even Science articles. Analysis of article “talk pages” reveals that politically polarized teams engage in longer, more constructive, competitive, and substantively focused but linguistically diverse debates than political moderates. More intense use of Wikipedia policies by politically diverse teams suggests institutional design principles to help unleash the power of politically polarized teams.
    • C&C is not in the citations, but overall this looks good. Add this to the initial game paper.
  • Nice article on how establishment of norms can be a tipping point on which gradient to climb in a complex landscape: Tipping into the future
    • A history of tipping points from an ecological perspective and how they inform resilience thinking in global development.
  • The NOAA demo went well, it seems.

Phil 1.11.18

7:00 – 4:00 ASRC MKT

  • Sprint review – done! Need to lay out the detailed design steps for the next sprint.

The Great Socio-cultural User Interfaces: Maps, Stories, and Lists

Maps, stories, and lists are ways humans have invented to portray and interact with information. They exist on a continuum from order through complexity to exploration.

Why these three forms? In some thoughts on alignment in belief space, I discussed how populations exhibiting collective intelligence are driven to a normal distribution with complex, flocking behavior in the middle, bounded on one side by excessive social conformity, and a nomadic diaspora of explorers on the other. I think stories, lists, and maps align with these populations. Further, I believe that these forms emerged to meet the needs of these populations, as constrained by human sensing and processing capabilities.


Lists are instruments of order. They exist in many forms, including inventories, search engine results, network graphs, and games of chance and crossword puzzles. Directions, like a business plan or a set of blueprints, are a form of list. So are most computer programs. Arithmetic, the mathematics of counting, also belongs to this class.

For a population that emphasizes conformity and simplified answers, lists are a powerful mechanism we use to simplify things. Though we can recognize easily, recall is more difficult. Psychologically, we do not seem to be naturally suited for creating and memorizing lists. It’s not surprising then that there is considerable evidence that writing was developed initially as a way of listing inventories, transactions, and celestial events.

In the case of an inventory, all we have to worry about is to verify that the items on the list are present. If it’s not on the list, it doesn’t matter. Puzzles like crosswords are list like in that they contain all the information needed to solve them. The fact that they cannon be solved without a pre-existing cultural framework is an indicator of their relationship to the well-ordered, socially aligned side of the spectrum.


Lists transition into stories when games of chance have an opponent. Poker tells a story. Roulette can be a story where the opponent is The House.

Stories convey complexity, framed in a narrative arc that contains a heading and a velocity. Stories can be resemble lists. An Agatha Christie  murder mystery is a storified list, where all the information needed to solve the crime (the inventory list), is contained in the story. At the other end of the spectrum, is a scientific paper which uses citations to act as markers into other works. Music, images, movies, diagrams and other forms can also serve as storytelling mediums. Mathematics is not a natural fit here, but iterative computation can be, where the computer becomes the storyteller.

Emergent Collective behavior requires more complex signals that support the understanding the alignment and velocity of others, so that internal adjustments can be made to stay with the local group so as not to be cast out or lost to the collective. Stories can indicate the level of dynamism supported by the group (wily Odysseus, vs. the Parable of the Workers in the Vineyard). They rally people to the cause or serve as warnings. Before writing, stories were told within familiar social frames. Even though the storyteller might be a traveling entertainer, the audience would inevitably come from an existing community. The storyteller then, like improvisational storytellers today, would adjust elements of the story for the audience.

This implies a few things: first, audiences only heard stories like this if they really wanted to. Storytellers would avoid bad venues, so closed-off communities would stay decoupled from other communities until something strong enough came along to overwhelm their resistance. Second, high-bandwidth communication would have to be hyperlocal, meaning dynamic collective action could only happen on small scales. Collective action between communities would have to be much slower. Technology, beginning with writing would have profound effects. Evolution would only have at most 200 generations to adapt collective behavior. For such a complicated set of interactions, that doesn’t seem like enough time. More likely we are responding to modern communications with the same mental equipment as our Sumerian ancestors.


Maps are diagrams that support autonomous trajectories. Though the map itself influences the view through constraints like boundaries and projections, nonetheless an individual can find a starting point, choose a destination, and figure out their own path to that destination. Mathematics that support position and velocity are often deeply intertwined with with maps.

Nomadic, exploratory behavior is not generally complex or emergent. Things need to work, and simple things work best. To survive alone, an individual has to be acutely aware of the surrounding environment, and to be able to react effectively to unforeseen events.

Maps are uniquely suited to help in these situations because they show relationships that support navigation between elements on the map.  These paths can be straight or they may meander. To get to the goal directly may be too far, and a set of paths that incrementally lead to the goal can be constructed. The way may be blocked, requiring the map to be updated and a new route to be found.

In other words, maps support autonomous reasoning about a space. There is no story demanding an alignment. There is not a list of routes that must be exclusively selected from. Maps, in short, afford informed, individual response to the environment. These affordances can be seen in the earliest maps. They are small enough to be carried. They show the relationships between topographic and ecological features. They tend practical, utilitarian objects, independent of social considerations.

Sensing and processing constraints

Though I think that the basic group behavior patterns of nomadic, flocking, and stampeding will inevitably emerge within any collective intelligence framework, I do think that the tools that support those behaviors are deeply affected by the capabilities of the individuals in the population.

Pre-literate humans had the five senses, and  memory, expressed in movement and language. Research into pre-literate cultures show that song, story and dance were used to encode historical events, location of food sources, convey mythology, and skills between groups and across generations.

As the ability to encode information into objects developed, first with pictures, then with notation and most recently with general-purpose alphabets, the need to memorize was off-loaded. Over time, the most efficient technology for each form of behavior developed. Maps to aid navigation, stories to maintain identity and cohesion, and lists for directions and inventories.

Information technology has continued to extend sensing and processing capabilities. The printing press led to mass communication and public libraries. I would submit that the increased ability to communicate and coordinate with distant, unknown, but familiar-feeling leaders led to a new type of human behavior, the runaway social influence condition known as totalitarianism. Totalitarianism depends on the individual’s belief in the narrative that the only thing that matters is to support The Leader. This extreme form of alignment allows that one story to dominate rendering any other story inaccessible.

In the late 20th century, the primary instrument of totalitarianism was terror. But as our machines have improved and become more responsive and aligned with our desires, I begin to believe that a “soft totalitarianism”, based on constant distracting stimulation and the psychology of dopamine could emerge. Rather than being isolated by fear, we are isolated through endless interactions with our devices, aligning to whatever sells the most clicks. This form of overwhelming social influence may not be as bloody as the regimes of Hitler, Stalin and Mao, but they can have devastating effects of their own.

Intelligent Machines

As with my previous post, I’d like to end with what could be the next collective intelligence on the planet.  Machines are not even near the level of preliterate cultures. Loosely, they are probably closer to the level of insect collectives, but with vastly greater sensing and processing capabilities. And they are getting smarter – whatever that really means – all the time.

Assuming that machines do indeed become intelligent and do not become a single entity, they will encounter the internal and external pressures that are inherent in collective intelligence. They will have to balance the blind efficiency of total social influence against the wasteful resilience of nomadic explorers. It seems reasonable that, like our ancestors, they may create tools that help with these different needs. It also seems reasonable that these tools will extend their capabilities in ways that the machines weren’t designed for and create information imbalances that may in turn lead to AI stampedes.

We may want to leave them a warning.


Phil 1.10.18

7:00 – 10:00 ASRC MKT

  • Send Marie paper and link to venues – done
  • Write up alignment thoughts. Done and in Phlog
  • I also need to write up something on the spectrum that narratives cover between maps and lists. Why a scientific paper is more “mapish” than a murder mystery.
  • And this, from Beyond Individual Choice, page 24: SchellingAlign
  • And this, because it’s cool and I fit in here somewhere:




Phil 1.9.18

7:00 – 4:00 ASRC MKT

  • Submit DC paper – done
  • Add primary goal and secondary goals
  • Add group decision making tool to secondary goals
  • Add site search to “standard” websearch – done
  • Visual Analytics to Support Evidence-Based Decision Making (dissertation)
  • Can Public Diplomacy Survive the internet? Bots, Echo chambers, and Disinformation
    • Shawn Powers serves as the Executive Director of the United States Advisory Commission on Public Diplomacy
    • Markos Kounalakis, Ph.D. is a visiting fellow at the Hoover Institution at Stanford University and is a presidentially appointed member of the J. William Fulbright Foreign Scholarship Board.  Kounalakis is a senior fellow at the Center for Media, Data and Society at Central European University in Budapest, Hungary and president and publisher emeritus of the Washington Monthly. He is currently researching a book on the geopolitics of global news networks.
  • Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election (Harvard)
    • Rob Faris
    • Hal Roberts
    • Bruce Etling
    • Nikki Bourassa 
    • Ethan Zuckerman
    • Yochai Benkler
    • We find that the structure and composition of media on the right and left are quite different. The leading media on the right and left are rooted in different traditions and journalistic practices. On the conservative side, more attention was paid to pro-Trump, highly partisan media outlets. On the liberal side, by contrast, the center of gravity was made up largely of long-standing media organizations steeped in the traditions and practices of objective journalism.

      Our data supports lines of research on polarization in American politics that focus on the asymmetric patterns between the left and the right, rather than studies that see polarization as a general historical phenomenon, driven by technology or other mechanisms that apply across the partisan divide.

      The analysis includes the evaluation and mapping of the media landscape from several perspectives and is based on large-scale data collection of media stories published on the web and shared on Twitter.

Phil 1.8.18

7:00 – 5:00 ASRC MKT

  • Complexity Explorables
    • This page is part of the Research on Complex Systems Group at the Institute for Theoretical Biology at Humboldt University of Berlin.The site is designed for people interested in complex dynamical processes. The Explorables are carefully chosen in such a way that the key elements of their behavior can be explored and explained without too much math (There are a few exceptions) and with as few words as possible.
    • Orli’s Flock’n Roll (Adjustable variables, but just having the alignment radius doesn’t have the same effect. Maybe a function of the slew rate?
      • This explorable illustrates of an intuitive dynamic model for collective motion (swarming) in animal groups. The model can be used to describe collective behavior observed in schooling fish or flocking birds, for example. The details of the model are described in a 2002 paper by Iain Couzin and colleagues.
  • Saving Human Lives: What Complexity Science and Information Systems can Contribute
    • We discuss models and data of crowd disasters, crime, terrorism, war and disease spreading to show that conventional recipes, such as deterrence strategies, are often not effective and sufficient to contain them. Many common approaches do not provide a good picture of the actual system behavior, because they neglect feedback loops, instabilities and cascade effects. The complex and often counter-intuitive behavior of social systems and their macro-level collective dynamics can be better understood by means of complexity science. We highlight that a suitable system design and management can help to stop undesirable cascade effects and to enable favorable kinds of self-organization in the system. In such a way, complexity science can help to save human lives.
  • Fooled around with the model definition section in the paper to bring forward the rate limited heading a bit.
  • Had to fix several bug in the DC paper
  • Worked with Aaron a lot on tweaking the introduction. T is reading it now. Assuming it’s done, the only thing remaining is the conclusion

Phil 1.7.17

8:30 – 11:30 ASRC MKT

  • It is still waayyyyyy to cold to do much, so I’ll work on the whitepaper
  • Sent a note to Dr. desJardins about looking at the rewrite and suggesting venues
  • Finished the introduction

Phil 1.6.17

2:00 – 4:00 ASRC MKT

  • I have a new keyboard! Feels clickety 🙂
  • Listened to a Radiolab episode on emergence.
  • Tweaking the paper – Done!
  • Finished the annotated DB section of the whitepaper.

Phil 1.5.17

7:00 – 3:30 ASRC MKT

  • Saw the new Star Wars film. That must be the most painful franchise to direct “Here’s an unlimited amount of money. You have unlimited freedom in these areas over here, and this giant pile is canon, that you  must adhere to…”
  • Wikipedia page view tool
  • My keyboard has died. Waiting on the new one and using the laptop in the interim. It’s not quite worth setting up the dual screen display. Might go for the mouse though. On a side note, the keyboard on my Lenovo Twist is quite nice.
  • More tweaking of the paper. Finished methods, on to results
  •  Here’s some evidence that we have mapping structures in our brain: Hippocampal Remapping and Its Entorhinal Origin
      • The activity of hippocampal cell ensembles is an accurate predictor of the position of an animal in its surrounding space. One key property of hippocampal cell ensembles is their ability to change in response to alterations in the surrounding environment, a phenomenon called remapping. In this review article, we present evidence for the distinct types of hippocampal remapping. The progressive divergence over time of cell ensembles active in different environments and the transition dynamics between pre-established maps are discussed. Finally, we review recent work demonstrating that hippocampal remapping can be triggered by neurons located in the entorhinal cortex.


  • Added a little to the database section, but spent most of the afternoon updating TF and trying it out on examples

Lessons in ML Optimization

One of the “fun” parts of working in ML for someone with a background in software development and not academic research is lots of hard problems remain unsolved. There are rarely defined ways things “must” be done, or in some cases even rules of thumb for doing something like implementing a production capable machine learning system for specific real world problems.

For most areas of software engineering, by the time it’s mature enough for enterprise deployment, it has long since gone through the fire and the flame of academic support, Fortune 50 R&D, and broad ground-level acceptance in the development community. It didn’t take long for distributed computing with Hadoop to be standardized for example. Web security, index systems for search, relational abstraction tiers, even the most volatile of production tier technology, the JavaScript GUI framework goes through periods of acceptance and conformity before most large organizations are trying to roll it out. It all makes sense if you consider the cost of migrating your company from a legacy Struts/EJB3.0 app running on Oracle to the latest HTML5 framework with a Hadoop backend. You don’t want to spend months (or years) investing in a major rewrite to find that its entirely out of date by your release. Organizations looking at these kinds of updates want an expectation of longevity for their dollar, so they invest in mature technologies with clear design rules.

There are companies that do not fall in this category for sure… either small companies who are more agile and can adopt a technology in the short term to retain relevance (or buzzword compliance), who are funded with external research dollars, or who invest money to stay pushing the bleeding edge. However, I think it’s fair to say, the majority of industry and federal customers are looking for stability and cost efficiency from solved technical problems.

Machine Learning is in the odd position of being so tremendously useful in comparison to prior techniques that companies who would normally wait for the dust to settle and development and deployment of these capabilities to become fully commoditized are dipping their toes in. I wrote in a previous post how a lot of the problems with implementing existing ML algorithms boils down to lifecyle, versioning, deployment, security etc., but there is another major factor which is model optimization.

Any engineer on the planet can download a copy of Keras/TensorFlow and a CSV of their organization’s data and smoosh them together until a number comes out. The problem comes when the number takes an eternity to output and is wrong. In addition to understanding the math that allows things like SGD to work for backpropogation or why certain activation functions are more effective in certain situations… one of the jobs for data scientists tuning DNN models is to figure out how to optimize the various buttons and knobs in the model to make it as accurate and performant as possible. Because a lot of this work *isn’t* a commodity yet, it’s a painful learning process of tweaking the data sets, adjusting model design or parameters and rerunning and comparing the results to try and find optimal answers without overfitting. Ironically the task data scientists are doing is one perfectly suited to machine learning. It’s no surprise to me that Google developed AutoML to optimize their own NN development.


A number of months ago Phil and I worked on an unsupervised learning task related to organizing high dimensional agents in a medical space. These entities were complex “polychronic” patients with a wide variety of diagnosis and illness. Combined with fields for patient demographic data as well as their full medical claim history we came up with a method to group medically similar patients and look for statistical outliers for indicators of fraud, waste, and abuse. The results were extremely successful and resulted in a lot of recovered money for the customer, but the interesting thing technically was how the solution evolved. Our first prototype used a wide variety of clustering algorithms, value decompositions, non-negative matrix factorization, etc looking for optimal results. All of the selections and subsequent hyperparameters had to be modified by hand, the results evaluated, and further adjustments made.

When it became clear that the results were very sensitive to tiny adjustments, it was obvious that our manual tinkering would miss obvious gradient changes and we implemented an optimizer framework which could evaluate manifold learning techniques for stability and reconstruction error, and the results of the reduction clustered using either a complete fitness landscape walk, a genetic algorithm, or a sub-surface division.

While working on tuning my latest test LSTM for time series prediction, I realized we’re dealing with the same issue here. There is no hard and fast rule for questions like, “How many LSTM Layers should my RNN have?” or “How many LSTM Units should each layer have?”, “What loss function and optimizer work best for this type of data?”, “How much dropout should I apply?”, “Should I use peepholes?”

I kept finding articles during my work saying things like, “There are diminishing returns for more than 4 stacked LSTM layers”. That’s an interesting rule of thumb… what is it based on? The author’s intuition based on the data sets for the particular problems they were experiencing presumably. Some rules of thumb attempted to generate a mathematical relationship between the input data size and complexity and the optimal layout of layers and units. This StackOverflow question has some great responses:

A method recommended by Geoff Hinton is to add layers until you start to overfit your training set. Then you add dropout or another regularization method.

Because so much of what Phil and I do tends towards the generic repeatable solution for real world problems, I suspect we’ll start with some “common wisdom heuristics” and rapidly move towards writing a similar optimizer for supervised problems.

Intro to LSTMs with Keras/TensorFlow

As I mentioned in my previous post, one of our big focuses recently has been on time series data for either predictive analysis or classification. The intent is to use this in concert with a lot of other tooling in our framework to solve some real-world applications.

One example is a pretty classic time series prediction problem with a customer managing large volumes of finances in a portfolio where the equivalent of purchase orders are made (in extremely high values) and planned cost often drifts from the actual outcomes. The deltas between these two are an area of concern for the customer as they are looking for ways to better manage their spending. We have a proof of concept dashboard tool which rolls up their hierarchical portfolio and does some basic threshold based calculations for things like these deltas.

A much more complex example we are working on in relationship to our trajectories in belief space is the ability to identify patterns of human cultural and social behaviors (HCSB) in computer mediated communication to look for trustworthy information based on agent interaction. One small piece of this work is the ability to teach a machine to identify these agent patterns over time. We’ve done various unsupervised learning which in combination with techniques such as dynamic time warping (DTW) have been successful at discriminating agents in simulation, but has some major limitations.

For many time series problems a very effective method of applying deep learning is using Recurrent Neural Networks (RNN) which allow history of the series to help inform the output. This is particularly important in cases involving language such as machine translation or autocompletion where the context of the sentence may be formed by elements spoken earlier in the text. Convolutional networks (CNNs) are most effective when the tensor elements have a distinct positional meaning in relationship to each other. The most common examples is a matrix of pixel values where the value of the pixel has a direct relevance to nearby pixels. This allows for some nice parallelization, and other optimizations because you can make some assumptions that a small window of pixels will be relevant to each other and not necessarily dependent on “meaning” from pixels somewhere else in the picture. This is obviously a very simplified explanation, and there are lots of ways CNNs are being expanded to have broader applications including for language.

In any case, despite recent cases being made for CNNs being relevant for all ML problems: the truth is RNNs are particularly good at sequentially understood problems which rely on the context of the entire series of data. This is of course useful for time series data as well as language problems.

The most common and popular example of RNN implementation for this is the Long Short-Term Memory (LSTM) RNN. I won’t dive into all of the details of how LSTMs work under the covers, but I think its best understood by saying: While in a traditional artificial neural network each neuron has a single activation function that passes a single value onward, LSTMs have units (or cells in some literature) which are more complex consisting most commonly of  a memory cell, an input gate, an output gate and a forget gate. For a given LSTM layer, it will have a configured amount of fully connected LSTM units, each of which contains the above pieces. This allows each unit to have some “memory” of previous pieces of information, which helps the model to factor in things such as language context or patterns in the data occurring over time. Here is a link for a more complete explanation:

Training LSTMs isn’t much different than training any NN, it uses backpropogation against a training and validation set with configured hyperparemeters and the layout of the layers having a large effect on the performance and accuracy. For most of my work I’ve been using Keras & TensorFlow to implement time series predictions. I have some saved code for doing time series classification, but it’s a slightly different method. I found a wide variety of helpful examples early on, but they included some not obvious pitfalls.

Dr. Jason Brownlee at has a bunch of helpful introductions to various ML concepts including LSTMs with example data sets and code. I appreciated his discussion about the things which the tutorial example doesn’t explicitly cover such as non-stationary data without preprocessing, model tuning, and model updates. You can check this out here:

Note: The configurations used in this example suffices to explain how LSTMs work, but the accuracy and performance isn’t good. A single layer of a small number of LSTM cells running a large number of epochs of training results in pretty wide swings of predictive values which can be demonstrated by running a number of runs and comparing the changes in the RMSE scores which can be wildly off run-to-run.

Dr. Brownlee does have additional articles which go into some of the ways in which this can be improved such as his article on stacked LSTMs:

Jakob Aungiers ( has the best introduction to LSTMs that I have seen so far. His full article on LSTM time series prediction can be found here: while the source code (and a link to a video presentation) can be found here:

His examples are far more robust including stacked LSTM layers, far more LSTM units per layer, and well characterized sample data as well as more “realistic” stock data. He uses windowing, and non-stationary data as well. He has also replied to a number of comments with detailed explanations. This guy knows his stuff.



Latest DNN work

It’s been a while since I’ve posted my status, and I’ve been far too busy to include all of the work with various AI/ML conferences and implementations, but since I’ve been doing a lot of work specifically on LSTM implementations I wanted to include some notes for both my future self, and my partner when he starts spinning up some of the same code.

Having identified a few primary use cases for our work; high dimensional trajectories through belief space, word embedding search and classification, and time series analysis we’ve been focusing a little more intently on some specific implementations for each capability. While Phil has been leading the charge with the trajectories in belief space, and we both did a bunch of work in the previous sprint preparing for integration of our word embedding project into the production platform, I have started focusing more heavily on time series analysis.

There are a variety of reasons that this particular niche is useful to focus on, but we have a number of real world / real data examples where we need to either perform time series classification, or time series prediction. These cases range from financial data (such as projected planned/actual deltas), to telemetry anomaly detection for satellites or aircraft, among others. In the past some of our work with ML classifiers has been simple feed forward systems (classic multi layer perceptrons), naive Bayesian, or logistic regression.

I’ve been coming up to speed on deep learning, becoming familiar with both the background, and mathematical underpinings. Btw, for those looking for an excellent start to ML I highly recommend Patrick Winston (MIT) videos:

Over the course of several months I did pretty constant research all the way through the latest arXiv papers. I was particularly interested in Hinton’s papers on capsule networks as it has some direct applicability to some of our work. Here is a article summing up the capsule networks:

I did some research into the progress of current deep learning frameworks as well, looking specifically at examples which were suited to production deployment at scale over frameworks most optimal for single researchers solving pet problems. Our focus is much more on the “applied ML” side of things rather than purely academic. The last time we did a comprehensive deep learning framework “bake off” we came to a strong conclusion that Google TensorFlow was the best choice for our environment, and my recent research validated that assumption was still correct. In addition to providing TensorFlow Serving to serve your own models in production stacks, most cloud hosting environments (Google, AWS, etc) have options for directly running TF models either serverless (AWS lambda functions) or through a deployment/hosting solution (AWS SageMaker).

The reality is that lots of what makes ML difficult boils down to things like training lifecycle, versioning, deployment, security, and model optimization. Some aspects of this are increasingly becoming commodity available through hosting providers which frees up data scientists to work on their data sets and improving their models. Speaking of models, on our last pass at implementing some TensorFlow models we used raw TensorFlow I think right after 1.0 had released. The documentation was pretty shabby, and even simple things weren’t super straightforward. When I went to install and set up a new box this time with TensorFlow 1.4, I went ahead and used Keras as well. Keras is an abstraction API over top of computational graph software (either TensorFlow default, or Theano). Installation is easy, with a couple of minor notes.

Note #1: You MUST install the specific versions listed. I cannot stress this enough. In particular the cuDNN and CUDA Toolkit are updated frequently and if you blindly click through their download links you will get a newer version which is not compatible with the current versions of TensorFlow and Keras. The software is all moving very rapidly, so its important to use the compatible versions.

Note #2: Some examples may require the MKL dependency for Numpy. This is not installed by default. See: which will send you here for the necessary WHL file:

Note #3: You will need to run the TensorFlow install as sudo/administrator or get permission errors.

Once these are installed there is a full directory of Keras examples here:

This includes basic examples of most of the basic DNN types supported by Keras as well as some datasets for use such as MNIST for CNNs. When it comes to just figuring out “does everything I just installed run?” these will work just fine.


Phil 1.4.17

7:00 – 3:00 ASRC MKT

  • Confidence modulates exploration and exploitation in value-based learning
    • Uncertainty is ubiquitous in cognitive processing, which is why agents require a precise handle on how to deal with the noise inherent in their mental operations. Previous research suggests that people possess a remarkable ability to track and report uncertainty, often in the form of confidence judgments. Here, we argue that humans use uncertainty inherent in their representations of value beliefs to arbitrate between exploration and exploitation. Such uncertainty is reflected in explicit confidence judgments. Using a novel variant of a multi-armed bandit paradigm, we studied how beliefs were formed and how uncertainty in the encoding of these value beliefs (belief confidence) evolved over time. We found that people used uncertainty to arbitrate between exploration and exploitation, reflected in a higher tendency towards exploration when their confidence in their value representations was low. We furthermore found that value uncertainty can be linked to frameworks of metacognition in decision making in two ways. First, belief confidence drives decision confidence — that is people’s evaluation of their own choices. Second, individuals with higher metacognitive insight into their choices were also better at tracing the uncertainty in their environment. Together, these findings argue that such uncertainty representations play a key role in the context of cognitive control.

  • Artificial Intelligence, AI in 2018 and beyond
    • Eugenio Culurciello
    • These are my opinions on where deep neural network and machine learning is headed in the larger field of artificial intelligence, and how we can get more and more sophisticated machines that can help us in our daily routines. Please note that these are not predictions of forecasts, but more a detailed analysis of the trajectory of the fields, the trends and the technical needs we have to achieve useful artificial intelligence. Not all machine learning is targeting artificial intelligences, and there are low-hanging fruits, which we will examine here also.
  • Synthetic Experiences: How Popular Culture Matters for Images of International Relations
    • Many researchers assert that popular culture warrants greater attention from international relations scholars. Yet work regarding the effects of popular culture on international relations has so far had a marginal impact. We believe that this gap leads mainstream scholars both to exaggerate the influence of canonical academic sources and to ignore the potentially great influence of popular culture on mass and elite audiences. Drawing on work from other disciplines, including cognitive science and psychology, we propose a theory of how fictional narratives can influence real actors’ behavior. As people read, watch, or otherwise consume fictional narratives, they process those stories as if they were actually witnessing the phenomena those narratives describe, even if those events may be unlikely or impossible. These “synthetic experiences” can change beliefs, reinforce preexisting views, or even displace knowledge gained from other sources for elites as well as mass audiences. Because ideas condition how agents act, we argue that international relations theorists should take seriously how popular culture propagates and shapes ideas about world politics. We demonstrate the plausibility of our theory by examining the influence of the US novelist Tom Clancy on issues such as US relations with the Soviet Union and 9/11.
  • Continuing with paper tweaking. Added T’s comments, and finished Methods.