Phil 11.3.17

7:00 – ASRC MKT

  • Good comments from Cindy on yesterday’s work
  • Facebook’s 2016 Election Team Gave Advertisers A Blueprint To A Divided US
  • Some flocking activity? AntifaNov4
  • I realized that I had not added the herding variables to the Excel output. Fixed.
  • DINH Q. LÊ: South China Sea Pishkun
    • In his new work, South China Sea Pishkun, Dinh Q. Lê references the horrifying events that occurred on April 30th 1975 (the day Saigon fell) as hundreds of thousands of people tried to flee Saigon from the encroaching North Vietnamese Army and Viet Cong. The mass exodus was a “Pishkun” a term used to describe the way in which the Blackfoot American Indians would drive roaming buffalo off cliffs in what is known as a buffalo jump.
  • Back to writing – got some done, mostly editing.
  • Stochastic gradient descent with momentum
  • Referred to in this: There’s No Fire Alarm for Artificial General Intelligence
    •  AlphaGo did look like a product of relatively general insights and techniques being turned on the special case of Go, in a way that Deep Blue wasn’t. I also updated significantly on “The general learning capabilities of the human cortical algorithm are less impressive, less difficult to capture with a ton of gradient descent and a zillion GPUs, than I thought,” because if there were anywhere we expected an impressive hard-to-match highly-natural-selected but-still-general cortical algorithm to come into play, it would be in humans playing Go.
  • In another article: The AI Alignment Problem: Why It’s Hard, and Where to Start
    • This is where we are on most of the AI alignment problems, like if I ask you, “How do you build a friendly AI?” What stops you is not that you don’t have enough computing power. What stops you is that even if I handed you a hypercomputer, you still couldn’t write the Python program that if we just gave it enough memory would be a nice AI.
    • I think this is where models of flocking and “healthy group behaviors” matters. Explore in small numbers is healthy – it defines the bounds of the problem space. Flocking is a good way to balance bounded trust and balanced awareness. Runaway echo chambers are very bad. These patterns are recognizable, regardless of whether they come from human, machine, or bison.
  • Added contacts and invites. I think the DB is ready: polarizationgameone
  • While out riding, I realized what I can do to show results in the herding paper. There are at least three ways to herd:
    1. No herding
    2. Take the average of the herd
    3. Weight a random agent
    4. Weight random agents (randomly select an agent and leave it that way for a few cycles, then switch
  • Look at the times it takes for these to converge and see which one is best. Also look at the DTW to see if they would be different populations.
  • Then re-do the above for the two populations inverted case (max polarization)
  • Started to put in the code changes for the above. There is now a combobox for herding with the above options.

Phil 11.1.17

Phil 7:00 – ASRC MKT

    • The identity of the machine is just as important as the identity of the human, argues Jeff Hudson.
    • Agent-based simulation for economics: The Tool Central Bankers Need Most Now
    • Introducing Vega-Lite 2.0 (from MIT Interactive Data Lab)
      • Vega-Lite enables concise descriptions of visualizations as a set of encodings that map data fields to the properties of graphical marks. Vega-Lite uses a portable JSON format that compiles to full specifications in the larger Vega language. Vega-Lite includes support for data transformations such as aggregation, binning, filtering, and sorting, as well as visual transformations such as stacking and faceting into small multiples.
    • Wayne says ‘awareness’ is too overloaded, at least in CSCW where it means ‘a shared awareness’. What about alertness, cognition, or perception?
    • Started Simulating Flocking and Herding in Belief Space. Shared with Wayne, Aaron and Cindy
    • Yay, finally got the array problems solved. The problem is that a PHP array is actually a set. But you can convert any set into a zero-indexed array using array_values(). So now all my arrays begin at zero, as God intended.
    • Meeting with the lads. Some really good stuff.
      • Add tmanage
        • dungeon_master
        • game
        • scenario
        • min_players
        • max_players
        • time_to_live
        • state (waiting, running, timeout, terminated, success)
        • open (true/false)
        • visible
      • Add trating
        • target_message
        • relevance
        • quality
        • vote
        • rating_player
      • Add ttopics
        • title
        • description
        • parent
      • Add tplayerstate
        • player
        • game
        • state (waiting, playing, finished, terminated)
      • Add tcontact
        • player
        • name
        • email
        • facebook (oAuth)
        • google (oAuth)
      • Add tinvite
        • contact
        • game
        • player


  • Humans + Machines (CNAS livestream)
    12:30 – 1:35 PM
    Dr. Jeff Clune, Assistant Professor of Computer Science, University of Wyoming
    Kimberly Jackson Ryan, Senior Human Systems Engineer, Draper Laboratory
    Dr. John Hawley, Engineering Psychologist, Army Research Laboratory
    Dr. Caitlin Surakitbanharn, Research Scientist, Purdue University
    Dan Lamothe, National Security Writer, The Washington Post (moderator)

Phil 8.16.17

7:00 – 8:00 Research

  • Added takeaway thoughts to my C&C writeup.
  • Working out how to add capability to the sim for P&RCH paper. My thoughts from vacation:
    • The agents contribution is the heading and speed
    • The UI is what the agent’s can ‘see’
    • The IR is what is available to be seen
    • An additional part might be to add the ability to store data in the space. Then the behavior of the IR (e.g. empty areas) would b more apparent, as would the effects of UI (only certain data is visible, or maybe only nearby data is visible) Data could be a vector field in Hilbert space, and visualized as color.
  • Updated IntelliJ
  • Working out how to to have a voxel space for the agents to move through that can also be drawn. It’s any number of dimensions, but it has to project to 2D. In the case of the agents, I just choose the first two axis. Each agent has an array of statements that are assembled into a belief vector. The space can be an array of beliefs. Are these just constructed so that they fill a space according to a set of rules? Then the xDimensionName and yDimensionName axis would go from (0, 1), which would scale to stage size? IR would still be a matter of comparing the space to the agent’s vector. Hmm.
  • This looks really good from an information horizon perspective: The Role of the Information Environment in Partisan Voting
    • Voters are often highly dependent on partisanship to structure their preferences toward political candidates and policy proposals. What conditions enable partisan cues to “dominate” public opinion? Here I theorize that variation in voters’ reliance on partisanship results, in part, from the opportunities their environment provides to learn about politics. A conjoint experiment and an observational study of voting in congressional elections both support the expectation that more detailed information environments reduce the role of partisanship in candidate choice

9:00 – 5:00 BRI

  • Good lord, the BoA corporate card comes with SIX seperate documents to read.
  • Onward to Chapter Three and Spring database interaction
  • Well that’s pretty clean. I do like the JdbcTemplate behaviors. Not sure I like the way you specify the values passed to the query, but I can’t think of anything better if you have more than one argument:
    public class EmployeeDaoImpl implements EmployeeDao {
        private DataSource dataSource;
        private JdbcTemplate jdbcTemplate;
        private RowMapper<Employee> employeeRowMapper = new RowMapper<Employee>() {
            public Employee mapRow(ResultSet rs, int i) throws SQLException {
                Employee employee = new EmployeeImpl();
                employee.setEmployeeName(rs.getString("FirstName") + " " + rs.getString("LastName"));
                return employee;
        public Employee getEmployeeById(int id) {
            Employee employee = null;
            employee = jdbcTemplate.queryForObject(
                    "select * from Employee where id = ?",
                    new Object[]{id},
            return employee;
        public List<Employee> getAllEmployees() {
            List<Employee> eList = jdbcTemplate.query(
                    "select * from Employee",
            return eList;
  • Here’s the xml to wire the thing up:
    <context:component-scan base-package="org.springframework.chapter3.dao"/>
    <bean id="employeeDao" class="org.springframework.chapter3.dao.EmployeeDaoImpl"/>
    <bean id="dataSource"
        <property name="driverClassName" value="${jdbc.driverClassName}" />
        <property name="url" value="${jdbc.url}" />
        <property name="username" value="xxx"/>
        <property name="password" value="yyy"/>
    <bean id="jdbcTemplate" class="org.springframework.jdbc.core.JdbcTemplate">
        <property name="dataSource" ref="dataSource" />
    <context:property-placeholder location="" />
  • And here’s the properties. Note that I had to disable SSL:

Phil 4.5.16

7:00 – 4:30 VTX

  • Had a good discussion with Patrick yesterday. He’s approaching his wheelchair work from a Heideggerian framework, where the controls may be present-at-hand or ready-to-hand. I think those might be frameworks that apply to non-social systems (Hammers, Excel, Search), while social systems more align with being-with. The evaluation of trustworthiness is different. True in a non-social sense is a property of exactness; a straightedge may be true or out-of-true. In a social sense, true is associated with a statement that is in accordance with reality.
  • While reading Search Engine Agendas  in Communications of the ACM, I came upon a mention of Frank Pasquale, who wrote an article on the regulation of Search, given its impact (Federal Search Commission? Access, Fairness, and Accountability in the Law of Search). The point of Search Engine Agendas is that the ranking of political candidates affects people’s perception of them (higher is better) This ties into my thoughts from March 29th. That there are situations where the idea of ordering among pertinent documents may be problematic and further that how users might interact with the ordering process might be instructive.
  • Continuing Technology, Humanness, and Trust: Rethinking Trust in Technology.
  • ————————
  • Added the sites Andy and Margarita found to the blacklist and updated the repo
  • Theresa has some sites too – in process.
  • Finished my refactoring party – more debugging than I was expecting
  • Converted the Excela spreadsheet to JSON and read the whole thing in. Need to do that just for a subsample now.
  • Added a request from Andy about creating a JSON object for the comments in the flag dismissal field.
  • Worked with Gregg about setting up the postgres db.

Phil 2.8.16

7:00 – 5:00 VTX

  • My 401k still isn’t being done right. Sheesh.
  • More Publius: A robust, tamper-evident, censorship-resistant web publishing system
    • Very good introduction, then it dives into the weeds of how the system was implemented and and the cryptologic challenges. Good stuff, and should be addressed. It does imply that the information stored in my system could be encrypted and sharded as an additional layer of protection agains malicious editing. Since in this case, text can have annotations pointing to it but the source should be archival.
    • I think I also need to set up a new doc db of news items that I can use to make the story more readable.
      • Stories of people fooled by misinformation
      • Stories of people damaged by lack of anonymity
      • Stories about citizen journalism
      • Stories about computational journalism
      • Something about CSCW, Wikipedia maybe?
    • Anderson’s Eternity Service?
  • Need to make the ProviderObject persistent. Done
  • Need a rating object – date , who, the rating, anything else? Done-ish
  • Need to make a quick & dirty swing app for people to use – started. Once that’s working, then build the rating object that it will create
  • Need to connect to a remote DB
    • Will also need summary statistics and charts to see how queries do.
    • Will also need to store the good (“match” and “flaggable”) pages for later training.
  • Should make the app stand-alone-ish Jsmooth?
  • Discussion with Mike G., Heath, Bob H., and Theresa on how to integrate current NLP/NER

Phil 1.12.16

7:00 – 4:00 VTX

  • So I ask myself, is there some kind of public repository of crawled data? Why, of course there is! Common Crawl. So there is a way of getting the deep link structure for a given site without crawling it. That could give me the ability to determine how ‘bubbly’ a site is. I’m thinking there may be a ratio of bidirectional to unidirectional links (per site?) that could help here.
  • More lit review and integration.
  • Making diagrams for the Sprint review today
    • Overview
      • The purpose of this effort is to provide a capability for the system to do more sophisticated queries that do several things
        • Allow the user to emphasize/de-emphasize words or phrases that relate to the particular search and to do this interactively based on linguistic analysis of the returned text.
        • Get user value judgments on the information provided based on the link results reordering
        • Use this to feed back to the selection criteria for provider Flags.
      • This work leans on the paper PageRank without Hyperlinks if you want more background/depth.
    • Eiphcone 129 – Design database table schema.
      • Took my existing MySql db schema and migrated it to Java Persistent Entities. Basically this meant taking a db that was designed for precompiled query access and retrieval (direct data access for adding data, views for retrieval) and restructuring it. So we go from: beforeTables
      • to
      • afterTables
      • The classes are annotated POJOs in a simple hierarchy. The classes that have ‘Base’ in their names I expect to be extended, though there may be enough capability here. GuidBase has some additional capability to make adding data to one class that has a data relation to another class gets filled out properly in both: JavaClassHierarchySince multiple dictionary entries can be present in multiple corpora BaseDictionaryEntry and Corpus both have a <Set> of BaseEntryContext that connects the corpora and entries with additional information that might be useful, such as counts.
      • This manifests itself in the database as the following: ER DiagramIt’s not the prettiest drawing, but I can’t get IntelliJ to draw any better. You can see that the tables match directly to the classes. I used the InheritanceType.JOINED strategy since Jeremy was concerned about wasted space in the tables.
      • The next steps will be to start to create test cases that allow for tuning and testing of this setup at different data scales.
    • Eiphcone 132 – Document current progress on relationship/taxonomy design & existing threat model
      • Currently, a threat is extracted by comparing a set of known entities to surrounding text for keywords. In the model shown above, practitioners would exist in a network that includes items like the practice, attending hospitals, legal representation, etc. Because of this relationship, flags could be extended to the other members of the network. If a near neighbor in this network has a Flag attached, it will weight the surrounding edges and influence the practitioner. So if one doctor in a practice is convicted of malpractice, then other doctors in the practice will get lower scores.
      • The dictionary and corpus can interact as their own network to determine the amount of wight that is given to a particular score. For example, words in a dictionary that are used to extract data from a legal corpus may have more weight than a social media corpus.
    • Eiphcone 134 – Design/document NER processing in relation to future taxonomy
      • I compiled and ran the NER codebase and also walked though the Stanford NLP documentation. The current NER system looks to be somewhat basic, but solid and usable. Using it to populate the dictionaries and annotating the corpus appears to be straightforward addition of the capabilities already present in the Stanford API.
    • Demo – I don’t really have a demo, unless people want to see some tests compile and run. To save the time, I have this exiting printout that shows the return of dynamically created data:
[EL Info]: 2016-01-12 14:09:40.481--ServerSession(1842102517)--EclipseLink, version: Eclipse Persistence Services - 2.6.1.v20150916-55dc7c3
[EL Info]: connection: 2016-01-12 14:09:40.825--ServerSession(1842102517)--/file:/C:/Development/Sandboxes/JPA_2_1/out/production/JPA_2_1/_NetworkService login successful

firstName(firstname_0), lastName(lastname_0), login(login_0), networks( network_0)
firstName(firstname_1), lastName(lastname_1), login(login_1), networks( network_4)
firstName(firstname_2), lastName(lastname_2), login(login_2), networks( network_3)
firstName(firstname_3), lastName(lastname_3), login(login_3), networks( network_1 network_2)
firstName(firstname_4), lastName(lastname_4), login(login_4), networks()

name(network_0), owner(login_0), type(WAMPETER), archived(false), public(false), editable(true)
	[92]: name(DataNode_6_to_BaseNode_8), guid(network_0_DataNode_6_to_BaseNode_8), weight(0.5708945393562317), type(IDENTITY), network(network_0)
		Source: [86]: name('DataNode_6'), type(ENTITIES), annotation('annotation_6'), guid('50836752-221a-4095-b059-2055230d59db'), double(18.84955592153876), int(6), text('text_6')
		Target: [88]: name('BaseNode_8'), type(COMPUTED), annotation('annotation_8'), guid('77250282-3b5e-416e-a469-bbade10c5e88')
	[91]: name(BaseNode_5_to_UrlNode_4), guid(network_0_BaseNode_5_to_UrlNode_4), weight(0.3703539967536926), type(COMPUTED), network(network_0)
		Source: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')
		Target: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link(''), image('')
	[98]: name(BaseNode_5_to_UrlNode_1), guid(network_0_BaseNode_5_to_UrlNode_1), weight(0.4556456208229065), type(ENTITIES), network(network_0)
		Source: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')
		Target: [81]: name('UrlNode_1'), type(UNKNOWN), annotation('annotation_1'), guid('f9693110-6b5b-4888-9585-99b97062a4e4'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link(''), image('')

name(network_1), owner(login_3), type(WAMPETER), archived(false), public(false), editable(true)
	[96]: name(BaseNode_2_to_UrlNode_1), guid(network_1_BaseNode_2_to_UrlNode_1), weight(0.5733484625816345), type(URL), network(network_1)
		Source: [82]: name('BaseNode_2'), type(ITEM), annotation('annotation_2'), guid('c5867557-2ac3-4337-be34-da9da0c7e25d')
		Target: [81]: name('UrlNode_1'), type(UNKNOWN), annotation('annotation_1'), guid('f9693110-6b5b-4888-9585-99b97062a4e4'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link(''), image('')
	[95]: name(DataNode_0_to_UrlNode_7), guid(network_1_DataNode_0_to_UrlNode_7), weight(0.85154128074646), type(MERGE), network(network_1)
		Source: [80]: name('DataNode_0'), type(USER), annotation('annotation_0'), guid('e9b7fa0a-37f1-41bd-a2c1-599841d1507a'), double(0.0), int(0), text('text_0')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link(''), image('')
	[94]: name(DataNode_9_to_BaseNode_5), guid(network_1_DataNode_9_to_BaseNode_5), weight(0.72845458984375), type(KEYWORDS), network(network_1)
		Source: [89]: name('DataNode_9'), type(USER), annotation('annotation_9'), guid('5bdb67de-5319-42db-916e-c4050dc682dd'), double(28.274333882308138), int(9), text('text_9')
		Target: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')

name(network_2), owner(login_3), type(EXPLICIT), archived(false), public(false), editable(true)
	[90]: name(BaseNode_8_to_UrlNode_7), guid(network_2_BaseNode_8_to_UrlNode_7), weight(0.2619180679321289), type(WAMPETER), network(network_2)
		Source: [88]: name('BaseNode_8'), type(COMPUTED), annotation('annotation_8'), guid('77250282-3b5e-416e-a469-bbade10c5e88')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link(''), image('')

name(network_3), owner(login_2), type(EXPLICIT), archived(false), public(false), editable(true)
	[93]: name(UrlNode_4_to_DataNode_3), guid(network_3_UrlNode_4_to_DataNode_3), weight(0.7689594030380249), type(ITEM), network(network_3)
		Source: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link(''), image('')
		Target: [83]: name('DataNode_3'), type(UNKNOWN), annotation('annotation_3'), guid('e7565935-6429-451f-b7f4-cc2d612ca3fd'), double(9.42477796076938), int(3), text('text_3')
	[97]: name(DataNode_3_to_DataNode_0), guid(network_3_DataNode_3_to_DataNode_0), weight(0.5808262825012207), type(URL), network(network_3)
		Source: [83]: name('DataNode_3'), type(UNKNOWN), annotation('annotation_3'), guid('e7565935-6429-451f-b7f4-cc2d612ca3fd'), double(9.42477796076938), int(3), text('text_3')
		Target: [80]: name('DataNode_0'), type(USER), annotation('annotation_0'), guid('e9b7fa0a-37f1-41bd-a2c1-599841d1507a'), double(0.0), int(0), text('text_0')

name(network_4), owner(login_1), type(ITEM), archived(false), public(false), editable(true)
	[99]: name(UrlNode_4_to_UrlNode_7), guid(network_4_UrlNode_4_to_UrlNode_7), weight(0.48601675033569336), type(WAMPETER), network(network_4)
		Source: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link(''), image('')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link(''), image('')

[30]: name(dictionary_0), guid(943ea8b6-6def-48ea-8b0f-a4e52e53954f), Owner(login_0), archived(false), public(false), editable(true)
	Entry = word_11
	Parent = word_10
	word_11 has 790 occurances in corpora0_chapter_1

	Entry = word_14
	word_14 has 4459 occurances in corpora1_chapter_2

	Entry = word_1
	Parent = word_0
	word_1 has 3490 occurances in corpora1_chapter_2

	Entry = word_10
	word_10 has 3009 occurances in corpora3_chapter_4

	Entry = word_4
	word_4 has 2681 occurances in corpora3_chapter_4

	Entry = word_5
	Parent = word_4
	word_5 has 5877 occurances in corpora1_chapter_2

[31]: name(dictionary_1), guid(c7b62a4b-b21a-4ebe-a939-0a71a891a3f9), Owner(login_0), archived(false), public(false), editable(true)
	Entry = word_3
	Parent = word_2
	word_3 has 4220 occurances in corpora0_chapter_1

	Entry = word_6
	word_6 has 4852 occurances in corpora2_chapter_3

	Entry = word_17
	Parent = word_16
	word_17 has 8394 occurances in corpora2_chapter_3

	Entry = word_2
	word_2 has 1218 occurances in corpora3_chapter_4

	Entry = word_19
	Parent = word_18
	word_19 has 8921 occurances in corpora2_chapter_3

	Entry = word_8
	word_8 has 4399 occurances in corpora3_chapter_4

[27]: name(corpora1_chapter_2), guid(08803d93-deeb-4699-bdb2-ffa9f635c373), totalWords(1801), importer(login_1), url(
	word_15 has 5338 occurances in corpora1_chapter_2
	word_13 has 2181 occurances in corpora1_chapter_2
	word_14 has 4459 occurances in corpora1_chapter_2
	word_1 has 3490 occurances in corpora1_chapter_2
	word_5 has 5877 occurances in corpora1_chapter_2
	word_16 has 2625 occurances in corpora1_chapter_2

[EL Info]: connection: 2016-01-12 14:09:41.116--ServerSession(1842102517)--/file:/C:/Development/Sandboxes/JPA_2_1/out/production/JPA_2_1/_NetworkService logout successful
  • Sprint review delayed. Tomorrow
  • Filling in some knowledge holes in JPA. Finished Chapter 4.
  • Tried getting enumerated types to work. No luck…?

Phil 1.8.16

8:00 – 5:00

  • Today is Roy Batty’s Birthday
  • Had a thought this morning. Rather than just having anonymous people post what they think is newsworthy, have a Journalist chatbot (something as simple as Eliza could work) tease out more information. The pattern of response, possibly augmented by server pulls for additional information might get to some really interesting responses, and a lot more input from the user.
  • Ok, now that I’ve got the path information figured out, migrating to vanilla JPA.
  • Viewing the sql requiresa  library specific property, but everything else is vanilla. This gets the tables built:
    <persistence xmlns="" version="2.1">
        <persistence-unit name="NetworkService" transaction-type="RESOURCE_LOCAL">
                <property name="javax.persistence.jdbc.driver" value="com.mysql.jdbc.Driver"/>
                <property name="javax.persistence.jdbc.url" value="jdbc:mysql://localhost:3306/projpa"/>
                <property name="javax.persistence.jdbc.user" value="root"/>
                <property name="javax.persistence.jdbc.password" value="edge"/>
                <property name="javax.persistence.schema-generation.database.action" value="drop-and-create"/>
                <!-- enable this property to see SQL and other logging -->
                <property name="eclipselink.logging.level" value="FINE"/>
  • Here’s a simple JPA commit:
    public void addUsers(int num){
        for(int i = 0; i < num; ++i) {
            BaseUser bu = new BaseUser("firstname_" + i, "lastname_" + i, "login_" + i, "password_" + i);
  • Here’s a simple Criteria pull:
    public void getAllUsers(){
        CriteriaBuilder cb = em.getCriteriaBuilder();
        CriteriaQuery<BaseUser> cq = cb.createQuery(BaseUser.class);
        TypedQuery<BaseUser> tq = em.createQuery(cq);
        users = new ArrayList<>(tq.getResultList());
  • Here’s a more sophisticated query. This can be made much better easily, but that’s for next week.
    String Query = "SELECT bd FROM dictionaries bd WHERE bd.owner.login LIKE '%_4%'";
    TypedQuery<BaseDictionary> dictQuery = em.createQuery(Query, BaseDictionary.class);
    List<BaseDictionary> bds = dictQuery.getResultList();
    for(BaseDictionary bd : bds){

Phil 12.11.15

8:00 – 5:00 VTX

  • No AI course this morning, had to drop off the car.
  • Some preliminary discussions about sprint planning with Aaron yesterday. Aside from the getting the two ‘Derived’ database structures reconciled, I need to think about a few things:
    • who the network ‘users’ are. I think it could be VTX, or the system customers, like Aetna.
    • What kinds of networks exist?
      • Each individual doctor is a network of doctors, keywords, entities, sources, threats and ratings. That can certainly run on the browser
      • Then there is the larger network of ‘relevant’ doctors. That’s a larger network, certainly in the 10s – 100s range. On the lower end of the scale that could be done directly in the browser. For larger networks, we might have to use the GPU? Which seems very doable, via Steve Sanderson.
      • Then there is the master ranking, which should be something like most threatening to least threatening, probably. Queries with additional parameters pull a subset of the ordered data (SELECT foo, bar from ?? ORDER BY eigenvalue). Interestingly, according to this IEEE article from 2010, GPU processing  was handling 10 million nodes in about 30 seconds using optimized sparse matrix (SpMV) calculations. So it’s conceivable that calculations could be done in real time.
  • More documentation
  • More discussions wit Aaron about where data lives and how it’s structured.
  • Sprint planning

Phil 12.4.15

8:00 – VTX

  • Scrum
  • Found an interesting tidbit on the WaPo this morning. It implies that if there is a pattern of statement followed by a search for confirming information followed by a public citation of confirming information could be the basic unit of an information bubble. For this to be a bubble, I think the pertinent information extracted from the relevant search results would have to be somehow identifiable as a minority view. This could be done by comparing the Jaccard index of the adjusted results with the raw returns of a search? In other words, if the world (relevant search)  has an overall vector in one direction and the individual preferences produce a pertinent result that is pointing in the opposite direction (large dot product), then the likelihood of those results being the result of echo-chamber processes are higher?
  • If the Derived DB depends on analyst examination of the data, this could be a way of flagging analyst bias.
  • Researching WebScaleSQL, I stumbled on another db from Facebook. This one,  RocksDB, is more focused on speed. From the splash page:
    • RocksDB can be used by applications that need low latency database accesses. A user-facing application that stores the viewing history and state of users of a website can potentially store this content on RocksDB. A spam detection application that needs fast access to big data sets can use RocksDB. A graph-search query that needs to scan a data set in realtime can use RocksDB. RocksDB can be used to cache data from Hadoop, thereby allowing applications to query Hadoop data in realtime. A message-queue that supports a high number of inserts and deletes can use RocksDB.
  • Interestingly, RocksDB appears to have integration with MongoDB and is working on MySQL integration. Cassandra appears to be implementing similar optimizations.
  • Just discovered, which is a social medial sourced, reporter curated news stream. Could be a good source of data to compare against things like news feeds from Google or major news venues.
  • Control System Meeting
    • Send RCS and Search Competition to Bob
    • Seems like this whole system is a lot like what Databricks is doing?

Phil 12.2.15

7:00 –

  • Learning: Neural Nets, Back Propagation
    • Synaptic weights are higher for some synapses than others
    • Cumulative stimulus
    • All-or-none threshold for propagation.
    • Once we have a model, we can ask what we can do with it.
    • Now I’m curious about the MIT approach to calculus. It’s online too: MIT 18.01 Single Variable Calculus
    • Back-propagation algorithm. Starts from the end and works forward so that each new calculation depends only on its local information plus values that have already been calculated.
    • Overfitting and under/over damping issues are also considerations.
  • Scrum meeting
  • Remember to bring a keyboard tomorrow!!!!
  • Checking that my home dev code is the same as what I pulled down from the repository
    • No change in definitelytyped
    • No change in the other files either, so those were real bugs. Don’t know why they didn’t get caught. But that means the repo is good and the bugs are fixed.
  • Validate that PHP runs and debugs in the new dev env. Done
  • Add a new test that inputs large (thousands -> millions) of unique ENTITY entries with small-ish star networks of partially shared URL entries. Time view retrieval times for SELECT COUNT(*) from tn_view_network_items WHERE network_id = 8;
    • Computer: 2008 Dell Precision M6300
    • System: Processor Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz, 2201 Mhz, 2 Core(s), 2 Logical Processor(s), Available Physical Memory 611 MB
    • 100 is 0.09 sec
    • 1000 is 0.14 sec
    • 10,000 is 0.84 sec
    • Using Open Office’s linear regression function, I get the equation t = 0.00007657x + 0.733 with an R squared of 0.99948.
    • That means 1,000,000 view entries can be processed in 75 seconds or so as long as things don’t get IO bound
  • Got the PHP interpreter and debugger working. In this case, it was just refreshing in settings->languages->php

Phil 12.1.15

7:30 – 5:00

  • Learning: Identification Trees, Disorder
    • Trees of tests
    • Identification Tree (Not a decision tree!)
    • Measuring Disorder – lowest disorder is best test
      • Disorder(set of binaries) = -(positive/total*log2(positive/total)) – (neg/total*log2(neg/total))
        • is the base log related to the base of the set?
        • Add up the disorder of each result in the test to determine the disorder of the test normalized by the number of samples. Lowest disorder is winner
  • Bringing in my machine learning, pattern recognition and stats books.
  • Bringing in my big laptop
  • Setting up dev environment.
    • Using the new IDEA 15.x, which seems to be OK for the typescript, will check PHP tomorrow.
    • Installed grunt (grunt-global, then grunt-local from the makefiles)
    • installed typescript (npm i -g typescript)
    • Installed gnuWin32 , which has makefile and touch support, along with all the important DLLs. It turns out that there is also gnuWin64. Will use that next time
    • Fixed bugs that didn’t get caught before. Older compiler?
      • commented out the waa.d.ts from the three.d.ts definitelytyped file
      • deleted the { antialias: boolean; alpha: boolean; } args from the CanvasRenderer call in classes/WebGlCanvasClasses
      • added title?:string and assoc_name?:string to IPostObject in RssController
      • had to add the experiments/wglcharts2 folder to the xampp Apache htdocs
      • added word?:string to IPostObj in RssAppDirectives
      • added word_type_name?:string to IPostObj in RssAppDirectives
      • fixed the font calls in WebGl3dCharts IComponentConfig.
    • Since these issues really shouldn’t have happened, I’m going to verify that they are not in my home dev environment before checking in.
  • And the new computer arrived, so I get to do some of the install tomorrow.

Phil 11.26.15

7:00 – Leave

  • Constraints: Visual Object Recognition
    • to see if to signals match, a maximising function that integrates the area under the signal with respect to offsets (translation and rotation) is very good, even with noise.
  • Dictionary
    • Add ‘Help Choose Doctor’, ‘Help Choose Investments’, ‘Help Choose Healthcare Plan’, ‘Navigate News’ and ‘Help Find CHI Paper’ dictionaries. At this point they can be empty. We’ll talk about them in the paper.
    • Added ‘archive’ to dictionary, because we’ll need temporary dicts associated with users like networks.
    • Deploy new system. Done!
      • Reloaded the DB
      • Copied over the server code
      • Ran the simpleTests() for AlchemyDictText. That adds network[5] with tests against the words that are in my manual resume dictionary. Then network[2] is added with no dictionary.
      • Commented out simpleTests for AlchemyDictText
      • copied over all the new client code
      • Ran the client and verified that all the networks and dictionaries were there as they were supposed to be.
      • Loaded network[2] ‘Using extracted dict’
      • Selected the empty dictionary[2] ‘Phil’s extracted resume dict’
      • Ran Extract from Network, which is faster on Dreamhost! That populated the dictionary.
      • Deleted the entry for ‘3’
      • Ran Attach to Network. Also fast 🙂
  • And now time for ThanksGiving. On a really good note!


Phil 11.25.15

7:00 – 1:00 Leave

  • Constraints: Search, Domain Reduction
    • Order from most constrained to least.
    • For a constrained problem, check over and under allocations to see where the gap between fast failure and fast completion lie.
    • Only recurse through neighbors where domain (choices) have been reduced to 1.
  • Dictionary
    • Add an optional ‘source_text’ field to the tn_dictionaries table so that user added words can be compared to the text. Done. There is the issue that the dictionary could be used against a different corpus, at which point this would be little more than a creation artifact
    • Add a ‘source_count’ to the tn_dictionary_entries table that is shown in the directive. Defaults to zero? Done. Same issue as above, when compared to a new corpus, do we recompute the counts?
    • Wire up Attach Dictionary to Network
      • Working on AlchemyDictReflect that will place keywords in the tn_items table and connect them in the tn_associations table.
      • Had to add a few helper methods in networkDbIo.php to handle the modifying of the network tables, since alchemyNLPbase doesn’t extend baseBdIo. Not the cleanest thing I’ve ever done, but not *horrible*.
      • Done and working! Need to deploy.

Phil 11.24.15

7:00 – Leave

  • Constraints: Interpreting Line Drawings
    • Successful research:
      • Finds a problem
      • Finds a method that solves the problem
      • Using some principal (That can be generalized)
  • Gave Aaron M. A subversion account and sent him a description of the structure of the project
  • Back to dictionary creation
    • Wire up Extract into Dictionary
      • I think I’m going to do most of this on the server. If I do a select text from tn_view_network_items where network = X, then I can run that text that is already in the DB through the term extractor, which should be the fastest thing I can do.
      • The next fastest thing would be to pull the text from the url (if it exists) and add that to the text pull.
      • Added a getTextFromNetwork() method to NetworkDbObject.
      • The html was getting extracted badly, so I had to add a call to alchemy to return the cleaned text. TODO: in the future add a ‘clean_text’ column to tn_items so this is done on ingestion. I also added
      • Added all the pieces to the rssPull.php file and tested. And integrated with the client. Looks like it takes about 8 seconds to go through my resume, so some offline processing will probably be needed for ACM papers, for example.
    • Wire up Attach Dictionary to Network
      • The current setup is set so that a new item that is read in will associate with the current network dictionary. Need to add a way to have the items that are already in the network to check themselves against the new dictionary.
      • Added class AlchemyDictReflect that will place keywords in the DB. Still need to debug. And don’t forget that the controller will have to reload the network after all thechanges are made.


Phil 10.30.15

8:00 – 4:00 SR

  • Working from home today, waiting for people to show up.
  • Here’s the fix for the Reqonciler issue:
    • Open Reqonciler in your browser.
    • click Post-Processing button to see all queries
    • double click the one that you disabled this morning to edit, Order 2100, update month 1 year 2 to 100% from month 12 year 1
    • add ” AND NOT ISNULL(bc.uid)” at the end of the query without the double quotes. Make sure there is a space before.
    • Save, run, and check the data
  • In the process of getting my home dev environment working again. I swear I should just do this once a week so it’s less stressful.
    • Fixed the Imagick load so that there is a test for the extension and whether the extension is installed correctly.
    • Disabled the world wide web service so that apache could run on port 80
    • Updates all the files in the Apache htdocs directory. Forgot that I had updated the server access methods to take an object.
    • It occurs to me that I can load up the DB directly on the server if I don’t get everything done with the dictionary by Wednesday.
  • Examine AlchemyNLP and see if there is a hierarchy that can be used. Not without a lot of work.
  • Buy and download the fivefilters term extractor and see how to integrate.
    • Ordered. Waiting for confirmation to show up.
    • Installed. Time to see if it’ll work. It looks good, though possibly slow? starting to put together a dictionary class to examine more deeply.
  • Add dictionary Flyout directive
    • Name the dictionary
    • Choose the networks (add/remove from list) ()
    • Input html, text or url
    • Get the clean text and show the machine extracted terms. We could look up potential definitions too – from wordnik. Set up an account and applied for a developer key.
    • Show a list of selected terms with checkboxes
      • Checked items can be deleted or grouped
      • Items can be added by typing into a field
    • Show a list of ‘group items’.  This displays a list of the items who’s index appears in the ‘parent’ field
      • Selecting an item in this list reorders the item list to show the appropriate group first
  • There should also be a select dictionary option on the network flyout