Phil 4.20.18

7:00 – ASRC MKT

  • Executing gradient descent on the earth
    • But the important question is: how well does gradient descent perform on the actual earth?
    • This is nice, because it suggests that we can compare GD algorithms on recognizable and visualizable terrains. Terrain locations can have multiple visualizable factors, height and luminance could be additional dimensions
  • Minds is the anti-facebook that pays you for your time
    • In a refreshing change from Facebook, Twitter, Instagram, and the rest of the major platforms, Minds has also retained a strictly reverse-chronological timeline. The core of the Minds experience, though, is that users receive “tokens” when others interact with their posts, or simply by spending time on the platform.
  • Continuing along with the Angular/PHP tutorial here. Nicely, there is also a Git repo
    • Had to add some styling to get the upload button to show
    • The HttpModule is deprecated, but sticking with it for now
    • Will need to connect/verify PHP server within IntelliJ, described here.
    • How to connect Apache, to IntelliJ
  • Installing and Configuring XAMPP with PhpStorm IDE. Don’t forget about deployment path: deploy
Advertisements

Phil 4.12.18

7:00 – 5:00 ASRC MKT/BD

  • Downloaded my FB DB today. Honestly, the only thing that seems excessive is the contact information
  • Interactive Semantic Alignment Model: Social Influence and Local Transmission Bottleneck
    • Dariusz Kalociński
    • Marcin Mostowski
    • Nina Gierasimczuk
    • We provide a computational model of semantic alignment among communicating agents constrained by social and cognitive pressures. We use our model to analyze the effects of social stratification and a local transmission bottleneck on the coordination of meaning in isolated dyads. The analysis suggests that the traditional approach to learning—understood as inferring prescribed meaning from observations—can be viewed as a special case of semantic alignment, manifesting itself in the behaviour of socially imbalanced dyads put under mild pressure of a local transmission bottleneck. Other parametrizations of the model yield different long-term effects, including lack of convergence or convergence on simple meanings only.
  • Starting to get back to the JuryRoom app. I need a better way to get the data parts up and running. This tutorial seems to have a minimal piece that works with PHP. That may be for the best since this looks like a solo effort for the foreseeable future
  • Proposal
    • Cut implementation down to proof-of-concept?
    • We are keeping the ASRC format
    • Got Dr. Lee’s contribution
    • And a lot of writing and figuring out of things

Phil 11.3.17

7:00 – ASRC MKT

  • Good comments from Cindy on yesterday’s work
  • Facebook’s 2016 Election Team Gave Advertisers A Blueprint To A Divided US
  • Some flocking activity? AntifaNov4
  • I realized that I had not added the herding variables to the Excel output. Fixed.
  • DINH Q. LÊ: South China Sea Pishkun
    • In his new work, South China Sea Pishkun, Dinh Q. Lê references the horrifying events that occurred on April 30th 1975 (the day Saigon fell) as hundreds of thousands of people tried to flee Saigon from the encroaching North Vietnamese Army and Viet Cong. The mass exodus was a “Pishkun” a term used to describe the way in which the Blackfoot American Indians would drive roaming buffalo off cliffs in what is known as a buffalo jump.
  • Back to writing – got some done, mostly editing.
  • Stochastic gradient descent with momentum
  • Referred to in this: There’s No Fire Alarm for Artificial General Intelligence
    •  AlphaGo did look like a product of relatively general insights and techniques being turned on the special case of Go, in a way that Deep Blue wasn’t. I also updated significantly on “The general learning capabilities of the human cortical algorithm are less impressive, less difficult to capture with a ton of gradient descent and a zillion GPUs, than I thought,” because if there were anywhere we expected an impressive hard-to-match highly-natural-selected but-still-general cortical algorithm to come into play, it would be in humans playing Go.
  • In another article: The AI Alignment Problem: Why It’s Hard, and Where to Start
    • This is where we are on most of the AI alignment problems, like if I ask you, “How do you build a friendly AI?” What stops you is not that you don’t have enough computing power. What stops you is that even if I handed you a hypercomputer, you still couldn’t write the Python program that if we just gave it enough memory would be a nice AI.
    • I think this is where models of flocking and “healthy group behaviors” matters. Explore in small numbers is healthy – it defines the bounds of the problem space. Flocking is a good way to balance bounded trust and balanced awareness. Runaway echo chambers are very bad. These patterns are recognizable, regardless of whether they come from human, machine, or bison.
  • Added contacts and invites. I think the DB is ready: polarizationgameone
  • While out riding, I realized what I can do to show results in the herding paper. There are at least three ways to herd:
    1. No herding
    2. Take the average of the herd
    3. Weight a random agent
    4. Weight random agents (randomly select an agent and leave it that way for a few cycles, then switch
  • Look at the times it takes for these to converge and see which one is best. Also look at the DTW to see if they would be different populations.
  • Then re-do the above for the two populations inverted case (max polarization)
  • Started to put in the code changes for the above. There is now a combobox for herding with the above options.

Phil 11.1.17

Phil 7:00 – ASRC MKT

    • The identity of the machine is just as important as the identity of the human, argues Jeff Hudson.
    • Agent-based simulation for economics: The Tool Central Bankers Need Most Now
    • Introducing Vega-Lite 2.0 (from MIT Interactive Data Lab)
      • Vega-Lite enables concise descriptions of visualizations as a set of encodings that map data fields to the properties of graphical marks. Vega-Lite uses a portable JSON format that compiles to full specifications in the larger Vega language. Vega-Lite includes support for data transformations such as aggregation, binning, filtering, and sorting, as well as visual transformations such as stacking and faceting into small multiples.
    • Wayne says ‘awareness’ is too overloaded, at least in CSCW where it means ‘a shared awareness’. What about alertness, cognition, or perception?
    • Started Simulating Flocking and Herding in Belief Space. Shared with Wayne, Aaron and Cindy
    • Yay, finally got the array problems solved. The problem is that a PHP array is actually a set. But you can convert any set into a zero-indexed array using array_values(). So now all my arrays begin at zero, as God intended.
    • Meeting with the lads. Some really good stuff.
      • Add tmanage
        • dungeon_master
        • game
        • scenario
        • min_players
        • max_players
        • time_to_live
        • state (waiting, running, timeout, terminated, success)
        • open (true/false)
        • visible
      • Add trating
        • target_message
        • relevance
        • quality
        • vote
        • rating_player
      • Add ttopics
        • title
        • description
        • parent
      • Add tplayerstate
        • player
        • game
        • state (waiting, playing, finished, terminated)
      • Add tcontact
        • player
        • name
        • email
        • facebook (oAuth)
        • google (oAuth)
      • Add tinvite
        • contact
        • game
        • player

 

  • Humans + Machines (CNAS livestream)
    12:30 – 1:35 PM
    Dr. Jeff Clune, Assistant Professor of Computer Science, University of Wyoming
    Kimberly Jackson Ryan, Senior Human Systems Engineer, Draper Laboratory
    Dr. John Hawley, Engineering Psychologist, Army Research Laboratory
    Dr. Caitlin Surakitbanharn, Research Scientist, Purdue University
    Dan Lamothe, National Security Writer, The Washington Post (moderator)

Phil 8.16.17

7:00 – 8:00 Research

  • Added takeaway thoughts to my C&C writeup.
  • Working out how to add capability to the sim for P&RCH paper. My thoughts from vacation:
    • The agents contribution is the heading and speed
    • The UI is what the agent’s can ‘see’
    • The IR is what is available to be seen
    • An additional part might be to add the ability to store data in the space. Then the behavior of the IR (e.g. empty areas) would b more apparent, as would the effects of UI (only certain data is visible, or maybe only nearby data is visible) Data could be a vector field in Hilbert space, and visualized as color.
  • Updated IntelliJ
  • Working out how to to have a voxel space for the agents to move through that can also be drawn. It’s any number of dimensions, but it has to project to 2D. In the case of the agents, I just choose the first two axis. Each agent has an array of statements that are assembled into a belief vector. The space can be an array of beliefs. Are these just constructed so that they fill a space according to a set of rules? Then the xDimensionName and yDimensionName axis would go from (0, 1), which would scale to stage size? IR would still be a matter of comparing the space to the agent’s vector. Hmm.
  • This looks really good from an information horizon perspective: The Role of the Information Environment in Partisan Voting
    • Voters are often highly dependent on partisanship to structure their preferences toward political candidates and policy proposals. What conditions enable partisan cues to “dominate” public opinion? Here I theorize that variation in voters’ reliance on partisanship results, in part, from the opportunities their environment provides to learn about politics. A conjoint experiment and an observational study of voting in congressional elections both support the expectation that more detailed information environments reduce the role of partisanship in candidate choice

9:00 – 5:00 BRI

  • Good lord, the BoA corporate card comes with SIX seperate documents to read.
  • Onward to Chapter Three and Spring database interaction
  • Well that’s pretty clean. I do like the JdbcTemplate behaviors. Not sure I like the way you specify the values passed to the query, but I can’t think of anything better if you have more than one argument:
    @Repository
    public class EmployeeDaoImpl implements EmployeeDao {
        @Autowired
        private DataSource dataSource;
    
        @Autowired
        private JdbcTemplate jdbcTemplate;
    
        private RowMapper<Employee> employeeRowMapper = new RowMapper<Employee>() {
            @Override
            public Employee mapRow(ResultSet rs, int i) throws SQLException {
                Employee employee = new EmployeeImpl();
                employee.setEmployeeAge(rs.getInt("Age"));
                employee.setEmployeeId(rs.getInt("ID"));
                employee.setEmployeeName(rs.getString("FirstName") + " " + rs.getString("LastName"));
                return employee;
            }
        };
    
        @Override
        public Employee getEmployeeById(int id) {
            Employee employee = null;
    
            employee = jdbcTemplate.queryForObject(
                    "select * from Employee where id = ?",
                    new Object[]{id},
                    employeeRowMapper
            );
            return employee;
        }
    
        public List<Employee> getAllEmployees() {
            List<Employee> eList = jdbcTemplate.query(
                    "select * from Employee",
                    employeeRowMapper
            );
            return eList;
        }
    }
  • Here’s the xml to wire the thing up:
    <context:component-scan base-package="org.springframework.chapter3.dao"/>
    <bean id="employeeDao" class="org.springframework.chapter3.dao.EmployeeDaoImpl"/>
    
    <bean id="dataSource"
          class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName" value="${jdbc.driverClassName}" />
        <property name="url" value="${jdbc.url}" />
        <property name="username" value="xxx"/>
        <property name="password" value="yyy"/>
    </bean>
    
    <bean id="jdbcTemplate" class="org.springframework.jdbc.core.JdbcTemplate">
        <property name="dataSource" ref="dataSource" />
    </bean>
    
    <context:property-placeholder location="jdbc.properties" />
  • And here’s the properties. Note that I had to disable SSL:
    jdbc.driverClassName=com.mysql.jdbc.Driver
    jdbc.url=jdbc:mysql://localhost:3306/sandbox?autoReconnect=true&useSSL=false

Phil 4.5.16

7:00 – 4:30 VTX

  • Had a good discussion with Patrick yesterday. He’s approaching his wheelchair work from a Heideggerian framework, where the controls may be present-at-hand or ready-to-hand. I think those might be frameworks that apply to non-social systems (Hammers, Excel, Search), while social systems more align with being-with. The evaluation of trustworthiness is different. True in a non-social sense is a property of exactness; a straightedge may be true or out-of-true. In a social sense, true is associated with a statement that is in accordance with reality.
  • While reading Search Engine Agendas  in Communications of the ACM, I came upon a mention of Frank Pasquale, who wrote an article on the regulation of Search, given its impact (Federal Search Commission? Access, Fairness, and Accountability in the Law of Search). The point of Search Engine Agendas is that the ranking of political candidates affects people’s perception of them (higher is better) This ties into my thoughts from March 29th. That there are situations where the idea of ordering among pertinent documents may be problematic and further that how users might interact with the ordering process might be instructive.
  • Continuing Technology, Humanness, and Trust: Rethinking Trust in Technology.
  • ————————
  • Added the sites Andy and Margarita found to the blacklist and updated the repo
  • Theresa has some sites too – in process.
  • Finished my refactoring party – more debugging than I was expecting
  • Converted the Excela spreadsheet to JSON and read the whole thing in. Need to do that just for a subsample now.
  • Added a request from Andy about creating a JSON object for the comments in the flag dismissal field.
  • Worked with Gregg about setting up the postgres db.

Phil 2.8.16

7:00 – 5:00 VTX

  • My 401k still isn’t being done right. Sheesh.
  • More Publius: A robust, tamper-evident, censorship-resistant web publishing system
    • Very good introduction, then it dives into the weeds of how the system was implemented and and the cryptologic challenges. Good stuff, and should be addressed. It does imply that the information stored in my system could be encrypted and sharded as an additional layer of protection agains malicious editing. Since in this case, text can have annotations pointing to it but the source should be archival.
    • I think I also need to set up a new doc db of news items that I can use to make the story more readable.
      • Stories of people fooled by misinformation
      • Stories of people damaged by lack of anonymity
      • Stories about citizen journalism
      • Stories about computational journalism
      • Something about CSCW, Wikipedia maybe?
    • Anderson’s Eternity Service?
  • Need to make the ProviderObject persistent. Done
  • Need a rating object – date , who, the rating, anything else? Done-ish
  • Need to make a quick & dirty swing app for people to use – started. Once that’s working, then build the rating object that it will create
  • Need to connect to a remote DB
    • Will also need summary statistics and charts to see how queries do.
    • Will also need to store the good (“match” and “flaggable”) pages for later training.
  • Should make the app stand-alone-ish Jsmooth?
  • Discussion with Mike G., Heath, Bob H., and Theresa on how to integrate current NLP/NER

Phil 1.12.16

7:00 – 4:00 VTX

  • So I ask myself, is there some kind of public repository of crawled data? Why, of course there is! Common Crawl. So there is a way of getting the deep link structure for a given site without crawling it. That could give me the ability to determine how ‘bubbly’ a site is. I’m thinking there may be a ratio of bidirectional to unidirectional links (per site?) that could help here.
  • More lit review and integration.
  • Making diagrams for the Sprint review today
    • Overview
      • The purpose of this effort is to provide a capability for the system to do more sophisticated queries that do several things
        • Allow the user to emphasize/de-emphasize words or phrases that relate to the particular search and to do this interactively based on linguistic analysis of the returned text.
        • Get user value judgments on the information provided based on the link results reordering
        • Use this to feed back to the selection criteria for provider Flags.
      • This work leans on the paper PageRank without Hyperlinks if you want more background/depth.
    • Eiphcone 129 – Design database table schema.
      • Took my existing MySql db schema and migrated it to Java Persistent Entities. Basically this meant taking a db that was designed for precompiled query access and retrieval (direct data access for adding data, views for retrieval) and restructuring it. So we go from: beforeTables
      • to
      • afterTables
      • The classes are annotated POJOs in a simple hierarchy. The classes that have ‘Base’ in their names I expect to be extended, though there may be enough capability here. GuidBase has some additional capability to make adding data to one class that has a data relation to another class gets filled out properly in both: JavaClassHierarchySince multiple dictionary entries can be present in multiple corpora BaseDictionaryEntry and Corpus both have a <Set> of BaseEntryContext that connects the corpora and entries with additional information that might be useful, such as counts.
      • This manifests itself in the database as the following: ER DiagramIt’s not the prettiest drawing, but I can’t get IntelliJ to draw any better. You can see that the tables match directly to the classes. I used the InheritanceType.JOINED strategy since Jeremy was concerned about wasted space in the tables.
      • The next steps will be to start to create test cases that allow for tuning and testing of this setup at different data scales.
    • Eiphcone 132 – Document current progress on relationship/taxonomy design & existing threat model
      • Currently, a threat is extracted by comparing a set of known entities to surrounding text for keywords. In the model shown above, practitioners would exist in a network that includes items like the practice, attending hospitals, legal representation, etc. Because of this relationship, flags could be extended to the other members of the network. If a near neighbor in this network has a Flag attached, it will weight the surrounding edges and influence the practitioner. So if one doctor in a practice is convicted of malpractice, then other doctors in the practice will get lower scores.
      • The dictionary and corpus can interact as their own network to determine the amount of wight that is given to a particular score. For example, words in a dictionary that are used to extract data from a legal corpus may have more weight than a social media corpus.
    • Eiphcone 134 – Design/document NER processing in relation to future taxonomy
      • I compiled and ran the NER codebase and also walked though the Stanford NLP documentation. The current NER system looks to be somewhat basic, but solid and usable. Using it to populate the dictionaries and annotating the corpus appears to be straightforward addition of the capabilities already present in the Stanford API.
    • Demo – I don’t really have a demo, unless people want to see some tests compile and run. To save the time, I have this exiting printout that shows the return of dynamically created data:
[EL Info]: 2016-01-12 14:09:40.481--ServerSession(1842102517)--EclipseLink, version: Eclipse Persistence Services - 2.6.1.v20150916-55dc7c3
[EL Info]: connection: 2016-01-12 14:09:40.825--ServerSession(1842102517)--/file:/C:/Development/Sandboxes/JPA_2_1/out/production/JPA_2_1/_NetworkService login successful

Users
firstName(firstname_0), lastName(lastname_0), login(login_0), networks( network_0)
firstName(firstname_1), lastName(lastname_1), login(login_1), networks( network_4)
firstName(firstname_2), lastName(lastname_2), login(login_2), networks( network_3)
firstName(firstname_3), lastName(lastname_3), login(login_3), networks( network_1 network_2)
firstName(firstname_4), lastName(lastname_4), login(login_4), networks()

Networks
name(network_0), owner(login_0), type(WAMPETER), archived(false), public(false), editable(true)
	[92]: name(DataNode_6_to_BaseNode_8), guid(network_0_DataNode_6_to_BaseNode_8), weight(0.5708945393562317), type(IDENTITY), network(network_0)
		Source: [86]: name('DataNode_6'), type(ENTITIES), annotation('annotation_6'), guid('50836752-221a-4095-b059-2055230d59db'), double(18.84955592153876), int(6), text('text_6')
		Target: [88]: name('BaseNode_8'), type(COMPUTED), annotation('annotation_8'), guid('77250282-3b5e-416e-a469-bbade10c5e88')
	[91]: name(BaseNode_5_to_UrlNode_4), guid(network_0_BaseNode_5_to_UrlNode_4), weight(0.3703539967536926), type(COMPUTED), network(network_0)
		Source: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')
		Target: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
	[98]: name(BaseNode_5_to_UrlNode_1), guid(network_0_BaseNode_5_to_UrlNode_1), weight(0.4556456208229065), type(ENTITIES), network(network_0)
		Source: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')
		Target: [81]: name('UrlNode_1'), type(UNKNOWN), annotation('annotation_1'), guid('f9693110-6b5b-4888-9585-99b97062a4e4'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')

name(network_1), owner(login_3), type(WAMPETER), archived(false), public(false), editable(true)
	[96]: name(BaseNode_2_to_UrlNode_1), guid(network_1_BaseNode_2_to_UrlNode_1), weight(0.5733484625816345), type(URL), network(network_1)
		Source: [82]: name('BaseNode_2'), type(ITEM), annotation('annotation_2'), guid('c5867557-2ac3-4337-be34-da9da0c7e25d')
		Target: [81]: name('UrlNode_1'), type(UNKNOWN), annotation('annotation_1'), guid('f9693110-6b5b-4888-9585-99b97062a4e4'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
	[95]: name(DataNode_0_to_UrlNode_7), guid(network_1_DataNode_0_to_UrlNode_7), weight(0.85154128074646), type(MERGE), network(network_1)
		Source: [80]: name('DataNode_0'), type(USER), annotation('annotation_0'), guid('e9b7fa0a-37f1-41bd-a2c1-599841d1507a'), double(0.0), int(0), text('text_0')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
	[94]: name(DataNode_9_to_BaseNode_5), guid(network_1_DataNode_9_to_BaseNode_5), weight(0.72845458984375), type(KEYWORDS), network(network_1)
		Source: [89]: name('DataNode_9'), type(USER), annotation('annotation_9'), guid('5bdb67de-5319-42db-916e-c4050dc682dd'), double(28.274333882308138), int(9), text('text_9')
		Target: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')

name(network_2), owner(login_3), type(EXPLICIT), archived(false), public(false), editable(true)
	[90]: name(BaseNode_8_to_UrlNode_7), guid(network_2_BaseNode_8_to_UrlNode_7), weight(0.2619180679321289), type(WAMPETER), network(network_2)
		Source: [88]: name('BaseNode_8'), type(COMPUTED), annotation('annotation_8'), guid('77250282-3b5e-416e-a469-bbade10c5e88')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')

name(network_3), owner(login_2), type(EXPLICIT), archived(false), public(false), editable(true)
	[93]: name(UrlNode_4_to_DataNode_3), guid(network_3_UrlNode_4_to_DataNode_3), weight(0.7689594030380249), type(ITEM), network(network_3)
		Source: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
		Target: [83]: name('DataNode_3'), type(UNKNOWN), annotation('annotation_3'), guid('e7565935-6429-451f-b7f4-cc2d612ca3fd'), double(9.42477796076938), int(3), text('text_3')
	[97]: name(DataNode_3_to_DataNode_0), guid(network_3_DataNode_3_to_DataNode_0), weight(0.5808262825012207), type(URL), network(network_3)
		Source: [83]: name('DataNode_3'), type(UNKNOWN), annotation('annotation_3'), guid('e7565935-6429-451f-b7f4-cc2d612ca3fd'), double(9.42477796076938), int(3), text('text_3')
		Target: [80]: name('DataNode_0'), type(USER), annotation('annotation_0'), guid('e9b7fa0a-37f1-41bd-a2c1-599841d1507a'), double(0.0), int(0), text('text_0')

name(network_4), owner(login_1), type(ITEM), archived(false), public(false), editable(true)
	[99]: name(UrlNode_4_to_UrlNode_7), guid(network_4_UrlNode_4_to_UrlNode_7), weight(0.48601675033569336), type(WAMPETER), network(network_4)
		Source: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')


Dictionaries
[30]: name(dictionary_0), guid(943ea8b6-6def-48ea-8b0f-a4e52e53954f), Owner(login_0), archived(false), public(false), editable(true)
	Entry = word_11
	Parent = word_10
	word_11 has 790 occurances in corpora0_chapter_1

	Entry = word_14
	word_14 has 4459 occurances in corpora1_chapter_2

	Entry = word_1
	Parent = word_0
	word_1 has 3490 occurances in corpora1_chapter_2

	Entry = word_10
	word_10 has 3009 occurances in corpora3_chapter_4

	Entry = word_4
	word_4 has 2681 occurances in corpora3_chapter_4

	Entry = word_5
	Parent = word_4
	word_5 has 5877 occurances in corpora1_chapter_2


[31]: name(dictionary_1), guid(c7b62a4b-b21a-4ebe-a939-0a71a891a3f9), Owner(login_0), archived(false), public(false), editable(true)
	Entry = word_3
	Parent = word_2
	word_3 has 4220 occurances in corpora0_chapter_1

	Entry = word_6
	word_6 has 4852 occurances in corpora2_chapter_3

	Entry = word_17
	Parent = word_16
	word_17 has 8394 occurances in corpora2_chapter_3

	Entry = word_2
	word_2 has 1218 occurances in corpora3_chapter_4

	Entry = word_19
	Parent = word_18
	word_19 has 8921 occurances in corpora2_chapter_3

	Entry = word_8
	word_8 has 4399 occurances in corpora3_chapter_4



Corpora
[27]: name(corpora1_chapter_2), guid(08803d93-deeb-4699-bdb2-ffa9f635c373), totalWords(1801), importer(login_1), url(http://americanliterature.com/author/herman-melville/book/moby-dick-or-the-whale/chapter-2-the-carpet-bag)
	word_15 has 5338 occurances in corpora1_chapter_2
	word_13 has 2181 occurances in corpora1_chapter_2
	word_14 has 4459 occurances in corpora1_chapter_2
	word_1 has 3490 occurances in corpora1_chapter_2
	word_5 has 5877 occurances in corpora1_chapter_2
	word_16 has 2625 occurances in corpora1_chapter_2

[EL Info]: connection: 2016-01-12 14:09:41.116--ServerSession(1842102517)--/file:/C:/Development/Sandboxes/JPA_2_1/out/production/JPA_2_1/_NetworkService logout successful
  • Sprint review delayed. Tomorrow
  • Filling in some knowledge holes in JPA. Finished Chapter 4.
  • Tried getting enumerated types to work. No luck…?

Phil 1.8.16

8:00 – 5:00

  • Today is Roy Batty’s Birthday
  • Had a thought this morning. Rather than just having anonymous people post what they think is newsworthy, have a Journalist chatbot (something as simple as Eliza could work) tease out more information. The pattern of response, possibly augmented by server pulls for additional information might get to some really interesting responses, and a lot more input from the user.
  • Ok, now that I’ve got the path information figured out, migrating to vanilla JPA.
  • Viewing the sql requiresa  library specific property, but everything else is vanilla. This gets the tables built:
    <persistence xmlns="http://xmlns.jcp.org/xml/ns/persistence" version="2.1">
        <persistence-unit name="NetworkService" transaction-type="RESOURCE_LOCAL">
            <class>com.philfeldman.mappings.GuidBase</class>
            <class>com.philfeldman.mappings.BaseAssociation</class>
            <class>com.philfeldman.mappings.BaseDictionary</class>
            <class>com.philfeldman.mappings.BaseDictionaryEntry</class>
            <class>com.philfeldman.mappings.BaseNetwork</class>
            <class>com.philfeldman.mappings.BaseNode</class>
            <class>com.philfeldman.mappings.BaseUser</class>
            <class>com.philfeldman.mappings.Corpus</class>
            <class>com.philfeldman.mappings.DataNode</class>
            <class>com.philfeldman.mappings.NetworkType</class>
            <class>com.philfeldman.mappings.UrlNode</class>
            <validation-mode>NONE</validation-mode>
            <properties>
                <property name="javax.persistence.jdbc.driver" value="com.mysql.jdbc.Driver"/>
                <property name="javax.persistence.jdbc.url" value="jdbc:mysql://localhost:3306/projpa"/>
                <property name="javax.persistence.jdbc.user" value="root"/>
                <property name="javax.persistence.jdbc.password" value="edge"/>
                <property name="javax.persistence.schema-generation.database.action" value="drop-and-create"/>
                <!-- enable this property to see SQL and other logging -->
                <property name="eclipselink.logging.level" value="FINE"/>
            </properties>
        </persistence-unit>
    </persistence>
  • Here’s a simple JPA commit:
    public void addUsers(int num){
        em.getTransaction().begin();
        for(int i = 0; i < num; ++i) {
            BaseUser bu = new BaseUser("firstname_" + i, "lastname_" + i, "login_" + i, "password_" + i);
            em.persist(bu);
        }
    
        em.getTransaction().commit();
    }
  • Here’s a simple Criteria pull:
    public void getAllUsers(){
        CriteriaBuilder cb = em.getCriteriaBuilder();
        CriteriaQuery<BaseUser> cq = cb.createQuery(BaseUser.class);
        TypedQuery<BaseUser> tq = em.createQuery(cq);
        users = new ArrayList<>(tq.getResultList());
    }
  • Here’s a more sophisticated query. This can be made much better easily, but that’s for next week.
    System.out.println("\nDictionaries");
    String Query = "SELECT bd FROM dictionaries bd WHERE bd.owner.login LIKE '%_4%'";
    TypedQuery<BaseDictionary> dictQuery = em.createQuery(Query, BaseDictionary.class);
    List<BaseDictionary> bds = dictQuery.getResultList();
    for(BaseDictionary bd : bds){
        System.out.println(bd.toString());
    }

Phil 12.11.15

8:00 – 5:00 VTX

  • No AI course this morning, had to drop off the car.
  • Some preliminary discussions about sprint planning with Aaron yesterday. Aside from the getting the two ‘Derived’ database structures reconciled, I need to think about a few things:
    • who the network ‘users’ are. I think it could be VTX, or the system customers, like Aetna.
    • What kinds of networks exist?
      • Each individual doctor is a network of doctors, keywords, entities, sources, threats and ratings. That can certainly run on the browser
      • Then there is the larger network of ‘relevant’ doctors. That’s a larger network, certainly in the 10s – 100s range. On the lower end of the scale that could be done directly in the browser. For larger networks, we might have to use the GPU? Which seems very doable, via Steve Sanderson.
      • Then there is the master ranking, which should be something like most threatening to least threatening, probably. Queries with additional parameters pull a subset of the ordered data (SELECT foo, bar from ?? ORDER BY eigenvalue). Interestingly, according to this IEEE article from 2010, GPU processing  was handling 10 million nodes in about 30 seconds using optimized sparse matrix (SpMV) calculations. So it’s conceivable that calculations could be done in real time.
  • More documentation
  • More discussions wit Aaron about where data lives and how it’s structured.
  • Sprint planning

Phil 12.4.15

8:00 – VTX

  • Scrum
  • Found an interesting tidbit on the WaPo this morning. It implies that if there is a pattern of statement followed by a search for confirming information followed by a public citation of confirming information could be the basic unit of an information bubble. For this to be a bubble, I think the pertinent information extracted from the relevant search results would have to be somehow identifiable as a minority view. This could be done by comparing the Jaccard index of the adjusted results with the raw returns of a search? In other words, if the world (relevant search)  has an overall vector in one direction and the individual preferences produce a pertinent result that is pointing in the opposite direction (large dot product), then the likelihood of those results being the result of echo-chamber processes are higher?
  • If the Derived DB depends on analyst examination of the data, this could be a way of flagging analyst bias.
  • Researching WebScaleSQL, I stumbled on another db from Facebook. This one,  RocksDB, is more focused on speed. From the splash page:
    • RocksDB can be used by applications that need low latency database accesses. A user-facing application that stores the viewing history and state of users of a website can potentially store this content on RocksDB. A spam detection application that needs fast access to big data sets can use RocksDB. A graph-search query that needs to scan a data set in realtime can use RocksDB. RocksDB can be used to cache data from Hadoop, thereby allowing applications to query Hadoop data in realtime. A message-queue that supports a high number of inserts and deletes can use RocksDB.
  • Interestingly, RocksDB appears to have integration with MongoDB and is working on MySQL integration. Cassandra appears to be implementing similar optimizations.
  • Just discovered reported.ly, which is a social medial sourced, reporter curated news stream. Could be a good source of data to compare against things like news feeds from Google or major news venues.
  • Control System Meeting
    • Send RCS and Search Competition to Bob
    • Seems like this whole system is a lot like what Databricks is doing?

Phil 12.2.15

7:00 –

  • Learning: Neural Nets, Back Propagation
    • Synaptic weights are higher for some synapses than others
    • Cumulative stimulus
    • All-or-none threshold for propagation.
    • Once we have a model, we can ask what we can do with it.
    • Now I’m curious about the MIT approach to calculus. It’s online too: MIT 18.01 Single Variable Calculus
    • Back-propagation algorithm. Starts from the end and works forward so that each new calculation depends only on its local information plus values that have already been calculated.
    • Overfitting and under/over damping issues are also considerations.
  • Scrum meeting
  • Remember to bring a keyboard tomorrow!!!!
  • Checking that my home dev code is the same as what I pulled down from the repository
    • No change in definitelytyped
    • No change in the other files either, so those were real bugs. Don’t know why they didn’t get caught. But that means the repo is good and the bugs are fixed.
  • Validate that PHP runs and debugs in the new dev env. Done
  • Add a new test that inputs large (thousands -> millions) of unique ENTITY entries with small-ish star networks of partially shared URL entries. Time view retrieval times for SELECT COUNT(*) from tn_view_network_items WHERE network_id = 8;
    • Computer: 2008 Dell Precision M6300
    • System: Processor Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz, 2201 Mhz, 2 Core(s), 2 Logical Processor(s), Available Physical Memory 611 MB
    • 100 is 0.09 sec
    • 1000 is 0.14 sec
    • 10,000 is 0.84 sec
    • Using Open Office’s linear regression function, I get the equation t = 0.00007657x + 0.733 with an R squared of 0.99948.
    • That means 1,000,000 view entries can be processed in 75 seconds or so as long as things don’t get IO bound
  • Got the PHP interpreter and debugger working. In this case, it was just refreshing in settings->languages->php

Phil 12.1.15

7:30 – 5:00

  • Learning: Identification Trees, Disorder
    • Trees of tests
    • Identification Tree (Not a decision tree!)
    • Measuring Disorder – lowest disorder is best test
      • Disorder(set of binaries) = -(positive/total*log2(positive/total)) – (neg/total*log2(neg/total))
        • is the base log related to the base of the set?
        • Add up the disorder of each result in the test to determine the disorder of the test normalized by the number of samples. Lowest disorder is winner
  • Bringing in my machine learning, pattern recognition and stats books.
  • Bringing in my big laptop
  • Setting up dev environment.
    • Using the new IDEA 15.x, which seems to be OK for the typescript, will check PHP tomorrow.
    • Installed grunt (grunt-global, then grunt-local from the makefiles)
    • installed typescript (npm i -g typescript)
    • Installed gnuWin32 , which has makefile and touch support, along with all the important DLLs. It turns out that there is also gnuWin64. Will use that next time
    • Fixed bugs that didn’t get caught before. Older compiler?
      • commented out the waa.d.ts from the three.d.ts definitelytyped file
      • deleted the { antialias: boolean; alpha: boolean; } args from the CanvasRenderer call in classes/WebGlCanvasClasses
      • added title?:string and assoc_name?:string to IPostObject in RssController
      • had to add the experiments/wglcharts2 folder to the xampp Apache htdocs
      • added word?:string to IPostObj in RssAppDirectives
      • added word_type_name?:string to IPostObj in RssAppDirectives
      • fixed the font calls in WebGl3dCharts IComponentConfig.
    • Since these issues really shouldn’t have happened, I’m going to verify that they are not in my home dev environment before checking in.
  • And the new computer arrived, so I get to do some of the install tomorrow.

Phil 11.26.15

7:00 – Leave

  • Constraints: Visual Object Recognition
    • to see if to signals match, a maximising function that integrates the area under the signal with respect to offsets (translation and rotation) is very good, even with noise.
  • Dictionary
    • Add ‘Help Choose Doctor’, ‘Help Choose Investments’, ‘Help Choose Healthcare Plan’, ‘Navigate News’ and ‘Help Find CHI Paper’ dictionaries. At this point they can be empty. We’ll talk about them in the paper.
    • Added ‘archive’ to dictionary, because we’ll need temporary dicts associated with users like networks.
    • Deploy new system. Done!
      • Reloaded the DB
      • Copied over the server code
      • Ran the simpleTests() for AlchemyDictText. That adds network[5] with tests against the words that are in my manual resume dictionary. Then network[2] is added with no dictionary.
      • Commented out simpleTests for AlchemyDictText
      • copied over all the new client code
      • Ran the client and verified that all the networks and dictionaries were there as they were supposed to be.
      • Loaded network[2] ‘Using extracted dict’
      • Selected the empty dictionary[2] ‘Phil’s extracted resume dict’
      • Ran Extract from Network, which is faster on Dreamhost! That populated the dictionary.
      • Deleted the entry for ‘3’
      • Ran Attach to Network. Also fast 🙂
  • And now time for ThanksGiving. On a really good note!

AllWorking

Phil 11.25.15

7:00 – 1:00 Leave

  • Constraints: Search, Domain Reduction
    • Order from most constrained to least.
    • For a constrained problem, check over and under allocations to see where the gap between fast failure and fast completion lie.
    • Only recurse through neighbors where domain (choices) have been reduced to 1.
  • Dictionary
    • Add an optional ‘source_text’ field to the tn_dictionaries table so that user added words can be compared to the text. Done. There is the issue that the dictionary could be used against a different corpus, at which point this would be little more than a creation artifact
    • Add a ‘source_count’ to the tn_dictionary_entries table that is shown in the directive. Defaults to zero? Done. Same issue as above, when compared to a new corpus, do we recompute the counts?
    • Wire up Attach Dictionary to Network
      • Working on AlchemyDictReflect that will place keywords in the tn_items table and connect them in the tn_associations table.
      • Had to add a few helper methods in networkDbIo.php to handle the modifying of the network tables, since alchemyNLPbase doesn’t extend baseBdIo. Not the cleanest thing I’ve ever done, but not *horrible*.
      • Done and working! Need to deploy.