Category Archives: Projects

Phil 8.18.17

7:00 – 8:00 Research

  • Got indexFromLocation() working. It took some fooling around with Excel. Here’s the method:
    public int[] indexFromLocation(double[] loc){
        int[] index = new int[loc.length];
        for(int i = 0; i < loc.length; ++i){
            double findex = loc[i]/mappingStep;
            double roundDown = Math.floor(findex);
            double roundUp = Math.ceil(findex);
            double lowdiff = findex - roundDown;
            double highdiff = roundUp - findex;
            if(lowdiff < highdiff){
                index[i] = (int)roundDown;
            }else{
                index[i] = (int)roundUp;
            }
        }
        return index;
    }
  • And here are the much cleaner results:
    • [0.00, 0.00] = [0, 0]
      [0.00, 0.10] = [0, 0]
      [0.00, 0.20] = [0, 1]
      [0.00, 0.30] = [0, 1]
      [0.00, 0.40] = [0, 2]
      [0.00, 0.50] = [0, 2]
      [0.00, 0.60] = [0, 2]
      [0.00, 0.70] = [0, 3]
      [0.00, 0.80] = [0, 3]
      [0.00, 0.90] = [0, 4]
      [0.00, 1.00] = [0, 4]

      [1.00, 0.00] = [4, 0]
      [1.00, 0.10] = [4, 0]
      [1.00, 0.20] = [4, 1]
      [1.00, 0.30] = [4, 1]
      [1.00, 0.40] = [4, 2]
      [1.00, 0.50] = [4, 2]
      [1.00, 0.60] = [4, 2]
      [1.00, 0.70] = [4, 3]
      [1.00, 0.80] = [4, 3]
      [1.00, 0.90] = [4, 4]
      [1.00, 1.00] = [4, 4]
  • Another thought that struck me as far as the (int) constraint is that I can have a number of ArrayLists that are embedded in a an object that has the first and last index in it. These would be linked together to provide unconstrained (MAX_VALUE or 2,147,483,647 lists) storage

8:30 – 4:30 BRI

  • I realized yesterday that the Ingest and Query microservices need to access the same GeoMesa Spring service. That keeps all the general store/query GeoMesa access code in one place, simplifies testing and allows for DI to provide the correct (hbase, accumulo, etc) implementation through a facade interface.
  • Got tangled up with getting classpaths right and importing the proper libraries
  • Got the maven files behaving, or at least not complaining on mvn clean and mvn compile!
  • Well that’s a new error: Error: Could not create the Java Virtual Machine. I get that running the new installation with the geomesa-quickstart-hbase
    • Ah, that’s what will happen when you paste your command-line arguments into the VM arguments space just above where it should go…
    • Wednesday’s goal will to verify that HBaseQuickStart is running correctly in its new home and start to turn it into a service.
Advertisements

Phil 8.17.17

BRI – one hour chasing down research hours from Jan – May

7:00 – 6:00 Research

  • Found this on negative flocking influences: The rise of negative partisanship and the nationalization of US elections in the 21st century.  Paper saved to Lit Review
    • One of the most important developments affecting electoral competition in the United States has been the increasingly partisan behavior of the American electorate. Yet more voters than ever claim to be independents. We argue that the explanation for these seemingly contradictory trends is the rise of negative partisanship. Using data from the American National Election Studies, we show that as partisan identities have become more closely aligned with social, cultural and ideological divisions in American society, party supporters including leaning independents have developed increasingly negative feelings about the opposing party and its candidates. This has led to dramatic increases in party loyalty and straight-ticket voting, a steep decline in the advantage of incumbency and growing consistency between the results of presidential elections and the results of House, Senate and even state legislative elections. The rise of negative partisanship has had profound consequences for electoral competition, democratic representation and governance.
  • Working on putting together an indexable high-dimension matrix that can contain objects. Generally, I’d expect it to be doubles, but I can see Strings and Objects as well.
  • Starting off by seeing what’s in the newest Apache Commons Math (v 3.6.1)
  • Found SimpleTensor, which uses the Efficient Java Matrix Library (EJML) and creates a 3D block of rows, columns and slices. THought it was what I wanted, but nope
  • Looks like there isn’t a class that would do what I need to do, or that I can even modify. I’m thinking that the best option is to use org.apache.commons.math3.linear.AbstractRealMatrix as a template.
  • Nope, coudn’t figure out how to do things as nested lists. So I’m doing it C-Style, where you really only have one array that you index into. Here’s a 4x4x4x4 Tensor filled with zeroes:
    Total elements = 256
    0.0:[0, 0, 0, 0], 0.0:[1, 0, 0, 0], 0.0:[2, 0, 0, 0], 0.0:[3, 0, 0, 0],
    0.0:[0, 1, 0, 0], 0.0:[1, 1, 0, 0], 0.0:[2, 1, 0, 0], 0.0:[3, 1, 0, 0],
    0.0:[0, 2, 0, 0], 0.0:[1, 2, 0, 0], 0.0:[2, 2, 0, 0], 0.0:[3, 2, 0, 0],
    0.0:[0, 3, 0, 0], 0.0:[1, 3, 0, 0], 0.0:[2, 3, 0, 0], 0.0:[3, 3, 0, 0],
    0.0:[0, 0, 1, 0], 0.0:[1, 0, 1, 0], 0.0:[2, 0, 1, 0], 0.0:[3, 0, 1, 0],
    ….
    0.0:[0, 2, 3, 3], 0.0:[1, 2, 3, 3], 0.0:[2, 2, 3, 3], 0.0:[3, 2, 3, 3],
    0.0:[0, 3, 3, 3], 0.0:[1, 3, 3, 3], 0.0:[2, 3, 3, 3], 0.0:[3, 3, 3, 3]
  • The only issue that I currently have is that ArrayLists are indexed by int, so the total size is 32k elements. That should be good enough for now, but it will need to be fixed.
  • set() and get() work nicely:
    lt.set(new int[]{0, 1, 0, 0}, 9.9);
    lt.set(new int[]{3, 3, 3, 3}, 3.3);
    
    System.out.println("[0, 1, 0, 0] = " + lt.get(new int[]{0, 1, 0, 0}));
    System.out.println("[3, 3, 3, 3] = " + lt.get(new int[]{3, 3, 3, 3}));
    
    [0, 1, 0, 0] = 9.9
    [3, 3, 3, 3] = 3.3
  • Started the indexFromLocation method, but this is too sloppy:
    index[i] = (int)Math.floor(Math.round(loc[i]/mappingStep));

Phil 8.16.17

7:00 – 8:00 Research

  • Added takeaway thoughts to my C&C writeup.
  • Working out how to add capability to the sim for P&RCH paper. My thoughts from vacation:
    • The agents contribution is the heading and speed
    • The UI is what the agent’s can ‘see’
    • The IR is what is available to be seen
    • An additional part might be to add the ability to store data in the space. Then the behavior of the IR (e.g. empty areas) would b more apparent, as would the effects of UI (only certain data is visible, or maybe only nearby data is visible) Data could be a vector field in Hilbert space, and visualized as color.
  • Updated IntelliJ
  • Working out how to to have a voxel space for the agents to move through that can also be drawn. It’s any number of dimensions, but it has to project to 2D. In the case of the agents, I just choose the first two axis. Each agent has an array of statements that are assembled into a belief vector. The space can be an array of beliefs. Are these just constructed so that they fill a space according to a set of rules? Then the xDimensionName and yDimensionName axis would go from (0, 1), which would scale to stage size? IR would still be a matter of comparing the space to the agent’s vector. Hmm.
  • This looks really good from an information horizon perspective: The Role of the Information Environment in Partisan Voting
    • Voters are often highly dependent on partisanship to structure their preferences toward political candidates and policy proposals. What conditions enable partisan cues to “dominate” public opinion? Here I theorize that variation in voters’ reliance on partisanship results, in part, from the opportunities their environment provides to learn about politics. A conjoint experiment and an observational study of voting in congressional elections both support the expectation that more detailed information environments reduce the role of partisanship in candidate choice

9:00 – 5:00 BRI

  • Good lord, the BoA corporate card comes with SIX seperate documents to read.
  • Onward to Chapter Three and Spring database interaction
  • Well that’s pretty clean. I do like the JdbcTemplate behaviors. Not sure I like the way you specify the values passed to the query, but I can’t think of anything better if you have more than one argument:
    @Repository
    public class EmployeeDaoImpl implements EmployeeDao {
        @Autowired
        private DataSource dataSource;
    
        @Autowired
        private JdbcTemplate jdbcTemplate;
    
        private RowMapper<Employee> employeeRowMapper = new RowMapper<Employee>() {
            @Override
            public Employee mapRow(ResultSet rs, int i) throws SQLException {
                Employee employee = new EmployeeImpl();
                employee.setEmployeeAge(rs.getInt("Age"));
                employee.setEmployeeId(rs.getInt("ID"));
                employee.setEmployeeName(rs.getString("FirstName") + " " + rs.getString("LastName"));
                return employee;
            }
        };
    
        @Override
        public Employee getEmployeeById(int id) {
            Employee employee = null;
    
            employee = jdbcTemplate.queryForObject(
                    "select * from Employee where id = ?",
                    new Object[]{id},
                    employeeRowMapper
            );
            return employee;
        }
    
        public List<Employee> getAllEmployees() {
            List<Employee> eList = jdbcTemplate.query(
                    "select * from Employee",
                    employeeRowMapper
            );
            return eList;
        }
    }
  • Here’s the xml to wire the thing up:
    <context:component-scan base-package="org.springframework.chapter3.dao"/>
    <bean id="employeeDao" class="org.springframework.chapter3.dao.EmployeeDaoImpl"/>
    
    <bean id="dataSource"
          class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName" value="${jdbc.driverClassName}" />
        <property name="url" value="${jdbc.url}" />
        <property name="username" value="xxx"/>
        <property name="password" value="yyy"/>
    </bean>
    
    <bean id="jdbcTemplate" class="org.springframework.jdbc.core.JdbcTemplate">
        <property name="dataSource" ref="dataSource" />
    </bean>
    
    <context:property-placeholder location="jdbc.properties" />
  • And here’s the properties. Note that I had to disable SSL:
    jdbc.driverClassName=com.mysql.jdbc.Driver
    jdbc.url=jdbc:mysql://localhost:3306/sandbox?autoReconnect=true&useSSL=false

Phil 4.25.16

5:30 – 4:00 VTX

  • Saw this on Twitter about visualizing networks with D3
  • Working my way through the JavaFX tutorial. It is a lot like a blend of Flex and a rethought Swing. Nice, actually…
  • Here is the list of stock components
  • Starting with the ope file dialog – done.
  • Yep, there’s a spinner. And here’s dials and knobs
  • And here’s how to do a word cloud.
  • Here’s a TF-IDF implementation in JAVA. Need to build some code that reads in from our ‘negative match’ ‘positive match’ results and start to get some data driven terms
  • Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for “tree regular expressions”). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, calledsemgrex).
  • Semgrex
  • Sprint review
    • Google CSEs
      • Switched over from my personal CSEs to Vistronix CSEs
      • Added VCS rep for CSEs
      • Figured out how to save out and load CSE from XML
      • Added a few more CSEs ONLY_NET, MOBY_DICK
      • Wrote up care and feeding document for Confluence
      • Added blacklists
    • Rating App
      • Re-rigged the JPA classes to be Ontology-agnostic Version 2 of nearly everything)
      • Upped my JQL game to handle SELECT IN WHERE precompiled queries
      • Reading in VA and PA data now
      • Added the creation of a text JSON object that formalizes the rating of a flag
      • Got hooked up to the Talend DB!!!
      • Deployed initial version(s)
      • Added backlink logging using SemRush
    • Future work
      • Developed Excel ingest
      • Still working on PDF and Word ingest

Phil 2.11.16

6:00 – 4:00 VTX

  • Continuing Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Need to see if I can get this on Monday: Rethinking Journalism: trust and participation in a transformed news landscape. Got the kindle book.
  • Need to add a menubar to the Gui app that has a ‘data’ and ‘queries’ tab. Data runs the data generation code. Queries has a list of questions that clears the output and then sends the results to the text area.
  • Still need to move the db to a server. Just realized that it could be a MySql db on Dreamhost too. Having trouble with that. It might be the eclipse jar? Here’s the hibernate jar location in maven:
    <groupId>org.hibernate.javax.persistence</groupId>
    <artifactId>hibernate-jpa-2.0-api</artifactId>
    <version>1.0.1.Final</version>
  • Gave up on connecting to Dreamhost. I think it’s a permissions thing. Asked Heath to look into creating a stable DB somewhere. He needs to talk to Damien.
  • Webhose.io – direct access to live & structured data from millions of sources.
  • Search by date: https://support.google.com/news/answer/3334?hl=en
    • Google news search that produces Json for the last 24 hours:
      ?q=malpractice&safe=off&hl=en&gl=us&authuser=0&tbm=nws&source=lnt&tbs=qdr:d
  • Played around with a bunch of queries, but in the end, I figured that it was better to write the whole works out in a .csv file and do pivot tables in Excel.
  • Adding the ability to read a config file to set the search engines, lables, etc for generation.

Data Architecture Meeting 2.11.15

Testing what we have

  • Relevance score
  • Pertinence score
  • Charts for management

Vinny

  • Terminology
  • gov
  • Bias towards trustworthy unstructured sources.
  • What about getting structured data.

Aaron

  • Isolate V1 capability
  • Metrics!
  • We need the structured data!!

Matt

  • Dsds

Scott

  • Questions about unstructured query

Phil 12.2.15

7:00 –

  • Learning: Neural Nets, Back Propagation
    • Synaptic weights are higher for some synapses than others
    • Cumulative stimulus
    • All-or-none threshold for propagation.
    • Once we have a model, we can ask what we can do with it.
    • Now I’m curious about the MIT approach to calculus. It’s online too: MIT 18.01 Single Variable Calculus
    • Back-propagation algorithm. Starts from the end and works forward so that each new calculation depends only on its local information plus values that have already been calculated.
    • Overfitting and under/over damping issues are also considerations.
  • Scrum meeting
  • Remember to bring a keyboard tomorrow!!!!
  • Checking that my home dev code is the same as what I pulled down from the repository
    • No change in definitelytyped
    • No change in the other files either, so those were real bugs. Don’t know why they didn’t get caught. But that means the repo is good and the bugs are fixed.
  • Validate that PHP runs and debugs in the new dev env. Done
  • Add a new test that inputs large (thousands -> millions) of unique ENTITY entries with small-ish star networks of partially shared URL entries. Time view retrieval times for SELECT COUNT(*) from tn_view_network_items WHERE network_id = 8;
    • Computer: 2008 Dell Precision M6300
    • System: Processor Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz, 2201 Mhz, 2 Core(s), 2 Logical Processor(s), Available Physical Memory 611 MB
    • 100 is 0.09 sec
    • 1000 is 0.14 sec
    • 10,000 is 0.84 sec
    • Using Open Office’s linear regression function, I get the equation t = 0.00007657x + 0.733 with an R squared of 0.99948.
    • That means 1,000,000 view entries can be processed in 75 seconds or so as long as things don’t get IO bound
  • Got the PHP interpreter and debugger working. In this case, it was just refreshing in settings->languages->php

Phil 11.26.15

7:00 – Leave

  • Constraints: Visual Object Recognition
    • to see if to signals match, a maximising function that integrates the area under the signal with respect to offsets (translation and rotation) is very good, even with noise.
  • Dictionary
    • Add ‘Help Choose Doctor’, ‘Help Choose Investments’, ‘Help Choose Healthcare Plan’, ‘Navigate News’ and ‘Help Find CHI Paper’ dictionaries. At this point they can be empty. We’ll talk about them in the paper.
    • Added ‘archive’ to dictionary, because we’ll need temporary dicts associated with users like networks.
    • Deploy new system. Done!
      • Reloaded the DB
      • Copied over the server code
      • Ran the simpleTests() for AlchemyDictText. That adds network[5] with tests against the words that are in my manual resume dictionary. Then network[2] is added with no dictionary.
      • Commented out simpleTests for AlchemyDictText
      • copied over all the new client code
      • Ran the client and verified that all the networks and dictionaries were there as they were supposed to be.
      • Loaded network[2] ‘Using extracted dict’
      • Selected the empty dictionary[2] ‘Phil’s extracted resume dict’
      • Ran Extract from Network, which is faster on Dreamhost! That populated the dictionary.
      • Deleted the entry for ‘3’
      • Ran Attach to Network. Also fast 🙂
  • And now time for ThanksGiving. On a really good note!

AllWorking

Phil 11.25.15

7:00 – 1:00 Leave

  • Constraints: Search, Domain Reduction
    • Order from most constrained to least.
    • For a constrained problem, check over and under allocations to see where the gap between fast failure and fast completion lie.
    • Only recurse through neighbors where domain (choices) have been reduced to 1.
  • Dictionary
    • Add an optional ‘source_text’ field to the tn_dictionaries table so that user added words can be compared to the text. Done. There is the issue that the dictionary could be used against a different corpus, at which point this would be little more than a creation artifact
    • Add a ‘source_count’ to the tn_dictionary_entries table that is shown in the directive. Defaults to zero? Done. Same issue as above, when compared to a new corpus, do we recompute the counts?
    • Wire up Attach Dictionary to Network
      • Working on AlchemyDictReflect that will place keywords in the tn_items table and connect them in the tn_associations table.
      • Had to add a few helper methods in networkDbIo.php to handle the modifying of the network tables, since alchemyNLPbase doesn’t extend baseBdIo. Not the cleanest thing I’ve ever done, but not *horrible*.
      • Done and working! Need to deploy.

Phil 11.24.15

7:00 – Leave

  • Constraints: Interpreting Line Drawings
    • Successful research:
      • Finds a problem
      • Finds a method that solves the problem
      • Using some principal (That can be generalized)
  • Gave Aaron M. A subversion account and sent him a description of the structure of the project
  • Back to dictionary creation
    • Wire up Extract into Dictionary
      • I think I’m going to do most of this on the server. If I do a select text from tn_view_network_items where network = X, then I can run that text that is already in the DB through the term extractor, which should be the fastest thing I can do.
      • The next fastest thing would be to pull the text from the url (if it exists) and add that to the text pull.
      • Added a getTextFromNetwork() method to NetworkDbObject.
      • The html was getting extracted badly, so I had to add a call to alchemy to return the cleaned text. TODO: in the future add a ‘clean_text’ column to tn_items so this is done on ingestion. I also added
      • Added all the pieces to the rssPull.php file and tested. And integrated with the client. Looks like it takes about 8 seconds to go through my resume, so some offline processing will probably be needed for ACM papers, for example.
    • Wire up Attach Dictionary to Network
      • The current setup is set so that a new item that is read in will associate with the current network dictionary. Need to add a way to have the items that are already in the network to check themselves against the new dictionary.
      • Added class AlchemyDictReflect that will place keywords in the DB. Still need to debug. And don’t forget that the controller will have to reload the network after all thechanges are made.

 

Phil 11.23.15

7:00 – Leave

  • Search: Games, Minimax, and Alpha-Beta
    • Branching factor (B)
    • Search depth (D)
    • Combining the two gives the number of leaf nodes or B^D
    • Branching factor of chess is approximately 14?
  • Dictionaries
    • Wire up Create New Dictionary – done
    • Wire up Extract into Dictionary
      • I think I’m going to do most of this on the server. If I do a select text from tn_view_network_items where network = X, then I can run that text that is already in the DB through the term extractor, which should be the fastest thing I can do.
      • The next fastest thing would be to pull the text from the url (if it exists) and add that to the text pull.
    • Wire up Attach Dictionary to Network
      • The current setup is set so that a new item that is read in will associate with the current network dictionary. Need to add a way to have the items that are already in the network to check themselves against the new dictionary.