Category Archives: Projects

Phil 8.18.17

7:00 – 8:00 Research

  • Got indexFromLocation() working. It took some fooling around with Excel. Here’s the method:
    public int[] indexFromLocation(double[] loc){
        int[] index = new int[loc.length];
        for(int i = 0; i < loc.length; ++i){
            double findex = loc[i]/mappingStep;
            double roundDown = Math.floor(findex);
            double roundUp = Math.ceil(findex);
            double lowdiff = findex - roundDown;
            double highdiff = roundUp - findex;
            if(lowdiff < highdiff){
                index[i] = (int)roundDown;
            }else{
                index[i] = (int)roundUp;
            }
        }
        return index;
    }
  • And here are the much cleaner results:
    • [0.00, 0.00] = [0, 0]
      [0.00, 0.10] = [0, 0]
      [0.00, 0.20] = [0, 1]
      [0.00, 0.30] = [0, 1]
      [0.00, 0.40] = [0, 2]
      [0.00, 0.50] = [0, 2]
      [0.00, 0.60] = [0, 2]
      [0.00, 0.70] = [0, 3]
      [0.00, 0.80] = [0, 3]
      [0.00, 0.90] = [0, 4]
      [0.00, 1.00] = [0, 4]

      [1.00, 0.00] = [4, 0]
      [1.00, 0.10] = [4, 0]
      [1.00, 0.20] = [4, 1]
      [1.00, 0.30] = [4, 1]
      [1.00, 0.40] = [4, 2]
      [1.00, 0.50] = [4, 2]
      [1.00, 0.60] = [4, 2]
      [1.00, 0.70] = [4, 3]
      [1.00, 0.80] = [4, 3]
      [1.00, 0.90] = [4, 4]
      [1.00, 1.00] = [4, 4]
  • Another thought that struck me as far as the (int) constraint is that I can have a number of ArrayLists that are embedded in a an object that has the first and last index in it. These would be linked together to provide unconstrained (MAX_VALUE or 2,147,483,647 lists) storage

8:30 – 4:30 BRI

  • I realized yesterday that the Ingest and Query microservices need to access the same GeoMesa Spring service. That keeps all the general store/query GeoMesa access code in one place, simplifies testing and allows for DI to provide the correct (hbase, accumulo, etc) implementation through a facade interface.
  • Got tangled up with getting classpaths right and importing the proper libraries
  • Got the maven files behaving, or at least not complaining on mvn clean and mvn compile!
  • Well that’s a new error: Error: Could not create the Java Virtual Machine. I get that running the new installation with the geomesa-quickstart-hbase
    • Ah, that’s what will happen when you paste your command-line arguments into the VM arguments space just above where it should go…
    • Wednesday’s goal will to verify that HBaseQuickStart is running correctly in its new home and start to turn it into a service.
Advertisements

Phil 8.17.17

BRI – one hour chasing down research hours from Jan – May

7:00 – 6:00 Research

  • Found this on negative flocking influences: The rise of negative partisanship and the nationalization of US elections in the 21st century.  Paper saved to Lit Review
    • One of the most important developments affecting electoral competition in the United States has been the increasingly partisan behavior of the American electorate. Yet more voters than ever claim to be independents. We argue that the explanation for these seemingly contradictory trends is the rise of negative partisanship. Using data from the American National Election Studies, we show that as partisan identities have become more closely aligned with social, cultural and ideological divisions in American society, party supporters including leaning independents have developed increasingly negative feelings about the opposing party and its candidates. This has led to dramatic increases in party loyalty and straight-ticket voting, a steep decline in the advantage of incumbency and growing consistency between the results of presidential elections and the results of House, Senate and even state legislative elections. The rise of negative partisanship has had profound consequences for electoral competition, democratic representation and governance.
  • Working on putting together an indexable high-dimension matrix that can contain objects. Generally, I’d expect it to be doubles, but I can see Strings and Objects as well.
  • Starting off by seeing what’s in the newest Apache Commons Math (v 3.6.1)
  • Found SimpleTensor, which uses the Efficient Java Matrix Library (EJML) and creates a 3D block of rows, columns and slices. THought it was what I wanted, but nope
  • Looks like there isn’t a class that would do what I need to do, or that I can even modify. I’m thinking that the best option is to use org.apache.commons.math3.linear.AbstractRealMatrix as a template.
  • Nope, coudn’t figure out how to do things as nested lists. So I’m doing it C-Style, where you really only have one array that you index into. Here’s a 4x4x4x4 Tensor filled with zeroes:
    Total elements = 256
    0.0:[0, 0, 0, 0], 0.0:[1, 0, 0, 0], 0.0:[2, 0, 0, 0], 0.0:[3, 0, 0, 0],
    0.0:[0, 1, 0, 0], 0.0:[1, 1, 0, 0], 0.0:[2, 1, 0, 0], 0.0:[3, 1, 0, 0],
    0.0:[0, 2, 0, 0], 0.0:[1, 2, 0, 0], 0.0:[2, 2, 0, 0], 0.0:[3, 2, 0, 0],
    0.0:[0, 3, 0, 0], 0.0:[1, 3, 0, 0], 0.0:[2, 3, 0, 0], 0.0:[3, 3, 0, 0],
    0.0:[0, 0, 1, 0], 0.0:[1, 0, 1, 0], 0.0:[2, 0, 1, 0], 0.0:[3, 0, 1, 0],
    ….
    0.0:[0, 2, 3, 3], 0.0:[1, 2, 3, 3], 0.0:[2, 2, 3, 3], 0.0:[3, 2, 3, 3],
    0.0:[0, 3, 3, 3], 0.0:[1, 3, 3, 3], 0.0:[2, 3, 3, 3], 0.0:[3, 3, 3, 3]
  • The only issue that I currently have is that ArrayLists are indexed by int, so the total size is 32k elements. That should be good enough for now, but it will need to be fixed.
  • set() and get() work nicely:
    lt.set(new int[]{0, 1, 0, 0}, 9.9);
    lt.set(new int[]{3, 3, 3, 3}, 3.3);
    
    System.out.println("[0, 1, 0, 0] = " + lt.get(new int[]{0, 1, 0, 0}));
    System.out.println("[3, 3, 3, 3] = " + lt.get(new int[]{3, 3, 3, 3}));
    
    [0, 1, 0, 0] = 9.9
    [3, 3, 3, 3] = 3.3
  • Started the indexFromLocation method, but this is too sloppy:
    index[i] = (int)Math.floor(Math.round(loc[i]/mappingStep));

Phil 8.16.17

7:00 – 8:00 Research

  • Added takeaway thoughts to my C&C writeup.
  • Working out how to add capability to the sim for P&RCH paper. My thoughts from vacation:
    • The agents contribution is the heading and speed
    • The UI is what the agent’s can ‘see’
    • The IR is what is available to be seen
    • An additional part might be to add the ability to store data in the space. Then the behavior of the IR (e.g. empty areas) would b more apparent, as would the effects of UI (only certain data is visible, or maybe only nearby data is visible) Data could be a vector field in Hilbert space, and visualized as color.
  • Updated IntelliJ
  • Working out how to to have a voxel space for the agents to move through that can also be drawn. It’s any number of dimensions, but it has to project to 2D. In the case of the agents, I just choose the first two axis. Each agent has an array of statements that are assembled into a belief vector. The space can be an array of beliefs. Are these just constructed so that they fill a space according to a set of rules? Then the xDimensionName and yDimensionName axis would go from (0, 1), which would scale to stage size? IR would still be a matter of comparing the space to the agent’s vector. Hmm.
  • This looks really good from an information horizon perspective: The Role of the Information Environment in Partisan Voting
    • Voters are often highly dependent on partisanship to structure their preferences toward political candidates and policy proposals. What conditions enable partisan cues to “dominate” public opinion? Here I theorize that variation in voters’ reliance on partisanship results, in part, from the opportunities their environment provides to learn about politics. A conjoint experiment and an observational study of voting in congressional elections both support the expectation that more detailed information environments reduce the role of partisanship in candidate choice

9:00 – 5:00 BRI

  • Good lord, the BoA corporate card comes with SIX seperate documents to read.
  • Onward to Chapter Three and Spring database interaction
  • Well that’s pretty clean. I do like the JdbcTemplate behaviors. Not sure I like the way you specify the values passed to the query, but I can’t think of anything better if you have more than one argument:
    @Repository
    public class EmployeeDaoImpl implements EmployeeDao {
        @Autowired
        private DataSource dataSource;
    
        @Autowired
        private JdbcTemplate jdbcTemplate;
    
        private RowMapper<Employee> employeeRowMapper = new RowMapper<Employee>() {
            @Override
            public Employee mapRow(ResultSet rs, int i) throws SQLException {
                Employee employee = new EmployeeImpl();
                employee.setEmployeeAge(rs.getInt("Age"));
                employee.setEmployeeId(rs.getInt("ID"));
                employee.setEmployeeName(rs.getString("FirstName") + " " + rs.getString("LastName"));
                return employee;
            }
        };
    
        @Override
        public Employee getEmployeeById(int id) {
            Employee employee = null;
    
            employee = jdbcTemplate.queryForObject(
                    "select * from Employee where id = ?",
                    new Object[]{id},
                    employeeRowMapper
            );
            return employee;
        }
    
        public List<Employee> getAllEmployees() {
            List<Employee> eList = jdbcTemplate.query(
                    "select * from Employee",
                    employeeRowMapper
            );
            return eList;
        }
    }
  • Here’s the xml to wire the thing up:
    <context:component-scan base-package="org.springframework.chapter3.dao"/>
    <bean id="employeeDao" class="org.springframework.chapter3.dao.EmployeeDaoImpl"/>
    
    <bean id="dataSource"
          class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName" value="${jdbc.driverClassName}" />
        <property name="url" value="${jdbc.url}" />
        <property name="username" value="xxx"/>
        <property name="password" value="yyy"/>
    </bean>
    
    <bean id="jdbcTemplate" class="org.springframework.jdbc.core.JdbcTemplate">
        <property name="dataSource" ref="dataSource" />
    </bean>
    
    <context:property-placeholder location="jdbc.properties" />
  • And here’s the properties. Note that I had to disable SSL:
    jdbc.driverClassName=com.mysql.jdbc.Driver
    jdbc.url=jdbc:mysql://localhost:3306/sandbox?autoReconnect=true&useSSL=false

Phil 4.25.16

5:30 – 4:00 VTX

  • Saw this on Twitter about visualizing networks with D3
  • Working my way through the JavaFX tutorial. It is a lot like a blend of Flex and a rethought Swing. Nice, actually…
  • Here is the list of stock components
  • Starting with the ope file dialog – done.
  • Yep, there’s a spinner. And here’s dials and knobs
  • And here’s how to do a word cloud.
  • Here’s a TF-IDF implementation in JAVA. Need to build some code that reads in from our ‘negative match’ ‘positive match’ results and start to get some data driven terms
  • Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for “tree regular expressions”). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, calledsemgrex).
  • Semgrex
  • Sprint review
    • Google CSEs
      • Switched over from my personal CSEs to Vistronix CSEs
      • Added VCS rep for CSEs
      • Figured out how to save out and load CSE from XML
      • Added a few more CSEs ONLY_NET, MOBY_DICK
      • Wrote up care and feeding document for Confluence
      • Added blacklists
    • Rating App
      • Re-rigged the JPA classes to be Ontology-agnostic Version 2 of nearly everything)
      • Upped my JQL game to handle SELECT IN WHERE precompiled queries
      • Reading in VA and PA data now
      • Added the creation of a text JSON object that formalizes the rating of a flag
      • Got hooked up to the Talend DB!!!
      • Deployed initial version(s)
      • Added backlink logging using SemRush
    • Future work
      • Developed Excel ingest
      • Still working on PDF and Word ingest

Phil 2.11.16

6:00 – 4:00 VTX

  • Continuing Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Need to see if I can get this on Monday: Rethinking Journalism: trust and participation in a transformed news landscape. Got the kindle book.
  • Need to add a menubar to the Gui app that has a ‘data’ and ‘queries’ tab. Data runs the data generation code. Queries has a list of questions that clears the output and then sends the results to the text area.
  • Still need to move the db to a server. Just realized that it could be a MySql db on Dreamhost too. Having trouble with that. It might be the eclipse jar? Here’s the hibernate jar location in maven:
    <groupId>org.hibernate.javax.persistence</groupId>
    <artifactId>hibernate-jpa-2.0-api</artifactId>
    <version>1.0.1.Final</version>
  • Gave up on connecting to Dreamhost. I think it’s a permissions thing. Asked Heath to look into creating a stable DB somewhere. He needs to talk to Damien.
  • Webhose.io – direct access to live & structured data from millions of sources.
  • Search by date: https://support.google.com/news/answer/3334?hl=en
    • Google news search that produces Json for the last 24 hours:
      ?q=malpractice&safe=off&hl=en&gl=us&authuser=0&tbm=nws&source=lnt&tbs=qdr:d
  • Played around with a bunch of queries, but in the end, I figured that it was better to write the whole works out in a .csv file and do pivot tables in Excel.
  • Adding the ability to read a config file to set the search engines, lables, etc for generation.

Data Architecture Meeting 2.11.15

Testing what we have

  • Relevance score
  • Pertinence score
  • Charts for management

Vinny

  • Terminology
  • gov
  • Bias towards trustworthy unstructured sources.
  • What about getting structured data.

Aaron

  • Isolate V1 capability
  • Metrics!
  • We need the structured data!!

Matt

  • Dsds

Scott

  • Questions about unstructured query

Phil 12.2.15

7:00 –

  • Learning: Neural Nets, Back Propagation
    • Synaptic weights are higher for some synapses than others
    • Cumulative stimulus
    • All-or-none threshold for propagation.
    • Once we have a model, we can ask what we can do with it.
    • Now I’m curious about the MIT approach to calculus. It’s online too: MIT 18.01 Single Variable Calculus
    • Back-propagation algorithm. Starts from the end and works forward so that each new calculation depends only on its local information plus values that have already been calculated.
    • Overfitting and under/over damping issues are also considerations.
  • Scrum meeting
  • Remember to bring a keyboard tomorrow!!!!
  • Checking that my home dev code is the same as what I pulled down from the repository
    • No change in definitelytyped
    • No change in the other files either, so those were real bugs. Don’t know why they didn’t get caught. But that means the repo is good and the bugs are fixed.
  • Validate that PHP runs and debugs in the new dev env. Done
  • Add a new test that inputs large (thousands -> millions) of unique ENTITY entries with small-ish star networks of partially shared URL entries. Time view retrieval times for SELECT COUNT(*) from tn_view_network_items WHERE network_id = 8;
    • Computer: 2008 Dell Precision M6300
    • System: Processor Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz, 2201 Mhz, 2 Core(s), 2 Logical Processor(s), Available Physical Memory 611 MB
    • 100 is 0.09 sec
    • 1000 is 0.14 sec
    • 10,000 is 0.84 sec
    • Using Open Office’s linear regression function, I get the equation t = 0.00007657x + 0.733 with an R squared of 0.99948.
    • That means 1,000,000 view entries can be processed in 75 seconds or so as long as things don’t get IO bound
  • Got the PHP interpreter and debugger working. In this case, it was just refreshing in settings->languages->php

Phil 11.26.15

7:00 – Leave

  • Constraints: Visual Object Recognition
    • to see if to signals match, a maximising function that integrates the area under the signal with respect to offsets (translation and rotation) is very good, even with noise.
  • Dictionary
    • Add ‘Help Choose Doctor’, ‘Help Choose Investments’, ‘Help Choose Healthcare Plan’, ‘Navigate News’ and ‘Help Find CHI Paper’ dictionaries. At this point they can be empty. We’ll talk about them in the paper.
    • Added ‘archive’ to dictionary, because we’ll need temporary dicts associated with users like networks.
    • Deploy new system. Done!
      • Reloaded the DB
      • Copied over the server code
      • Ran the simpleTests() for AlchemyDictText. That adds network[5] with tests against the words that are in my manual resume dictionary. Then network[2] is added with no dictionary.
      • Commented out simpleTests for AlchemyDictText
      • copied over all the new client code
      • Ran the client and verified that all the networks and dictionaries were there as they were supposed to be.
      • Loaded network[2] ‘Using extracted dict’
      • Selected the empty dictionary[2] ‘Phil’s extracted resume dict’
      • Ran Extract from Network, which is faster on Dreamhost! That populated the dictionary.
      • Deleted the entry for ‘3’
      • Ran Attach to Network. Also fast 🙂
  • And now time for ThanksGiving. On a really good note!

AllWorking

Phil 11.25.15

7:00 – 1:00 Leave

  • Constraints: Search, Domain Reduction
    • Order from most constrained to least.
    • For a constrained problem, check over and under allocations to see where the gap between fast failure and fast completion lie.
    • Only recurse through neighbors where domain (choices) have been reduced to 1.
  • Dictionary
    • Add an optional ‘source_text’ field to the tn_dictionaries table so that user added words can be compared to the text. Done. There is the issue that the dictionary could be used against a different corpus, at which point this would be little more than a creation artifact
    • Add a ‘source_count’ to the tn_dictionary_entries table that is shown in the directive. Defaults to zero? Done. Same issue as above, when compared to a new corpus, do we recompute the counts?
    • Wire up Attach Dictionary to Network
      • Working on AlchemyDictReflect that will place keywords in the tn_items table and connect them in the tn_associations table.
      • Had to add a few helper methods in networkDbIo.php to handle the modifying of the network tables, since alchemyNLPbase doesn’t extend baseBdIo. Not the cleanest thing I’ve ever done, but not *horrible*.
      • Done and working! Need to deploy.

Phil 11.10.15

7:00 – 3:00 SR

  • Brought in java code for VizTool and gave to Al
  • More training
  • Working on building the flex libraries and projects.

Phil 11.9.15

7:00 – 3:00 SR

  • Training
  • Got all the Java files built and burned to disk the main problem that I had was getting a Tomcat runtime instance showing up. Here was the fix: http://stackoverflow.com/questions/2000078/apache-tomcat-not-showing-in-eclipse-server-runtime-environments

Phil 11.4.15

7:00 – 3:30 SR

  • And we have more confusion on what’s happening. Still going through the process of bringing everything inside.
  • Helped Al a bit on how money moves around in the system
  • Set up Al in the integration scripting system.
  • Long discussions about requirements

Phil 11.3.15

7:00 – 5:30 SR

  • I’ve decided I don’t like heading home in the dark, so I’m going to stay on daylight savings time. Or at least give it a shot. 4:45 am looks very early on my clock….
  • Getting the SW documentation over to Al.
    • For some reason the System diagrams weren’t in the SVN repo. Fixed that and sent a zip file over to Bill.
  • Status report!
  • Meeting with Al and Lenny about future work. I literally have no idea if we should just set everything up for maintenance or build a new NLP-based search engine for financial questions. Hopefully Lenny can get some answers.
  • Meeting at Infotek. Al is now the lead. I am to package everything up for deployment. Future work will be on some other vehicle.
  • Trying to get FB 4.7 running, but it’s hanging on the launch screen while thrashing the CPU. Fortunately it was just a test. Pulling down everything from the new repository to build on FB 4.6. Verifying that FB 4.6 should work
  • Setting up a subversion server

11.2.15

8:00 – 5:00 SR

  • Al came on board today. Showed him around the system, discovering that the scripting system wasn’t working on the production server. Fixed that, and downloaded a copy of the documentation for him to look at. Also gave him accounts on the integration server for him to poke around.
  • Fixed the Reqonciler bug. Had to insert the modified query directly into the reqonciler table to get around odd quote-escaping issues.
  • Updated Friday’s work from the repo. Updated the database and ran the term extraction and dictionary tests.
  • Working on dictionary access methods.
    • Got AddEntry and Remove Entry working. Also removed the tn_dictionary table and stuck the dictionary_id in the tn_dictionary_entries table.
    • Added cascade entry/modification of parent if it doesn’t exist. Otherwise the indices won’t work.

Phil 10.29.15

8:00 – 4:30 SR

  • Sent Dong screenshots of the issue. He’s checking queries and code now.
  • Added simpleTests($dbObj) to each class in AlchemyNLP
  • Added ‘skill’ ‘capability’ and  ‘task’ as parents in the dictionary
  • Add flyout directive to create and assign dictionaries and entries.
  • Set the dictionary to zero in the networkDbIo.addNetwork()  PHP code and add the dict_id to the typescript interface. Done
  • Make sure that an association between a keyword and another item is always from the keyword. Otherwise PageRank won’t calculate correctly. Done.
  • Chain up the dictionary and add parent keywords to the network (parents point to children). That way, for example, all ‘skills’ can be elevated, while all ‘tasks’ can be suppressed. Done
  • Changed keywords to be ‘editable’ so they have adjustable link weights. It does make the keywords in the network editable as well. May need to just add a slider to ITEMS of certain types. Still need to think about this…
  • Next step is to buy and download the fivefilters term extractor and see how to integrate?

Phil 10.28.15

8:00 – 5:00 SR

  • Walked through the FA bug with Dong on the phone. Took some screenshots that I will send over tonight.
  • Add a DictionaryText class that uses a passed-in tag list to determine what items to create associations to. Low edit-distance matches get added to the item. Possibly the keyword list can be hierarchical?
  • Add a tn_dictionary table with fields for word, type (optional), description (optional), server_code (optional), parent (optional), and user_id. Multiple users can have different versions of the same word. When a new word is entered, the content of the network is rescanned and items that contain the keyword link to it. We will need to know which definition is being used in the network, since it will point to the master item. – Done, except for the last part
    • The server_code field would include scripts/regexes or something similar that could do special text scanning. This would require the use of eval, for example. In the db, but not used.
  • So now, when an external query is made, only items from the result that contain words in the dictionary will be added to the network. Done and working in the DB and PHP!Done and working in the DB and PHP!
  • There should also be a ‘resubmit’ button that looks for new material while running the stored queries. TODO
  • It’s possible to use NLP, particularly five filter’s, to create a strawman dictionary as a starting point. TODO
  • Meeting with Dr. Pan
    • There are different contexts that a keyword dictionary needs to be aware of. Resumes have skills, tasks and achievements. Scientific papers have contributions and methods, financial data has budget centers, companies, clients, invoices, etc.
    • Phrases add specificity, single words can be very noisy.