Category Archives: VISIBILITY

Phil 4.25.16

5:30 – 4:00 VTX

  • Saw this on Twitter about visualizing networks with D3
  • Working my way through the JavaFX tutorial. It is a lot like a blend of Flex and a rethought Swing. Nice, actually…
  • Here is the list of stock components
  • Starting with the ope file dialog – done.
  • Yep, there’s a spinner. And here’s dials and knobs
  • And here’s how to do a word cloud.
  • Here’s a TF-IDF implementation in JAVA. Need to build some code that reads in from our ‘negative match’ ‘positive match’ results and start to get some data driven terms
  • Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for “tree regular expressions”). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, calledsemgrex).
  • Semgrex
  • Sprint review
    • Google CSEs
      • Switched over from my personal CSEs to Vistronix CSEs
      • Added VCS rep for CSEs
      • Figured out how to save out and load CSE from XML
      • Added a few more CSEs ONLY_NET, MOBY_DICK
      • Wrote up care and feeding document for Confluence
      • Added blacklists
    • Rating App
      • Re-rigged the JPA classes to be Ontology-agnostic Version 2 of nearly everything)
      • Upped my JQL game to handle SELECT IN WHERE precompiled queries
      • Reading in VA and PA data now
      • Added the creation of a text JSON object that formalizes the rating of a flag
      • Got hooked up to the Talend DB!!!
      • Deployed initial version(s)
      • Added backlink logging using SemRush
    • Future work
      • Developed Excel ingest
      • Still working on PDF and Word ingest
Advertisements

Phil 2.11.16

6:00 – 4:00 VTX

  • Continuing Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Need to see if I can get this on Monday: Rethinking Journalism: trust and participation in a transformed news landscape. Got the kindle book.
  • Need to add a menubar to the Gui app that has a ‘data’ and ‘queries’ tab. Data runs the data generation code. Queries has a list of questions that clears the output and then sends the results to the text area.
  • Still need to move the db to a server. Just realized that it could be a MySql db on Dreamhost too. Having trouble with that. It might be the eclipse jar? Here’s the hibernate jar location in maven:
    <groupId>org.hibernate.javax.persistence</groupId>
    <artifactId>hibernate-jpa-2.0-api</artifactId>
    <version>1.0.1.Final</version>
  • Gave up on connecting to Dreamhost. I think it’s a permissions thing. Asked Heath to look into creating a stable DB somewhere. He needs to talk to Damien.
  • Webhose.io – direct access to live & structured data from millions of sources.
  • Search by date: https://support.google.com/news/answer/3334?hl=en
    • Google news search that produces Json for the last 24 hours:
      ?q=malpractice&safe=off&hl=en&gl=us&authuser=0&tbm=nws&source=lnt&tbs=qdr:d
  • Played around with a bunch of queries, but in the end, I figured that it was better to write the whole works out in a .csv file and do pivot tables in Excel.
  • Adding the ability to read a config file to set the search engines, lables, etc for generation.

Data Architecture Meeting 2.11.15

Testing what we have

  • Relevance score
  • Pertinence score
  • Charts for management

Vinny

  • Terminology
  • gov
  • Bias towards trustworthy unstructured sources.
  • What about getting structured data.

Aaron

  • Isolate V1 capability
  • Metrics!
  • We need the structured data!!

Matt

  • Dsds

Scott

  • Questions about unstructured query

Phil 12.2.15

7:00 –

  • Learning: Neural Nets, Back Propagation
    • Synaptic weights are higher for some synapses than others
    • Cumulative stimulus
    • All-or-none threshold for propagation.
    • Once we have a model, we can ask what we can do with it.
    • Now I’m curious about the MIT approach to calculus. It’s online too: MIT 18.01 Single Variable Calculus
    • Back-propagation algorithm. Starts from the end and works forward so that each new calculation depends only on its local information plus values that have already been calculated.
    • Overfitting and under/over damping issues are also considerations.
  • Scrum meeting
  • Remember to bring a keyboard tomorrow!!!!
  • Checking that my home dev code is the same as what I pulled down from the repository
    • No change in definitelytyped
    • No change in the other files either, so those were real bugs. Don’t know why they didn’t get caught. But that means the repo is good and the bugs are fixed.
  • Validate that PHP runs and debugs in the new dev env. Done
  • Add a new test that inputs large (thousands -> millions) of unique ENTITY entries with small-ish star networks of partially shared URL entries. Time view retrieval times for SELECT COUNT(*) from tn_view_network_items WHERE network_id = 8;
    • Computer: 2008 Dell Precision M6300
    • System: Processor Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz, 2201 Mhz, 2 Core(s), 2 Logical Processor(s), Available Physical Memory 611 MB
    • 100 is 0.09 sec
    • 1000 is 0.14 sec
    • 10,000 is 0.84 sec
    • Using Open Office’s linear regression function, I get the equation t = 0.00007657x + 0.733 with an R squared of 0.99948.
    • That means 1,000,000 view entries can be processed in 75 seconds or so as long as things don’t get IO bound
  • Got the PHP interpreter and debugger working. In this case, it was just refreshing in settings->languages->php

Phil 11.26.15

7:00 – Leave

  • Constraints: Visual Object Recognition
    • to see if to signals match, a maximising function that integrates the area under the signal with respect to offsets (translation and rotation) is very good, even with noise.
  • Dictionary
    • Add ‘Help Choose Doctor’, ‘Help Choose Investments’, ‘Help Choose Healthcare Plan’, ‘Navigate News’ and ‘Help Find CHI Paper’ dictionaries. At this point they can be empty. We’ll talk about them in the paper.
    • Added ‘archive’ to dictionary, because we’ll need temporary dicts associated with users like networks.
    • Deploy new system. Done!
      • Reloaded the DB
      • Copied over the server code
      • Ran the simpleTests() for AlchemyDictText. That adds network[5] with tests against the words that are in my manual resume dictionary. Then network[2] is added with no dictionary.
      • Commented out simpleTests for AlchemyDictText
      • copied over all the new client code
      • Ran the client and verified that all the networks and dictionaries were there as they were supposed to be.
      • Loaded network[2] ‘Using extracted dict’
      • Selected the empty dictionary[2] ‘Phil’s extracted resume dict’
      • Ran Extract from Network, which is faster on Dreamhost! That populated the dictionary.
      • Deleted the entry for ‘3’
      • Ran Attach to Network. Also fast 🙂
  • And now time for ThanksGiving. On a really good note!

AllWorking

Phil 11.25.15

7:00 – 1:00 Leave

  • Constraints: Search, Domain Reduction
    • Order from most constrained to least.
    • For a constrained problem, check over and under allocations to see where the gap between fast failure and fast completion lie.
    • Only recurse through neighbors where domain (choices) have been reduced to 1.
  • Dictionary
    • Add an optional ‘source_text’ field to the tn_dictionaries table so that user added words can be compared to the text. Done. There is the issue that the dictionary could be used against a different corpus, at which point this would be little more than a creation artifact
    • Add a ‘source_count’ to the tn_dictionary_entries table that is shown in the directive. Defaults to zero? Done. Same issue as above, when compared to a new corpus, do we recompute the counts?
    • Wire up Attach Dictionary to Network
      • Working on AlchemyDictReflect that will place keywords in the tn_items table and connect them in the tn_associations table.
      • Had to add a few helper methods in networkDbIo.php to handle the modifying of the network tables, since alchemyNLPbase doesn’t extend baseBdIo. Not the cleanest thing I’ve ever done, but not *horrible*.
      • Done and working! Need to deploy.

Phil 11.10.15

7:00 – 3:00 SR

  • Brought in java code for VizTool and gave to Al
  • More training
  • Working on building the flex libraries and projects.

Phil 11.9.15

7:00 – 3:00 SR

  • Training
  • Got all the Java files built and burned to disk the main problem that I had was getting a Tomcat runtime instance showing up. Here was the fix: http://stackoverflow.com/questions/2000078/apache-tomcat-not-showing-in-eclipse-server-runtime-environments

Phil 11.4.15

7:00 – 3:30 SR

  • And we have more confusion on what’s happening. Still going through the process of bringing everything inside.
  • Helped Al a bit on how money moves around in the system
  • Set up Al in the integration scripting system.
  • Long discussions about requirements

Phil 11.3.15

7:00 – 5:30 SR

  • I’ve decided I don’t like heading home in the dark, so I’m going to stay on daylight savings time. Or at least give it a shot. 4:45 am looks very early on my clock….
  • Getting the SW documentation over to Al.
    • For some reason the System diagrams weren’t in the SVN repo. Fixed that and sent a zip file over to Bill.
  • Status report!
  • Meeting with Al and Lenny about future work. I literally have no idea if we should just set everything up for maintenance or build a new NLP-based search engine for financial questions. Hopefully Lenny can get some answers.
  • Meeting at Infotek. Al is now the lead. I am to package everything up for deployment. Future work will be on some other vehicle.
  • Trying to get FB 4.7 running, but it’s hanging on the launch screen while thrashing the CPU. Fortunately it was just a test. Pulling down everything from the new repository to build on FB 4.6. Verifying that FB 4.6 should work
  • Setting up a subversion server

11.2.15

8:00 – 5:00 SR

  • Al came on board today. Showed him around the system, discovering that the scripting system wasn’t working on the production server. Fixed that, and downloaded a copy of the documentation for him to look at. Also gave him accounts on the integration server for him to poke around.
  • Fixed the Reqonciler bug. Had to insert the modified query directly into the reqonciler table to get around odd quote-escaping issues.
  • Updated Friday’s work from the repo. Updated the database and ran the term extraction and dictionary tests.
  • Working on dictionary access methods.
    • Got AddEntry and Remove Entry working. Also removed the tn_dictionary table and stuck the dictionary_id in the tn_dictionary_entries table.
    • Added cascade entry/modification of parent if it doesn’t exist. Otherwise the indices won’t work.

Phil 10.29.15

8:00 – 4:30 SR

  • Sent Dong screenshots of the issue. He’s checking queries and code now.
  • Added simpleTests($dbObj) to each class in AlchemyNLP
  • Added ‘skill’ ‘capability’ and  ‘task’ as parents in the dictionary
  • Add flyout directive to create and assign dictionaries and entries.
  • Set the dictionary to zero in the networkDbIo.addNetwork()  PHP code and add the dict_id to the typescript interface. Done
  • Make sure that an association between a keyword and another item is always from the keyword. Otherwise PageRank won’t calculate correctly. Done.
  • Chain up the dictionary and add parent keywords to the network (parents point to children). That way, for example, all ‘skills’ can be elevated, while all ‘tasks’ can be suppressed. Done
  • Changed keywords to be ‘editable’ so they have adjustable link weights. It does make the keywords in the network editable as well. May need to just add a slider to ITEMS of certain types. Still need to think about this…
  • Next step is to buy and download the fivefilters term extractor and see how to integrate?

Phil 10.28.15

8:00 – 5:00 SR

  • Walked through the FA bug with Dong on the phone. Took some screenshots that I will send over tonight.
  • Add a DictionaryText class that uses a passed-in tag list to determine what items to create associations to. Low edit-distance matches get added to the item. Possibly the keyword list can be hierarchical?
  • Add a tn_dictionary table with fields for word, type (optional), description (optional), server_code (optional), parent (optional), and user_id. Multiple users can have different versions of the same word. When a new word is entered, the content of the network is rescanned and items that contain the keyword link to it. We will need to know which definition is being used in the network, since it will point to the master item. – Done, except for the last part
    • The server_code field would include scripts/regexes or something similar that could do special text scanning. This would require the use of eval, for example. In the db, but not used.
  • So now, when an external query is made, only items from the result that contain words in the dictionary will be added to the network. Done and working in the DB and PHP!Done and working in the DB and PHP!
  • There should also be a ‘resubmit’ button that looks for new material while running the stored queries. TODO
  • It’s possible to use NLP, particularly five filter’s, to create a strawman dictionary as a starting point. TODO
  • Meeting with Dr. Pan
    • There are different contexts that a keyword dictionary needs to be aware of. Resumes have skills, tasks and achievements. Scientific papers have contributions and methods, financial data has budget centers, companies, clients, invoices, etc.
    • Phrases add specificity, single words can be very noisy.

Phil 10.27.15

8:00 – 5:00 SR

  • Still chasing down the Reqonciler issues.
  • Need to put together a list of software that I have installed on my dev box and send to Lenny.
  • Still working on the EXPLICIT entry. Added two manual items directly in the PHP and it’s working.
    • Also see if I can get HTML parsed and displayed – done
    • Based on that paper, I’m going to try keywords again. OK, I have keywords, but they’re really key phrases and as such, too unique? Here’s a screenshot: dump
    • There are things that I think should correlate, like web-based, status, senior and visualization. There’s also ‘analysis’ and ‘analysys’ that should be matched by edit distance. So there are some options:
      • Try a different NLP. Open Calais is free enough for what I’m doing, and does provide different parsing. FiveFilters has a PHP implementation that is 20 Euros, and actually seems to work best for what I’m looking for (items linked by keywords)?
      • Or a naive kind of tagging that does some naive keyword extraction. These could be presented as potential tags to be checked by the users?

Phil 10.16.15

8:00 – 4:00 SR

  • I have my access back!
  • More justification for dev machine
  • Updated truancy reports.
  • Add ‘read only’ to network and use it to disable buttons on the GUI. It should have a checkbox on the flyout next to private.
    • Added. Had to add a read_only field to tn_networks.
    • Next, make the various buttons that cause writes to the DB to be conditional.Done
    • Updated the public server.
    • Changed findNetworks() so $queryString .= “select * from tn_networks where user_id = :user_id or is_private = 0”;. Now I need to verify that I’ve reconciled in the Flyout Directive. Need to make a new network for that. Monday.

Phil 10.15.15

8:00 – 5:00 SR

  • Still don’t have my access back
  • But, was able to update the needed table for FR using the scripting system. So that’s pretty cool.
  • Looking to migrate over to the new servers next Friday.
  • Adding behaviors to support the changes I made to the GUI.
    • Link to wampeter
      • GUI – Done
      • DB – Done. Changed the way that linkSelected worked so that if a sourceID was passed along a wampeter was not created.
    • Update wampter value change DB (means updating association weights)– Done
    • Update Rating value change DB – Done
  • Taadaa! ItAllWorks