Category Archives: Java

Phil 7.22.16

7:00 – 1:00 VTX

  • More bubble modelling. Found a nice paper from a financial perspective that looks like a good source for similar models.
  • Split out the calculation and spreadsheet functions to support snapshots and debugging.
    • Set up the base class to be the control. Explorers only look outside their SD, while confirmers and avoiders stay within. Not sure how to tease out the difference between those. I think it will have something to do with the way they look for information, which is beyond the scope of this model for now. Also switched to a random distribution. Here’s an initial result. Much more work to follow

GP

  • I was riding and thinking about something I read on fivethirtyeight.comThis isn’t the most artful way to say it, but it’s like, where do you go when the only people who seem to agree with you on taxes hate black people?” It’s by Ben Howe, a redstate commentator. And it makes me think that rather than basing the sim on only one value, there should be a cluster. Confirmed could look for a match in the cluster while avoiders would clusters if they hit somethings that doesn’t match. And the distance from the value should matter. Adopting a very different concept should take more energy than a similar one. And this makes me think that the CAs have to have a bit more alife in them. They need to budget their energy with reference to their internal and external states.
  • And then mom died. Here’s the OPM web page that matters: https://www.opm.gov/retirement-services/my-annuity-and-benefits/life-events/death/report-of-death/

Phil 7.18.16

7:00 – 3:30 VTX

  • Writing and reworking Lit Review 2. After that, I need to rework the research plan so that RQs and Hs are interchanged.
  • Meeting with Ned Thursday evening?
  • Meeting with Thom second week of August.
  • If there is time today, try to add color change to the table cells to reflect rank. Failing that, add a column that shows relative motion? Both?
    • Added a Rank and Delta field. That seems to be working fine.
  • Finished lockout task
  • Starting Gateway exposes old APIs task

Phil 6.9.16

6:00 – 12:00 Writing

  • Going to go through the RQs and describe how to address them
  • Start with the back end and my local cohort, which I can assume to be diversity-seeking because of where they are.
  • Iteratively develop tool so that it gets used for diversity-related activities
  • Logs and questionairres.
  • Scraping for Google Scholar and CaseLaw? Java code is here.
  • Looks like Google Scholar has also started to add the concept of pertinence in?
  • Finished the Research Plan. Do need a timeline.
  • Finished discussion/conclusion. Done(ish)!

Phil 6.2.16

7:00 – 5:00 VTX

  • Writing
  • Write up sprint story – done
    • Develop a ‘training’ corpus known bad actors (KBA) for each domain.

      • KBAs will be pulled from http://w3.nyhealth.gov/opmc/factions.nsf, which provides a large list.
      • List of KBAs will be added to the content rating DB for human curation
      • HTML and PDF data will be used to populate a list of documents that will then be scanned and analyzed to prepare TF-IDF and LSI term-document tables.
      • The resulting table will in turn be analyzed using term centrality, with the output being an ordered list of terms to be evaluated for each domain.

  • Building view to get person, rating and link from the db – done, or at least V1
    CREATE VIEW view_ratings AS
      select io.link, qo.search_type, po.first_name, po.last_name, po.pp_state, ro.person_characterization from item_object io
        INNER JOIN query_object qo ON io.query_id = qo.id
        INNER JOIN rating_object ro on io.id = ro.result_id
        INNER JOIN poi_object po on qo.provider_id = po.id;
  • Took results from w3.nyhealth.gov and ran them through the whole system. The full results are in the Corpus file under w3.nyhealth.gov-PDF-centrality_06_02_16-13_12_09.xlsx and w3.nyhealth.gov-WEB-centrality_06_02_16-13_12_09.xlsx. The results seem to make incredibly specific searches. Here are the two first examples. Note that there are very few .com sites.:

Phil 6.1.16

7:00 – 2:00VTX

Phil 5.31.16

7:00 – 4:30 VTX

  • Writing. Working on describing how maintaining many codes in a network contains more (and more subtle) information than grouping similar codes.
  • Working on the UrlChecker
    • In the process, I discovered that the annotation.xml file is unique only for the account and not for the CSE. All CSEs for one account are contained in one annotation file
    • Created a new annotation called ALL_annotations.xml
    • fixed a few things in Andy’s file
    • Reading in everything. Now to produce the new sets of lists.
    • I think it’s just easier to delete all the lists and start over.
    • Done and verified. You run UrlChecker from the command line, with the input file being a list of domains (one per line) and the ALL_annotations.xml file.
  • https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2
  • Need to add a Delete or Hide button to reduce down a large corpus to a more effective size.
  • Added. Tomorrow I’ll wire up the deletion of a row or cilumn and the recreation of the initialMatrix

Phil 5.30.16

7:00 – 10:00 Thesis/VTX

  • Built a new matrix for the coded lit review. I had coded a couple of more papers
  • Working on copying over the read papers into a new folder that I can run text analytics over
  • After carefully reading through the doc manager list and copying over each paper, I just discovered I could have exported selected.
  • Ooops: Exception in thread “JavaFX Application Thread” java.lang.IllegalArgumentException: Invalid column index (16384).  Allowable column range for EXCEL2007 is (0..16383) or (‘A’..’XFD’)
    • Going to add a limit of
      SpreadsheetVersion.EXCEL2007.getMaxColumns()-8

      columns for now. Clearly that can be cut down.

    • Figuring out where to cut the terms. I’m summing the columns of the LSI calculation, starting at the highest value and then dividing that by the sum of all values. The top 20% of rank weights gives 280 columns. Going to try that first
    • Success! Some initial thoughts
      • The coded version is much more ‘crisp’
      • There are interesting hints in the LSI version
      • Clicking on a term or paper to see the associated items is really nice.
      • I think that document subgroups might be good/better, and it might be possible to use the tool to help build those subgroups. This goes back to the ‘hiding’ concept. (hide item / hide item and associated)

Phil 5.27.16

7:00 – 2:00 VTX

  • Wound up writing the introduction and saving the old intro to a new document – Themesurfing
  • Renamed the current document
  • Got the parser working. Old artifact settings.
  • Added some tweaks to show progress better. I’m kinda stuck with the single thread in JavaFx having to execute before text can get shown.
  • Need an XML parser to find out what sites have already been added. Added an IntelliJ project to the GoogleCseConfigFiles SVN file. Should be able to finish it on Tuesday.

Phil 5.17.16

7:00 -7:00

  • Great discussion with Greg yesterday. Very encouraging.
  • Some thoughts that came up during Fahad’s (Successful!) defense
    • It should be possible to determine the ‘deletable’ codes at the bottom of the ranking by setting the allowable difference between the initial ranking and the trimmed rank.
    • The ‘filter’ box should also be set by clicking on one of the items in the list of associations for the selected items. This way, selection is a two-step process in this context.
    • Suggesting grouping of terms based on connectivity? Maybe second degree? Allows for domain independence?
    • Using a 3D display to show the shared second, third and nth degree as different layer
    • NLP tagged words for TF-IDF to produce a more characterized matrix?
    • 50 samples per iteration, 2,000 iterations? Check! And add info to spreadsheet! Done, and it’s 1,000 iterations
  • Writing
  • Parsing Jeremy’s JSON file
    • Moving the OptionalContent and JsonLoadable over to JavaJtils2
    • Adding javax.persistence-2.1.0
    • Adding json-simple-1.1.1
    • It worked, but it’s junk. It looks like these are un-curated pages
  • Long discussion with Aaron about calculating flag rollups.

Phil 5.11.16

7:00 – 4:30 VTX

  • Continuing paper – working on the ‘motivations’ section
  • Need to set the mode to interactive after a successful load
  • Need to find out where the JSON ratings are in the medicalpractitioner db? Or just rely on Jeremy’s interface? I guess it depends on what gets blown away. But it doesn’t seem like the JSON is in the db.
  • Added a stanfordNLP package to JavaUtils
    • NLPtoken stores all the extracted information about a token (word, lemma, index, POS, etc)
    • DocumentStatistics holds token data across one or more documents
    • StringAnnotator parses strings into NLPtokens.
  • Fixed a bunch of math issues (in Excel, too), but here are the two versions;
    am = 1.969
    be = 2.523
    da = 0.984
    do = 1.892
    i = 1.761
    is = 1.130
    it = 1.130
    let = 1.130
    not = 1.380
    or = 3.523
    thfor = 1.380
    think = 1.380
    to = 1.469
    what = 1.380

    And Excel:

     da	is	 it	 let	 not	 thfor	 think	 what	 to	 i	 do	 am	 be	 or
    0.984	1.130	1.130	1.130	1.380	1.380	1.380	1.380	1.469	1.761	1.892	1.969	2.523	3.523