Monthly Archives: October 2015

Phil 10.30.15

8:00 – 4:00 SR

  • Working from home today, waiting for people to show up.
  • Here’s the fix for the Reqonciler issue:
    • Open Reqonciler in your browser.
    • click Post-Processing button to see all queries
    • double click the one that you disabled this morning to edit, Order 2100, update month 1 year 2 to 100% from month 12 year 1
    • add ” AND NOT ISNULL(bc.uid)” at the end of the query without the double quotes. Make sure there is a space before.
    • Save, run, and check the data
  • In the process of getting my home dev environment working again. I swear I should just do this once a week so it’s less stressful.
    • Fixed the Imagick load so that there is a test for the extension and whether the extension is installed correctly.
    • Disabled the world wide web service so that apache could run on port 80
    • Updates all the files in the Apache htdocs directory. Forgot that I had updated the server access methods to take an object.
    • It occurs to me that I can load up the DB directly on the server if I don’t get everything done with the dictionary by Wednesday.
  • Examine AlchemyNLP and see if there is a hierarchy that can be used. Not without a lot of work.
  • Buy and download the fivefilters term extractor and see how to integrate.
    • Ordered. Waiting for confirmation to show up.
    • Installed. Time to see if it’ll work. It looks good, though possibly slow? starting to put together a dictionary class to examine more deeply.
  • Add dictionary Flyout directive
    • Name the dictionary
    • Choose the networks (add/remove from list) ()
    • Input html, text or url
    • Get the clean text and show the machine extracted terms. We could look up potential definitions too – from wordnik. Set up an account and applied for a developer key.
    • Show a list of selected terms with checkboxes
      • Checked items can be deleted or grouped
      • Items can be added by typing into a field
    • Show a list of ‘group items’.  This displays a list of the items who’s index appears in the ‘parent’ field
      • Selecting an item in this list reorders the item list to show the appropriate group first
  • There should also be a select dictionary option on the network flyout
Advertisements

Phil 10.29.15

8:00 – 4:30 SR

  • Sent Dong screenshots of the issue. He’s checking queries and code now.
  • Added simpleTests($dbObj) to each class in AlchemyNLP
  • Added ‘skill’ ‘capability’ and  ‘task’ as parents in the dictionary
  • Add flyout directive to create and assign dictionaries and entries.
  • Set the dictionary to zero in the networkDbIo.addNetwork()  PHP code and add the dict_id to the typescript interface. Done
  • Make sure that an association between a keyword and another item is always from the keyword. Otherwise PageRank won’t calculate correctly. Done.
  • Chain up the dictionary and add parent keywords to the network (parents point to children). That way, for example, all ‘skills’ can be elevated, while all ‘tasks’ can be suppressed. Done
  • Changed keywords to be ‘editable’ so they have adjustable link weights. It does make the keywords in the network editable as well. May need to just add a slider to ITEMS of certain types. Still need to think about this…
  • Next step is to buy and download the fivefilters term extractor and see how to integrate?

Phil 10.28.15

8:00 – 5:00 SR

  • Walked through the FA bug with Dong on the phone. Took some screenshots that I will send over tonight.
  • Add a DictionaryText class that uses a passed-in tag list to determine what items to create associations to. Low edit-distance matches get added to the item. Possibly the keyword list can be hierarchical?
  • Add a tn_dictionary table with fields for word, type (optional), description (optional), server_code (optional), parent (optional), and user_id. Multiple users can have different versions of the same word. When a new word is entered, the content of the network is rescanned and items that contain the keyword link to it. We will need to know which definition is being used in the network, since it will point to the master item. – Done, except for the last part
    • The server_code field would include scripts/regexes or something similar that could do special text scanning. This would require the use of eval, for example. In the db, but not used.
  • So now, when an external query is made, only items from the result that contain words in the dictionary will be added to the network. Done and working in the DB and PHP!Done and working in the DB and PHP!
  • There should also be a ‘resubmit’ button that looks for new material while running the stored queries. TODO
  • It’s possible to use NLP, particularly five filter’s, to create a strawman dictionary as a starting point. TODO
  • Meeting with Dr. Pan
    • There are different contexts that a keyword dictionary needs to be aware of. Resumes have skills, tasks and achievements. Scientific papers have contributions and methods, financial data has budget centers, companies, clients, invoices, etc.
    • Phrases add specificity, single words can be very noisy.

Phil 10.27.15

8:00 – 5:00 SR

  • Still chasing down the Reqonciler issues.
  • Need to put together a list of software that I have installed on my dev box and send to Lenny.
  • Still working on the EXPLICIT entry. Added two manual items directly in the PHP and it’s working.
    • Also see if I can get HTML parsed and displayed – done
    • Based on that paper, I’m going to try keywords again. OK, I have keywords, but they’re really key phrases and as such, too unique? Here’s a screenshot: dump
    • There are things that I think should correlate, like web-based, status, senior and visualization. There’s also ‘analysis’ and ‘analysys’ that should be matched by edit distance. So there are some options:
      • Try a different NLP. Open Calais is free enough for what I’m doing, and does provide different parsing. FiveFilters has a PHP implementation that is 20 Euros, and actually seems to work best for what I’m looking for (items linked by keywords)?
      • Or a naive kind of tagging that does some naive keyword extraction. These could be presented as potential tags to be checked by the users?

Phil 10.26.15

8:00 – 10:00, 12:00 – 3:00 SR

  • Query Builder is not capturing all of the COGNOS data correctly. Specifically for FY15 the second year data is not being added into the queries and for FY14 the 3rd year of data is not being added into the queries. FY16 which is in year 1 is working correctly.
  • I’ve checked the tables in the database and it looks like everything is going in correctly (e.g. the cognos_obligations table has timestamps from today). So I don’t know if there is a problem with ingest or with the FA queries.
  • Working on getting all the pieces working in the EXPLICIT item case. I have an issue where the ‘query’ field is used for a lot of things – I’m definitely going to have to clean that up. Probably the best thing to do is to align the database and the GUI better.
    • When getting the text for the item to show as HTML, I got an angular ‘sanitizer unable to parse’ error that I couldn’t figure out. After trying
      sceProvider.trustAsResourceUrl
    • in the directive without success, I went back (slowly) through the html. It turned out to be the difference between this:
      $htmlStr .= '<img src="assets/'.$this->rssImage.'" height="80"></a></td>';

      and this:

      $htmlStr .= '<img src="assets/'.$this->rssImage.'"height="80"></a></td>';

      Didja see it? It’s the space before “height=80”. Chrome handled it when I was working it out, but angular chokes.

Phil 10.23.15

8:00 – 4:00 SR

  • Things look good for the switchover
  • Getting the PHP right for items from text
    • Fixed the author parsing
    • Fixed image save
    • Fixed guid generation for manual add. Alchemy likes my resume. I wonder what makes that more parse-able?
    • Will need to generate some default values for the items used to generate the guid.
  • Add some new tn_types for tn_items so we know what the source is? Ether that or add a ‘source’ field….
    • ANLP_
    • USER_
    • GNEWS_
    • etc
  • 3:00 switching production servers.
    • Looks like everything works.

Phil 10.22.15

Taking the day off to enjoy the weather.

Phil 10-21-2015

8:00 – 6:00 SR

  • Well, we have hoverboards, after a fashion.
  • Discussed the new approach in some detail with Lenny. It turns out that the current system has around 140 tables and about 15 views spread across 5 databases. So getting it down to one DB with 5 tables makes some sense.
  • Adding direct text interaction to the AlchempNLP code – Looking for a (good?) abstract
    • So here’s kind of an interesting thing. Google has my citation history, and uses that to make recommendations. But since my previous work is surgical simulator heavy, I’m not getting results that are useful to me. I want to be able to adjust weights! I think I’ll call this ‘the self driving car problem’ 🙂
    • Refactoring alchemyNLP pull into a base class and then an URL version and a text version
  • Meeting on campus with Wayne, Shimei and Victor.
    • Since by next week, something should be working, Wayne would like to use the system to try and find mentors. Sheimei suggests using the dblp as a source. It’s structured and can be downloaded
    • I realized that items should(?) have their own(?) networks of items that are attached at a system level. For example, when AlchemyNLP parses a web page, it produces concepts, entities, author etc. Rather than re-running the analysis each time, the system should look to see if there is already an <itemGuid_Alchemy> network produced the first time the item was ingested. Then those items can be attached to the user network as needed.
    • Talked with Victor about collaborating somehow. We also talked about how to present a list of search results in a ‘curated’ context. It could be a sharded display where each item listed has a slider/add/discard control, and hovering over the item shows it’s relationship to the curated collection on the right hand side of the page. Choosing the context could be in a dropdown at the top that is guessed at by context, but can be changed by the user.

Phil 10.20.15

8:00 – 4:00 SR

  • Need to change the QueryHistory list to Visible Ranks. Done
  • Google custom search, and pricing
  • Started on ‘manual’ item. Need to add an IDENTITY association so that items that are connected will still show up in the dataObject query. Added to tn_types. Will need to work on the queries;
  • Added most of the Javascript guts to send the manual item to the PHP. Need to do that side next. And don’t forget to add an IDENTITY link on the PHP or the item won’t come back.
    • Any time a link is added, delete IDENTITY links if the IDENTITY item is the source or target..?
  • Need to make it so that all associations have adjustable weights. It probably should be based on the item that is the source, but maybe independent ‘advanced’ manipulation should be possible?

Phil 10.19.15

8:00 – 12:00 SR

  • Add  the check in the flyout directive that looks at the user_id and read_only coming back. If the user_id is not the current user_id from the session object, force the network to be read_only regardless of its listing in the db
    • Populated network 3 for user_id = 3. Done, and verified on the server.
  • Need to change the QueryHistory list to Visible Ranks.
  • Try running a pdf through alchemy. It returns a ‘non-html’ error. However, I can parse the pdf in PHP and then send the text off to Alchemy for analysis
    • I think though, that the way to do this is to add a ‘manual’ item. This would add the following fields(?):
      • Text_content – cut and paste of the text that matters
      • Authors (comma separated authors – add validation and parsing)
      • The rest of the items could be the RSS2.0 spec, which makes sense anyway.
      • This would require a new button and a new directive. On ‘save, the text_content gets sent to alchemy for the creation of the keyword(?) network. The other items (title, author, etc) get added to the network explicitly.
      • This does mean that when when an item is added to the network, that there are other ‘items’ lie author that should be attached automatically.