Phil 6.2.16

7:00 – 5:00 VTX

  • Writing
  • Write up sprint story – done
    • Develop a ‘training’ corpus known bad actors (KBA) for each domain.

      • KBAs will be pulled from, which provides a large list.
      • List of KBAs will be added to the content rating DB for human curation
      • HTML and PDF data will be used to populate a list of documents that will then be scanned and analyzed to prepare TF-IDF and LSI term-document tables.
      • The resulting table will in turn be analyzed using term centrality, with the output being an ordered list of terms to be evaluated for each domain.

  • Building view to get person, rating and link from the db – done, or at least V1
    CREATE VIEW view_ratings AS
      select, qo.search_type, po.first_name, po.last_name, po.pp_state, ro.person_characterization from item_object io
        INNER JOIN query_object qo ON io.query_id =
        INNER JOIN rating_object ro on = ro.result_id
        INNER JOIN poi_object po on qo.provider_id =;
  • Took results from and ran them through the whole system. The full results are in the Corpus file under and The results seem to make incredibly specific searches. Here are the two first examples. Note that there are very few .com sites.:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s