7:00 – 4:00 VTX
- Laplace Transforms 2 – Laplace Transforms 6
- While cleaning up my post from yesterday, I discovered GloVe, another item from the stanfordnlp group. “GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space“. Could be good, but it’s written in C (and I mean straight struct and function C) so it would have to be translated to be used. Still, could be useful for a more sophisticated dictionary. Each entry would simply have to store its coordinates. or a pointer to the trained data.
- The Stanford NLP JavaDoc index page
- Ok! Parsing is working (using Moby Dick again). Lemma works and so does edit distance. Now I need to think about building the entries, dictionaries, and using them to parse text.
- Wondering about using lemmas to build hierarchies in the dictionary. It could be redundant (it’s already in the NLP data). But if we want to make specialty dictionaries (Java vs. Java vs. Java), it might be needed.
- First, I really need to get familiar with the POS annotations. Then I can start to see what are the putative candidates for creating a dictionary from scratch. That essentially creates the annotated (overloaded term!) bag-of-words that is the dictionary. The dictinoary will need to be edited, so it might as well be able to be read in and written out as a JSON or XML file. Then something about synonyms leading to concepts maybe?
- Results for today:
Sentence  is: If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me. Sentence  tokens are: almost (POS:RB, Lemma:almost) men (POS:NNS, Lemma:man) degree (POS:NN, Lemma:degree) time (POS:NN, Lemma:time) other (POS:JJ, Lemma:other) cherish (POS:JJ, Lemma:cherish) very (POS:RB, Lemma:very) nearly (POS:RB, Lemma:nearly) same (POS:JJ, Lemma:same) feelings (POS:NNS, Lemma:feeling) ocean (POS:NN, Lemma:ocean) close match between 'osean' and 'ocean'