7:00 – 4:30 ASRC MKT
- I am going to start calling runaway echo chambers Baudrillardian Stampedes: https://en.wikipedia.org/wiki/Simulacra_and_Simulation
- GECCO 2018 paper list is full of swarming optimizers
- CORNELL NEWSROOM is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017 and use a variety of summarization strategies combining extraction and abstraction.
- More Ultimate Angular
- Template Fundamentals (interpolation – #ref)
- Now that I have my corpora, time to figure out how to build an embedding
- Installing gensim
- By now, gensim is—to my knowledge—the most robust, efficient and hassle-free piece of software to realize unsupervised semantic modelling from plain text. It stands in contrast to brittle homework-assignment-implementations that do not scale on one hand, and robust java-esque projects that take forever just to run “hello world”.
- Big install. Didn’t break TF, which is nice
- How to Develop Word Embeddings in Python with Gensim
- I need to redo the parser so that each file is one sentence.
- sentences are strings that begin with a [CR] or [SPACE] + [WORD] and end with [WORD] + [.] or [“]
- a [CR] preceded by anything other than a [.] or [“] is the middle of a sentance
- A fantastic regex tool! https://regex101.com/
regex = re.compile(r"([-!?\.]\"|[!?\.])")
- After running into odd edge cases, I decided to load each book as a single string, parse it, then write out the individual lines. Works great except the last step, where I can’t seem to iterate over an array of strings. Calling it a day