- Upgrading the Firmware on a Tulip
- Learning Code Through the Advent of Code Challenge
- Common Loki Misconfigurations
- Iterating Through a List in Ink
- Debugging Misconfigured Container Networks
- Minimum Viable EC2 in Terraform
- Storylets in Ink
- Interactive Fiction Tooling Overview
- In-Place Resizing for Digitalocean Droplets
- Unity Demonstrates the Importance of FOSS
- Target Labels in Prometheus
- My View of AI is the Same
- Verify DNS Ownership with TXT Records
- Sane Droplet Defaults
- Editing Made Easy with Vim
- Gatsby Gotchas
- Concatinating Default AWS Tags in Terraform
- Easily Updating the Default Github Branch
- Lifetimes in Rust
- Checking for Bad Links
- Maybe TypeScript and React is Bad
- Static Asset Management in React
- Bundler Down Time
- Using React Context for Localization
- JS Implementation of a Sticky Footer
- Custom Aliases
- Trying Out the 7drl Challenge
- Trash Opinions
- Building Your First Program in Rust
- Fixing mongod reports errors related to opening a socket
- Improving Open Source Maintenance
- Technical Interviewing Tips
- Housekeeping Note
- Dynamic Programming Basics
- The Oddity of Naming Conventions in Programming Languages
- An Experiment Using Machine Learning, Part 3
- Debugging with grep
- An Experiment Using Machine Learning, Part 2
- An Experiment Using Machine Learning, Part 1
- The Value of while
- National Day of Civic Hacking
- OpenAI and the Future of Humanity
- Creating a Whiteboard App in Django
- Creating Meaningful, Organized Information
- Towards A Critique of Social Media Feeds
- Setting up Routes in Django
- Developing a Messaging Component for Code for SF
- Dream Stream 2.0
- Keyed Collections in Javascript: Maps and Sets
- Blog Soft Relaunch
- Scraping with Puppeteer
- Looking Ahead to Dream Stream 2.0
- Solving West of Loathing's Soupstock Lode Puzzle
- Installing Ubuntu
- Interview with David Jickling Evaluation
- Compare Text Evaluation
- Dream Stream Evaluation
An Experiment Using Machine Learning, Part 2
My initial research into studies involving Shakespeare and machine learning didn’t turn up a ton of results, but luckily all the results I did find provided useful frameworks for how to think about my problem. In my last blog post I was saying how it would be great if there were already existing studies looking into the Bacon hypothesis. I did not turn up any results. This is not surprising. The Bacon hypothesis was quite popular in the 19th century, however it has since lost steam, and almost no Shakespearean scholars subscribe to it. There is a similar, and more recent hypothesis speculating that the author of Shakespeare’s plays is Edward de Vere. This hypothesis has not had the same popular success as the Bacon hypothesis once had, which is not surprising since I can’t imagine most people have even heard of Edward de Vere. I certainly hadn’t until I came across the theory. At any rate, I was unable to find any prior research into this inquiry, bad news for making this experiment easier to conduct, good news for people that would prefer people generally not be persuaded by theories that don’t stand up to the scrutiny of Occam’s Razor.
The first piece of research I came across was not machine learning related, but extremely relevant to the project: a blog post about text mining the complete works of Shakespeare. If we want to avoid the problem of garbage in, garbage out, we need to clean up the text to make sure it only contains characters that are relevant data points. This post gives a good idea of how to do that.
The next piece of research I came across was extremely interesting. An article summarizing the research described an analysis of functional words like “and”, “or”, “the”, “to”, etc. to form a word adjacency network that creates an author’s fingerprint. The theory here is that by analyzing words that everyone has to use to construct sentences, looking at the differences in how different authors deploy these words generates a more “objective measure of ‘style’”. Their findings suggest that Christopher Marlowe may have written certain parts of Henry VI. Unfortunately I’ve only been able to find an article summarizing the research, I haven’t yet found the research itself, but this is definitely something I’ll be looking into.
The other piece of research I came across is a university paper that used machine learning to guess the gender of Shakespeare’s characters with “reasonably good classification accuracy”. There are some assumptions baked into this study that I find questionable, but I’m not here to conduct a gender analysis of speech patterns, so this is less important to me. Of greater value is the author’s clarity in laying out how the experiment is conducted, and the analysis of the different types of algorithms used to tackle the problem, a naive Bayes model and support vector machine model. A universal truth is different algorithms do a better job at solving different types of problems. My initial experiment will initially use a k nearest neighbor cluster algorithm to analyze different sets of textual data. I decided I wanted to start with this algorithm just because it is relatively straightforward to implement. As I get a better sense of the data I am working with I will start to consider what other algorithms I can use to better analyze the problem.