Overview
Today I’m going to skim over a recent passion project of mine that involves yoyo tricks. I have been throwing around yoyos since I was a teenager. Proof: Me, junior year, in the high school talent show (I’m no better today)
The purpose of this post is to provide a high-level overview of an end-to-end analysis in R, starting with unstructured data from YoYoTricks.com and ending with an interactive graph.
The motivation here was to obtain/create a catalog all yoyo tricks and how they relate to one another. This could be useful in determining the most foundational yoyo tricks, the most complex yoyo tricks, or to even in creating a recommendation engine to give users a selection of new tricks to learn based on what they know.
Scrape
So, the first step in the process was scraping this index to obtain a list of all yoyo trick names and the link to each trick tutorial. For this, I used R (rvest
for HTML parsing, dplyr
for transformations) and SelectorGadget to help define what to CSS classes to collect.
The code below will scrape the site map to obtain a data frame containing the trick name (title
) and it’s corresponding link.
read_html('https://yoyotricks.com/sitemap/') %>%
html_nodes('.cat-item-1 a') %>%
html_attrs() %>%
lapply(function(row) {
data.frame(title=row['title'], link=row['href'], stringsAsFactors = FALSE)
}) %>%
bind_rows() %>%
filter(grepl(x=link, pattern='https://yoyotricks.com/yoyo-tricks/.+')) %>%
head(5) %>% kable()
title | link |
---|---|
Looping Introduction | https://yoyotricks.com/yoyo-tricks/looping-introduction/139/ |
String Tension Introduction - Fixing Twisted String | https://yoyotricks.com/yoyo-tricks/string-tension-introduction-fixing-twisted-string/334244/ |
Choose and Setup the Perfect 2A Yoyo | https://yoyotricks.com/yoyo-tricks/choose-and-setup-the-perfect-2a-yoyo/333161/ |
Swipe Double Green Triangle | https://yoyotricks.com/yoyo-tricks/swipe-double-green-triangle/333034/ |
SOH-CAH-TOA Chopsticks | https://yoyotricks.com/yoyo-tricks/soh-cah-toa-chopsticks/333231/ |
Enrich
From here, I iterated over each trick page, scraping for enrichment data, such as trick category, tags, and (most importantly) references to other tricks.
id | title | author | category | tags | link |
---|---|---|---|---|---|
1 | Vanish Grind | Adam B. | Yoyo Tricks;Yoyo String Tricks (1A) | Green-Triangle;grind | https://yoyotricks.com/yoyo-tricks/vanish-grind/233896/ |
2 | Daniel Day-Lewis | Adam B. | Yoyo Tricks;Long String Tricks | https://yoyotricks.com/yoyo-tricks/daniel-day-lewis/232973/ | |
3 | Bouncy Castle | Jake E. | Yoyo Tricks;Yoyo String Tricks (1A) | Hop;Green-Triangle | https://yoyotricks.com/yoyo-tricks/bouncy-castle/228525/ |
4 | Monochrome | Cory H. | Yoyo Tricks;Yoyo String Tricks (1A) | Slack-String | https://yoyotricks.com/yoyo-tricks/monochrome/224191/ |
5 | Wax On, Wax Off | Jake E. | Yoyo Tricks;Yoyo String Tricks (1A) | Repeater;chopsticks | https://yoyotricks.com/yoyo-tricks/wax-on-wax-off/224190/ |
With an enriched trick catalog in-hand, I did some R-fu to produce a data frame of references between tricks.
In the table below, each number corresponds to a trick in the catalog table. The column weight
indicates how many times the from
trick references the to
trick.
from | to | weight |
---|---|---|
1 | 294 | 1 |
2 | 90 | 2 |
2 | 235 | 2 |
2 | 281 | 2 |
2 | 335 | 3 |
3 | 4 | 2 |
Visualize & Explore
We now have everything needed to create a network graph visualization:
- nodes (trick catalog)
- edges (trick references)
Whoa, that’s messy! Regardless, underlying graph structure is useful for a variety of analyses.
Let’s see which tricks are referenced the most:
Next, let’s a few tricks that are the most similar, based soley on referenced tricks in common.
Note that the similarity measure is ignoring many important factors, such as trick style, category, etc, but this is sufficient for this exercise. Also, to simplify things, only complex tricks are considered (tricks which reference more than 5 other tricks).
At a glance, the tricks “Gravity Whip” and “Independent Tangler” have high similarity, as indicated by the width of the edge connecting them.
A glance at the sub-graph of these two similar shows they do indeed share many references to other fundamental tricks:
Just how similar are they? One measure, the jaccard similarity coefficient, is obtained by dividing the number of tricks in the intersection by the number of tricks in the union of the two trick reference sets.
## [1] "Similarity score: 0.555555555555556"
## [1] "55.56% of referenced tricks in common"
Conclusion
If you’re interested in further data exploration, I encourage you to spin up a kernel on Kaggle with the YoYoTricks dataset:
https://www.kaggle.com/dm3ll3n/yoyotricks
If you discover something interesting, let me know!
Until next time,
Donald