Over the past week, I decided to put use of the time I spend on reddit, and as a data enthusiast, I thought maybe defining a few projects for myself would be a good idea not to feel guilty about F5ing reddit on a constant basis.
I decided to scrape /r/CryptoCurrency + the subreddits stated in the wiki, and see their activity patterns, what keywords they use, and what is the relationship between subreddit activity and market cap. Data is from 20th to 26th February. Out of nearly 70 subs, only around 10 of them had more than 1000 comments in a span of a week, and because of noisy data I had to get rid of most of the subs.
You can see the results in here:
1. First I decided to create a wordcloud of the data I had from /r/cc. You can see it [**here**](https://imgur.com/KkRVtMr).
2. What really intrigued me was that there wasn’t much going on in the subs of some of the highest mcap coins, while some smaller coins had tons of activity. You can see their distribution [**here**](https://imgur.com/XjTffxs).
3. The keywords used in different subs(excluding /r/cc) seemed to look interesting. I created a Treemap visualizing it [**here**](https://i.imgur.com/AqKUTXW.png).
4. The overall keyword graph and their dominance is visualized **[here](https://imgur.com/3Ct7szy)**. (Also excluding /r/cc)
How I got the data? I used `psaw` to scrape reddit comments in python. I preprocessed them and removed markdown + links and tags, then fed them to my algorithms. For worldcloud there is a very good python package named `wordcloud` that I used. For keyword extraction, I used `spacy` to detect named entities in each comment, and used a bag of words model to create a co-occurance matrix.
If you had any questions, I would be glad to answer them. Thanks!