in

[OC] My personal text mining project in reddit cryptosphere + results.

Hi.

Over the past week, I decided to put use of the time I spend on reddit, and as a data enthusiast, I thought maybe defining a few projects for myself would be a good idea not to feel guilty about F5ing reddit on a constant basis.

I decided to scrape /r/CryptoCurrency + the subreddits stated in the wiki, and see their activity patterns, what keywords they use, and what is the relationship between subreddit activity and market cap. Data is from 20th to 26th February. Out of nearly 70 subs, only around 10 of them had more than 1000 comments in a span of a week, and because of noisy data I had to get rid of most of the subs.

You can see the results in here:

1. First I decided to create a wordcloud of the data I had from /r/cc. You can see it [**here**](https://imgur.com/KkRVtMr).

2. What really intrigued me was that there wasn’t much going on in the subs of some of the highest mcap coins, while some smaller coins had tons of activity. You can see their distribution [**here**](https://imgur.com/XjTffxs).

3. The keywords used in different subs(excluding /r/cc) seemed to look interesting. I created a Treemap visualizing it [**here**](https://i.imgur.com/AqKUTXW.png).

4. The overall keyword graph and their dominance is visualized **[here](https://imgur.com/3Ct7szy)**. (Also excluding /r/cc)

——————————–

How I got the data? I used `psaw` to scrape reddit comments in python. I preprocessed them and removed markdown + links and tags, then fed them to my algorithms. For worldcloud there is a very good python package named `wordcloud` that I used. For keyword extraction, I used `spacy` to detect named entities in each comment, and used a bag of words model to create a co-occurance matrix.

If you had any questions, I would be glad to answer them. Thanks!



View Reddit by quit_daedalusView Source

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

7 Comments

  1. just my 2 cents: you should always put everything in lowercase before you do a wordcloud or any kind of frequency analysis, this way you have (e.g.) CoinGecko, Coingecko and coingecko as separate words

  2. Thats so interesting. Have you thought about doing any further analysis to try and find a correlation between comments and market cap. Or how comments relate to price rise/falls?

    I have an assumption that yolo’ing into a coin that’s trending on /cryptocurrency will more often that not be a bad decision at that point in time. I wonder if a strategy of buying the least mentioned coin of the top 20 in any given week results in better profits than buying the most mentioned one?

    Anyway, just wanted to say thanks for doing some awesome work that has got me thinking 🙂

Loading…

0

What do you think?

At the mall my daughter pointed and said, “Hey dad, wanna buy more Bitcoin?” Are these legit?

Bitcoin vs. Ethereum: Which Is a Better Buy?