actuary reddit thinking
|

Actuaries & Reddit: An Analysis of How Actuaries Think

This article analyses how actuaries use Reddit, one of the more popular social media sites on the internet. It gives users, such as actuaries, the ability to create topical posts and have discussions anonymously. Moreover it is split into subreddits which are small communities based around a subject, such as actuarial science. There are two major communities for actuaries:

So what are actuaries talking about? I have scraped data from these two actuarial subreddits and performed a text analysis to find out.

The data used for this analysis was extracted using the RedditExtractoR package. I will mostly focus on UK actuaries. I have collected the top 1,000 posts within the ActuaryUK subreddit. From these 1,000 posts the 14,412 comments attached were collected to be analysed… plenty of opinions left by anonymous actuaries.

Actuaries & Reddit: Word Frequency

Firstly we look at word frequency overall. For this I have combined words to their stem using the SnowballC package. This allows similar words to be grouped together. To make the plot more intuitive, I have then converted the stem back to the most common word representing that stem. So for example, the words “actuary”, “actuaries” and “actuarial” on Reddit are all grouped by the stem “actuar” and aggregated. This stem is then represented by the most frequent of those words which turns out to be “actuarial”.

Some words were removed. These include common connecting words listed in the tidytext::stop_words dataframe. I have also removed all exam names from this piece of analysis, however I will come back to these later on.

The top 50 words by frequency are shown in the word cloud below. It is fair to say that the topic of actuarial exams dominates the conversation, with the words “actuarial exams” appearing more than actuary. We also see other words related to exams heavily represented such as pass, paper, questions, study, marks and time.

actuary reddit word frequency

Figure 1: Word cloud of most popular words used in the ActuaryUK subreddit

So far so expected. Although reddit has a broad base of users, they generally skew younger, so students are likely well represented there. And it stands to reason that students would be discussing actuarial exams a lot.

Actuaries & Reddit: Actuarial

Exam Sentiment

Given actuarial exams are a major topic of conversation, it is worth asking how that conversation is going? We can get an idea of this by performing sentiment analysis.

In this analysis I have assigned a sentiment score of positive/negative/neutral to each word using the “bing” sentiment dataset provided in tidytext package. For every comment that contains the name of an exam, the total positive/negative/neutral words are aggregated. The neutral words are discarded, and the ratio of positive/negative is analysed. For this piece I also weighted comments by their reddit score. The score is a metric which tracks the popularity of a post by the relative upvotes and downvotes. Weighting comments by their score means the general voting consensus is measured, rather than simply the most verbose redditors.

From this analysis we can get an idea of the sentiment around each of the core actuarial exams. Those who have read my previous analysis on The Decline in Actuarial Exam Pass Rates might not be surprised by the pattern observed, with CS1/CS2/CM1/CM2 having the most negative sentiment.

actuary reddit sentimental words

Figure 2: Proportion of sentimental words that are positive in comments attached to each exam

It is important to note that sentiment analysis is a blunt tool, and can often contain inaccuracies. It fails to double meanings, for example “Excel” is treated as positive and “risk” is treated as negative whereas these words are likely neutral to an actuary. It can also fail to understand context, i.e. “that wasn’t good” is rated as positive.

Negative sentiment is not necessarily a criticism of an actuarial exam. Words like “fail” and “hard” are often used to describe actuarial subjects such as CS1, and these would be classed as negative. But these words could be considered as accurate descriptors of the exam, rather than any negative feeling towards how it is currently run. That being said, the most upvoted comment within the entire r/ActuaryUK subreddit is the following critique, posted on 04/07/2023 by im-not-really_here:

“How are the IFOA not embarrassed that their servers can’t handle students logging in to check their results?”

Actuaries & Reddit:

US vs UK

Finally we consider the differences between the UK subreddit (r/ActuaryUK) and the mostly US subreddit (r/Actuary). To do this I have also extracted the comments from the top 1,000 posts on r/Actuary. This provided an extra 25,819 comments.

I performed Term Frequency Inverse Document Frequency (TF-IDF) analysis on the datasets. The details of how this works will not be described here, but essentially the method produces a score which rewards words that appear more frequently in one group, but punishes words that also appear frequently in other groups. This allows us to discern which words are particular to a certain group.

In this analysis, there are only two groups: r/ActuaryUK and r/Actuary. I have removed some words that scored high but were not very informative, including the names of exams, US spelling variants (e.g. “favorite”) and web addresses for soa.org and actuaries.org.uk.

The 15 highest TF-IDF scores are below, they essentially inform us which important topics are US-specific and UK-specific.

TF-IDF scores

Figure 3: Highest Term Frequency Inverse Document Frequency (TF-IDF) scores across two subreddits

Some words are perhaps unsurprising. Terms like “Gilt” and “SCR” are important to the work of a typical UK actuary but may be less so across the pond. Locations such as “Leeds” and “Chicago” would also make sense having a bias for one regional subreddit over another. Other words are quite baffling – not least “aquarium”. It seems that this result stems from a question previously set in the UK actuarial exams.

Acronyms/abbreviations often appear in the data. Terms like “hcol”/”mcol” refer to high/medium cost of living in the US but is not really used in the UK. One of the most used acronyms in the UK is “mcqs” which is short for multiple choice questions.

It should be noted that the TF-IDF scores for these data are quite punitive for words that appear in both subreddits. Most words in the plot below are exclusive to only one subreddit. It should also be noted that only the top 1,000 posts were analysed for each actuarial subreddit. A greater dataset could change the results above.

I also performed sentiment analysis of r/Actuary and r/ActuaryUK. Sentiment generally leaned negative across both, however there was no major difference in scores between the two actuarial subreddits.

Paul Beard, FIA is an Actuary at Phoenix Group. Prior to that he was an Actuary at Royal London. Paul graduated from University of Cambridge in 2011 with an MS degree in Natural Sciences. You can connect with him on LinkedIn.”