Jump to content
Banner by ~ Wizard

Answering the Question: Is the Show Deteriorating? J. Poniver. 2016: 1(5)


Taialin

Recommended Posts

Taialin. Answering the Question: Is the Show Deteriorating? A Descriptive and Analytical Statistical Summary of MLPForum's Polls Concerning Six Seasons of My Little Pony: Friendship Is Magic. Journal of Poniverse 2016: 1(5)

 

ANSWERING THE QUESTION: IS THE SHOW DETERIORATING?

A Descriptive and Analytical Statistical Summary of MLPForum's Polls Concerning Six Seasons of Episodes of My Little Pony: Friendship Is Magic

 

Taialin

 

DISCLAIMER

This post models scientific literature in structure but is not, in any way, a scientific piece of literature (for obvious reasons). This piece was written in an informal manner and partially for comedic effect. I am a scientist, but I was not trying to be very rigorous in this study. Pastel ponies brimming with magic generally don't respond to scientific rigor very well; just ask Sunset Shimmer. Reader's discretion is advised.

 

PS. If there actually will be a Journal of Poniverse in the future though, I want to be in it! Oh, and if you're wondering why this isn't the first issue of the journal, well . . . this isn't my first "study" on pony and fandom matters, either. If you're curious, have a look around.

 

ABSTRACT

 

This being a piece of "scientific" literature, I'm obligated to put an abstract at the top. It's basically a summary of the research I've done, what I've found from it, and why it's significant. Basically, it's a "tl;dr." That being said, it also omits a good amount of other information and research I did over the past six months or so (yes, six months). If I do say so myself, it's also less fun to read. Available below the break.

 

 

In order to determine whether MLP is actually deteriorating or not, vote data was gathered on episodes of each season of My Little Pony: Friendship Is Magic from polls posted on MLPForums. Rating and Polarization Factor metrics were derived from these data. It was determined that while season and voting behavior may not be independent, there is no reason to believe that consensus opinion on episodes differs between seasons, though the integrity of data gathered may be biased. The data gathered in this study provide impetus for future semi-objective analyses of episode consensus.

 

 

 

 

INTRODUCTION

I don't know about you, but around about June 2016, I was getting pretty tired of all those "The show is jumping the shark" posts and "The show is going downhill" threads concerning Season 6 [1, 2, 3, 4, 5, 6, 14]. Even our own Shadow the Hedgehog didn't appear to be a big fan [6]. Of course, there are also a handful of posts and threads that defend Season 6 [6, 7]. I've heard from some friends as well that MLP should have ended at Season 6 because it's already "jumped the shark." At this point, I haven't heard quite so much about it, but even so, what with all the vitriol and vitriol-defense being thrown around, it's hard to come to a definitive conclusion regarding whether Season 6 has actually gotten worse compared to previous seasons. I think it's an interesting question to answer.

 

There is some evidence pointing to the fact that Season 6 is an inferior season. M.A. Larson, a writer since Season 1 and the writer of the venerable "Amending Fences," was not involved in writing any episodes from Season 6 [10], and Amy Keating Rogers, another veteran writer and one whose episodes I enjoyed very much, left in mid-2015 [11]. Whether that amounts to an "inferior" season is up to you, but the change in writer lineup [12] certainly amounts to Season 6 being "different" insofar that different writers were making the scripts. There are also numerous plot-related elements that may point to Season 6 being inferior [13], but those elements are almost entirely based on subjectivity and personal taste and will not be discussed here.

 

The fact is, everyone's personal tastes and personal preferences for episodes and seasons will differ (and if you need a citation for this, you need to get out more). Whether you think the show is deteriorating is not what this paper seeks to answer, as that's almost entirely a subjective measure. The question I seek to answer is this: among bronies, what is the general consensus regarding Season 6, and how does this consensus compare to past seasons/episodes?

 

This may appear to be an impossible question to answer . . . and honestly, you'd be completely right. Naturally, taking a simple random sample of all bronies is not feasible. But MLPForums, being home to some sizeable number of bronies, lends some resources that makes answering this question easier. Namely, the fact that the episode discussions concerning each episode also play home to a poll of general satisfaction of that episode. To be frank, the question I have set out to answer (that one in the title) will not be the question I'm actually answering. Rather, I can answer this one: considering data gathered from MLPF polls concerning each episode, collectively, is Season 6 significantly more poorly rated as compared any past seasons, and on a grander scale, what are the differences between seasons and episodes? . . . Yeah, it's a more unwieldly question, but it's the more accurate one. Let's get started.

 

METHODS

Within the My Little Pony: Friendship is Magic section of MLPF, the Show Discussion section includes one thread concerning each episode, typically posted near the time the episode is released. On each thread as well, there is a poll for the purpose of gauging public response, for which results are publicly accessible. Data from the polls were generally gathered over a few days in two periods: June and November 2016. Poll responses for each episode were recorded and adapted to a five point Likert scale [15], with labels "Emphatic Like," "Like," "Ambivalent," "Dislike," and "Emphatic Dislike." Data were imputed where missing.

 

Data were organized and recorded into Microsoft Excel. Two descriptive metrics were then derived from this data. "Rating" was computed to be a measure of general satisfaction with the relevant episode, with values ranging from -100% to 100%. "Polarization Factor" was computed to be a measure of amount of disagreement of opinion with the relevant episodes, with values ranging from 0% to 100%. Descriptive metrics were averaged by season. "Rating" and "Polarization Factor" were computed according to the below formulae:

untuhu.png

Here's where I drop a load of stats on you; don't worry if you don't understand everything in it. For analytical statistics, I used SAS 9.04, a statistics coding language and program. I repurposed the data gathered into comma-separated values that SAS could understand. From there, an Analysis of Variance (ANOVA) statistical test with independent variable "Season" and dependent variable "Rating," plus an analysis for homogeneity of variance. Based on those results, a Welch's ANOVA was conducted. A 5% alpha level was used for all tests. To take into account individual votes at each level, a chi-square test was also conducted.

 

RESULTS

A depiction of the Rating and Polarization Factor of each episode is shown below, grouped into seasons, along with a graph of Rating. The Average Rating and Polarization Factor for each season is also noted.

Sketc2h.png

Figure 1. Rating and Polarization Factor per episode by season. Episode numbers appear across the top row under its corresponding season. Rating and Polarization Factor in each cell are color-coded such that green items represent relatively high ratings or low polarization, red items represent relatively low ratings or high polarization, and white items represent medial ratings and polarization.

rchx.png

Figure 2. Boxplots of the Rating by season.

 

The results for the ANOVA run on the vote data is shown below as well. The results showed that there was a marginally significant difference between group means (F(5, 137)=2.48; p=0.035). The homogeneity of variance, test, however, also concluded that there was sufficient evidence to conclude that homogeneity of variance was violated (F(5, 137)=2.78, p=0.020). This indicates that ANOVA results may not be trusted and may yield spurious results. In response, a Welch's ANOVA was run, which is nonparametric and does not require variance to be equal between categories [16]. The results of this test showed that insignificance, indicating that there is not sufficient evidence to reject the notion that there is a difference between seasons in terms of rating (F(5, 55.86)=1.96; p=0.099).

p3utit.png

Figure 3. One-way ANOVA of rating by season. Rating was the dependent variable and Season was the independent variable.

uxeub.png

Figure 4. Levene's test for homogeneity of variance of rating by season. Rating was the dependent variable and Season was the independent variable.

wel.png

Figure 5. Welch's ANOVA of rating by season. Rating was the dependent variable and Season was the independent variable.

 

A chi-square test of independence was conducted on the data. Results were significant, indicating that there is reason to believe that Season and Rating may not be independent variables (χ²(20, N=22148)=357.606; p<0.001).

image.png

image.png

Figure 6. Chi-square test of independence. Table of opinion votes crossed against season is shown first, followed by summary statistics on calculated chi-square values.

 

DISCUSSION

Caveats

So, before I actually begin discussing the results proper, I need to address the elephant in the room: the integrity of the data. Because by the nature of this data and the way it was gathered, there's quite a few problems with the data that may hamper my ability to draw valid conclusions. Unfortunately, by the time I'm done with this, you may wonder why I bothered to do a study in the first place.

 

First, there is a critical liaison between consensus opinion of an episode and the data I gathered: the poll and whether it was an accurate metric of consensus opinion. In one way, it is, given that the possible responses to the polls were generally kept constant, which allowed for consistent data (with one exception, which I'll get into). What I cannot guarantee is that popular opinion towards the poll itself and voting behavior did not change through all six seasons. Unfortunately, I have reason to believe that this may be the case. The number of total votes between Seasons 1–3 was typically under 100, but vote count in Seasons 4–6 were much greater. This indicates to me that either the response rate increased between Seasons 3 and 4 (generating a response bias) or population sampled changed. As a result, cross-comparing between Seasons 1–3 and Seasons 4–6 may not be valid. Analyses were done on all seasons for completion, but I advise you to interpret pairwise differences between those two macrogroups described with a grain of salt.

 

Subsequent potential sources of error I'll note pose a lesser to risk to validity, but I'll mention them for completion. Given that I collected all the data in this study at the same time, polls for older episodes had a longer period of time to accrue votes, which may skew results. While I cannot discount the notion, I don't consider this effect to be that significant. The polls were designed to glimpse general reaction to an episode immediately after it aired, and after a poll is up, after a short period of time, response rate drops off precipitously. I conducted a sensitivity analysis on this (in layman's terms, a let's-see-how-the-data-screwed-up analysis), observing how votes changed between June 2016 and November 2016. I observed no significant difference, so I have no reason to believe that votes changed significantly between the time the poll was posted and the time I gathered data.

 

There's a small risk for recall bias, as prior to Season 2 Episode 3, all threads were posted retroactive to the episode airing. That is, current policy of posting a reaction thread immediately after an episode airing was not in place at the time. Expected, given the fandom wasn't that big at the time. But it means that all people who voted in those polls did so some time after the episode aired. Even so, the data between Seasons 1 and 2 are similar enough and the sample sizes are similar enough that I don't believe that bias occurred.

 

Finally, Season 4 Episode 1 through Season 4 Episode 9 saw a small change in the poll structure: the "Ambivalent" option was removed. Given that people will generally respond differently if they're not given an "Ambivalent" option as compared to when they are, there is reason to believe these data were collected differently. But I conducted a sensitivity analysis on this data as well and found that only "Polarization Factor" appeared to be affected. I then imputed data for those nine episodes only so that their data was comparable to the rest of the episodes.

 

Huff. Excuses over. It's unfortunate that I have to mention all this and provide all these excuses on the data, but it underscores the nature of data collection; it's problematic, however we may wish it to be otherwise. And this is, unfortunately, the only data set I believe that I have access to. Anyhow, on to the numbers!

 

Analysis of Variance

On cursory inspection, one may be led to believe that Season 3 is far more poorly rated than any other season, and Season 6 is marginally more poorly rated (Figure 1). But that would be cheating. In science, we don't "guess" at whether something "looks" different; we evaluate whether it honestly is via statistics or if the results we've found are better attributed to dumb luck [16]. Specifically, if the data is such that there's less than a 5% chance that what we've seen could be attributed to luck, there's reason to believe that luck isn't the culprit behind the difference. Much of statistics falls into answering that question, and indeed, all the fancy names I listed above attempt to answer that question. We'll address them in order.

 

The ANOVA conducted attempted to analyze whether the variance between seasons is so great as to conclude that at least one season's rating differs significantly from at least one other. Straight ANOVA suggested this might be so (p<0.05), but this is misleading (Figure 3). ANOVA is a parametric test that depends on a few assumptions, specifically, (1) the data are normal, and (2) the variance between groups are equal [16], otherwise, results can't be trusted. While I didn't evaluate all of the assumptions, the second was determined to be violated by Levene's test of homogeneity of variance (Figure 4).

 

When that assumption is violated, it's best to conduct a non-parametric test, or one that doesn't depend on those two assumptions. Welch's ANOVA is one such test. And what do you know, it found that there was a 10% chance that the differences between seasons in terms of rating could be due to chance (Figure 5). That's small, but not small enough. In pretentious language, this result means that there is not sufficient evidence to conclude that any group differs significantly from each other, and thus we would fail to reject the hypothesis that they are equal. In layman's terms, it means based on this test, it'd be best if we say that all ratings between all seasons are the same.

 

Chi-square Test of Independence

Something about the above test bothered me, though. While it did use most of the data I gathered, it didn't really take into account voting differences between episodes of seasons. That is, more Likes than Emphatic Likes, etc. To alleviate that, I ran another test, this one the chi-square test of independence. It's a non-parametric test (remember those?) that evaluates whether two categorical variables (in this case, season and opinion ratings, ranging from Emphatic Dislike to Emphatic Like) are independent and do not influence each other. And by the results of this test, it was found that they do influence each other (Figure 6). Conclusively.

 

The problem with this test that I don't like about it either that it may be too granular for this application. Consider that two episodes may have the same rating and same general consensus, but there may exist small differences in individual votes (for instance, more Ambivalent votes in one and less in the other). While the general Rating consensus for both episodes would be identical, the chi-square test would nevertheless indicate that the two weren't independent, based on those individual vote distribution differences. It seems to me that that's not the question I'm trying to answer.

 

Conclusion

So while it's possible to say that season and voting opinions are not independent, there's no reason to believe that season has any influence on rating, at least at this point. And here's the thing. Even if I did find a significant difference, let me ask you this question: who cares? I mean, this conclusion is on past data, and you already watched the seasons and episodes, didn't you? Did you enjoy it? If you did, keep watching. If you didn't, don’t. It's as simple as that. As I said before, it's impossible for me to answer the question of whether you would enjoy the episode or season or if you should continue giving MLP a chance. That's up to you to decide.

 

On my part, I found that watching other people's analyses of each episode didn't really enhance my viewing experience. On the contrary, it diminished it. If I hadn't watched the episode beforehand and people said it was bad, that would influence my own disposition. And if I had watched the episode, enjoyed it, but people still said it was bad, that would still influence my own disposition. As in, because other people didn't enjoy the episode, I should have reason to dig into it more heavily and identify those flawed parts. And that is exactly what happened every single time.

 

That's one of the reasons why I stopped watching analysis videos. That's also one of the reasons you probably won't see me on this forum very much; I'm averse to letting other's opinions color my own. That being said, this study is a sort of 500-foot view from the whole matter. While some episodes are certainly more poorly rated than others, there are always those who enjoy it, even enjoy it emphatically. All I can say is to keep the 500-foot view in mind, but enjoy each episode on its own terms.

 

Future Studies

While I found all this data on rating and season and whatnot, there's a lot more in this set that remains to be observed. Is there any difference between those episodes where songs appear and songs don't? Is there any difference between Amy Keating Rogers' episodes and other writers'? Is there any difference between episodes where Fluttershy is given a central position and others?

 

I've only just scratched the surface of what could be analyzed and given the science treatment. That's only a small selection of the questions you could ask with this data, and this is only a small amount of the data that could be collected on the brony fandom as a whole. I didn't do any qualitative research, see. And I didn't collect data on anything else besides votes. If you want to use this data for some other purpose, get in contact with me. Or if you have ideas for other studies, get in contact with me. Let this not be the last paper in the Journal of Poniverse!

 

WORKS CITED

[1] meduni. Does MLP deserve to be cancelled right now/abruptly/etc. for being imperfect? (MLPForums, 2016).  https://mlpforums.com/topic/156148-does-mlp-deserve-to-be-cancelled-right-nowabruptlyetc-for-being-imperfect/

[2] Gamer_KM. Did Hasbro really throw plot out the window? (MLPForums, 2016). https://mlpforums.com/topic/156553-did-hasbro-really-throw-plot-out-of-the-window/

[3] VG_Addict. Has the show jumped the shark? (MLPForums, 2016). https://mlpforums.com/topic/155424-has-the-show-jumped-the-shark/

[4] onlyone. Has My Little Pony been nothing but a joke all this time? (MLPForums, 2016). https://mlpforums.com/topic/153789-has-my-little-pony-been-nothing-but-a-joke-all-this-time/

[5] Calpain. Synopsis for Episode 16 Revealed - 28 Pranks Later. (Equestria Daily, 2016). http://www.equestriadaily.com/2016/07/synopsis-for-episode-16-revealed.html

[6] TheAnimatorOfficial. How would you rate the show in general? (MLPForums, 2016). https://mlpforums.com/topic/151067-how-would-you-rate-the-show-in-general/

[7] OmegaBeamOfficial. What's With All The Season 6 Hate? (MLPForums, 2016). https://mlpforums.com/topic/155866-whats-with-all-the-season-6-hate/

[8] Loganberry. The My Little Pony Movie is coming! (In two years' time…). (Louder Yay, 2015). http://louderyay.blogspot.co.uk/2015/11/the-my-little-pony-movie-is-coming-in.html

[9] Loganberry. Thoughts on the mid-season hiatus. (Louder Yay, 2016). http://louderyay.blogspot.co.uk/2016/05/thoughts-on-mid-season-hiatus.html

[10] Loganberry. No more M.A. Larson to blame! (Louder Yay, 2016). http://louderyay.blogspot.co.uk/2016/03/no-more-m-larson-to-blame.html

[11] Sethisto. Amy Keating Rogers Becomes Full Time Disney Writer, Leaves My Little Pony For the Near Future. (Equestria Daily, 2015). http://www.equestriadaily.com/2015/04/amy-keating-rogers-becomes-full-time.html

[12] Loganberry. Episode review: S6E07: "Newbie Dash". (Louder Yay, 2016). http://louderyay.blogspot.co.uk/2016/05/episode-review-s6e07-newbie-dash.html

[13] Loganberry. S6 coming this spring! Oh, and about that royal foal.... (Louder Yay, 2016). http://louderyay.blogspot.co.uk/2016/01/s6-coming-this-spring-oh-and-about-that.html

[14] Rainbow Dash. Why do people call a season bad before it's over? (MLPForums, 2016). https://mlpforums.com/topic/153231-why-do-people-call-a-season-bad-before-its-over/

[15] Likert, R. A Technique for the Measurement of Attitudes. (Archives of Psychology 140, 1–55, 1932).

[16] McDonald, JH. Handbook of Biological Statistics. (Sparky House Publishing, Baltimore, Maryland, 157-64, 2014.) http://www.biostathandbook.com/kruskalwallis.html

  • Brohoof 7
Link to comment
Share on other sites

I haven't read this yet, but you deserve a lot of credit for this type of approach. Too bad IPB doesn't permit easy MLA/APA/Turabian/Journal formatting. I'll delve into the details in a bit and ruminate. 

  • Brohoof 2

 

 

Link to comment
Share on other sites

Very interesting article. I commend you for the hard work and dedication. I work in the field of clinical research, so I was able to understand most of the statistical tests and jargon  :P

 

The figures and results you were able to collect were very fascinating. I am curious to see average ratings for all the writers based on their episodes though. You can create a new figure comparing writer averages from each season and see what the overall rating is for that particular season. I would love to see the age old question answered of, "Who is the best writer on the show"?

 

Otherwise, keep up the great work. I look forward to more articles from you in the future  :proud:.

  • Brohoof 1

img-10675-1-post-8308-0-41922500-1435321

#Squadgoals  :ph34r::P:lol::smug::rarity:;):sunny:        

"But that day...The day I discovered racing...I proved that the legends were true. I made the impossible happen!"Dash

"Friendship isn't always easy, but there's no doubt it's worth fighting for."Twilight Sparkle 

Link to comment
Share on other sites

As I said on Discord, Twilight would LOVE you!

 

And I do my best to not have others' opinions of the episodes / seasons influence my own; that's why I, too, avoid analysis shows, as the hosts inevitably burn themselves out. Enjoy the show for yourself!

 

But awesome job on the charts and graphs!

  • Brohoof 1
Link to comment
Share on other sites

 

 

489124__safe_pinkie+pie_animated_tree_to

 

 

 

Sir, I am pretty sure your IQ is several multiples of 10 above mine. It's too complicated for me to understand, but so much work went into this! I'm impressed by your passion as well as your scientific capabilities. :D

  • Brohoof 1

This isn't pretty but it's what I am tonight.

Link to comment
Share on other sites

I have a Sociology BA but this takes quantitative studies to a whole new level beyond what I learned. To be honest though I always preferred qualitative methods and Anova was one of the things I struggled with. Thanks for proving what can be accomplished with a longitudinal quantitative study though. I hope your future as a social researcher is bright.

  • Brohoof 2
Link to comment
Share on other sites

@: That would be what the abstract is for, darling.  ;) Below the spoiler tags.
 
@@AlbaTross: I find myself much more at home pushing numbers around and spouting words like Mantel-Haenszel stratification than running a focus group, as a matter of fact. Could never get my head around qualitative research, honestly. :3

 

@@Chuckles4lyfe: Those analyses that you mention are exactly the ones that'd be trivial to do with just a touch more data on writers. Plus an ANOVA or multiregression on the writers. I'm burned from too much number-ing to do that at the moment, but that is definitely a question that I could answer. Whether I would want to or not, that's a different story. I'll be honest, I can't really think of much good that could come out of objectively proving that one writer is better than another. It's an easy way to start arguments, that's for sure.

Link to comment
Share on other sites

You definitely did your homework when it came to answering the question. If this were an assignment at a college or university, no doubt they would give you an A+ for coming up with such intricate detail about where MLP stands.


rainbowfalls_sig.png.9f23ec82e216af1315704914cd3052b1.png

Link to comment
Share on other sites

@:

 

@@AlbaTross: I find myself much more at home pushing numbers around and spouting words like Mantel-Haenszel stratification than running a focus group, as a matter of fact. Could never get my head around qualitative research, honestly. :3

Well, that's exactly why mixed methods studies often incorporate personnel from both sides of the social science spectrum. That way individuals can stick to their area of expertise and help provide a more well-rounded study. Heck, even qualitative and quantitative are too broad to adequately describe where one truly shines. Some qualitative researchers love conducting interviews while others such as myself prefer surreptitious methods, or at least I did as a university student. I don't actually think my cutie mark is social research related, but my degree sure has opened a lot of doors.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Join the herd!

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...