In this post I describe a simple, unbiased, method for ranking news sources according to their level of favoritism among conservative vs. liberal groups. As a proof-of-concept I provide a set of real results based on over a million behavioral data-points scraped from the social media website Reddit.
The impetus for this project was that, in the months leading-up to our most recent U.S. presidential election I found myself repeating the same four-step pattern, where (1) I’d see some headline posted on social media that made my jaw drop, followed by (2) asking what are the sources of this unbelievable claim, which lead to (3) realizing its from a news source I had little-to-no prior knowledge about, and finally (4) engaged in a laborious attempt to divine the political leaning of said news outlet.
Basically I was engaged in a behavioral pattern that bottlenecked at: where does news source X fall on the political spectrum?
How could I synthesize a dataset on where a news outlet resides on the political spectrum?
I had an idea — perhaps I could use the very source of these variegated and suspect headlines, pitted against itself. I’ll scrape headlines from two sub Reddit communities on the polar opposite ends of the spectrum
And so it was. Using Reddit API (PRAW), I wrote python script to identify the favorite news sources of two subreddit communities on opposite ends of the political aisle.
This bot scraped the url from the top daily submissions to the main pro-Trump (r/The_Donald) and anti-Trump (r/EnoughTrumpSpam) subreddit communities, essentially determining these subreddit partisan’s favorite news outlets.
Nota bene: the validity of these data as a litmus for liberal-leaning and conservative-leaning news rests on the assumption that generally people prefer to post and upvote stories that align-with and support their personal worldview.
Figure 1: The most liberal and conservative news sources in the US, according to reddit.
Figure-1: histogram displaying the distribution and source of the most linked and discussed news articles on the front-page of two large politically active and highly partisan subreddit communities. Note here that ‘to be on the front page‘ means a story was among the top 25 most upvoted and discussed items by each subreddit community on a given day.
Upper Panel: conservative subreddit The_Donald
Lower Panel: liberal subreddit EnoughTrumpSpam
It should also be noted that The_Donald subreddit community is so against the “liberal news establishment” it has a policy that any articles from a known left-wing news org must be submitted via archive.is , which is why “archive” is the 2nd highest source found on The_Donald (it represents all the liberal sources rolled into a single site; which actually makes the results even cleaner). Apparently The_Donald has this policy to prevent liberal news sources from benefitting (in terms of ad revenue) from The_Donald’s massive user-base clicking a link that sends them directly to the liberal news organization’s website.
I cross posted this project on Reddit’s Data is Beautiful, where it blew-up. And that is where Google data scientist Felipe Hoffa saw my post and took it to the next level!
Using data studio he expanded my original idea to all of reddit, and made it interactive. It’s something really worth playing around with for a few minutes.
Seriously, check this out!
Also you can read Filipe’s writeup about the project on Medium
Summary Discussion (and naturally, the musings)
Social media () was exposing me to an endless array of news outlets; some of which I’d never heard of before.
Political news is almost always spun in the same direction by a news organization, in order to cater to the prevailing party demographics of their readership (whether mostly-liberal or mostly-conservative).
This is fine, I think, but only insofar as these political leanings are common public knowledge. For example it’s widely known that Fox News leans conservative while MSNBC leans liberal; so a headline that states: “Breaking: Hillary Clinton Broke Major Laws & Should be in Prison” can range in significance from essentially no-significance to extreme-significance depending on whether it’s found on the front page of MSNBC or Fox News.
What if, however, you saw the headline was breaking from the Chicago Herald ? Without an unbiased and convenient way to determine where a news organization falls on the political spectrum, the public is put in a situation where propaganda and fake news can run rampant: the stories that align with the reader’s biases will be taken as truth, while stories that don’t align will send readers scurrying back to their political safe-spaces.
You might be saying – well they do this anyway, whether the news is from a source with known bias or from a completely unknown source. But I’d respectfully disagree; my hypothesis is that as the public becomes less-and-less aware of the political leanings of the players involved in reporting the news; a climate of distrust is engendered. When the biases are laid bare, it engenders a climate of debate.
Think about how it plays out in a courtroom… a jury is presented information from two parties, both of which are heavily biased in a particular direction. However, since the jury is expressly aware of these biases, the debate plays out in a civil manner and allows a jury to formulate informed / logical conclusions. Could you imagine the zoo it would be if trials were conducted such that all the attorneys were back-stage with a bunch of expert witnesses, and the trial was conducted by sending out each person, one-by-one, to make a 10 minute appeal to the jury, but the jury wasn’t allowed to know whether the person speaking was an expert witness or an attorney for the plaintiff, or an attorney defendant. They just had to take everything said at face value. It would be madness.
And that’s about all I have to say about that 🙃 😉
Anyway, you can grab my code for scraping the reddit data from this github gist (with the caveat that it was hacked-together in a few minutes of spare time; so take it as a draft with lots of room for improvement ;)