2019-nCOV Data Visualizations

FeaturedEuropean Coronavirus

Here I will be adding data visualizations (both found and original) depicting facets of the 2019/2020 coronavirus pandemic.

Let’s start with one found on https://coronavirus.1point3acres.com showing how the cumulative number of confirmed COVID19 cases in each state has evolved over time.

This shows pretty clearly the situation in New York is dire, but I wouldn’t be surprised if California starts to close the gap.

Here is a similar graph, for each country

Here is a data viz I made using ERIS ArcGIS…

The best covid19 dashboard I’ve found belongs to the group at Johns Hopkins University…

One way we might be able to predict which U.S. counties will be disproportionately affected by the coronavirus pandemic is whether there are enough hospitals (specifically hospital beds) in those regions to accomodate the number of people expected to become infected with COVID-19.

Using data compiled from the CDC, Johns Hopkins University, and the NYT databases…

… I’ve generated a map of the location of U.S. hospitals. Each dot is a hospital. The size of the dot is proportional to the number of beds each hospital contains. Each dot is colored according to how many beds there are per 1000 people in the county. Such a map might help reveal areas with a low number of hospital beds per service population.

(full MATLAB code tutorial here)

Here is another plot similar to the one above. In this one…

  • Dot size == total number of hospital beds in a given county
  • Dot color == log(Cases) in that county

Plotting the location of every U.S. hospital using MATLAB Mapping Toolbox

FeaturedU.S. Hospital Locations

Given the current COVID-19 pandemic, one major question is will we have enough medical resources to handle those who need treatment. In order to make predictions about the nCOV sequelae, one key piece of information is the number of U.S. hospitals, and where they are located.

I’ve put a copy of this dataset here: https://bit.ly/UShospitals

After importing the data the MATLAB, we simply need to pass the latitude and longitude data into the mapping toolbox. We will use a method that draw circles of a given size, centered at each lat/lon provided.

clat = 39.08320; 
clon = -94.57713;

wm = webmap('World Street Map');

LATS = nan(height(HOSP)*2,1);
LONS = nan(height(HOSP)*2,1);

LATS(1:2:end) = HOSP.LAT;
LONS(1:2:end) = HOSP.LON;

RADIUS = repmat(.01,size(LATS,1),1); 

[LA,LO] = scircle1(LATS,LONS,RADIUS);

wmpolygon(wm,LA,LO,'EdgeColor','none','FaceColor','r','FaceAlpha',.3, 'Autofit', false)
Hospitals in the U.S.

This looks pretty good from this level of zoom, but you will find out upon zooming in that map overlays that draw polygons that (for good reason) don’t resize when you zoom in and out…

You can however plot locations using pins, which will resize based on the current zoom level. Here is an example of that code:

clat = 39.08320; 
clon = -94.57713;

wm = webmap('World Street Map');

wmmarker(wm, HOSP.LAT, HOSP.LON, 'Autofit', false);
for i = 1:2:17
Pins indicate the location of U.S. hospitals

What are the most liberal-leaning and conservative-leaning news sources?


In this post I describe a simple, unbiased, method for ranking news sources according to their level of favoritism among conservative vs. liberal groups. As a proof-of-concept I provide a set of real results based on over a million behavioral data-points scraped from the social media website Reddit.

The impetus for this project was that, in the months leading-up to our most recent U.S. presidential election I found myself repeating the same four-step pattern, where (1) I’d see some headline posted on social media that made my jaw drop, followed by (2) asking what are the sources of this unbelievable claim, which lead to (3) realizing its from a news source I had little-to-no prior knowledge about, and finally (4) engaged in a laborious attempt to divine the political leaning of said news outlet.

Basically I was engaged in a behavioral pattern that bottlenecked at:  where does news source X fall on the political spectrum

How could I synthesize a dataset on where a news outlet resides on the political spectrum?

I had an idea — perhaps I could use the very source of these variegated and suspect headlines, pitted against itself. I’ll scrape headlines from two sub Reddit communities on the polar opposite ends of the spectrum

And so it was. Using Reddit API (PRAW), I wrote python script to identify the favorite news sources of two subreddit communities on opposite ends of the political aisle.

This bot scraped the url from the top daily submissions to the main pro-Trump (r/The_Donald) and anti-Trump (r/EnoughTrumpSpam) subreddit communities, essentially determining these subreddit partisan’s favorite news outlets.

Nota bene: the validity of these data as a litmus for liberal-leaning and conservative-leaning news rests on the assumption that generally people prefer to post and upvote stories that align-with and support their personal worldview.

Figure 1: The most liberal and conservative news sources in the US, according to reddit.

Figure-1: histogram displaying the distribution and source of the most linked and discussed news articles on the front-page of two large politically active and highly partisan subreddit communities. Note here that ‘to be on the front page‘ means a story was among the top 25 most upvoted and discussed items by each subreddit community on a given day. 

Upper Panel: conservative subreddit The_Donald
Lower Panel: liberal subreddit EnoughTrumpSpam

It should also be noted that The_Donald subreddit community is so against the “liberal news establishment” it has a policy that any articles from a known left-wing news org must be submitted via archive.is , which is why “archive” is the 2nd highest source found on The_Donald (it represents all the liberal sources rolled into a single site; which actually makes the results even cleaner). Apparently The_Donald has this policy to prevent liberal news sources from benefitting (in terms of ad revenue) from The_Donald’s massive user-base clicking a link that sends them directly to the liberal news organization’s website.

I cross posted this project on Reddit’s Data is Beautiful, where it blew-up. And that is where Google data scientist Felipe Hoffa saw my post and took it to the next level!

Using data studio he expanded my original idea to all of reddit, and made it interactive. It’s something really worth playing around with for a few minutes.

Seriously, check this out!

Also you can read Filipe’s writeup about the project on Medium

Summary Discussion (and naturally, the musings)

Social media () was exposing me to an endless array of news outlets; some of which I’d never heard of before.

Political news is almost always spun in the same direction by a news organization, in order to cater to the prevailing party demographics of their readership (whether mostly-liberal or mostly-conservative).

This is fine, I think, but only insofar as these political leanings are common public knowledge. For example it’s widely known that Fox News leans conservative while MSNBC leans liberal; so a headline that states:  “Breaking: Hillary Clinton Broke Major Laws & Should be in Prison” can range in significance from essentially no-significance to extreme-significance depending on whether it’s found on the front page of MSNBC or Fox News.

What if, however, you saw the headline was breaking from the Chicago Herald ? Without an unbiased and convenient way to determine where a news organization falls on the political spectrum, the public is put in a situation where propaganda and fake news can run rampant: the stories that align with the reader’s biases will be taken as truth, while  stories that don’t align will send readers scurrying back to their political safe-spaces.

You might be saying – well they do this anyway, whether the news is from a source with known bias or from a completely unknown source.  But I’d respectfully disagree; my hypothesis is that as the public becomes less-and-less aware of the political leanings of the players involved in reporting the news; a climate of distrust is engendered. When the biases are laid bare, it engenders a climate of debate.

Think about how it plays out in a courtroom… a jury is presented information from two parties, both of which are heavily biased in a particular direction. However, since the jury is expressly aware of these biases, the debate plays out in a civil manner and allows a jury to formulate informed / logical conclusions. Could you imagine the zoo it would be if trials were conducted such that all the attorneys were back-stage with a bunch of expert witnesses, and the trial was conducted by sending out each person, one-by-one, to make a 10 minute appeal to the jury, but the jury wasn’t allowed to know whether the person speaking was an expert witness or an attorney for the plaintiff, or an attorney defendant. They just had to take everything said at face value. It would be madness.

And that’s about all I have to say about that 🙃 😉

Anyway, you can grab my code for scraping the reddit data from this github gist (with the caveat that it was hacked-together in a few minutes of spare time; so take it as a draft with lots of room for improvement ;)