Every time I re-read my post “Caring about racism is not that hard“, I’ve been torn about whether or not I’m being too hard on the faculty who I’ve worked with for the past few years. On the one hand, I know that they do care deeply about these issues. On the other hand, I’ve been mostly disappointed at the relative lack of self-starting, inertia-overcoming, proactive-game-changing energy and responses I’ve received. In the post, when I say “if only faculty cared,” what I really mean is “if only faculty cared enough to make theses issues as high of a priority as research.” I really don’t want to imply that they don’t care, I just want to point out that they don’t seem to care enough.
One of the keynotes discussing disparities in big data at this year’s Pacific Symposium on Biocomputing pointed to a Bloomberg article about Amazon’s same-day delivery areas: in Boston (at the time of writing), all neighborhoods surrounding Roxbury were eligible for same-day delivery but Roxbury was not.
I tweeted that when I saw this, my jaw literally dropped (which it did) – not because I’m surprised that this was the case, but because I was so surprised to see such a clear irrefutable example of bias in algorithms. Like, there is no Occam’s razor explanation you can give for the map above, not if you’re being serious with yourself.
The other thing I didn’t focus on as much in the Nature Microbiology blog post was all of the lessons I learned about data and doing reproducible science.
It turns out that doing reproducible science is really hard. Making data usefully accessible is not trivial, figuring out how to use someone’s else data can be super confusing, and keeping track of what you’ve done gets unmanageable super quickly.
In all the excitement, I forgot to post about my microbiome meta-analysis getting published! You can read the paper at Nature Communications (for free because open science is the way it should be!) I also wrote a blog post for Nature Microbiology’s Community Forum “Behind the paper” series.
Writing the “Behind the paper” post was really fun and helpful – it helped me think about how to communicate my motivation and findings to a broader audience, and was also really cathartic. For example, I got to mention some things that I believe to be true but which aren’t supported by the data we currently have and so couldn’t go in the paper. I also got to talk a little a bit about how hard it was to collect all the data – which feels really good because it can sometimes seem like collecting the data shouldn’t be the hard part, you know? (Obviously any data scientist and computational biologist will tell you how hard it is… It just doesn’t feel like a glamorous problem to have, unlike some hard scientific or statistical question or something.) And I got to step up on a small little soapbox and give a bit of editorialized perspective on doing reproducible science.
That said, there were still some things that were too personal or off-topic to include in the “official” blog post, which is what this personal blog is for! This post is mostly me being giddy and sharing my personal reactions to seeing this project come to fruition. I’ll write another post with some more thoughts on what I learned about data and doing reproducible science.
JAMA just published a series of responses to the “Unintended Consequences of Machine Learning in Medicine” editorial that had me pretty disappointed a few weeks ago.
My rapid-fire thoughts on each of the (very short!) articles:
Last week, I learned about the 1999 Hopkins Study on the Status of
Women Faculty in Science at MIT. This was apparently a report that made huge waves at MIT, partially because the President himself acknowledged the magnitude and severity of the problem:
First, I have always believed that contemporary gender discrimination within universities is part reality and part perception. True, but I now understand that reality is by far the greater part of the balance.
Man, clinicians are really on a roll with their curmudgeon, take-me-back-to-before-computers editorials…
The JAMA viewpoint “Unintended Consequences of Machine Learning in Medicine” and its associated response on the Cross Invalidation blog are really important reads. Both leave me fairly unsatisfied, however, in how little effort they make to acknowledge the validity of the other’s viewpoints.