Introduction
For the final unit of my Lambda School journey, I worked on a large team of 20+ developers and data scientists for the nonprofit Human Rights First (HRF) in a month-long experience called Labs. HRF helps connect immigration attorneys with asylum seekers and provides valuable resources for those attorneys. Accordingly, HRF would like a web-based application that gives attorneys access to information about an immigration judge’s past decisions and any factors that appear to bias those decisions. Lambda School and HRF have partnered to allow students to build and improve this web application. Consequently, our Labs team inherited a codebase to build upon the progress made by the previous five cohort, which includes a database of case decisions provided by HRF.
As the product rapidly approaches a minimally viable product (MVP), HRF would like to start demoing this application to a select group of immigration attorneys, who would be given access via invite-only. After meeting with my teammates, we decided that the best way to get feedback from this group is to set up a framework for rapidly building and demoing visualizations right in the web application.
The catch: the data scientists work mainly in Python while the web application uses Javascript. As a data science student who has spent the last five months primarily exercising my research skills, my first taste of working on a cross-functional team was during this experience. As a result, it was easy for me to feel sequestered away from the development side. So how can data scientists work together with web developers to ensure that they can seamlessly contribute to development?
Planning
After a whole week of planning tasks and translating product-roadmap features to user stories, I worked closely with a fellow data scientist named Filipe and the back-end web developers to outline the data flow. We soon recognized that a significant hurdle was in front of us. Our data science team could not access the database that stores the case information necessary to create visualizations directly. This left us with a few options:
- assist the web developers with creating the visualizations in the back-end,
- set up access to the database for data scientists, or
- set up a data flow between the back-end API and the data science API.
However, there are drawbacks to these approaches. The first option would likely suffer from a lack of consistency since there is essentially a new batch of data science students onboarded every month. So, the project may lose progress as the new students gain the necessary knowledge about the web team’s tech stack. The second option is tempting, but our team couldn’t see a reasonable solution we could implement quickly enough to start exploring visualizations.
Thus, we chose option three and set up a data flow where the back-end sends case information to an endpoint in the data science API as a POST request. Then, the data scientists manipulate that data to create charts using Plotly. Finally, information to recreate the chart in Plotly is sent back to the back-end as JSON formatted data. The web developers then send this data to the front-end to display the charts on the website. This framework was made possible by web developer Daniel Gamboa’s excellent work as architect of this flow along with the web team’s implementation in the back- and front-ends.
Daniel’s diagram of the data visualization data flow |
Implementation
With the plan in place, we began working on the details. We met with and received feedback from the associate data science leader. Following that meeting, I guided Filipe through building the endpoint in FastAPI to deliver a stacked bar chart showing the distribution of case outcomes, a basic visualization to get us started. We coordinated with the back-end team and found a problem with the data format we had for the endpoint. In our rush to set up a viable local test, we forgot to set our endpoint to receive POST requests. I hopped into a video call with the web team to remedy this, eager to start coding myself.
I began by examining the documentation for FastAPI. We needed to use a class called Request
from FastAPI to receive the case data properly. After writing a few lines of code, I was ready to test using Postman. Daniel shared a chunk of JSON with some case data to send a POST request to our endpoint. After our API responded with JSON data capable of recreating the chart in Plotly, we all did a little cheer. This team coding session allowed us to quickly debug the issue and deliver a fix within 30 minutes.
Stakeholder feedback
With our implementation set up, it was time to present our work to the stakeholders at HRF for feedback on the visualizations. The web team led with a demo of the app and showed the page with the visualizations we created. We explained how the visualizations would adapt as more cases get uploaded to the database. Our presentation received a favorable response, and we were happy to move on to create more visualizations.
Wrapping up
For the final week, we held a brainstorming session with the web team to conceive of more complex visualizations that would be helpful for users to view on the landing page. As a result of that session, I began work on a map showing the approval rate of asylum cases for each US Court of Appeals circuit. I created an IPython Notebook to organize my research into this type of visualization. After my search for files describing the circuit boundaries came up empty, I created a custom GeoJSON file from a GeoJSON file with district court boundaries. Since each circuit is composed of several district courts, I merged most of them into their respective circuits. I used this to create a choropleth in Plotly and shared my work with my teammates by making a pull request.
Unfortunately, as time ran out on our Labs experience, our team found a bug in the application unrelated to the visualizations. We reverted part of our application to a previous state to be safe, but this means that our visualization work is not viewable on the deployed website, at least until the bug gets fixed. Despite these minor setbacks, I am very proud of our team’s work to establish a framework that will make it easier to get user feedback on data visualizations.
Reflection
I grew a lot as a data scientist, a teammate, and a communicator throughout my Labs experience. This month, I deepened my skills and understanding of Git, Github, APIs, and AWS. In addition, I worked with some excellent web developers, learning something new from them every step of the way. Though I was initially uncomfortable with voicing my opinions, I became much more active in discussions in stakeholder and product review meetings week after week. Thus, I learned how to work with product owners to contribute effectively to the product vision.
Acknowledgments
I want to thank everybody at Lambda Labs and Human Rights First for their contributions to my professional development. Special thanks, in no particular order, to Filipe Collares, Daniel Gamboa, Paul St. Germain (Paul’s Blog), Jacob Bohlen, Nick Allen, Jonathan Mary, Sebastian Espeset, Katie Olson (Katie’s Blog), Nick Major, River Bellamy, Alex Krieger, Daniel Fernandez, Tori Armstrong, and Jennifer Faith for their diligent work on this project and for providing great feedback. I would also like to thank Joseph Panetta, Rebecca Duke Wiesenberg, and Robert Sharp for their excellent leadership and support throughout my Labs experience. I learned a lot from all three of them.