Looping through the Urban Institute's Data Explorer API

This was inspired by seeing a startling comparison between electrical engineering and computer science enrollments. And, anecdotally I've interviewed more than a handful of electrical engineers for software engineering roles. Eventually I'd like to overlay outcomes data to show the change of salaries between different majors.

Source: https://semiwiki.com/events/314964-a-crisis-in-engineering-education-where-are-the-microelectronics-engineers/

This is a first pass to try and familiarize myself with the data. I started wrangling IPEDS data using the Urban Institute's Data Explorer API (which is fantastic, btw). This code is neither tidy nor performant but it gets the job done. I think if I were to keep fiddling with this, I would set it up to export the data as json blobs and do all of the data visualization on this site using ChartJS, which I already use for some calculator widgets. It's really hard to get mobile responsive charts using Python libs. The 11ty pagination feature + Netlify CMS is perfectly suited for batching out this kind of content. I might try to do it over the long, dark winter months.

Anyways, here's the chart I created. My chart starts about where the inspiration ended. I was surprised to see a drop in the rolled up CIP code (110000—Computer and information sciences and support services). I looked into it, and there's actually more people enrolling in those degrees than ever, but it has insanely high drop outs. This analysis is showing the degrees conferred.

2009201020112012201320142015201620172018201920k30k40k50k60k70k80k90k
award_levelAssociate degreeBachelors degreeMasters degreeindexvalue

Here's the code. It's pretty nasty and looping through that many pagination API pages takes way too long - like 40 seconds per CIP code. If I wanted to take this further I would need to re-write it.

import pandas as pd
import requests

total_results = []
#computer science
cip_code = '110000'

#create the df. I think it would be better to start with an empty df and handle all of the data in the loop
url ='https://educationdata.urban.org/api/v1/college-university/ipeds/completions-cip-2/2009/?cipcode=' + str(cip_code)
response = requests.get(url)
data = response.json()

total_results = total_results + data['results']

# While data['next'] isn't empty, download the next page
while data['next'] is not None:
response = requests.get(data['next'])
data = response.json()
# Store the current page of results
total_results = total_results + data['results']


df = pd.DataFrame (total_results, columns = ['unitid','year','fips','cipcode','award_level','majornum','race','sex','awards'])
new_df = df.groupby(['cipcode','award_level']).agg({'majornum': "sum"})
new_df.rename(columns = {'majornum':'2009'}, inplace = True)


def add_years():
for i in range(10,20):
global new_df
global cip_code
total_results = []
url ='https://educationdata.urban.org/api/v1/college-university/ipeds/completions-cip-2/' + '20'+ str(i) + '/?cipcode=' + str(cip_code)
response = requests.get(url)
data = response.json()

total_results = total_results + data['results']

# While data['next'] isn't empty, download the next page
while data['next'] is not None:
response = requests.get(data['next'])
data = response.json()
# Store the current page of results
total_results = total_results + data['results']

df = pd.DataFrame (total_results, columns = ['unitid','year','fips','cipcode','award_level','majornum','race','sex','awards'])
temp_df = df.groupby(['cipcode','award_level']).agg({'majornum': "sum"})
temp_df.rename(columns = {'majornum':'20'+str(i)}, inplace = True)
new_df = new_df.merge(temp_df, on='award_level', how='left')
return new_df

new_df = add_years()

#in this case, it is faster to just transpose the df to create a visual. usually it'd probably be better to unpivot / melt it
df_transposed = new_df.T
df_transposed.rename(columns={1: 'Award of less than one academic year', 2: 'Award of less than two academic years',3: 'Award of at least one but less than two academic years',4: 'Associate degree',5: 'Award of at least one but less than four academic years',6: 'Award of at least two but less than four academic years',7: 'Bachelors degree',8: 'Postbaccalaureate or post-masters certificate',9: 'Masters degree',20: 'Doctors degree',21: 'First-professional degree',22: 'Doctors degree, research',23: 'Doctors degree, practice',24: 'Doctors degree, other'},inplace=True)

And then, dropping some columns that make the chart messy plotting it out.

df_transposed.drop(columns=['Award of less than one academic year','Award of at least one but less than two academic years','Award of at least two but less than four academic years','Postbaccalaureate or post-masters certificate', 'Doctors degree, research','Doctors degree, practice','Doctors degree, other'],inplace=True)


#import plotly.express as px

fig = px.line(df_transposed)

fig.update_layout(legend=dict(
orientation="h",
itemwidth=70,
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
))

fig.show()