My First Date with Quilt Data

July 21, 2020 § Leave a comment

I’ve known the good folks at Quilt Data for a long time. A company hackathon gave me a good excuse to actually use them “in anger” for an actual demo. These are my notes on how to configure quilt3 and create my first package (and panda data frame) from a CSV

  1. Create a Quilt account. Actually, they created one for me, since I don’t have access to my own S3 bucket
    1. Login and create a password. Make sure I save it.
  2. Install stuff
    1. $ pip3 install quilt3 (on my Mac)
    2. Install (mini)conda from a package, because that seems to be what the cool kids use
    3. Use conda to install JupyterLab.
    4. Wait, no, I need the older (“Classic”) Jupyter Notebook to run the Quilt examples.
    5. Voila too, though I don’t know why.
  3. $ jupyter notebook
    1. Weird browser. Uploaded notebook file. Opened it. Works/
    2. $ jupyter notebook CORD19.ipynb # Ah, much better
    3. Click “Run” to evaluate each cell
    4. quilt3.login() – how the heck do I do that in Jupyter?
    5. ModuleNotFoundError: No module named ‘quilt3’
  4. $ python # try from repl
    1. same error
    2. $ which python
    3. /Users/nauto/opt/miniconda3/bin/python
  5. Ah. Maybe I need to install quilt also from conda
    1. conda install -c conda-forge quilt3
    2. Works!
  6. quilt3.login()
    1. Launching a web browser…
    2. Did not see that coming. Works!
  7. Work with packages
    1.  b = quilt3.Bucket(“s3://quiltnauto”)
    2. q=quilt3.Package()
    3. # fix .quiltignore
    4.  q.set_dir(“.”,”.”)
    5. q.push(“nauto/trips”,registry=”s3://quiltnauto”)
    6. quilt3.config()
    7. quilt3.config(default_remote_registry=”s3://quiltnauto”)
    8. qn = quilt3.Package.browse(“nauto/trips”, registry=”s3://quiltnauto”)  
    9. trip_data = qn[“trip_report_data.csv”].deserialize()

COVID-19 Data Lake

  1. https://open.quiltdata.com/b/covid19-lake/tree/tableau-jhu/csv/COVID-19-Cases.csv?version=OXNN19GctMD4EW4BOk8TBP4aAtx6lc8t
  2. c = quilt3.Bucket(“s3://covid19-lake”)
  3. c.fetch(“tableau-jhu/csv/COVID-19-Cases.csv”, “./COVID-19-Cases.csv”)
  4. import pandas as pd
  5. covid_data = pd.read_csv(“./COVID-19-Cases.csv”)

Pandas

  1. trip_data.dtypes
  2. covid_data.dtypes
  3. len(covid_data.Province_State.unique())                                                                                 
  4. trip_data.head()
  5. trip_data.index
  6. trip_data.columns
  7. trip_data.describe()
  8. trip_loc = trip_data.loc[:,[‘account’, ‘fleet_id’,’trip_bucket’, ‘trip_start_location’, ‘trip_end_location’]]                                                               
  9. covid_loc = covid_data.loc[:,[‘Case_Type’,’Cases’,’Date’,’Lat’,’Long’]]
  10. trip_loc[['start_lat','start_long']] = trip_loc[‘trip_start_location’].str.replace('(','').str.replace(')','').str.split(", ", expand=True)
  11. trip_loc[['end_lat','end_long']] = trip_loc[‘trip_end_location’].str.replace('(','').str.replace(')','').str.split(", ", expand=True)
  12. trip_loc.loc[0]

GeoPandas

  1. conda install -c conda-forge geopandas,
  2. conda install -c conda-forge matplotlib
  3. conda install -c conda-forge shapely
  4. import geopandas
  5. import matplotlib.pyplot as plt
  6. from shapely.geometry import Polygon,Point
  7. # project lat/long points into meters
  8. trip_start = gp.GeoDataFrame(trip_loc, geometry=gp.points_from_xy(trip_loc.start_lat, trip_loc.start_long), crs=’epsg:4326′).to_crs(‘epsg:3310’)
  9. trip_end = gp.GeoDataFrame(trip_loc, geometry=gp.points_from_xy(trip_loc.end_lat, trip_loc.end_long), crs=’epsg:4326′).to_crs(‘epsg:3310’)
  10. covid_point = gp.GeoDataFrame(covid_loc, geometry=gp.points_from_xy(covid_loc[‘Lat’], covid_loc[‘Long’]), crs=’epsg:4326′).to_crs(‘epsg:3310’)
  11. covid_point = covid_point[covid_point.geometry.is_valid] # get rid of Inf points from the conversion to meters

Tagged: , , , ,

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

What’s this?

You are currently reading My First Date with Quilt Data at iHack, therefore iBlog.

meta

%d bloggers like this: