Highlights
- Reduced Spark compute costs by 60% by migrating from managed EC2 to EMR for data and predictive analytics.
- Integrated 3 new data sources from SQL Server, Splunk, and Internal APIs into existing Spark pipeline.
- Migrated version control from Perforce to Git-based BitBucket.
Stack
PythonSparkAWS EMRSQL ServerSplunk