Analytics Engineer
We are both an electricity retailer and a tech platform and we think there is no better way to address our greatest challenge, climate change, than with the combination of those two.
Through our proprietary tech platform, Kraken, we are changing the way people interact with their energy company - by making it approachable, low cost, easy-to-understand, and most importantly, 100%% renewable. We’ve distinguished ourselves by being named 2020’s Energy Provider of the Year, which highlights our commitment to exceptional customer service. In many markets we are a leading employer on Glassdoor for best places to work.
At Octopus we’re focused on making energy fair, clean, and simple for all using technology. We’re looking for an Analytics Engineer that can help us with this challenge. Our data team is developing a data platform and providing data services to inform Octopus US business strategy and operations. This data platform enables self-service of data analytics for business stakeholders as well as automation of all our data workflows from ETL jobs to ML training and prediction. The data platform team works across the whole customer domain on anything from energy load forecasting to financial and customer data modeling.
Octopus Energy is growing fast and that means lots of data that needs to be ingested, organized, analyzed and shared with the team. You’ll work across all different parts of the business to understand what our teams need and deliver data pipelines and tools to meet them. Because it’s still early days, you’ll need to be versatile and be equally comfortable building robust production ready pipelines or hacking together a quick script to run on your machine. You’ll spend most of your time engineering, but you should also enjoy analyzing data and building data interfaces like dashboards or data applications.
You’ll be part of our global data platform team who will provide dev ops and infrastructure support as well as technical guidance. We’re building a consistent data platform across all Octopus retail businesses around the world so you’ll be part of and contribute to a global data community. This position is based in Houston, Texas.
What you'll do
- Work with the data scientists, data analysts, and business stakeholders to scope out and plan new data sources and pipelines
- Build, automate, deploy and maintain data models and workflows
- Develop Streamlet data apps and lend a hand building and maintaining Tableau dashboards
- Spearhead efforts on data monitoring and integrity to ensure accuracy of reporting.
- Work with the global data platform team to deploy new tools and services into the US data environment
- Participate in and contribute to our global data community
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources
- Use an analytical, data-driven approach to drive a deep understanding of fast changing business
- Build large-scale batch and real-time data pipelines with data processing frameworks
What you'll need
- 2+ years of experience in data engineering
- First and foremost, we want our data engineers to be great software engineers with a passion for writing high quality code
- It would be helpful to have experience/expertise in the following (in rough priority order):
- Python (in combination with Data Pipelines and Analytics)
- Advanced SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Experience modeling data for analytics - ideally experience using dbt as a modeling tool
- Experience building data pipelines in a cloud environment (ideally AWS)
- Spark
- The projects will be varied and we’re looking for someone who can work autonomously and proactively to scope problems and solve and deliver pragmatic solutions
- We want someone who is passionate about building great data tools for our business teams
- Experience in the energy industry or enthusiasm for innovation in our energy system towards a more intelligent and clean grid are a big plus!
- The ability to work alongside the team in our downtown Houston office
Our Data Platform Stack
- Python as our main programming/scripting language
- Kubernetes for data services and task orchestration
- Airflow purely for job scheduling and tracking
- Circle CI for continuous deployment
- Parquet and Databricks Delta file formats on S3 for data lake storage
- Spark and pandas for data processing
- dbt for data modelling
- Presto and SparkSQL for querying
- Jupyter for data notebooks and ad-hoc analytics
- Streamlit for data applications
- Tableau for BI