top of page
Writer's pictureJon Thompson

The real reason data scientists are struggling to work on energy-system challenges, and what we can do about it.

When future generations look back on the 2020’s, I like to think they’ll say it was the time when humans achieved two major breakthroughs:

  1. the mass-adoption of artificial intelligence

  2. the transition to sustainable energy


With artificial intelligence set to revolutionise every industry, there is a huge opportunity to apply AI to accelerate the transition towards sustainable energy.


The harsh reality is that today, this couldn’t be further from the truth. There are surprisingly few real-world applications of artificial intelligence to address energy-system related challenges. But why?


Energy-systems of the future


Ditching fossil fuels and moving to renewable energy sources sounds like a no brainer. But the infrastructure overhaul and systemic change required to facilitate a low carbon energy-system is where the real challenge lies.


A low-carbon energy-system is also a decentralised energy-system, relying on millions of distributed energy resources, instead of the handful of centrally managed power stations that we have today. Tens of millions of small-scale solar PV installations, batteries and electric vehicles will provide the flexibility the energy system needs to perfectly balance supply with demand.


How exactly this happens is still up for debate, but there’s no doubt that an operation of this complexity will need to be a digital-first solution, utilising a cutting-edge technology-stack that may not even exist today. One thing we can be certain of is that it won’t be human dependent.


The age of artificial intelligence


AI-based solutions have been optimising our news feeds and recommendation engines for some time, but it’s only with recent advances in compute-power that AGI (Artificial General Intelligence) has become a real possibility. OpenAI’s ChatGPT stole all the headlines, but it’s the development of neural-networks and the GPUs they're trained on that has enabled the breakthrough.


The foundation of machine learning and AI models is a large, clean and diverse training dataset. When I say large, I mean absolutely enormous. The first version of ChatGPT (GPT-3) was trained on 1.3 billion parameters. Meta has just released Llama 3.1, which was trained on a staggering 405 billion parameters. These models take weeks to run and cost tens of millions of dollars in compute resources -  which is why we only get a new model a few times a year.


The main reason we’ve seen large language models take off is the availability of training data. If we want to create a similar level of intelligence within the energy sector, we need to drastically improve data availability.


As of right now, it’s pretty much impossible for data scientists to use energy system data in the most basic forms of analysis. We’re a million miles away from enabling the creation of an effective training dataset to deploy AI-based solutions. That needs to change.


Data scientists working in the energy sector spend 80% of their time collecting, parsing and cleansing data. The process of collecting the data needed for a given project takes weeks, if not months. Data scientists that are new to the energy-sector are immediately overwhelmed by industry jargon and having to navigate the numerous industry bodies. We’re actively disincentivising AI-talent from working on energy related challenges.


Energy-system data challenges


Energy-system data exists, it’s just not being used because it can’t be accessed easily. Yes, there are probably more datasets that should be created through additional monitoring. But for now at least, we should focus on extracting the enormous potential value from existing datasets.


The simplest way to do this is put it in the hands of people who can extract value, i.e data scientists. Ofgem’s drive for open data is an important first step, but it’s not enough in isolation.


I’ve spent the last four weeks speaking with data scientists in the energy sector to learn about their specific challenges and pain points. Everyone I spoke to described issues with every aspect of the system. The frustration was palpable. Some described our call as being like a therapy session.


After consolidating pages and pages of notes, I summarised the issues into four areas:


  1. Discoverability

    1. Poor knowledge as to what data exists and where to find it.

    2. Datasets are not published with a clear use-case in mind.

  2. Accessibility

    1. Lots of red tape and bureaucracy.

    2. Energy data is treated as overly sensitive.

  3. Reliability

    1. Datasets aren’t maintained or kept up to date.

    2. Solutions proposing to solve the problem are defunded and disappear. 

    3. Estimation techniques aren’t clearly explained.

  4. Usability

    1. Data isn’t standardised or consistent in any way and has poor documentation.

    2. Multiple disparate sources are needed to compile complete datasets.

    3. Where APIs do exist, they are complex and varied and therefore require further processes.


In summary, data scientists can’t find the data they need, when they can find it, it’s hidden behind paperwork and complex processes. In the rare case that they manage to get hold of it, it’s in an unusable condition, and even then, it’s unlikely to be maintained long term. 


3 ways we can improve access to energy-system data


At its core, improving access to energy data is all about removing friction. Data science is a creative endeavour. When a data scientist is in flow, asking them to wait for 3 weeks for approval, or presenting them with hundreds of files broken down by day and geographic-region, is going to stop them in their tracks.


To incentivise our brightest minds to work on energy-system related problems we need to lower the barrier to entry and encourage experimentation.


We believe there are three key areas for improvement:


  1. Simplification - Datasets should be available from a single point of access with documentation that’s easy to understand. It should be queryable through cloud-object storage, instead of having to store the data locally.

  2. Standardisation - Data structures should remain consistent over time and should be regularly maintained and kept up to date. Language should be accessible and consistent across data providers.

  3. Consolidation - Similar datasets from different sources should use the same methodology and file format, removing the need for merging, parsing or unnecessary processing.


At the Centre for AI & Climate, we’re working on a solution with these core characteristics to address the frustrations of data scientists, and help them work on energy system related problems.


If you’re a data scientist working in the energy sector, we’d love to hear from you. You can either email me at jon@c-ai-c.org, or book a slot in my diary.


Stay tuned for the launch of our prototype coming very soon!

Comments


bottom of page