DataHub Ruhr is a business building program that connects start-ups with established corporations in the Ruhr region. The program tasks start-ups with developing innovative, data-driven ideas to tackle use cases provided by our corporate partners. As part of a three-month collaboration, start-ups will draft a proof of concept, with the opportunity to receive up to EUR 20,000 in funding.
Data Driven Sustainability
Biodiesel has become a sustainable alternative to conventional fuels. As it is produced from renewable raw materials, it is a means to significantly reduce the currently still dominant dependency on fossil fuels. Evonik is providing catalysts for biodiesel production for many years and has long-standing experience in this domain. However, the biodiesel market is highly regulated and regulatory constraints are constantly changing in different parts of the world. This is a challenge especially for predicting future biodiesel demand. Can you help Evonik to develop a computational model that predicts future biodiesel demand per geographical region for a time horizon of 10 - 15 years?

Use Case

The prediction of future biodiesel demand in different regions of the world involves an extensive data collection process. A prediction model based on this data must not only consider available information on historical fuel consumption and related quantities, but also consider current and future constraints imposed by local biofuel mandates. Evonik is interested in a computational model that can predict future biodiesel demand per geographical region (e.g. on the country level) for a time horizon of 10 – 15 years. This will help Evonik in assessing future demands and opportunities for its biofuel catalyst business and how it contributes to corresponding sustainability criteria.

Historical data on biodiesel consumption and information on regulatory constraints are currently collected and aggregated manually from various data sources. These sources include publicly available data from governmental websites and national statistical services, as well as additional data from commercial service providers.

This use case aims at developing an automated data collection process that aggregates publicly available information about biodiesel consumption and regulatory constraints (biofuel mandates) in a centralized data repository. The process should provide the possibility for regular automatic data collection and updating (e.g. on a daily or weekly basis). A coherent data structure must be set up to integrate the collected data with additional internal data sources. In conjunction with expert domain knowledge available at Evonik, this structure shall subsequently serve as the basis for training and evaluating the required prediction models. Within the scope of this challenge, also the development of a prototypical prediction model is desired, which utilizes the aggregated data sources and predicts the expected biodiesel consumption for an exemplary geographical region.

If the results of this use case are satisfying, a possible follow-up project will extend the solution to worldwide market coverage and additional data sources to improve the prediction accuracy even further. More specifically the data will enable the development of scenarios with respect to global greenhouse gas savings scenarios and enable to choose favorable business conditions with the goal of maximizing these savings.

What you will need

  • Experience and capabilities in information retrieval, web mining and scraping
  • Expertise in automated data aggregation and storage
  • Knowledge about time-series forecasting methods and practical experience in developing and deploying related models
  • You know how to present insights in a simple and visual way (eg. Dashboards)  

Expected result

  • Framework/software for automated data collection from publicly available sources on biodiesel consumption and biofuel mandates
  • The framework can easily be extended to additional markets and data sources
  • Aggregation of collected information with available Evonik-internal data sources in a joint data repository
  • Prototypical implementation of a biodiesel consumption forecasting model based on the aggregated data


  1. The first milestone is reached when the required information on biodiesel consumption and corresponding biofuel mandates can be automatically collected from the respective websites
  2. The second milestone is reached when the data obtained from the web is stored in a suitable repository and aggregated with Evonik-internal data sources
  3. The third milestone is reached when a prototypical forecasting model (biodiesel consumption forecast for a selected region) is implemented and evaluated

