Using Your Own Datasets#
APEM can work with your own data, but not by pointing config.json at an arbitrary file path.
The usual workflow is:
Put your dataset into the expected folder structure.
Register it in the relevant dataset enum.
Select that dataset name in
config.json.
Two Ways To Add Data#
The setup depends on which market workflow you want to run.
Use a unit-based dataset if your data describes generators, buyers, and a network.
Use an order-book dataset if your data is already in Euphemia-style order tables.
Option 1: Add A Unit-Based Dataset#
For the unit-based workflow, APEM loads data through parser classes that return a Scenario.
Step 1: Add your raw data#
Place your files under:
apem/unit_based_model/data/raw_data/<your_dataset_name>/
Step 2: Create a parser#
Add a parser class in apem/unit_based_model/data/parsing/ that subclasses ParseData and returns a Scenario.
Your parser should build and return one Scenario object.
df_sellers, df_buyers, network, nodes_agents,
periods, blocks_buyers, blocks_sellers, r_star.
Keep periods indexed as
1..T.Start with
DCOPFwhen adding a new unit-based dataset.Use one row per
(agent, period)in buyers/sellers tables.
Parser Output Contract#
df_sellers Required
One row per (seller, period).
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str/int |
yes |
Seller (unit) id. |
|
|
int |
yes |
Time index. |
|
|
str/int |
yes |
Network node where seller is located. |
|
|
float |
yes |
Available production in period. |
|
|
float |
yes |
Minimum stable production. |
|
|
int |
yes |
Minimum up-time. |
|
|
float |
yes |
Fixed no-load cost. |
|
|
float |
yes |
Seller bid-block quantities. |
|
|
float |
yes |
Seller bid-block prices/costs. |
|
seller,period,node,max_prod,min_prod,min_uptime,no_load_cost,size1,cost1,size2,cost2
101,1,N1,120,20,2,30,40,25,80,40
df_buyers Required
One row per (buyer, period).
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str/int |
yes |
Buyer id. |
|
|
int |
yes |
Time index. |
|
|
str/int |
yes |
Network node where buyer is located. |
|
|
float |
yes |
Must-serve demand. |
|
|
float |
yes |
Buyer bid-block quantities. |
|
|
float |
yes |
Buyer bid-block valuations. |
|
|
float |
yes |
Total demand cap ( |
|
buyer,period,node,inelastic_dem,size1,val1,max_dem
B1,1,N2,50,20,200,70
network Required
Type: networkx.Graph.
element |
required |
meaning |
example |
|---|---|---|---|
node ids |
yes |
Must match nodes used in buyers/sellers tables. |
|
edge attr |
yes for |
Line susceptance. |
|
edge attr |
yes for |
Line capacity limit. |
|
For single-node market data, a graph with one node is valid.
nodes_agents, periods, blocks_*, r_star Required
field |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
|
yes |
|
|
|
|
yes |
Period list used by the model. |
|
|
|
yes |
Buyer block index set. |
|
|
|
yes |
Seller block index set. |
|
|
str/int |
yes |
Reference/slack node. |
|
Use this parser skeleton:
from collections import defaultdict
import networkx as nx
import pandas as pd
from apem.unit_based_model.data.parsing.parse_data import ParseData
from apem.unit_based_model.data.parsing.scenario import Scenario
from apem.unit_based_model.utils.paths import RAW_DATA_DIR
class ParseMyDataset(ParseData):
def parse_data(self, day=None) -> Scenario:
path = RAW_DATA_DIR / "my_dataset"
# 1) Read and normalize sellers/buyers.
df_sellers = pd.read_csv(path / "sellers.csv")
df_buyers = pd.read_csv(path / "buyers.csv")
# 2) Build network with DCOPF-required edge attributes.
network = nx.Graph()
# network.add_edge(u, v, B=<susceptance>, F_max=<capacity>)
# 3) Build node -> agents mapping.
nodes_agents = defaultdict(lambda: {"sellers": [], "buyers": []})
for node, group in df_sellers.groupby("node"):
nodes_agents[node]["sellers"] = sorted(group["seller"].unique().tolist())
for node, group in df_buyers.groupby("node"):
nodes_agents[node]["buyers"] = sorted(group["buyer"].unique().tolist())
periods = sorted(df_buyers["period"].unique().tolist())
blocks_buyers = range(1, 3 + 1) # replace 3 with your buyer block count
blocks_sellers = range(1, 4 + 1) # replace 4 with your seller block count
r_star = df_sellers.iloc[0]["node"]
return Scenario(
"MY_DATASET",
df_buyers,
df_sellers,
network,
nodes_agents,
periods,
blocks_buyers,
blocks_sellers,
r_star,
)
Before registering the dataset, verify:
period indices are consistent across buyers and sellers (prefer
1..T)each
(seller, period)and(buyer, period)combination expected by your model is presenteach
nodein buyers/sellers exists innetworkblocks_buyersandblocks_sellersmatch the number ofsize*/val*/cost*columnsmax_demis correctly computed
Useful references in the repo:
Step 3: Register the dataset#
Add your parser to apem/unit_based_model/enums/datasets.py, following the existing pattern:
class UnitBased_Datasets(Enum):
...
MY_DATASET = ParseMyDataset()
Step 4: Select it in config.json#
{
"run": {
"market_model": "unit_based_model"
},
"unit_based_model": {
"dataset": "MY_DATASET",
"power_flow_model": { "type": "DCOPF" },
"pricing_algorithm": "ELMP"
}
}
Limitation#
For a new custom unit-based dataset, the safe supported choice is DCOPF.
More precisely:
your custom dataset can be used in the unit-based workflow with
DCOPFyour custom dataset cannot currently be used with the zonal power-flow models
Zonal_NTC_aggregated,Zonal_NTC_multiedge, orZonal_FBMCredispatch is part of those zonal workflows, so it is also not available for a new custom unit-based dataset
Note
At the moment, the zonal unit-based workflows are only supported for the two built-in PyPSA datasets: PyPSAEurSmall and PyPSAEurLarge.
What would be needed for zonal support on a new dataset?#
To make a new unit-based dataset work with zonal models, it is not enough to only add a parser.
You would also need to:
provide node coordinates so APEM can map nodes to zones
define or extend the zonal mapping logic for your geography
ensure the dataset contains the network information needed to aggregate a nodal network into a zonal one
remove the current PyPSA-only restrictions in the zonal execution path
test the full zonal workflow, including allocation, pricing, and redispatch
In other words, adding a new dataset for DCOPF is mostly a data-integration task, while adding a new dataset for zonal models also requires extending the current zonal-model implementation.
Option 2: Add An Order-Book Dataset#
For the order-book workflow, APEM expects a Euphemia-style dataset folder made of CSV files.
In most cases, you do not write a new parser class: you use ParseOrderBook and provide the expected files.
Step 1: Add the dataset folder#
Place your dataset under:
apem/order_book_based_model/euphemia/data/datasets/<your_dataset_name>/
ParseOrderBook reads fixed filenames and builds a ZonalScenario from them.
Use the contracts below.
You must provide:
periods.csv, step_orders.csv, block_orders.csv,
complex_orders.csv, complex_step_orders.csv,
scalable_complex_orders.csv, scalable_step_orders.csv,
piecewise_linear_orders.csv.
You can add:
zones.csv, atc.csv, fb_constraints.csv, fb_ptdf.csv.
If omitted or empty, network constraints may be disabled and the model can fall back to unconstrained single-zone clearing.
Note
If you do not use one order family (for example complex orders), keep that CSV as a header-only file with the expected columns.
File Specifications#
periods.csv Required
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
int |
yes |
Market period index. |
|
period
1
2
step_orders.csv Required
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str/int |
yes |
Unique step-order id. |
|
|
int |
yes |
Period index (must exist in |
|
|
float |
yes |
Limit price. |
|
|
float |
yes |
Quantity ( |
|
|
str |
yes |
Bidding zone. Aliases accepted: |
|
id,t,p,q,zone
S1,1,75,-20,Z1
S2,1,25,30,Z2
block_orders.csv Required
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str/int |
yes |
Unique block id. |
|
|
str |
yes |
|
|
|
str/int |
conditional |
Group/parent code ( |
|
|
float |
yes |
Block price. |
|
|
float |
yes |
Per-period quantities, matching |
|
|
float |
yes |
Minimum acceptance ratio (0..1). |
|
|
str |
yes |
Bidding zone. Same aliases as step orders are accepted. |
|
id,block_type,code_prm,p,q1,q2,MAR,zone
B1,normal,,500,-10,-10,1,Z3
complex_orders.csv Required
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str/int |
yes |
Complex parent id. |
|
|
str |
yes |
Comma-separated child step ids. |
|
|
float |
yes |
Fixed MIC/MP term. |
|
|
float |
yes |
Variable MIC/MP term (complex orders). |
|
|
str |
yes |
Typical values: |
|
|
float/empty |
yes |
Ramp-like limit if used. |
|
id,step_orders,fixed_term,variable_term,condition,load_gradient
C1,"CS1,CS2",100,5,MIC,
complex_step_orders.csv Required
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str/int |
yes |
Complex-step id. |
|
|
str/int |
yes |
Parent id in |
|
|
int |
yes |
Period index. |
|
|
float |
yes |
Step price. |
|
|
float |
yes |
Step quantity. |
|
|
str |
yes |
Zone (aliases accepted). |
|
id,complex_order_id,t,p,q,zone
CS1,C1,1,60,30,Z1
CS2,C1,2,65,25,Z1
scalable_complex_orders.csv Required
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str/int |
yes |
Scalable parent id. |
|
|
str |
yes |
Comma-separated scalable step ids. |
|
|
float |
yes |
Fixed term. |
|
|
str |
yes |
Typical values: |
|
|
float/empty |
yes |
Gradient limit if used. |
|
|
float |
yes |
Period-wise minimum acceptance profile. |
|
id,step_orders,fixed_term,condition,load_gradient,MAP1,MAP2
SC1,"SS1,SS2",0,MIC,,10,10
scalable_step_orders.csv Required
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str/int |
yes |
Scalable-step id. |
|
|
str/int |
yes |
Parent id in |
|
|
int |
yes |
Period index. |
|
|
float |
yes |
Step price. |
|
|
float |
yes |
Step quantity. |
|
|
str |
yes |
Zone (aliases accepted). |
|
id,scalable_order_id,t,p,q,zone
SS1,SC1,1,55,20,Z2
SS2,SC1,2,58,20,Z2
piecewise_linear_orders.csv Required
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str/int |
yes |
PLO id. |
|
|
int |
yes |
Period index. |
|
|
float |
yes |
Start price. |
|
|
float |
yes |
End price. |
|
|
float |
yes |
Quantity ( |
|
|
str |
yes |
Zone (aliases accepted). |
|
id,t,p0,p1,q,zone
P1,1,20,80,15,Z1
zones.csv Optional
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str |
no |
Explicit zone list. If missing, zones are inferred. |
|
Aliases accepted: z, or first column fallback.
zone
Z1
Z2
atc.csv Optional ATC
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str |
yes |
Source zone. |
|
|
str |
yes |
Sink zone. |
|
|
int |
yes |
Period index. |
|
|
float |
yes |
Directed transfer capacity. |
|
|
float |
no |
Inter-temporal upward ramp bound. |
|
|
float |
no |
Inter-temporal downward ramp bound. |
|
Aliases accepted:
from/to, source_zone/sink_zone, period/time, capacity/atc.
from_zone,to_zone,t,cap
Z1,Z2,1,400
Z2,Z1,1,400
fb_constraints.csv Optional FBMC
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str |
yes |
CNEC identifier. |
|
|
int |
yes |
Period index. |
|
|
float |
yes |
Remaining available margin (upper bound). |
|
|
float |
no |
Optional lower bound. |
|
Aliases accepted:
cnec, constraint_id, period/time, capacity; lower bound aliases ram_lb, min_ram.
cnec_id,t,ram,lb
NP_Z1,1,800,-800
fb_ptdf.csv Optional FBMC
column |
type |
required |
meaning |
example |
|---|---|---|---|---|
|
str |
yes |
CNEC identifier. |
|
|
int |
yes |
Period index. |
|
|
str |
yes |
Zone label. |
|
|
float |
yes |
PTDF coefficient for |
|
Aliases accepted:
cnec, constraint_id, period/time, z/bidding_zone, value/factor.
cnec_id,t,zone,ptdf
NP_Z1,1,Z1,1.0
NP_Z1,1,Z2,0.0
Network Model Mapping#
network_model = "ATC": usesatc.csvfor multi-zone transfer constraints.network_model = "FBMC": usesfb_constraints.csv+fb_ptdf.csv.If required network files are missing or empty, APEM can run with unconstrained single-zone clearing.
Validation Checklist#
periods.csvcontains integers and everytin every orders file belongs toperiods.block_orders.csvcontains allq1..qTcolumns for defined periods.scalable_complex_orders.csvcontains allMAP1..MAPTcolumns for defined periods.Linked blocks are valid: each
linkedblock hascode_prmpointing to an existing parentid.complex_step_orders.complex_order_idreferences existingcomplex_orders.id.scalable_step_orders.scalable_order_idreferences existingscalable_complex_orders.id.
Quick Start#
Copy
apem/order_book_based_model/euphemia/data/datasets/test_3node/.Replace CSV contents while keeping filenames and headers.
Step 2: Register the dataset#
Add your dataset to apem/order_book_based_model/euphemia/enums/datasets.py:
from enum import Enum
from apem.order_book_based_model.euphemia.data.parsing.parse_order_book import ParseOrderBook
from apem.order_book_based_model.euphemia.utils.paths import DATA_DIR
class OrderBookBased_Datasets(Enum):
...
MY_DATASET = ParseOrderBook(DATA_DIR / "my_dataset", "My Dataset")
Step 3: Select it in config.json#
{
"run": {
"market_model": "order_book_based_model"
},
"order_book_based_model": {
"dataset": "MY_DATASET",
"cut_type": "price based",
"euphemia_configuration": {
"network_model": "FBMC"
}
}
}
Summary#
config.jsononly selects datasets that APEM already knows about.To use your own data, you first register it in code.
For unit-based data, you add a parser returning a
Scenario.For order-book data, you add a dataset folder and register it with
ParseOrderBook.