Check and adjust annual Mileage #316

johannah-pik · 2025-01-22T16:38:13Z

I double checked the annual mileage trajectories on iso country level as well as regionally aggregated over 21 regions using the following script for electric busses as an example:
/p/projects/edget/testsFormodelImpovement/annualMileage

Learnings:

iso country data looks good from 2010 onwards -> I implemented an interpolation step from 1990 until 2010 to remove spikes before 2010 smooth annual mileage < 2010 mrtransport#32 (note: this is more a cosmetic improvement for our plots)
There are spikes due to the regional aggregation using gdp on MER basis

Assumed reason for spikes in input data in comparison to old edge-t:

old edget t produced output only from 2010 onwards
In old edge-t the calculation of cost per pkm using annual mileage & load factor was done on iso country level already. Missing data was interpolated somewhere afterwards

Now we follow the same steps for all input data parameters separately:

read in sources
select and merge data
apply fixes to have a consistent and complete set of parameters on iso country level
-> by calling mrtransport in edgeTransport the iso level parameters are aggregated to 21 region first
-> then calculations are applied to get e.g. cost per pkm

This "differentiation" of tasks and systematic approach is very helpful to handle our input data. So to my mind this is a huge improvement! But of course it is different to old edget so it is essential, to remove spikes on 21 region level

Next steps:

Look at current results. Implement reasonable smootheing of parameter input data at the end of toolLoadmrtransportData.R and compare input parameters on 21 regional aggregation and model output to previous version.
Load Factor should be also checked on iso country level where it looks weird (you can do this very quickly by adjusting the script on the cluster) - > once that looks good the smoothening on 21 region level should be applied as well

robertpietzcker · 2025-01-22T18:08:20Z

Thanks for the summary!
could you add a (or several) plots to show the magnitude of the problem?

johannah-pik · 2025-01-23T09:58:26Z

There are plots in the folder and a script to get them..
I guess a current run with the extended reporting would be best to see the remaining problem in the results :)
@jmuessel could you take over?
My procedure was more to find out, why and when the spikes come in to find the root cause of the problem.

When you find a good way of smoothing the input data, we should also apply that to the 12 regions aggregation for REMIND, I guess.

jmuessel · 2025-01-23T10:17:27Z

Happy to provide plots but atm I cannot access the cluster (I'm in contact with RSE). I need the cluster because my current setup does not allow me to plot the input variables. I'll post the plots here asap.

johannah-pik · 2025-01-23T10:40:49Z

There is also a lot of data affected (thats why I used the script and ggsave), and I think this is the most pressing issue we have right now with the input data (+ it affects other parameters e.g. costs per pkm).
So from my side the focus can be more on solving the issue then on documenting the current version here (if that helps you)

johannah-pik · 2025-01-23T10:49:34Z

I also cannot access the cluster right now, but in /p/projects/edget/PRchangeLog is a recent compScen that covers the annual mileage @robertpietzcker. Aggregating the edget regions to NEU (which is also the case for the data we send to REMIND makes the problem even worse)

robertpietzcker · 2025-01-23T11:47:46Z

My point is that it might be helpful to have a common view of what the problem is :-)
(this is separate from removing any unintentional changes that happened in the refactoring)

so if I look at the compScen from 2025-01-08, I eg see this:

and I would say the main issues here are:

the 70% change from 2005 to 2010 for some regions
For some regions (most visible for UKI) there is a persistent technology differentiation of mileage between techs of >30%, with gases ~2x liquids values.

the small wiggles of <5% may look a bit weird but I wouldn't see them as a major problem that absolutely needs to be fixed

robertpietzcker · 2025-01-23T11:53:38Z

and similarly for H12:

what is NEU doing from 2005-2020 :-)

and EUR looks funny, but if I understand correctly that is just the vehicle-weighted average across the EU countries, so it is reasonable - it is not really an "input", but rather a results-weighted aggregation of inputs, so it is fine if that looks weird

(if that is also the problem for NEU, it would be good to add the plotting of NEN and NES explicitly in the compScen

robertpietzcker · 2025-01-23T12:00:19Z

For cars, I see similar issues:

some regions show twice the mileage for gases than for liquids, also FCEV and hybrids > liquids.
again the drop from 2005-2010 in some regions

more generally: what is the basis for the size- and tech-differentiated difference of mileage between EU regions? (to me it feels like this would be brutally difficult data to collect, to have for each region and technology detailed (and still consistent across regions) data collection of how far a certain car size is driven - so I am wondering if there is some actual data behind this, or if this is just noise?

johannah-pik · 2025-01-23T12:09:18Z

and similarly for H12:
3. what is NEU doing from 2005-2020 :-)
and EUR looks funny, but if I understand correctly that is just the vehicle-weighted average across the EU countries, so it is reasonable - it is not really an "input", but rather a results-weighted aggregation of inputs, so it is fine if that looks weird

(if that is also the problem for NEU, it would be good to add the plotting of NEN and NES explicitly in the compScen

I was referring to these spikes!
In the coupled EDGE-T/REMIND system running on 12 regions, NEN and NES are aggregated to NEU. So I think the spikes are not irrelevant! Same for EUR.

johannah-pik · 2025-01-23T12:13:38Z

The drop between 2005 and 2010 comes from the data sources.
We can argue whether they make sense or not, but in the end this data is not very relevant for our runs.
The differences between the technologies and sizes are a consequence of how missing data is approximated.
@jmuessel implemented that fix and of course that can be reevaluated.

We can also just add single values for all seizes and technologies as it was done for trucks.

robertpietzcker · 2025-01-23T12:45:11Z

thanks!
which source do we use for 2005 and which do we use for 2010 for the mileages?
or is it actually the same source that shows such crazy changes within 5 years?

johannah-pik · 2025-01-23T12:57:19Z

Its TRACCS for EUR regions and UCD for all other regions. Of course we are using the same source for 2005 as for 2010 regions.
We do aggregate over vehicle groups, which could cause some deviation.
We could also just take the 2010 data and extrapolate it to 2005.
We could also let the plots start in 2010 as it is done anyway in many projects :P

robertpietzcker · 2025-01-23T13:08:48Z

In the coupled EDGE-T/REMIND system running on 12 regions, NEN and NES are aggregated to NEU. So I think the spikes are not irrelevant! Same for EUR.

I don't understand the second part - if we ONLY see the crazy up/downs for EUR but NOT for the individual regions (on which EDGE-T runs, if I understood correctly), then the mileage-value for EUR is of no relevance to the model, right?

ahhhh - or does EDGE-T run in H12 resolution when REMIND runs in H12? then I see why this is a problem

robertpietzcker · 2025-01-23T13:11:16Z

And how do we fill in mileage for other techs? I would be surprised if TRACCS has data for FCEVs in 2005?
and somehow the 2005-2010 step seems to be much larger for FCEVs/gases than for liquids (which I guess is the only actual data point in TRACCS?)

robertpietzcker · 2025-01-23T14:24:14Z

Ich hab jetzt mal kurz für Deutschland nachgeschaut - in den TRACCS Daten scheint es von 2005 bis 2010 für liquids um etwa 10-15% anzusteigen;

Aus "Verkehr in Zahlen" sehe ich tatsächlich einen Anstieg:

allerdings steht in den Fußnoten "2) Bezogen auf den Fahrzeugbestand: bis 2006
einschl. der vorübergehend abgemeldeten Fahrzeuge (Stilllegungsfrist 18 Monate). Ab 2007 ohne vorübergehend abgemeldete Fahrzeuge.-" - der starke Sprung 2006 zu 2007 scheint also ein Berechnungsartefakt zu sein.

und bis 2023 geht es wieder runter, wobei es 2017 nochmal einen Sprung in der Berechnugnsmethode gab :-)

insofern würde ich sagen "die Daten sind auch nicht perfekt" :-)

um diesen Effekt "was macht das Modell da seltsames von 2005 bis 2010" zu beseitigen, könnte ich mir vorstellen, einfach in 2005 auch die 2010-Werte zu nehmen? oder einen Durchschnitt der beiden?

Da die 2005->2010-Sprünge für "liquids" eher begrenzt zu sein scheinen, wäre ich aber auch ok damit, die Sprünge dort beizubehalten, solange dieser spezielle Effekt behoben wird, dass der Sprung bei anderen Technologien größer ist :-)

johannah-pik · 2025-01-23T14:30:36Z

Deswegen habe ich ja die iso regionen gecheckt (wie oben besprochen). Die Sprünge sind ein Berechnungsartefakt durch die Aggregation. Die Daten sind nicht perfekt und unsere Berechnungsmethoden sind es auch nicht. Denke da müssen wir einfach schauen, was mit vertretbarem Aufwand die beste "systematische" Lösung ist. Alle Daten einzeln zu tunen wird am Ende zu aufwändig sein und ist dann eben auch immer ein lock-in.
Deswegen finde ich:
1.Sytematische Probleme erkennen (wie zB das Problem mit der Aggregation) + Systematische (und gern auch pragmatische) Lösung finden (wie zB Glättung, Verwendung der 2010er daten in 2005, grade wenn die daten fürs Modell eh nicht so relevant sind)
2. Verbleibende Ungereimheiten/Unplausibilitäten einzeln anpassen

jmuessel · 2025-01-23T14:33:35Z

Ich wäre hier auch pragmatisch. Die Sprünge kommen durch fehlende Daten und darin, wie man die Daten aggregiert und auffühlt kann man diskutieren. Ich habe Teile der Spikes durch meinem mrtransport PR zum Auffüllen von fehlenden Daten erzeugt und würde sagen, dass wir aktuell am besten die Daten Glätten und auf lange Sicht neue Datenquellen einbauen.

Ich würde jetzt:

Werte vor 2010 fixen, da die komisch aussehen (in mrtransport)
Meinen BEV-PR zurücknehmen und die Werte auf mit Mittelwerten über Regionen und Technologien auffüllen (in mrtranpsort)
Mir die Daten in EDGET anschauen und sollten da noch SPikes durch Aggregation hineingekommen sein (auf ISO Level sehen sie ja gut aus), dann werde ich diese dort ausglätten. (in EDEGT)

Analog gehe ich dann auch mit dem LoadFactor um.

jmuessel · 2025-01-24T10:34:45Z

Regarding whether I substitute missing data with the mean over regions or technology, I found that the differences across regions are higher than those across technologies in one region (for cars). I uploaded a very limited number of figures in our transport folder for this validation. This is mainly to show what I plotted against what, happy to add all regions but this would be a lot of figures since they are on iso country-level.

https://cloud.pik-potsdam.de/index.php/f/23191664

johannah-pik assigned jmuessel Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check and adjust annual Mileage #316

Check and adjust annual Mileage #316

johannah-pik commented Jan 22, 2025 •

edited

Loading

robertpietzcker commented Jan 22, 2025

johannah-pik commented Jan 23, 2025

jmuessel commented Jan 23, 2025

johannah-pik commented Jan 23, 2025 •

edited

Loading

johannah-pik commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025 •

edited

Loading

robertpietzcker commented Jan 23, 2025

johannah-pik commented Jan 23, 2025 •

edited

Loading

johannah-pik commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025

johannah-pik commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025 •

edited

Loading

robertpietzcker commented Jan 23, 2025

johannah-pik commented Jan 23, 2025 •

edited

Loading

jmuessel commented Jan 23, 2025 •

edited

Loading

jmuessel commented Jan 24, 2025

Check and adjust annual Mileage #316

Check and adjust annual Mileage #316

Comments

johannah-pik commented Jan 22, 2025 • edited Loading

robertpietzcker commented Jan 22, 2025

johannah-pik commented Jan 23, 2025

jmuessel commented Jan 23, 2025

johannah-pik commented Jan 23, 2025 • edited Loading

johannah-pik commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025 • edited Loading

robertpietzcker commented Jan 23, 2025

johannah-pik commented Jan 23, 2025 • edited Loading

johannah-pik commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025

johannah-pik commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025

robertpietzcker commented Jan 23, 2025 • edited Loading

robertpietzcker commented Jan 23, 2025

johannah-pik commented Jan 23, 2025 • edited Loading

jmuessel commented Jan 23, 2025 • edited Loading

jmuessel commented Jan 24, 2025

johannah-pik commented Jan 22, 2025 •

edited

Loading

johannah-pik commented Jan 23, 2025 •

edited

Loading

robertpietzcker commented Jan 23, 2025 •

edited

Loading

johannah-pik commented Jan 23, 2025 •

edited

Loading

robertpietzcker commented Jan 23, 2025 •

edited

Loading

johannah-pik commented Jan 23, 2025 •

edited

Loading

jmuessel commented Jan 23, 2025 •

edited

Loading