You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been working with the methods to_stacked_array() and to_unstacked_dataset(), and I'm puzzled about how they interact with each other. According to the docstring of to_unstacked_dataset(), these methods are meant to be inverse operations:
This is the inverse operation of Dataset.to_stacked_array.
However, my experiments with these methods are showing something different, and I'm wondering if I'm missing something or if there's a subtle issue at hand.
In particular, several unexpected observations have caught my attention:
Unstacking Variables, Not Dimensions: to_unstacked_dataset() only unstacks variables, leaving other dimensions that have been stacked by to_stacked_array() unchanged. This behavior might be intentional, but it's certainly something that has surprised me.
Broadcasting Dimensions: In my example, to_unstacked_dataset() broadcasts dimensions to align all variables on the same dimensionality. This is perplexing, as it seems to conflict with what I perceive to be the core purpose of the method. Shouldn't the method retain the original dimensions?
Altering Data Values: Perhaps most concerning, to_unstacked_dataset() appears to change the actual data of my variables, replacing some values with NaN. This alteration seems entirely unintended, as one would naturally expect that any stacking or unstacking operation would preserve the integrity of the underlying data.
MWE
importxarrayasxrarr=xr.DataArray(
np.arange(8).reshape(2, 2, 2),
coords=[("time", [2000, 2001]), ("lon", [3., 4.]), ("lat", [5., 6.])],
)
data=xr.Dataset({"da1": arr, "da2": arr.isel(lon=0)})
stacked=data.to_stacked_array("feature", ["time"]) # stacked data looks perfectshould_be_unstacked=stacked.to_unstacked_dataset(dim="feature", level='variable') # while variables get unstacked into `da1` and `da2`, `feature` dimension remains stacked Multiindex (is that intended?)really_unstacked=should_be_unstacked.unstack()
# now we've *almost* reconstructed the original data but...data.identical(should_be_unstacked) # False // because not really unstackeddata.identical(really_unstacked) # False // because dimensions have been broadcastedunique_entries_in_original_da1=set(np.unique(data['da1'].data).tolist())
unique_entries_in_unstacked_da1=set(np.unique(really_unstacked['da1'].data.astype(float).tolist())
unique_entries_in_original_da1==unique_entries_in_unstacked_da1# False // some entries have been unintentionally replace by NaNs!
Questions:
Intended Behavior? Is the behavior I'm observing intended, or am I perhaps misusing the to_stacked_array() and to_unstacked_dataset() methods in some way?
Correct Restoration? If I want to restore the data in the example above after calling to_stacked_array(), what would be the correct approach?
I'd greatly appreciate any insights or guidance on this matter. Thanks in advance for your help! 🙏
EDIT 11/08/23: The more I look into this, the more it seems like a bug. I noticed that things are happening that shouldn't, like changing the data when I'm only using stacking operations. The broadcasting step also seems out of place. To help explain what's going on, I've made my example above simpler. If this is a mistake, I'd really like to know how to fix it. Thanks! 🐛
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello everyone! 😊
I've been working with the methods
to_stacked_array()
andto_unstacked_dataset()
, and I'm puzzled about how they interact with each other. According to the docstring ofto_unstacked_dataset()
, these methods are meant to be inverse operations:However, my experiments with these methods are showing something different, and I'm wondering if I'm missing something or if there's a subtle issue at hand.
In particular, several unexpected observations have caught my attention:
to_unstacked_dataset()
only unstacks variables, leaving other dimensions that have been stacked byto_stacked_array()
unchanged. This behavior might be intentional, but it's certainly something that has surprised me.to_unstacked_dataset()
broadcasts dimensions to align all variables on the same dimensionality. This is perplexing, as it seems to conflict with what I perceive to be the core purpose of the method. Shouldn't the method retain the original dimensions?to_unstacked_dataset()
appears to change the actual data of my variables, replacing some values withNaN
. This alteration seems entirely unintended, as one would naturally expect that any stacking or unstacking operation would preserve the integrity of the underlying data.MWE
Questions:
to_stacked_array()
andto_unstacked_dataset()
methods in some way?data
in the example above after callingto_stacked_array()
, what would be the correct approach?I'd greatly appreciate any insights or guidance on this matter. Thanks in advance for your help! 🙏
EDIT 11/08/23: The more I look into this, the more it seems like a bug. I noticed that things are happening that shouldn't, like changing the data when I'm only using stacking operations. The broadcasting step also seems out of place. To help explain what's going on, I've made my example above simpler. If this is a mistake, I'd really like to know how to fix it. Thanks! 🐛
Beta Was this translation helpful? Give feedback.
All reactions