You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When given a KeyedTraversable with duplicate keys, Dict\chunk() will return chunks smaller than the chunk size. Some (but not all) duplicate keys will be lost.
* Returns a vec containing the original dict split into chunks of the given
* size.
*
* If the original dict doesn't divide evenly, the final chunk will be
* smaller.
The vec will not contain all the elements of the KeyedTraversable ("original dict") and the chunks will not be of the "given size".
Standalone code, or other way to reproduce the problem
functionkeyed_traversable_with_duplicate_keys(
): KeyedTraversable<string, int> {
// a => 0, b => 1, c => 2, a => 3, b => 4, c => 5, ...for ($i=0; $i<15; $i+=3) {
yield'a'=>$i;
yield'b'=>$i+1;
yield'c'=>$i+2;
}
}
<<__EntryPoint>>
functionmain(): void {
echo \json_encode(\HH\Lib\Dict\chunk(keyed_traversable_with_duplicate_keys(), 5));
// [{"a":3,"b":4,"c":2},{"c":8,"a":9,"b":7},{"b":13,"c":14,"a":12}]
}
Steps to reproduce the behavior:
Run the repro listed above
Expected behavior
There is no "correct" behavior for a KeyedTraversable with duplicate keys that makes much sense.
Consuming items until the accumulating dict is 5 elements long would result in [{"a":12,"b":13,"c":14}].
The current behavior silently drops elements from the provided KeyedTraversable. The elements that are dropped depend on the requested size. The order of the sub dicts differs from the order of the supplied KeyedTraversable.
My preferred solution would be to restrict the input type to KeyedContainer<Tk, Tv>. KeyedContainers can not have duplicate keys. This forces callers with a KeyedTraversable to call the function like this:
The loss of data will be more reasonable this way. Later values of duplicate keys overwrite previous occurrences. The order of the elements is maintained. The sub dicts will all be of the correct size, except for the trailing dict.
In order to support users who depended on the current behavior, I'd move the current implementation to the Legacy_FIXME namespace.
/** * Returns a vec containing the original dict split into chunks of the given * size. If the original dict doesn't divide evenly, the final chunk will be * smaller.*/functionchunk<Tk, Tv>(
KeyedTraversable<Tk, Tv> $traversable,
int$size,
): vec<dict<Tk, Tv>> {
invariant($size>0, 'Chunk size must be positive.');
$result=vec[];
$ii=0;
foreach ($traversableas$key=>$value) {
if ($ii%$size===0) {
$result[] =dict[];
}
$result[(int)($ii/$size)][$key] =$value;
$ii++;
}
return$result;
}
/** * Returns a vec containing the original dict split into chunks of the given * size. * * If the original dict doesn't divide evenly, the final chunk will be * smaller. * * Time complexity: O(n) * Space complexity: O(n)*/functionchunk<Tk as arraykey, Tv>(
KeyedTraversable<Tk, Tv> $traversable,
int$size,
)[]: vec<dict<Tk, Tv>> {
invariant($size>0, 'Expected positive chunk size, got %d.', $size);
$result=vec[];
$ii=0;
$chunk_number=-1;
foreach ($traversableas$key=>$value) {
if ($ii%$size===0) {
$result[] =dict[];
$chunk_number++;
}
$result[$chunk_number][$key] =$value;
$ii++;
}
return$result;
}
The text was updated successfully, but these errors were encountered:
Describe the bug
When given a KeyedTraversable with duplicate keys, Dict\chunk() will return chunks smaller than the chunk size. Some (but not all) duplicate keys will be lost.
The vec will not contain all the elements of the KeyedTraversable ("original dict") and the chunks will not be of the "given size".
Standalone code, or other way to reproduce the problem
Steps to reproduce the behavior:
Expected behavior
There is no "correct" behavior for a KeyedTraversable with duplicate keys that makes much sense.
Consuming items until the accumulating dict is 5 elements long would result in
[{"a":12,"b":13,"c":14}]
.The current behavior silently drops elements from the provided KeyedTraversable. The elements that are dropped depend on the requested size. The order of the sub dicts differs from the order of the supplied KeyedTraversable.
My preferred solution would be to restrict the input type to
KeyedContainer<Tk, Tv>
. KeyedContainers can not have duplicate keys. This forces callers with a KeyedTraversable to call the function like this:The loss of data will be more reasonable this way. Later values of duplicate keys overwrite previous occurrences. The order of the elements is maintained. The sub dicts will all be of the correct size, except for the trailing dict.
In order to support users who depended on the current behavior, I'd move the current implementation to the
Legacy_FIXME
namespace.Actual behavior
[{"a":3,"b":4,"c":2},{"c":8,"a":9,"b":7},{"b":13,"c":14,"a":12}]
Environment
Additional context
This behavior has existed since the first public release of the hsl.
https://github.com/hhvm/hsl/blob/e3bb5e944f0639819f4d6416205449418cbc96c5/src/dict/transform.php#L20-L35
https://github.com/hhvm/hsl/blob/2615e1a30ced5518339a46f971c0434249ab5617/src/dict/transform.php#L15-L42
The text was updated successfully, but these errors were encountered: