Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use keys from across all rows to construct csv object #2478

Merged
merged 6 commits into from
Oct 24, 2023

Conversation

arjunlalb
Copy link
Contributor

Description

CSV download logic should look at all keys across the row list, not just first row.

@arjunlalb arjunlalb requested a review from a team as a code owner October 21, 2023 08:11
@codecov
Copy link

codecov bot commented Oct 21, 2023

Codecov Report

Merging #2478 (7e397df) into main (dd44c68) will decrease coverage by 30.29%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##             main    #2478       +/-   ##
===========================================
- Coverage   83.05%   52.77%   -30.29%     
===========================================
  Files         925      925               
  Lines       20612    20610        -2     
  Branches     3256     3254        -2     
===========================================
- Hits        17120    10876     -6244     
- Misses       3373     9319     +5946     
- Partials      119      415      +296     
Files Coverage Δ
...src/download-file/service/file-download.service.ts 91.66% <100.00%> (+1.66%) ⬆️

... and 299 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

const allKeys = getAllDataKeys(data);
const allKeysDict = allKeys.reduce((acc, key) => ({ ...acc, [key]: undefined }), {});

return data.map(row => ({ ...allKeysDict, ...row }));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're using object.values to iterate this below, we're still relying here on insertion order being consistent, which it won't be if we're spreading like this (also IIRC there's some nuances to the iteration order of object key/values, so best not to rely on it). suggest we iterate on keys to access values rather than using them as-is to overwrite empty values like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting we do this instead?:

data.map((row) => {
  const newRow = Object.fromEntries(allKeys.map((key) => [key, row?.[key] ?? undefined]));
  return newRow;
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's multiple approaches here, but I probably would avoid mapping it into an object and back out again. Not only is it extra allocations, but more important - you're relying on semantics (both of .fromEntries and of .values). Instead, map allKeys to the stringify calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

// If given header is empty, then create header from the data keys
const header$ = isEmpty(config.header)
? config.dataSource.pipe(
map(data => Object.keys(data[0] ?? [])),
map(keys => keys.map(startCase))
map(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give there's a decent amount of logic here, can we also write a real test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 one the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions

This comment has been minimized.

const values$: Observable<string[][]> = config.dataSource.pipe(
// All rows are not guaranteed to have all keys, so add all keys to all rows
map(data => {
const allKeys = getAllDataKeys(data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: now that I read this a little closer, we're being pretty inefficient on doing this. We're iterating all the values to calculate the keys twice, iterating to calculate the values once and then iterating again to merge them. If you think that makes it more readable, by all means go for it. Personally, I think it does the opposite - I'd try to pipe off the data source a single time.

More importantly though, I see the header can be specified via config. If that's the case, we still have an issue. If I give a config of "header1, header2" and data containing extra keys (or keys in a different order) both the prior code and our new code would produce mismatched rows. However we calculate the header, we need to use that same result to extract the data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: 1a62ad4 (#2478)

// If given header is empty, then create header from the data keys
const header$ = isEmpty(config.header)
? config.dataSource.pipe(
map(data => Object.keys(data[0] ?? [])),
map(keys => keys.map(startCase))
map(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump here.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@adisreyaj adisreyaj enabled auto-merge (squash) October 24, 2023 09:12
@adisreyaj adisreyaj merged commit 3170394 into main Oct 24, 2023
@adisreyaj adisreyaj deleted the use-all-keys-csv branch October 24, 2023 09:22
@github-actions

This comment has been minimized.

@github-actions
Copy link

Unit Test Results

       4 files     316 suites   47m 6s ⏱️
1 131 tests 1 131 ✔️ 0 💤 0 ❌
1 141 runs  1 141 ✔️ 0 💤 0 ❌

Results for commit 3170394.

)
: of(config.header!);

const getAllDataKeys = (data: Dictionary<unknown>[]): string[] => uniq(data.map(row => Object.keys(row)).flat());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - now we only do this once, don't need to declare it as a separate function.

// Value replacer for null and undefined values
const replacer = (_: string, value: string) => value ?? '-';
const csvData$ = config.dataSource.pipe(
map(data => {
const allKeys = isEmpty(config.header) ? getAllDataKeys(data) : config.header!;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major concern here is there's a big assumption that the header provided is keys, and not already cased/renamed. Do we know that to be true for all use cases? If it is, I wouldn't even call it the header any more since that sounds like it should not impact the extraction. Would call it like csvColumn with a key and optional title or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants