Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parquet endian issue for s390x #40

Conversation

HarryLeeIBM
Copy link

Fix parquet endian issue for s390x which fails the functional test 02242_case_insensitive_nested

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@HarryLeeIBM
Copy link
Author

@rschu1ze , this is another fix in submodule, could you or someone from ClickHouse team give it a review? Thanks!

@rschu1ze
Copy link
Member

rschu1ze commented Aug 23, 2023

Looks generally good to me but before merging allow me two questions:

A) The fix could generally be contributed to upstream. If that isn't too much work, it would be great if you could make a PR.

B) The branch this PR is merging into is origin/release-6.0.1. It is the correct branch, i.e. the one which the top-level ClickHouse repository currently references (good). I wonder how you found out that this is the correct branch? Asking because

  1. the top-level repository references a Git commit, not a branch name, so that's not helpful in determining the branch name. Also, '.gitmodules' in the top-level Git repository shows says the branch name is branch = blessed/release-6.0.1 which apparently is != origin/release-6.0.1.
  2. the branch view of this repository (go here, click "release-6.0.1", then "View all branches") shows as "default branch" release-6.01 which is different from origin/release-6.0.1. Also, the overview that pops up after clicking "View more active branches" doesn't show origin/release-6.0.1 somewhere. Again, not helpful to determine the branch name.

Chances are that the mess has historical reasons, we should just be careful to not miss patches.

@HarryLeeIBM
Copy link
Author

Looks generally good to me but before merging allow me two questions:

A) The fix could generally be contributed to upstream. If that isn't too much work, it would be great if you could make a PR.

B) The branch this PR is merging into is origin/release-6.0.1. It is the correct branch, i.e. the one which the top-level ClickHouse repository currently references (good). I wonder how you found out that this is the correct branch? Asking because

  1. the top-level repository references a Git commit, not a branch name, so that's not helpful in determining the branch name. Also, '.gitmodules' in the top-level Git repository shows says the branch name is branch = blessed/release-6.0.1 which apparently is != origin/release-6.0.1.
  2. the branch view of this repository (go here, click "release-6.0.1", then "View all branches") shows as "default branch" release-6.01 which is different from origin/release-6.0.1. Also, the overview that pops up after clicking "View more active branches" doesn't show origin/release-6.0.1 somewhere. Again, not helpful to determine the branch name.

Chances are that the mess has historical reasons, we should just be careful to not miss patches.

A. I will raise PR for upstream, thanks for reminding me for this.
B. The branch to be merged is actually origin/release-11.0.0. The .gitmodules file has incorrect branch name(which should be origin/release-11.0.0). Anyhow it picks the actual branch according to the hash of commit(which is in origin/release-11.0.0, and it is quite different from default origin/release-6.0.1) which seems like a glitch of github. I will update the .gitmodules file with the correct branch name(origin/release-11.0.0) to avoid this confusion after this PR is merged.
Thanks!

rschu1ze added a commit to rschu1ze/ClickHouse that referenced this pull request Aug 23, 2023
Branch references becomes outdated too easily and they mostly spread
confusion (see the discussion in ClickHouse/arrow#40).
@rschu1ze
Copy link
Member

The branch to be merged is actually origin/release-11.0.0.

Ah, yes, thanks. I am unfortunately still confused. If I cd contrib/arrow/ from the top-level repository and run git log, I get

1f1b3d35f 2023-03-31 14:28:40 +0000 avogar (HEAD) Fix conflicts issues
c8cfbe28c 2023-03-29 16:51:05 +0800 taiyang-li allow map key to be optional
9ef654ded 2023-01-20 15:39:41 +0000 avogar Don't abort in ~CerrLog
4e680d292 2022-08-02 08:24:54 +0000 avogar Fix 'undefined symbol: pthread_atfork' on PowerPC64
a3f96587c 2022-08-09 10:08:45 +0000 avogar Fix deadlock with msan
5cc79b775 2022-03-08 13:07:43 +0800 taiyang-li fix building issue introduced by https://github.com/ClickHouse-Extras/arrow/pull/9
cbb5211d2 2022-02-16 12:35:40 +0800 taiyang-li add interface to get raw orc reader from adapters
f10f5cfd1 2023-01-18 09:38:47 +0100 Raúl Cumplido MINOR: [Release] Update versions for 11.0.0
f87c2b2ae 2023-01-18 09:38:43 +0100 Raúl Cumplido MINOR: [Release] Update .deb/.rpm changelogs for 11.0.0
6d0f608d1 2023-01-18 09:38:43 +0100 Raúl Cumplido MINOR: [Release] Update CHANGELOG.md for 11.0.0
ccd169cae 2023-01-18 09:16:31 +0100 Raúl Cumplido GH-14997: [Release] Ensure archery release tasks works with both new style GitHub issues and old style JIRA issues (#33615)
66332515d 2023-01-17 18:55:00 +0000 Nic Crane GH-33526: [R] Implement new function open_dataset_csv with signature more closely matching read_csv_arrow (#33614)
c2199dc49 2023-01-17 10:15:08 -0800 Will Jones GH-20512: [Python] Numpy conversion doesn't account for ListArray offset (#15210)
8d1e357b7 2023-01-17 18:53:02 +0100 Antoine Pitrou GH-14875: [C++] C Data Interface: check imported buffer for non-null (#14814)
[...]

Going through the rest of the history, it is the case that the last commits were the first two ones done in March 2023.

If I open the branch view for origin/release-11.0.0, the last commit was done in August 2023 ("Allow to get number of rows in record batch").

This makes me think that origin/release-11.0.0 is not exactly the right branch to merge into.

I could imagine that the top-level repository actually references a detached Git head, i.e. a commit that's not part of a branch. I have no idea how it is possible in GitHub to achieve such a state. Anyways, in order to not mess things further up, it would be nice if you could checkout the actual commit referenced by the top-level repository (1f1b3d3) and pick the commit in this PR on top. Then make a new branch and push it.

I will update the .gitmodules file with the correct branch name(origin/release-11.0.0) to avoid this confusion after this PR is merged.

Quickly did that for all branch references already (ClickHouse/ClickHouse#53763).

@HarryLeeIBM
Copy link
Author

The branch to be merged is actually origin/release-11.0.0.

Ah, yes, thanks. I am unfortunately still confused. If I cd contrib/arrow/ from the top-level repository and run git log, I get

1f1b3d35f 2023-03-31 14:28:40 +0000 avogar (HEAD) Fix conflicts issues
c8cfbe28c 2023-03-29 16:51:05 +0800 taiyang-li allow map key to be optional
9ef654ded 2023-01-20 15:39:41 +0000 avogar Don't abort in ~CerrLog
4e680d292 2022-08-02 08:24:54 +0000 avogar Fix 'undefined symbol: pthread_atfork' on PowerPC64
a3f96587c 2022-08-09 10:08:45 +0000 avogar Fix deadlock with msan
5cc79b775 2022-03-08 13:07:43 +0800 taiyang-li fix building issue introduced by https://github.com/ClickHouse-Extras/arrow/pull/9
cbb5211d2 2022-02-16 12:35:40 +0800 taiyang-li add interface to get raw orc reader from adapters
f10f5cfd1 2023-01-18 09:38:47 +0100 Raúl Cumplido MINOR: [Release] Update versions for 11.0.0
f87c2b2ae 2023-01-18 09:38:43 +0100 Raúl Cumplido MINOR: [Release] Update .deb/.rpm changelogs for 11.0.0
6d0f608d1 2023-01-18 09:38:43 +0100 Raúl Cumplido MINOR: [Release] Update CHANGELOG.md for 11.0.0
ccd169cae 2023-01-18 09:16:31 +0100 Raúl Cumplido GH-14997: [Release] Ensure archery release tasks works with both new style GitHub issues and old style JIRA issues (#33615)
66332515d 2023-01-17 18:55:00 +0000 Nic Crane GH-33526: [R] Implement new function open_dataset_csv with signature more closely matching read_csv_arrow (#33614)
c2199dc49 2023-01-17 10:15:08 -0800 Will Jones GH-20512: [Python] Numpy conversion doesn't account for ListArray offset (#15210)
8d1e357b7 2023-01-17 18:53:02 +0100 Antoine Pitrou GH-14875: [C++] C Data Interface: check imported buffer for non-null (#14814)
[...]

Going through the rest of the history, it is the case that the last commits were the first two ones done in March 2023.

If I open the branch view for origin/release-11.0.0, the last commit was done in August 2023 ("Allow to get number of rows in record batch").

This makes me think that origin/release-11.0.0 is not exactly the right branch to merge into.

I could imagine that the top-level repository actually references a detached Git head, i.e. a commit that's not part of a branch. I have no idea how it is possible in GitHub to achieve such a state. Anyways, in order to not mess things further up, it would be nice if you could checkout the actual commit referenced by the top-level repository (1f1b3d3) and pick the commit in this PR on top. Then make a new branch and push it.

I will update the .gitmodules file with the correct branch name(origin/release-11.0.0) to avoid this confusion after this PR is merged.

Quickly did that for all branch references already (ClickHouse/ClickHouse#53763).

I had exact confusion as yours before. If I directly checkout 1f1b3d3 under default origin/release-6.0.1, github complains the commit 1f1b3d3 cannot be found. Only after I checkout branch origin/release-11.0.0, then the commit is found and I can check it out and add commit based on it. Let me know if it can clarify the situation.

@rschu1ze
Copy link
Member

I just made a new branch called clickhouse/release-11.0.0 (here) which corresponds exactly to what the top-level repo references. Could you abandon this PR and make a new one against the new branch?

@HarryLeeIBM
Copy link
Author

I just made a new branch called clickhouse/release-11.0.0 (here) which corresponds exactly to what the top-level repo references. Could you abandon this PR and make a new one against the new branch?

Sure, I am closing this PR

@rschu1ze
Copy link
Member

Let me know if there is a new PR ... happy to merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants