Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable automatic git CRLF conversion at checkout #900

Merged
merged 1 commit into from
Jan 21, 2025
Merged

Disable automatic git CRLF conversion at checkout #900

merged 1 commit into from
Jan 21, 2025

Conversation

reyammer
Copy link
Collaborator

@reyammer reyammer commented Jan 21, 2025

Fixes #899.

FYI @ia0 @invernizzi

Fixes #899.
@reyammer reyammer requested a review from invernizzi as a code owner January 21, 2025 22:44
@reyammer reyammer merged commit 0cc5ed5 into main Jan 21, 2025
53 of 54 checks passed
@reyammer reyammer deleted the fix899 branch January 21, 2025 22:53
@ia0
Copy link
Member

ia0 commented Jan 22, 2025

It's probably better to use .gitattributes (at the repository root) for that, otherwise developers on Windows still have the issue.

# Make sure test files are checked-in verbatim.
/tests_data/** -text

@ia0
Copy link
Member

ia0 commented Jan 22, 2025

But this issue is still interesting, because it raises those questions:

  • Is it a valid Python source to use CRLF instead of LF? If yes, do we have such training data (to avoid misclassification on Windows)?
  • What about other text formats? The same questions apply.

reyammer added a commit that referenced this pull request Jan 22, 2025
This fixes #899 in a cleaner way, and somehow reverts #900.
reyammer added a commit that referenced this pull request Jan 22, 2025
This fixes #899 in a cleaner way, and somehow reverts #900.
@reyammer
Copy link
Collaborator Author

To your questions:

Is it a valid Python source to use CRLF instead of LF? If yes, do we have such training data (to avoid misclassification on Windows)?

we don't have a specific dataset that makes sure we have CRLF, but given the size of the dataset, I would expect some samples to have CRLF. And the fact that this bug has been there all along without being detected makes me think we should be OK. This feels just an unlucky FP with the new model (it never affected any other file for any of the 5 models we tested so far). But definitively something to keep in mind. If we get some feedback that this is causing actual issues, we should be able to add some augmentation and train for that as well.

What about other text formats? The same questions apply.

Same as above; we didn't do anything specific about it. But so far it doesn't seem a big deal.

reyammer added a commit that referenced this pull request Jan 22, 2025
This fixes #899 in a cleaner way, and somehow reverts #900.
@ia0
Copy link
Member

ia0 commented Jan 22, 2025

Sounds good. And reading through the lines I guess it is valid to use CRLF in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix github workflows so that git checkout does not mess up CRLF on windows
2 participants