Character encoding inconsistency / reporting #5681

solardiz · 2025-03-04T08:00:06Z

Testing against the test vectors from openwall/john-samples#31 I am only able to directly crack the simple password 12345678. For cracking the complex password, I have to first process the wordlist through iconv -f utf8 -t iso-8859-1. I guess it got inadvertently converted the other way somewhere on the way to git commit? Should we replace it with the result of this iconv with a subsequent commit?

Oh, alternatively I am able to get it cracked by adding -target-enc=iso-8859-1.

@magnumripper @davidedg please suggest how to fix this encoding issue best, to minimize user confusion and users' wasted time on running with wrong encoding settings. Right now, by default we print Using default input encoding: UTF-8, but with the input wordlist actually in UTF-8 we fail to crack this password. So it feels like a bug.

The text was updated successfully, but these errors were encountered:

magnumripper · 2025-03-04T10:14:17Z

I generally recommend always using UTF-8 for wordlists, and -target-enc where needed. For samples however, maybe it's better to have it as the expected encoding already in the password hint file. If we do that, I suggest we use both encodings in the password hint file: Keep the UTF-8 and add one in ISO. Then also explain this with #!comment: lines!

If we do not change oubliette-passwords.txt (and perhaps even if we do), we should add some kind of README that explains the situation and the -target-enc option.

magnumripper · 2025-03-04T10:29:43Z

Right now, by default we print Using default input encoding: UTF-8, but with the input wordlist actually in UTF-8 we fail to crack this password. So it feels like a bug.

We could amend the output when --target-encoding is not used, such as:

Using default input encoding: UTF-8 and expecting target encoding to the same

or

Using default input encoding: UTF-8
Expected target encoding: UTF-8

magnumripper · 2025-03-04T10:38:30Z

Right now, by default we print Using default input encoding: UTF-8, but with the input wordlist actually in UTF-8 we fail to crack this password. So it feels like a bug.

We could amend the output when --target-encoding is not used, such as:
Using default input encoding: UTF-8 and expecting target encoding to the same
or
Using default input encoding: UTF-8
Expected target encoding: UTF-8

Hmm no, that ends up even more confusing for the case when no encoding option is used, but the wordlist is already (in this case) in ISO-8859-1. So maybe we should change the Using default input encoding: UTF-8 to Expecting input encoding to match target encoding (for that case, but not if FMT_UNICODE)

solardiz · 2025-03-04T17:31:18Z

Expecting input encoding to match target encoding

I like this one. Maybe even: Expecting input character encoding to match the target encoding to be clearer what kind of encoding we refer to.

I recall that there are cases where passing -enc=raw makes a difference, so perhaps the above isn't always the default?

magnumripper · 2025-03-05T08:43:53Z

be clearer what kind of encoding we refer to.

What could it be other than character encoding?

I recall that there are cases where passing -enc=raw makes a difference, so perhaps the above isn't always the default?

For Unicode formats like NT, -enc=raw affects the conversion to UTF-16 (will behave like old john, which in turn behaves exactly like -enc=iso-8859-1 - perhaps we should clearly say so).
For any format including Unicode ones, rules processing with -enc=raw will also behave like old john: Can only lower/upper case ASCII, all character classes are ASCII only, and so on.

We could add a line when rules are in use with RAW:

Rules will not fully support non-ASCII characters

or s/support/handle/

solardiz · 2025-03-05T15:06:12Z

What could it be other than character encoding?

e.g. base64 ;-)

Rules will not fully support non-ASCII characters

or s/support/handle/

Yes, we could. Maybe prefix it with "Note: " like we do for some other things that are almost but not quite warnings.

solardiz added the bug label Mar 4, 2025

solardiz added this to the Potentially 2.0.0 milestone Mar 4, 2025

solardiz mentioned this issue Mar 4, 2025

Oubliette Password Manager support #5680

Merged

magnumripper self-assigned this Mar 5, 2025

magnumripper changed the title ~~Oubliette character encoding inconsistency~~ Character encoding inconsistency / reporting Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character encoding inconsistency / reporting #5681

Character encoding inconsistency / reporting #5681

solardiz commented Mar 4, 2025

magnumripper commented Mar 4, 2025

magnumripper commented Mar 4, 2025

magnumripper commented Mar 4, 2025 •

edited

Loading

solardiz commented Mar 4, 2025

magnumripper commented Mar 5, 2025

solardiz commented Mar 5, 2025

Character encoding inconsistency / reporting #5681

Character encoding inconsistency / reporting #5681

Comments

solardiz commented Mar 4, 2025

magnumripper commented Mar 4, 2025

magnumripper commented Mar 4, 2025

magnumripper commented Mar 4, 2025 • edited Loading

solardiz commented Mar 4, 2025

magnumripper commented Mar 5, 2025

solardiz commented Mar 5, 2025

magnumripper commented Mar 4, 2025 •

edited

Loading