Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid UTF-8 bytes after “=" "==" #23

Open
Lemonononon opened this issue Nov 20, 2023 · 3 comments
Open

invalid UTF-8 bytes after “=" "==" #23

Lemonononon opened this issue Nov 20, 2023 · 3 comments

Comments

@Lemonononon
Copy link

Hi @powturbo
Thanks for your great work!
Recently, I discovered that when using this library to encode image files, there are some strange characters appearing at the end. Printing them out shows 'NULL' or just some patterns. ( Like ”7mlbMjdKxLobZAOx6jFekoqMbHg==�#��+Z��8Z�s)��k_H���pd�?���Ծ ” "Px/wA7sn4uWWf/AAj/AA3/ALQooor0Yg==NULLNULLNULL"

My code:

std::ifstream ifs(file_path, std::ios::binary);
if (!ifs.is_open()) {
    std::cerr << "Unable to open file: " << file_path << std::endl;
}

ifs.seekg(0, std::ios::end);
auto size = ifs.tellg();
ifs.seekg(0, std::ios::beg);

// Read the file content into a char buffer
auto buf = new unsigned char[size];
ifs.read((char *) buf, size);

//use turbobase64
auto outsize = tb64enclen(size);
auto out = new uint8_t[outsize];

size_t num_enc = tb64enc(buf, size, out); //error handle

out[num_enc] = 0;

std::string str_encode(out, out + num_enc);

std::cout << str_encode << std::endl;

I'm confused. Shouldn't the size of a string converted to Base64 be fixed? Why are there unknown characters appearing

@powturbo
Copy link
Owner

The output size is fixed to ((input_size + 2)/3 * 4).
You must use : auto out = new uint8_t[outsize +1]; when you put 0 at the end of the buffer with out[num_enc] = 0.

@Lemonononon
Copy link
Author

@powturbo
Thank you! Previously, I discovered this issue and made attempts using [output_size+1], but I still couldn't achieve the desired outcome. What I meant is that the length of the entire string ( including the non-UTF-8 characters after the == ) equals to ((input_size + 2)/3 * 4).

Afterwards, I directly added an identical cpp file to the library source code, and the compiled, the executed result was correct. And then I found that when using the static lib local installed ( cmake .. && make install ), only the results from tb64senc are correct, as shown in the following image. And I added 'set(BUILD_SHARED_LIBS ON)' to the CMakeLists.txt file to get shared lib, then all the results were correct. ( This result was reproduced on two computers running Ubuntu os ) . My problem is resolved now, but I'm still confused. I'll do my best to provide you with the information I have

a45b371f8f59447da86f4c2e97168b6

@powturbo
Copy link
Owner

There are not separate functions for static and dynamic linking.
Wondering why you're getting different sizes depending on the linking mode.
Anyway the correct size is ((input_size + 2)/3 * 4), the base64 characters are all ascii and with the same utf-8 1 byte coding points.
You can decode the base64 encoded and check against the original string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants