Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capitalize package name #5

Closed
tkelman opened this issue Jan 11, 2016 · 19 comments
Closed

Capitalize package name #5

tkelman opened this issue Jan 11, 2016 · 19 comments

Comments

@tkelman
Copy link
Contributor

tkelman commented Jan 11, 2016

If you plan on registering this. Otherwise will sort at the very end. Only other non capitalized package name is kNN, and that is deprecated and doesn't have any tags.

@nalimilan
Copy link
Member

Yeah, I wasn't sure about that. Sorting isn't a real issue, but consistency is. The name of the interface is really iconv, so... What would you suggest else, Iconv or IConv?

@nalimilan
Copy link
Member

We could also imagine giving it a more evocative name, like StringEncodings, StringEncoders...

@tkelman
Copy link
Contributor Author

tkelman commented Jan 11, 2016

Case sensitive scripts might also miss the package - potentially PkgEval or others.

If the I stands for something I guess IConv would be better than Iconv, but a more descriptive name without referring to the library name jargon would work too.

@ScottPJones
Copy link
Contributor

Either StringEncodings or StringEncoders sounds fine to me, much more general, and if you end up using something besides the iconv libraries to implement the encoding/decoding in the future, or for different platforms, would be more accurate.

@nalimilan
Copy link
Member

The question is: if at some point we want to write a pure-Julia implementation, will be switch the package to use it, or create another package exporting the same API?

BTW, @ScottPJones, when you'll feel like writing something in Julia, I've discovered this MIT-licensed Node.js package which sounds cool to take as a base: https://github.com/ashtuchkin/iconv-lite/

@ScottPJones
Copy link
Contributor

I already have some structures/code in Julia that I'll start benchmarking against iconv & ICU, right now just for all the 8-bit mappings, as well as some ideas on efficiently doing some different mb <-> Unicode
codecs to handle the rest. I'd love to have Julia get to the point where it character set conversions are consistent across all platforms, along with all of the nice features that Python 3.5 supports.

@ScottPJones
Copy link
Contributor

@nalimilan So as not to continue off-topic here, I've created another issue to discuss performance / using pure Julia, #8
The link you sent for the Node.js package has some useful bits, like the method of representing the tables as JSON, those at least can be used as input to produce compact binary tables to load for handling multi-byte character sets.

@ScottPJones
Copy link
Contributor

@nalimilan Would you be up for a PR to change the name from iconv.jl to StringEncoders.jl?
I think, besides Tony's considerations of not wanting all lower case, that it really deserves to be a nice generic package.

@nalimilan
Copy link
Member

So, StringEncodings or StringEncoders? I would think "encoding" is the name most people will be looking for.

@ScottPJones
Copy link
Contributor

I was leaning towards StringEncoders for this, because I'd seen in the past, if you have a module or package Foobars, it contains a type (or function) Foobar (and this has StringEncoder & StringDecoder).
Also, I think StringEncodings might be a more suitable name for a package/module with a parameterized StringEncoding type.
(I'd like to make one with traits to handle different encodings (little-endian, big-endian, and/or native-endian vs. opposite-endian, 8-bit, 16-bit or 32-bit codeunits, linear indexed vs. not, Unicode or not, etc.)

@nalimilan
Copy link
Member

I was leaning towards StringEncoders for this, because I'd seen in the past, if you have a module or package Foobars, it contains a type (or function) Foobar (and this has StringEncoder & StringDecoder).

Yeah, but then we could name the package either StringEncoders and StringDecoders. The former isn't great because in many cases people will simply be looking for a way of reading a text file, and won't think about encoding anything.

Also, I think StringEncodings might be a more suitable name for a package/module with a parameterized StringEncoding type.
(I'd like to make one with traits to handle different encodings (little-endian, big-endian, and/or native-endian vs. opposite-endian, 8-bit, 16-bit or 32-bit codeunits, linear indexed vs. not, Unicode or not, etc.)

It such a type proves useful, why wouldn't it live in this package instead of elsewhere? Couldn't it be used to speed up conversions?

@ScottPJones
Copy link
Contributor

StringDecoders would also be fine, maybe even StringConvert, which works both ways?
StringEncodings would be useful as the basis for a new, more efficient parameterized String
type, such as in https://github.com/quinnj/Strings.jl, so I think it would be best to be separate,
at least for now.

@ScottPJones
Copy link
Contributor

Make that StringConverters maybe instead.

@nalimilan
Copy link
Member

Sorry, but try typing "string converter" in your preferred search engine, and compare to "string encoding". The latter clearly reflects better the goal of the package.

If you create a package for encoded strings, why not call it EncodedStrings.jl? It would logically complement StringEncodings.jl.

@ScottPJones
Copy link
Contributor

OK, fine, I'll run up a PR to change this to StringEncodings.jl if that's your favorite.
I'm thinking it deserves to not be hidden under the "iconv" name.

@nalimilan
Copy link
Member

No worries, I've just done the rename. Now we need to decide what's needed before we tag a release.

@ScottPJones
Copy link
Contributor

Great!
The thing that I feel might be needed (but could be added as an enhancement later), is to have the API for specifying the different types of strategies for invalid input, as we'd discussed before.
I think it is critical that it be done in such a way as to preserve optimal performance, so I'd recommend against keywords or using symbols to select.
I think an extra positional argument, that might take: a character, a string, or a type, might work best.
Default behaviour would be as now, to raise an exception.

@nalimilan
Copy link
Member

This is definitely an improvement that can be added later, without breaking anything. I only wonder whether there are things that would need to be done immediately. For example, more efficient versions of encode/decode which do not create StringEncoder/StringDecoder objects just to destroy them one second later.

@ScottPJones
Copy link
Contributor

I had been wondering about speeding things up by using a Dict to cache those. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants