Replies: 2 comments 2 replies
-
It’s an optimisation. Categorical string data usually parses as
StringColumn. If the columns contain mostly unique values they will parse
as text column (assuming a table of some minimum size). You can force them
to parse one way or another using other read options. One way is to reduce
the set of column types considered.
If you’re going to create tables repeatedly with different values it may be
worthwhile to specify the type for each column.
It can be inconvenient, I know.
…On Mon, Mar 14, 2022 at 5:30 PM rsneumann ***@***.***> wrote:
Hi - I am using the CsvReadOptions.Builder to load CSV files as follows:
CsvReadOptions.Builder builder=
CsvReadOptions.builder(fn).separator('\t').header(true);
CsvReadOptions options = builder.build();
tbl=Table.read().usingOptions(options);
Then I retrieve string columns as:
StringColumn vGeneCol=tbl.stringColumn("myColname");
Typically, this goes without any problems. Sometimes, however, I get the
following exception:
class tech.tablesaw.api.TextColumn cannot be cast to class
tech.tablesaw.api.StringColumn
My tables are auto-generated and do not differ from each other (apart from
the string values in the columns...). Why are they parsed as sometimes as
StringColumn, and sometimes as TextColumn? Is it possible to control this
behavior?
Thanks!
—
Reply to this email directly, view it on GitHub
<#1055>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2FPAT4KRZHSQX7VZQ6UXTU76VY5ANCNFSM5QWZOSDQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
lwhite1
-
FWIW, I've created an issue for an enhancement that would eliminate this problem. See #1074. The Text vs String difference should have been hidden inside the String Class, even though it's already quite complicated. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi - I am using the CsvReadOptions.Builder to load CSV files as follows:
CsvReadOptions.Builder builder= CsvReadOptions.builder(fn).separator('\t').header(true);
CsvReadOptions options = builder.build();
tbl=Table.read().usingOptions(options);
Then I retrieve string columns as:
StringColumn vGeneCol=tbl.stringColumn("myColname");
Typically, this goes without any problems. Sometimes, however, I get the following exception:
class tech.tablesaw.api.TextColumn cannot be cast to class tech.tablesaw.api.StringColumn
My tables are auto-generated and do not differ from each other (apart from the string values in the columns...). Why are they parsed as sometimes as StringColumn, and sometimes as TextColumn? Is it possible to control this behavior?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions