-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added splitfile support to the Create table command #1
base: master
Are you sure you want to change the base?
Conversation
def string2Key(values: Seq[String], | ||
lineBuffer: Array[BytesUtils], | ||
keyColumns: Seq[AbstractColumn], | ||
keyBytes: Array[(Array[Byte], DataType)]) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only in createSplitKeys we use this method, right? if so i think we no need create this method in HBaseKVHelper
Just a minor comment, @yzhou2001 can you take a look at this? |
Basically my questions are:
Thanks. |
@yzhou2001 2 i think split file should be very small size and can fit into memory, so maybe we can put them with cmd just like: 3 yes, we need test it. PS: Also a question here: do you think it is necessary to control the # of reduce for bulk load of non-split table now? |
For the table creation, I think a focal point is how much we should build on top of the semantics of "creation of a RDB table on a nonexistent HBase table". The problem is that the more functionalities built into this semantics, the more difficult to reconcile with a possibly existing HBase table. In summary, this is something good to have, but has to be designed carefully to have a clear semantics. On the reducers, yes, a configurable reducer would be great. But, again, there is some complexity to it, mainly because we probably need a "splitter" class like in HBase. It's feasible but probably not a priority as of now. Right now, we're anxious to get basic functionalities work and obtain some advantageous performance data, in order to produce some weight behind the push for our technology. All the value-adding features/optimizations can be put off 'til a future release. |
User can create the splits to the table by using following command ex : CREATE TABLE testrav4(bytecol BYTE, shortcol SHORT, intcol INTEGER, longcol LONG, floatcol FLOAT, PRIMARY KEY(intcol,shortcol)) MAPPED BY (testhbaseravi4, COLS=[bytecol=cf1.hbytecol, longcol=cf2.hlongcol, floatcol=cf2.hfloatcol]) SPLITSFILE = 'D:/1.txt'