Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with API #1

Open
bikash opened this issue Feb 14, 2013 · 7 comments
Open

Performance issue with API #1

bikash opened this issue Feb 14, 2013 · 7 comments

Comments

@bikash
Copy link

bikash commented Feb 14, 2013

Hi Holstius,

I just looked in opentsdbr code and tried some example. It is awesome work. I am trying to work on big data analysis on R. I want to run map-reduce on R using Rhipe/rbase. I am trying to get data from opentsdb using API. But I am looking for the way to read data directly from hbase table 'tsdb'. Can you help me with that part. I went through the documentation of opentsdb. I understand how we can get rowkey. But since everything is stored in serialized form. So I have no idea how to read data on hbase table directly from shell. I want to do this. So, that we can analysis performance between using API and reading data directly from hbase. Once it is done. I hope it would be helpful to other people.

Thanks

@dholstius
Copy link
Collaborator

Hi bikash, thanks!

I'm sorry I won't have the time during the next month or two, almost certainly, to do any low-level work on R/opentsdb. Too busy with analyses and other things.

But, I encourage you to dive into it! You're more than welcome to fork opentsdbr and use it as a template or starting point.

Maybe there is a need---or maybe a project already exists---for a C-level read API to OpenTSDB? Something as basic as fetching values for one metric whilst filtering on tags and start/end times? It's fairly straightforward to bind C code into an R package. I could possibly help with that.

Cheers,
David

On Feb 14, 2013, at 5:21 AM, bikash [email protected] wrote:

Hi Holstius,

I just looked in opentsdbr code and tried some example. It is awesome work. I am trying to work on big data analysis on R. I want to run map-reduce on R using Rhipe/rbase. I am trying to get data from opentsdb using API. But I am looking for the way to read data directly from hbase table 'tsdb'. Can you help me with that part. I went through the documentation of opentsdb. I understand how we can get rowkey. But since everything is stored in serialized form. So I have no idea how to read data on hbase table directly from shell. I want to do this. So, that we can analysis performance between using API and reading data directly from hbase. Once it is done. I hope it would be helpful to other people.

Thanks


Reply to this email directly or view it on GitHub.

@bikash
Copy link
Author

bikash commented Feb 14, 2013

Hi Holstius,
thanks,
I will do it. I will fork on this project. And start working with R code for this. May be I need some idea and help from you.
Also one more thing when I run opentsdbr example you have given, I got one error.

result <- tsd_get(metric, start, end, tags, downsample="10m-avg")
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
In addition: Warning message:
In Timestamp(end) : No timezone given. Defaulting to

Do you have any idea what went wrong. I set end date too still I get this error.

Also do you have any idea which part of code in opentsdb (not opentsdbr) is used to read data from hbase table.
So that I can have idea of how opentsdb read data from hbase table.

Thanks

@dholstius
Copy link
Collaborator

Hi bikash,

Thanks for the bug report. I haven't looked at any OpenTSDB internals, so I've no idea which part of the code is used to read data from HBase. The suggestion to look at http://opentsdb.net/schema.html is probably a good one, even before you dive into the code, so you know what you'll be looking for.

Good luck!

David

On Feb 14, 2013, at 8:59 AM, bikash [email protected] wrote:

Hi Holstius,
thanks,
I will do it. I will fork on this project. And start working with R code for this. May be I need some idea and help from you.

Also one more thing when I run opentsdbr example you have given, I got one error.

result <- tsd_get(metric, start, end, tags, downsample="10m-avg")
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
In addition: Warning message:
In Timestamp(end) : No timezone given. Defaulting to

Do you have any idea what went wrong. I set end date too still I get this error.

Also do you have any idea which part of code in opentsdb (not opentsdbr) is used to read data from hbase table.
So that I can have idea of how opentsdb read data from hbase table.

Thanks


Reply to this email directly or view it on GitHub.

@bikash
Copy link
Author

bikash commented Feb 15, 2013

Hi David,
Thanks for your help.
When I run your code. I am getting some error. have you face with such type of issue. Do you have any idea how to get rid of it.

0.186s to fetch http://localhost:4242/q?start=2013/01/25-01:00:00&m=sum:cipsi.haisen.proc.loadavg.1m&ascii=
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 302653, 0
Timing stopped at: 35.627 0.502 36.159

error about data.frame.

@dholstius
Copy link
Collaborator

Hi Haisen, I need a little more information:

  1. the exact R function call you executed, e.g. "tsd_get(...)" where you fill in the "..."

  2. the first few lines of what you get when you do the same query on the command line using 'tsdb'.

David

On Feb 15, 2013, at 1:13 AM, bikash [email protected] wrote:

Hi David,
Thanks for your help.
When I run your code. I am getting some error. have you face with such type of issue. Do you have any idea how to get rid of it.

0.186s to fetch http://localhost:4242/q?start=2013/01/25-01:00:00&m=sum:cipsi.haisen.proc.loadavg.1m&ascii=
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 302653, 0
Timing stopped at: 35.627 0.502 36.159

error about data.frame.


Reply to this email directly or view it on GitHub.

@dholstius
Copy link
Collaborator

I mean Bikash not Haisen. It's been a long night :-)

David

On Feb 15, 2013, at 1:13 AM, bikash [email protected] wrote:

Hi David,
Thanks for your help.
When I run your code. I am getting some error. have you face with such type of issue. Do you have any idea how to get rid of it.

0.186s to fetch http://localhost:4242/q?start=2013/01/25-01:00:00&m=sum:cipsi.haisen.proc.loadavg.1m&ascii=
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 302653, 0
Timing stopped at: 35.627 0.502 36.159

error about data.frame.


Reply to this email directly or view it on GitHub.

@bikash
Copy link
Author

bikash commented Feb 16, 2013

Hi holstius,
Thank you so much.
I am able to run opentsdbr in my local machine. Found I found some issue in code
like on file deserialized.r at line no 27. There is issue
return(cbind(metric_data, tag_data))
tag_data array is blank if we send tag="*", then cbind function couldn't works fine, As two array should have same no of row.

Also on /tsd_get.r at line no 53. We don't need to trim data set as we have query above data from start to end date, So data we get is within that interval. If we add that line, We are putting extra load, and it will get some time to trim data. It is not an error. But to increase performance we can remove that line.

Also on tsd_get function we need to define end data as blank on function definition.

I am working on getting data directly from hbase table. Once I have done you can take lot at it.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants