From c6c4d06789fd89bf69afe56432647880aa216006 Mon Sep 17 00:00:00 2001 From: Github Action Date: Sun, 15 Sep 2024 00:21:54 +0000 Subject: [PATCH] New data collected at 2024-09-15_00-21-54 --- r-devel/2024-September.txt | 1274 +++++++ r-help/2024-September.txt | 6655 ++++++++++++++++++++++++++++++++++ r-package-devel/2024q3.txt | 3882 ++++++++++++++++++++ r-sig-mac/2024-August.txt | 238 ++ r-sig-mac/2024-September.txt | 556 +++ 5 files changed, 12605 insertions(+) create mode 100644 r-devel/2024-September.txt create mode 100644 r-help/2024-September.txt create mode 100644 r-sig-mac/2024-August.txt create mode 100644 r-sig-mac/2024-September.txt diff --git a/r-devel/2024-September.txt b/r-devel/2024-September.txt new file mode 100644 index 0000000..a269681 --- /dev/null +++ b/r-devel/2024-September.txt @@ -0,0 +1,1274 @@ +From m@rk@c|ement@ @end|ng |rom k|@@e Sun Sep 1 14:42:35 2024 +From: m@rk@c|ement@ @end|ng |rom k|@@e (Mark Clements) +Date: Sun, 1 Sep 2024 12:42:35 +0000 +Subject: [Rd] R compilation (revision 87083) failed after upgrade to Ubuntu + 24.04 (libtirpc missing) +Message-ID: + +Following an upgrade from Ubuntu 22.04 LTS to 24.04.1, revision 87083 failed to compile with an error: + +"No rule to make target '/usr/include/tirpc/rpc/types.h', needed by 'saveload.o'" + +where the file /usr/include/tirpc/rpc/types.h did not exist. After installing libtirpc-dev: + +> sudo apt install libtirpc-dev + +compilation proceeded smoothly. I understand that this may be local to my system. + +Sincerely, Mark. + + + +N?r du skickar e-post till Karolinska Institutet (KI) inneb?r detta att KI kommer att behandla dina personuppgifter. H?r finns information om hur KI behandlar personuppgifter. + + +Sending email to Karolinska Institutet (KI) will result in KI processing your personal data. You can read more about KI's processing of personal data here. + + [[alternative HTML version deleted]] + + +From tom@@@k@||ber@ @end|ng |rom gm@||@com Mon Sep 2 16:04:43 2024 +From: tom@@@k@||ber@ @end|ng |rom gm@||@com (Tomas Kalibera) +Date: Mon, 2 Sep 2024 16:04:43 +0200 +Subject: [Rd] Big speedup in install.packages() by re-using connections +In-Reply-To: <20240425180109.418ca5f4@arachnoid> +References: + + <20240425180109.418ca5f4@arachnoid> +Message-ID: <248ad6c4-ec50-4a3c-b9e4-05346840dbf0@gmail.com> + + +On 4/25/24 17:01, Ivan Krylov via R-devel wrote: +> On Thu, 25 Apr 2024 14:45:04 +0200 +> Jeroen Ooms wrote: +> +>> Thoughts? +> How verboten would it be to create an empty external pointer object, +> add it to the preserved list, and set an on-exit finalizer to clean up +> the curl multi-handle? As far as I can tell, the internet module is not +> supposed to be unloaded, so this would not introduce an opportunity to +> jump to an unmapped address. This makes it possible to avoid adding a +> CurlCleanup() function to the internet module: + +Cleaning up this way in principle would probably be fine, but R already +has support for re-using connections. Even more, R can download files in +parallel (in a single thread), which particularly helps with bigger +latencies (e.g. typically users connecting from home, etc). See +?download.file(), look for "simultaneous". + +I've improved the existing support in R-devel and made it the default in +install.packages() and download.packages(). I am seeing speedups +somewhere between 2x and 9x on several systems (with quiet=TRUE) when +downloading many packages. + +Sequential downloads by default (quiet=FALSE) show a progress bar, which +can be rather slow, particularly on Windows. Turning that off can help +with slow downloads in already released versions of R. Consequently, via +disabling the progress bar, the speedups with R-devel could seem even +bigger than the range. + +A common practice when installing/checking say all packages from CRAN or +Bioconductor is to set up a local mirror of the repositories and +download/update it with "rsync". When R then "downloads" packages from +the local mirror using the "file://" protocol, it completely avoids +these problems. The increased performance of download in R-devel cannot +beat this setup. + +However, the improvement should help a little bit users installing +binary packages (on Windows, macOS), where the download time can play a +visible role, when they install packages with many (uninstalled) +dependencies. Users installing from source, particularly with packages +needing compilation, would probably not notice. + +Tomas + + +> +> Index: src/modules/internet/libcurl.c +> =================================================================== +> --- src/modules/internet/libcurl.c (revision 86484) +> +++ src/modules/internet/libcurl.c (working copy) +> @@ -55,6 +55,47 @@ +> +> static int current_timeout = 0; +> +> +// The multi-handle is shared between downloads for reusing connections +> +static CURLM *shared_mhnd = NULL; +> +static SEXP mhnd_sentinel = NULL; +> + +> +static void cleanup_mhnd(SEXP ignored) +> +{ +> + if(shared_mhnd){ +> + curl_multi_cleanup(shared_mhnd); +> + shared_mhnd = NULL; +> + } +> + curl_global_cleanup(); +> +} +> +static void rollback_mhnd_sentinel(void* sentinel) { +> + // Failed to allocate memory while registering a finalizer, +> + // therefore must release the object +> + R_ReleaseObject((SEXP)sentinel); +> +} +> +static CURLM *get_mhnd(void) +> +{ +> + if (!mhnd_sentinel) { +> + SEXP sentinel = PROTECT(R_MakeExternalPtr(NULL, R_NilValue, R_NilValue)); +> + R_PreserveObject(sentinel); +> + UNPROTECT(1); +> + // Avoid leaking the sentinel before setting the finalizer +> + RCNTXT cntxt; +> + begincontext(&cntxt, CTXT_CCODE, R_NilValue, R_BaseEnv, R_BaseEnv, +> + R_NilValue, R_NilValue); +> + cntxt.cend = &rollback_mhnd_sentinel; +> + cntxt.cenddata = sentinel; +> + R_RegisterCFinalizerEx(sentinel, cleanup_mhnd, TRUE); +> + // Succeeded, no need to clean up if endcontext() fails allocation +> + mhnd_sentinel = sentinel; +> + cntxt.cend = NULL; +> + endcontext(&cntxt); +> + } +> + if(!shared_mhnd) { +> + shared_mhnd = curl_multi_init(); +> + } +> + return shared_mhnd; +> +} +> + +> # if LIBCURL_VERSION_MAJOR < 7 || (LIBCURL_VERSION_MAJOR == 7 && LIBCURL_VERSION_MINOR < 28) +> +> // curl/curl.h includes and headers it requires. +> @@ -565,8 +606,6 @@ +> if (c->hnd && c->hnd[i]) +> curl_easy_cleanup(c->hnd[i]); +> } +> - if (c->mhnd) +> - curl_multi_cleanup(c->mhnd); +> if (c->headers) +> curl_slist_free_all(c->headers); +> +> @@ -668,7 +707,7 @@ +> c.headers = headers = tmp; +> } +> +> - CURLM *mhnd = curl_multi_init(); +> + CURLM *mhnd = get_mhnd(); +> if (!mhnd) +> error(_("could not create curl handle")); +> c.mhnd = mhnd; +> +> + + +From ggrothend|eck @end|ng |rom gm@||@com Sun Sep 8 13:30:36 2024 +From: ggrothend|eck @end|ng |rom gm@||@com (Gabor Grothendieck) +Date: Sun, 8 Sep 2024 07:30:36 -0400 +Subject: [Rd] Inconsistency between row and nrow +Message-ID: + +In the following nrow provides the expected result but row gives an +error. I would have thought that they would both work or both fail. + + aa <- array(dim = 5:3) + + nrow(aa) + ## [1] 5 + + row(aa) + ## Error in row(aa) : a matrix-like object is required as argument to 'row' + + # this does work: + slice.index(aa, 1) + +-- +Statistics & Software Consulting +GKX Group, GKX Associates Inc. +tel: 1-877-GKX-GROUP +email: ggrothendieck at gmail.com + + +From m@rc_@chw@rtz @end|ng |rom me@com Sun Sep 8 14:26:53 2024 +From: m@rc_@chw@rtz @end|ng |rom me@com (Marc Schwartz) +Date: Sun, 08 Sep 2024 08:26:53 -0400 +Subject: [Rd] Inconsistency between row and nrow +In-Reply-To: +References: +Message-ID: <0453329E-F005-4EB6-86D1-EA9BDEC349CC@me.com> + +Hi Gabor, + +In strictly reading the help files for both nrow() and row(), the 'x' argument in the former case is "a vector, array, data frame, or NULL.", whereas in the latter case it is "a matrix-like object, that is one with a two-dimensional dim.". + +Thus, I would expect row() to fail on a >= 3-dimensional array, as your example shows. + +In reading the help file for slice.index(), there is the following in the See Also section: + +"row and col for determining row and column indexes; in fact, these are special cases of slice.index corresponding to MARGIN equal to 1 and 2, respectively when x is a matrix." + +further differentiating the behavior of row() and col() as more specific implementations in the 2-dimensional case. + +To my read then, the difference in behavior appears to be intentional and expected. + +Regards, + +Marc Schwartz + + +?-----Original Message----- +From: R-devel > on behalf of Gabor Grothendieck > +Date: Sunday, September 8, 2024 at 7:31 AM +To: "r-devel at r-project.org " > +Subject: [Rd] Inconsistency between row and nrow + + +In the following nrow provides the expected result but row gives an +error. I would have thought that they would both work or both fail. + + +aa <- array(dim = 5:3) + + +nrow(aa) +## [1] 5 + + +row(aa) +## Error in row(aa) : a matrix-like object is required as argument to 'row' + + +# this does work: +slice.index(aa, 1) + + +-- +Statistics & Software Consulting +GKX Group, GKX Associates Inc. +tel: 1-877-GKX-GROUP +email: ggrothendieck at gmail.com + + +______________________________________________ +R-devel at r-project.org mailing list +https://stat.ethz.ch/mailman/listinfo/r-devel + + +From ggrothend|eck @end|ng |rom gm@||@com Sun Sep 8 14:36:41 2024 +From: ggrothend|eck @end|ng |rom gm@||@com (Gabor Grothendieck) +Date: Sun, 8 Sep 2024 08:36:41 -0400 +Subject: [Rd] Inconsistency between row and nrow +In-Reply-To: <0453329E-F005-4EB6-86D1-EA9BDEC349CC@me.com> +References: + <0453329E-F005-4EB6-86D1-EA9BDEC349CC@me.com> +Message-ID: + +The fact that it is consistent with the documentation is not the +point. The point is that the design itself is inconsistent. + +On Sun, Sep 8, 2024 at 8:27?AM Marc Schwartz wrote: +> +> Hi Gabor, +> +> In strictly reading the help files for both nrow() and row(), the 'x' argument in the former case is "a vector, array, data frame, or NULL.", whereas in the latter case it is "a matrix-like object, that is one with a two-dimensional dim.". +> +> Thus, I would expect row() to fail on a >= 3-dimensional array, as your example shows. +> +> In reading the help file for slice.index(), there is the following in the See Also section: +> +> "row and col for determining row and column indexes; in fact, these are special cases of slice.index corresponding to MARGIN equal to 1 and 2, respectively when x is a matrix." +> +> further differentiating the behavior of row() and col() as more specific implementations in the 2-dimensional case. +> +> To my read then, the difference in behavior appears to be intentional and expected. +> +> Regards, +> +> Marc Schwartz +> +> +> ?-----Original Message----- +> From: R-devel > on behalf of Gabor Grothendieck > +> Date: Sunday, September 8, 2024 at 7:31 AM +> To: "r-devel at r-project.org " > +> Subject: [Rd] Inconsistency between row and nrow +> +> +> In the following nrow provides the expected result but row gives an +> error. I would have thought that they would both work or both fail. +> +> +> aa <- array(dim = 5:3) +> +> +> nrow(aa) +> ## [1] 5 +> +> +> row(aa) +> ## Error in row(aa) : a matrix-like object is required as argument to 'row' +> +> +> # this does work: +> slice.index(aa, 1) +> +> +> -- +> Statistics & Software Consulting +> GKX Group, GKX Associates Inc. +> tel: 1-877-GKX-GROUP +> email: ggrothendieck at gmail.com +> +> +> ______________________________________________ +> R-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-devel +> +> +> +> + + +-- +Statistics & Software Consulting +GKX Group, GKX Associates Inc. +tel: 1-877-GKX-GROUP +email: ggrothendieck at gmail.com + + +From ggrothend|eck @end|ng |rom gm@||@com Sun Sep 8 14:38:58 2024 +From: ggrothend|eck @end|ng |rom gm@||@com (Gabor Grothendieck) +Date: Sun, 8 Sep 2024 08:38:58 -0400 +Subject: [Rd] transform +In-Reply-To: <3b755353-abee-41aa-8621-bbfd49876200@fau.de> +References: + <9629D1E6-A493-41E4-80EB-1A2289F42F60@gmail.com> + <3b755353-abee-41aa-8621-bbfd49876200@fau.de> +Message-ID: + +Suggest you look at dplyr::mutate as this functionality is widely used +there and has shown itself to be useful. + +On Tue, Aug 27, 2024 at 9:16?AM Sebastian Meyer wrote: +> +> Am 27.08.24 um 11:55 schrieb peter dalgaard: +> > Yes. A quirk, rather than a bug I'd say. One issue is that the internal logic of transform() relies on +> > +> > e <- eval(substitute(list(...)), `_data`, parent.frame()) +> > tags <- names(e) +> > +> > so untagged entries in ... will not be included. +> +> ... unless at least one is tagged: +> +> R> transform(BOD, 0:5, 1:6) +> Time demand +> 1 1 8.3 +> 2 2 10.3 +> 3 3 19.0 +> 4 4 16.0 +> 5 5 15.6 +> 6 7 19.8 +> +> R> transform(BOD, 0:5, 1:6, foo = 1) +> Time demand 0:5 1:6 foo +> 1 1 8.3 0 1 1 +> 2 2 10.3 1 2 1 +> 3 3 19.0 2 3 1 +> 4 4 16.0 3 4 1 +> 5 5 15.6 4 5 1 +> 6 7 19.8 5 6 1 +> +> But as transform.data.frame is only documented for tagged vector +> expressions, all examples provided in this thread were formal misuses. +> (It might make sense to warn about untagged entries.) +> +> Personally, I'd be quite confused about what to expect from syntax like +> +> transform(BOD, data.frame(y = 1:6)) +> +> as really no transformation is specified. Looks like cbind() or +> data.frame() was meant. +> +> Sebastian +> +> +> > The other part is a direct consequence of a quirk in data.frame: +> > +> >> data.frame(head(airquality), y=data.frame(x=rnorm(6))) +> > Ozone Solar.R Wind Temp Month Day x +> > 1 41 190 7.4 67 5 1 0.3075402 +> > 2 36 118 8.0 72 5 2 0.7765265 +> > 3 12 149 12.6 74 5 3 0.3909341 +> > 4 18 313 11.5 62 5 4 0.4733170 +> > 5 NA NA 14.3 56 5 5 -0.6947709 +> > 6 28 NA 14.9 66 5 6 0.1126040 +> > +> > whereas (the wisdom of this escapes me) +> > +> >> data.frame(head(airquality), y=data.frame(x=rnorm(6),z=rnorm(6))) +> > Ozone Solar.R Wind Temp Month Day y.x y.z +> > 1 41 190 7.4 67 5 1 -0.9250228 0.46483406 +> > 2 36 118 8.0 72 5 2 -0.5035793 0.28822668 +> > ... +> > +> > On the whole, I think that transform was never designed (nor documented) to take data frame arguments, so caveat emptor. +> > +> > - Peter +> > +> > +> >> On 24 Aug 2024, at 16:41 , Gabor Grothendieck wrote: +> >> +> >> One oddity in transform that I recently noticed. It seems that to include +> >> a one-column data frame in the arguments one must name it even though the +> >> name is ignored. If the data frame has more than one column then it must +> >> also be named but in that case it is not ignored and the names are made up of +> >> a combination of that name and the data frame's names. I would have thought +> >> that if we did not want a combination of names we would just not name the +> >> argument. +> >> +> >> # ignores second argument returning BOD unchanged +> >> transform(BOD, data.frame(y = 1:6)) |> names() +> >> ## [1] "Time" "demand" +> >> +> >> # ignores second argument returning BOD unchanged +> >> transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names() +> >> ## [1] "Time" "demand" +> >> +> >> # with one column in data frame it adds the column and names it y ignoring x +> >> transform(BOD, x = data.frame(y = 1:6)) |> names() +> >> ## [1] "Time" "demand" "y" +> >> +> >> # with multiple columns in data frame it uses x.y and x.z as names +> >> transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names() +> >> ## [1] "Time" "demand" "x.y" "x.z" +> >> +> >> +> >> -- +> >> Statistics & Software Consulting +> >> GKX Group, GKX Associates Inc. +> >> tel: 1-877-GKX-GROUP +> >> email: ggrothendieck at gmail.com +> >> +> >> ______________________________________________ +> >> R-devel at r-project.org mailing list +> >> https://stat.ethz.ch/mailman/listinfo/r-devel +> > + + + +-- +Statistics & Software Consulting +GKX Group, GKX Associates Inc. +tel: 1-877-GKX-GROUP +email: ggrothendieck at gmail.com + + +From @vi@e@gross m@iii@g oii gm@ii@com Sun Sep 8 21:10:07 2024 +From: @vi@e@gross m@iii@g oii gm@ii@com (@vi@e@gross m@iii@g oii gm@ii@com) +Date: Sun, 8 Sep 2024 15:10:07 -0400 +Subject: [Rd] Inconsistency between row and nrow +In-Reply-To: +References: + <0453329E-F005-4EB6-86D1-EA9BDEC349CC@me.com> + +Message-ID: <003701db0222$b84b0730$28e11590$@gmail.com> + + +Why would a design made by perhaps different people at different times have to be consistent? + +Why complicate a simple design meant to be used in 2-D objects to also handle other things? + +It is a bit like asking why for a vector you cannot use the same verb to measure length as in one sense a vector is about the same as a 1-D matrix. Why use length(vec) and not nrow(vec) or something + +-----Original Message----- +From: R-devel On Behalf Of Gabor Grothendieck +Sent: Sunday, September 8, 2024 8:37 AM +To: Marc Schwartz +Cc: r-devel at r-project.org +Subject: Re: [Rd] Inconsistency between row and nrow + +The fact that it is consistent with the documentation is not the +point. The point is that the design itself is inconsistent. + +On Sun, Sep 8, 2024 at 8:27?AM Marc Schwartz wrote: +> +> Hi Gabor, +> +> In strictly reading the help files for both nrow() and row(), the 'x' argument in the former case is "a vector, array, data frame, or NULL.", whereas in the latter case it is "a matrix-like object, that is one with a two-dimensional dim.". +> +> Thus, I would expect row() to fail on a >= 3-dimensional array, as your example shows. +> +> In reading the help file for slice.index(), there is the following in the See Also section: +> +> "row and col for determining row and column indexes; in fact, these are special cases of slice.index corresponding to MARGIN equal to 1 and 2, respectively when x is a matrix." +> +> further differentiating the behavior of row() and col() as more specific implementations in the 2-dimensional case. +> +> To my read then, the difference in behavior appears to be intentional and expected. +> +> Regards, +> +> Marc Schwartz +> +> +> ?-----Original Message----- +> From: R-devel > on behalf of Gabor Grothendieck > +> Date: Sunday, September 8, 2024 at 7:31 AM +> To: "r-devel at r-project.org " > +> Subject: [Rd] Inconsistency between row and nrow +> +> +> In the following nrow provides the expected result but row gives an +> error. I would have thought that they would both work or both fail. +> +> +> aa <- array(dim = 5:3) +> +> +> nrow(aa) +> ## [1] 5 +> +> +> row(aa) +> ## Error in row(aa) : a matrix-like object is required as argument to 'row' +> +> +> # this does work: +> slice.index(aa, 1) +> +> +> -- +> Statistics & Software Consulting +> GKX Group, GKX Associates Inc. +> tel: 1-877-GKX-GROUP +> email: ggrothendieck at gmail.com +> +> +> ______________________________________________ +> R-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-devel +> +> +> +> + + +-- +Statistics & Software Consulting +GKX Group, GKX Associates Inc. +tel: 1-877-GKX-GROUP +email: ggrothendieck at gmail.com + +______________________________________________ +R-devel at r-project.org mailing list +https://stat.ethz.ch/mailman/listinfo/r-devel + + +From jeroenoom@ @end|ng |rom gm@||@com Sun Sep 8 23:14:22 2024 +From: jeroenoom@ @end|ng |rom gm@||@com (Jeroen Ooms) +Date: Sun, 8 Sep 2024 17:14:22 -0400 +Subject: [Rd] Big speedup in install.packages() by re-using connections +In-Reply-To: <248ad6c4-ec50-4a3c-b9e4-05346840dbf0@gmail.com> +References: + + <20240425180109.418ca5f4@arachnoid> + <248ad6c4-ec50-4a3c-b9e4-05346840dbf0@gmail.com> +Message-ID: + +On Mon, Sep 2, 2024 at 10:05?AM Tomas Kalibera wrote: +> +> +> On 4/25/24 17:01, Ivan Krylov via R-devel wrote: +> > On Thu, 25 Apr 2024 14:45:04 +0200 +> > Jeroen Ooms wrote: +> > +> >> Thoughts? +> > How verboten would it be to create an empty external pointer object, +> > add it to the preserved list, and set an on-exit finalizer to clean up +> > the curl multi-handle? As far as I can tell, the internet module is not +> > supposed to be unloaded, so this would not introduce an opportunity to +> > jump to an unmapped address. This makes it possible to avoid adding a +> > CurlCleanup() function to the internet module: +> +> Cleaning up this way in principle would probably be fine, but R already +> has support for re-using connections. Even more, R can download files in +> parallel (in a single thread), which particularly helps with bigger +> latencies (e.g. typically users connecting from home, etc). See +> ?download.file(), look for "simultaneous". + +Thank you for looking at this. A few ideas wrt parallel downloading: + +Additional improvement on Windows can be achieved by enabling the +nghttp2 driver in libcurl in rtools, such that it takes advantage of +http2 multiplexing for parallel downloads +(https://bugs.r-project.org/show_bug.cgi?id=18664). + +Moreover, one concern is that install.packages() may fail more +frequently on low bandwidth connections due to reaching the "download +timeout" when downloading files in parallel: + +R has an unusual definition of the http timeout, which by default +aborts in-progress downloads after 60 seconds for no obvious reason. +(by contrast, browsers enforce a timeout on unresponsive/stalled +downloads only, which can be achieved in libcurl by setting +CURLOPT_CONNECTTIMEOUT or CURLOPT_LOW_SPEED_TIME). + +The above is already a problem on slow networks, where large packages +can fail to install with a timeout error in the download stage. Users +may assume there must be a problem with the network, as it is not +obvious that machines on slower internet connection need to work +around R's defaults and modify options(timeout) before +install.packages(). This problem could become more prevalent when +using parallel downloads while still enforcing the same total timeout. + +For example: the MacOS binary for package "sf" is close to 90mb, hence +currently, under the default R settings of options(timeout=60), +install.packages will error with a download timeout on clients with +less than 1.5MB/s bandwidth. But with the parallel implementation, +install.packages() will share the bandwidth on 6 parallel downloads, +so if "sf" is downloaded with all its dependencies, we need at least +9MB/s (i.e. a 100mbit connection) for the default settings to not +cause a timeout. + +Hopefully this can be revised to enforce the timeout on stalled +downloads only, as is common practice. + + +From @vi@e@gross m@iii@g oii gm@ii@com Mon Sep 9 00:36:34 2024 +From: @vi@e@gross m@iii@g oii gm@ii@com (@vi@e@gross m@iii@g oii gm@ii@com) +Date: Sun, 8 Sep 2024 18:36:34 -0400 +Subject: [Rd] Inconsistency between row and nrow +In-Reply-To: <0453329E-F005-4EB6-86D1-EA9BDEC349CC@me.com> +References: + <0453329E-F005-4EB6-86D1-EA9BDEC349CC@me.com> +Message-ID: <004001db023f$8f984d80$aec8e880$@gmail.com> + +It can be informative to look at what the actual functions being discussed do. + +Dim is an internal, meaning written in some variant of C, perhaps: + +> dim +function (x) .Primitive("dim") + +The function nrow, in my distribution, actually just calls dim() and throws away one dimension: + +> nrow +function (x) +dim(x)[1L] + + + +The function row is a bit related in calling dim in one of several ways: + +> row +function (x, as.factor = FALSE) +{ + if (as.factor) { + labs <- rownames(x, do.NULL = FALSE, prefix = "") + res <- factor(.Internal(row(dim(x))), labels = labs) + dim(res) <- dim(x) + res + } + else .Internal(row(dim(x))) +} + + + +Does this shed any light on why the result may be inconsistent? + +-----Original Message----- +From: R-devel On Behalf Of Marc Schwartz via R-devel +Sent: Sunday, September 8, 2024 8:27 AM +To: Gabor Grothendieck ; r-devel at r-project.org +Subject: Re: [Rd] Inconsistency between row and nrow + +Hi Gabor, + +In strictly reading the help files for both nrow() and row(), the 'x' argument in the former case is "a vector, array, data frame, or NULL.", whereas in the latter case it is "a matrix-like object, that is one with a two-dimensional dim.". + +Thus, I would expect row() to fail on a >= 3-dimensional array, as your example shows. + +In reading the help file for slice.index(), there is the following in the See Also section: + +"row and col for determining row and column indexes; in fact, these are special cases of slice.index corresponding to MARGIN equal to 1 and 2, respectively when x is a matrix." + +further differentiating the behavior of row() and col() as more specific implementations in the 2-dimensional case. + +To my read then, the difference in behavior appears to be intentional and expected. + +Regards, + +Marc Schwartz + + +?-----Original Message----- +From: R-devel > on behalf of Gabor Grothendieck > +Date: Sunday, September 8, 2024 at 7:31 AM +To: "r-devel at r-project.org " > +Subject: [Rd] Inconsistency between row and nrow + + +In the following nrow provides the expected result but row gives an +error. I would have thought that they would both work or both fail. + + +aa <- array(dim = 5:3) + + +nrow(aa) +## [1] 5 + + +row(aa) +## Error in row(aa) : a matrix-like object is required as argument to 'row' + + +# this does work: +slice.index(aa, 1) + + +-- +Statistics & Software Consulting +GKX Group, GKX Associates Inc. +tel: 1-877-GKX-GROUP +email: ggrothendieck at gmail.com + + +______________________________________________ +R-devel at r-project.org mailing list +https://stat.ethz.ch/mailman/listinfo/r-devel + +______________________________________________ +R-devel at r-project.org mailing list +https://stat.ethz.ch/mailman/listinfo/r-devel + + +From ||u|@@rev|||@ @end|ng |rom gm@||@com Mon Sep 9 08:00:00 2024 +From: ||u|@@rev|||@ @end|ng |rom gm@||@com (=?UTF-8?Q?Llu=C3=ADs_Revilla?=) +Date: Mon, 9 Sep 2024 08:00:00 +0200 +Subject: [Rd] Documentation cross references +Message-ID: + +Hi all, + +I am checking the cross references at CRAN and base R, and I am having +trouble understanding how to reconcile the documentation and how R +help links work. + +Checking the documentation [1] it seems that in a cross reference such +as \link[pkg:x]{text} x should be a topic (created with \alias{topic} +in R documentation). +But I found some cases exploring the output of +tools::base_rdxrefs_db() where it is a file name (without the .Rd +extension). +For instance, ?print has a link with text (Target as named by the +output) ".print.via.format" and the anchor "tools:print.via.format". +The topic would be print.via.format, but if one uses: +help(topic = "print.via.format", package = "tools"): + No documentation for 'print.via.format' in specified packages and +libraries: + you could try '??print.via.format' + +However, if one accesses the html help page and clicks +.print.via.format, is redirected to the right help page (found in the +REPL too with help(topic = ".print.via.format", package = "tools") ). + +I see a paragraph in R 4.1 NEWS about a change in behaviour [2] (also +hinted in WRE), where it is described as: +... "and fall back to a file link only if the topic is not found in +the target package. The earlier rule which prioritized file names over +topics can be restored by setting the environment variable +_R_HELP_LINKS_TO_TOPICS_ to a false value." + +The internal variable _R_HELP_LINKS_TO_TOPICS_ isn't mentioned on +R-internals and this behaviour of html pages is not mentioned on WRE. + - Perhaps documentation at R-internals and WRE could be updated to +show how html page links are created? + - And/or the behaviour could continue the path started on R 4.1 and +start complaining about anchors pointing to files? + +In the second case, ~10 links in base R would be affected but on CRAN +this could affect ~1700 more packages than those currently with "Rd +cross-references" notes. + +Regards, + +Llu?s + +[1]: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Cross_002dreferences +[2]: https://developer.r-project.org/blosxom.cgi/R-4-1-branch/NEWS/2021/04/20#n2021-04-21 + + +From tom@@@k@||ber@ @end|ng |rom gm@||@com Mon Sep 9 11:11:02 2024 +From: tom@@@k@||ber@ @end|ng |rom gm@||@com (Tomas Kalibera) +Date: Mon, 9 Sep 2024 11:11:02 +0200 +Subject: [Rd] Big speedup in install.packages() by re-using connections +In-Reply-To: +References: + + <20240425180109.418ca5f4@arachnoid> + <248ad6c4-ec50-4a3c-b9e4-05346840dbf0@gmail.com> + +Message-ID: <905cd9c7-5c1f-43eb-b104-d540a5598d96@gmail.com> + + +On 9/8/24 23:14, Jeroen Ooms wrote: +> On Mon, Sep 2, 2024 at 10:05?AM Tomas Kalibera wrote: +>> +>> On 4/25/24 17:01, Ivan Krylov via R-devel wrote: +>>> On Thu, 25 Apr 2024 14:45:04 +0200 +>>> Jeroen Ooms wrote: +>>> +>>>> Thoughts? +>>> How verboten would it be to create an empty external pointer object, +>>> add it to the preserved list, and set an on-exit finalizer to clean up +>>> the curl multi-handle? As far as I can tell, the internet module is not +>>> supposed to be unloaded, so this would not introduce an opportunity to +>>> jump to an unmapped address. This makes it possible to avoid adding a +>>> CurlCleanup() function to the internet module: +>> Cleaning up this way in principle would probably be fine, but R already +>> has support for re-using connections. Even more, R can download files in +>> parallel (in a single thread), which particularly helps with bigger +>> latencies (e.g. typically users connecting from home, etc). See +>> ?download.file(), look for "simultaneous". +> Thank you for looking at this. A few ideas wrt parallel downloading: +> +> Additional improvement on Windows can be achieved by enabling the +> nghttp2 driver in libcurl in rtools, such that it takes advantage of +> http2 multiplexing for parallel downloads +> (https://bugs.r-project.org/show_bug.cgi?id=18664). + +Anyone who wants to cooperate and help is more than welcome to +contribute patches to upstream MXE. + +In case of nghttp2, thanks to Andrew Johnson, who contributed nghttp2 +support to upstream MXE. It will be part of the next Rtools (probably +Rtools45). + +> Moreover, one concern is that install.packages() may fail more +> frequently on low bandwidth connections due to reaching the "download +> timeout" when downloading files in parallel: +> +> R has an unusual definition of the http timeout, which by default +> aborts in-progress downloads after 60 seconds for no obvious reason. +> (by contrast, browsers enforce a timeout on unresponsive/stalled +> downloads only, which can be achieved in libcurl by setting +> CURLOPT_CONNECTTIMEOUT or CURLOPT_LOW_SPEED_TIME). +> +> The above is already a problem on slow networks, where large packages +> can fail to install with a timeout error in the download stage. Users +> may assume there must be a problem with the network, as it is not +> obvious that machines on slower internet connection need to work +> around R's defaults and modify options(timeout) before +> install.packages(). This problem could become more prevalent when +> using parallel downloads while still enforcing the same total timeout. +> +> For example: the MacOS binary for package "sf" is close to 90mb, hence +> currently, under the default R settings of options(timeout=60), +> install.packages will error with a download timeout on clients with +> less than 1.5MB/s bandwidth. But with the parallel implementation, +> install.packages() will share the bandwidth on 6 parallel downloads, +> so if "sf" is downloaded with all its dependencies, we need at least +> 9MB/s (i.e. a 100mbit connection) for the default settings to not +> cause a timeout. +> +> Hopefully this can be revised to enforce the timeout on stalled +> downloads only, as is common practice. + +Yes, this is work in progress, I am aware that the timeout could use +some thought re simultaneous downloads. + +If anyone wants to help with testing the current implementation of +simultaneous download and report any bugs found, that would be nice. + +Best +Tomas + + +From jeroenoom@ @end|ng |rom gm@||@com Mon Sep 9 18:19:12 2024 +From: jeroenoom@ @end|ng |rom gm@||@com (Jeroen Ooms) +Date: Mon, 9 Sep 2024 18:19:12 +0200 +Subject: [Rd] Big speedup in install.packages() by re-using connections +In-Reply-To: <905cd9c7-5c1f-43eb-b104-d540a5598d96@gmail.com> +References: + + <20240425180109.418ca5f4@arachnoid> + <248ad6c4-ec50-4a3c-b9e4-05346840dbf0@gmail.com> + + <905cd9c7-5c1f-43eb-b104-d540a5598d96@gmail.com> +Message-ID: + +On Mon, Sep 9, 2024 at 11:11?AM Tomas Kalibera wrote: +> +> +> On 9/8/24 23:14, Jeroen Ooms wrote: +> > On Mon, Sep 2, 2024 at 10:05?AM Tomas Kalibera wrote: +> >> +> >> On 4/25/24 17:01, Ivan Krylov via R-devel wrote: +> >>> On Thu, 25 Apr 2024 14:45:04 +0200 +> >>> Jeroen Ooms wrote: +> >>> +> >>>> Thoughts? +> >>> How verboten would it be to create an empty external pointer object, +> >>> add it to the preserved list, and set an on-exit finalizer to clean up +> >>> the curl multi-handle? As far as I can tell, the internet module is not +> >>> supposed to be unloaded, so this would not introduce an opportunity to +> >>> jump to an unmapped address. This makes it possible to avoid adding a +> >>> CurlCleanup() function to the internet module: +> >> Cleaning up this way in principle would probably be fine, but R already +> >> has support for re-using connections. Even more, R can download files in +> >> parallel (in a single thread), which particularly helps with bigger +> >> latencies (e.g. typically users connecting from home, etc). See +> >> ?download.file(), look for "simultaneous". +> > Thank you for looking at this. A few ideas wrt parallel downloading: +> > +> > Additional improvement on Windows can be achieved by enabling the +> > nghttp2 driver in libcurl in rtools, such that it takes advantage of +> > http2 multiplexing for parallel downloads +> > (https://bugs.r-project.org/show_bug.cgi?id=18664). +> +> Anyone who wants to cooperate and help is more than welcome to +> contribute patches to upstream MXE. +> +> In case of nghttp2, thanks to Andrew Johnson, who contributed nghttp2 +> support to upstream MXE. It will be part of the next Rtools (probably +> Rtools45). +> +> > Moreover, one concern is that install.packages() may fail more +> > frequently on low bandwidth connections due to reaching the "download +> > timeout" when downloading files in parallel: +> > +> > R has an unusual definition of the http timeout, which by default +> > aborts in-progress downloads after 60 seconds for no obvious reason. +> > (by contrast, browsers enforce a timeout on unresponsive/stalled +> > downloads only, which can be achieved in libcurl by setting +> > CURLOPT_CONNECTTIMEOUT or CURLOPT_LOW_SPEED_TIME). +> > +> > The above is already a problem on slow networks, where large packages +> > can fail to install with a timeout error in the download stage. Users +> > may assume there must be a problem with the network, as it is not +> > obvious that machines on slower internet connection need to work +> > around R's defaults and modify options(timeout) before +> > install.packages(). This problem could become more prevalent when +> > using parallel downloads while still enforcing the same total timeout. +> > +> > For example: the MacOS binary for package "sf" is close to 90mb, hence +> > currently, under the default R settings of options(timeout=60), +> > install.packages will error with a download timeout on clients with +> > less than 1.5MB/s bandwidth. But with the parallel implementation, +> > install.packages() will share the bandwidth on 6 parallel downloads, +> > so if "sf" is downloaded with all its dependencies, we need at least +> > 9MB/s (i.e. a 100mbit connection) for the default settings to not +> > cause a timeout. +> > +> > Hopefully this can be revised to enforce the timeout on stalled +> > downloads only, as is common practice. +> +> Yes, this is work in progress, I am aware that the timeout could use +> some thought re simultaneous downloads. + +OK that is good to hear. + + +> If anyone wants to help with testing the current implementation of +> simultaneous download and report any bugs found, that would be nice. + +R-universe has ran this a few thousand times to recheck packages on +r-devel on both linux and windows, and it works well. It reduces the +CI process by a few seconds, and there are less random connection +failures. If you want to inspect some recent logs for yourself, click +the rightmost column on https://r-universe.dev/builds and then on the +GitHub Actions page, look under the "Build R-devel for Windows / +Linux" runs to see the log files. + +I was also able to confirm an edge case that install.packages() does +not abort if any of the dependencies fails to download with http-404, +which I think is desired behavior. If there is anything else +specifically that you would like to see tested I can look at that. + + +From @|mon@@ndrew@ @end|ng |rom b@br@h@m@@c@uk Thu Sep 12 14:01:54 2024 +From: @|mon@@ndrew@ @end|ng |rom b@br@h@m@@c@uk (Simon Andrews) +Date: Thu, 12 Sep 2024 12:01:54 +0000 +Subject: [Rd] Can gzfile be given the same method option as file +Message-ID: + +Recently my employer has introduced a security system which generates SSL certificates on the fly to be able to see the content of https connections. To make this work they add a new root certificate to the windows certificate store. + +In R this causes problems because the default library used to download data from URLs doesn't look at this store, however the "wininet" download method works so where this is used then things work (albeit with a warning about future deprecation). + +For functions like download.file this works great, but it fails when running readRDS: + +readRDS('https://seurat.nygenome.org/azimuth/references/homologs.rds') +Error in gzfile(file, "rb") : cannot open the connection +In addition: Warning message: +In gzfile(file, "rb") : + cannot open compressed file 'https://seurat.nygenome.org/azimuth/references/homologs.rds', probable reason 'Invalid argument' + +After some debugging I see that the root cause is from the gzfile function. + +> gzfile('https://seurat.nygenome.org/azimuth/references/homologs.rds') -> g +> open(g, open="r") +Error in open.connection(g, open = "r") : cannot open the connection +In addition: Warning message: +In open.connection(g, open = "r") : + cannot open compressed file 'https://seurat.nygenome.org/azimuth/references/homologs.rds', probable reason 'Invalid argument' + +If this was not a compressed file then using file rather than gzfile we can make this work by setting the url.method option: + +> options("url.method"="wininet") +> file('https://seurat.nygenome.org/azimuth/references/homologs.rds') -> g +> open(g, open="r") +Warning message: +In open.connection(g, open = "r") : + the 'wininet' method of url() is deprecated for http:// and https:// URLs + +So I get a warning, but it works. + +I guess this boils down to two questions: + + + 1. Is it possible to add the same "method" argument to gzfile that file uses so that people in my situation have a work round? + 2. Given the warnings we're getting when using wininet, are their plans to make windows certficates be supported in another way? + +Thanks + +Simon. + + + + + + + + + + + + + [[alternative HTML version deleted]] + + +From |kry|ov @end|ng |rom d|@root@org Thu Sep 12 16:27:37 2024 +From: |kry|ov @end|ng |rom d|@root@org (Ivan Krylov) +Date: Thu, 12 Sep 2024 17:27:37 +0300 +Subject: [Rd] Can gzfile be given the same method option as file +In-Reply-To: +References: +Message-ID: <20240912172737.1202b2bd@arachnoid> + +? Thu, 12 Sep 2024 12:01:54 +0000 +Simon Andrews via R-devel ?????: + +> readRDS('https://seurat.nygenome.org/azimuth/references/homologs.rds') +> Error in gzfile(file, "rb") : cannot open the connection + +I don't think that gzfile works with URLs. gzcon(), on the other hand, +does work with url() connections, which accepts the 'method' argument +and the getOption('url.method') default. + +h <- readRDS(url( + 'https://seurat.nygenome.org/azimuth/references/homologs.rds' +)) + +But that only works with gzip-compressed files. For example, CRAN's +PACKAGES.rds is xz-compressed, and I don't see a way to read it the +same way: + +readBin( + index <- file.path( + contrib.url(getOption('repos')['CRAN']), + 'PACKAGES.rds' + ), raw(), 5 +) |> rawToChar() +# [1] "\xfd7zXZ" <-- note the "7zXZ" header +readRDS(url(index)) +# Error in readRDS(url(index)) : unknown input format + +> 2. Given the warnings we're getting when using wininet, are their +> plans to make windows certficates be supported in another way? + +What does libcurlVersion() return for you? In theory, it should be +possible to make libcurl use schannel and therefore the system +certificate store for TLS verification purposes. + +-- +Best regards, +Ivan + + +From @|mon@@ndrew@ @end|ng |rom b@br@h@m@@c@uk Thu Sep 12 17:06:50 2024 +From: @|mon@@ndrew@ @end|ng |rom b@br@h@m@@c@uk (Simon Andrews) +Date: Thu, 12 Sep 2024 15:06:50 +0000 +Subject: [Rd] Can gzfile be given the same method option as file +In-Reply-To: <20240912172737.1202b2bd@arachnoid> +References: + <20240912172737.1202b2bd@arachnoid> +Message-ID: + +Thankyou! This helped a lot. I had mis-understood some of the chain of functions which got to the eventual failure. I can confirm that it does indeed work if you create a url() first and it picks the appropriate back end as long as the url.method option is set. + +For the schannel back end I have: + +> libcurlVersion() +[1] "8.6.0" +attr(,"ssl_version") +[1] "(OpenSSL/3.2.1) Schannel" +attr(,"libssh_version") +[1] "libssh2/1.11.0" + +However I can't get either of the curl related methods to work. + +> download.file('https://seurat.nygenome.org/azimuth/references/homologs.rds', destfile = "c:/Users/andrewss/homologs.rds", method="libcurl") +trying URL 'https://seurat.nygenome.org/azimuth/references/homologs.rds' +Error in download.file("https://seurat.nygenome.org/azimuth/references/homologs.rds", : + cannot open URL 'https://seurat.nygenome.org/azimuth/references/homologs.rds' +In addition: Warning message: +In download.file("https://seurat.nygenome.org/azimuth/references/homologs.rds", : + URL 'https://seurat.nygenome.org/azimuth/references/homologs.rds': status was 'SSL connect error' + +> download.file('https://seurat.nygenome.org/azimuth/references/homologs.rds', destfile = "c:/Users/andrewss/homologs.rds", method="curl") + % Total % Received % Xferd Average Speed Time Time Time Current + Dload Upload Total Spent Left Speed + 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 +curl: (35) schannel: next InitializeSecurityContext failed: CRYPT_E_NO_REVOCATION_CHECK (0x80092012) - The revocation function was unable to check revocation for the certificate. +Error in download.file("https://seurat.nygenome.org/azimuth/references/homologs.rds", : + 'curl' call had nonzero exit status + +I realise that this may not be as simple as the certificate not being seen, and that the system here may not fake the revocation infrastructure too, but I don't see that this is going to change, and it's only the winet method which actually allows anything to connect. + +Simon. + + + + +-----Original Message----- +From: Ivan Krylov +Sent: 12 September 2024 15:28 +To: Simon Andrews via R-devel +Cc: Simon Andrews +Subject: Re: [Rd] Can gzfile be given the same method option as file + +? Thu, 12 Sep 2024 12:01:54 +0000 +Simon Andrews via R-devel ?????: + +> readRDS('https://seurat.nygenome.org/azimuth/references/homologs.rds') +> Error in gzfile(file, "rb") : cannot open the connection + +I don't think that gzfile works with URLs. gzcon(), on the other hand, does work with url() connections, which accepts the 'method' argument and the getOption('url.method') default. + +h <- readRDS(url( + 'https://seurat.nygenome.org/azimuth/references/homologs.rds' +)) + +But that only works with gzip-compressed files. For example, CRAN's PACKAGES.rds is xz-compressed, and I don't see a way to read it the same way: + +readBin( + index <- file.path( + contrib.url(getOption('repos')['CRAN']), + 'PACKAGES.rds' + ), raw(), 5 +) |> rawToChar() +# [1] "\xfd7zXZ" <-- note the "7zXZ" header +readRDS(url(index)) +# Error in readRDS(url(index)) : unknown input format + +> 2. Given the warnings we're getting when using wininet, are their +> plans to make windows certficates be supported in another way? + +What does libcurlVersion() return for you? In theory, it should be possible to make libcurl use schannel and therefore the system certificate store for TLS verification purposes. + +-- +Best regards, +Ivan + +------------------------------------ +This email has been scanned for spam & viruses. If you believe this email should have been stopped by our filters, click the following link to report it (https://portal-uk.mailanyone.net/index.html#/outer/reportspam?token=dXNlcj1zaW1vbi5hbmRyZXdzQGJhYnJhaGFtLmFjLnVrO3RzPTE3MjYxNTEyNjU7dXVpZD02NkUyRkE2MDJDNjczRjRCREUwOTMxQUM4NTdCNkY3Nzt0b2tlbj1lNWI5MzU2NGJmOWE1MTcwYmM4ZmY2YjNhNTYwMWQ5ZmFkOTU2YWE1Ow%3D%3D). + + +From |kry|ov @end|ng |rom d|@root@org Thu Sep 12 17:23:02 2024 +From: |kry|ov @end|ng |rom d|@root@org (Ivan Krylov) +Date: Thu, 12 Sep 2024 18:23:02 +0300 +Subject: [Rd] Can gzfile be given the same method option as file +In-Reply-To: +References: + <20240912172737.1202b2bd@arachnoid> + +Message-ID: <20240912182302.5ddbe5a8@arachnoid> + +? Thu, 12 Sep 2024 15:06:50 +0000 +Simon Andrews ?????: + +> > download.file('https://seurat.nygenome.org/azimuth/references/homologs.rds', +> > destfile = "c:/Users/andrewss/homologs.rds", method="libcurl") +<...> +> status was 'SSL connect error' +> +> > download.file('https://seurat.nygenome.org/azimuth/references/homologs.rds', +> > destfile = "c:/Users/andrewss/homologs.rds", method="curl") +<...> +> curl: (35) schannel: next InitializeSecurityContext +> failed: CRYPT_E_NO_REVOCATION_CHECK (0x80092012) - The revocation +> function was unable to check revocation for the certificate. + +This extra error code is useful, thank you for trying the "curl" +method. https://github.com/curl/curl/issues/14315 suggests a libcurl +option and a curl command line option. + +Does download.file(method = 'curl', extra = '--ssl-no-revoke') work for +you? + +Since R-4.2.2, R understands the R_LIBCURL_SSL_REVOKE_BEST_EFFORT +environment variable. Does it help to set it to "TRUE" (e.g. in the +.Renviron file) before invoking download.file(method = "libcurl")? + +Some extra context can be found in +news(grepl('R_LIBCURL_SSL_REVOKE_BEST_EFFORT', Text)) and +. + +-- +Best regards, +Ivan + + +From @|mon@@ndrew@ @end|ng |rom b@br@h@m@@c@uk Thu Sep 12 17:36:32 2024 +From: @|mon@@ndrew@ @end|ng |rom b@br@h@m@@c@uk (Simon Andrews) +Date: Thu, 12 Sep 2024 15:36:32 +0000 +Subject: [Rd] Can gzfile be given the same method option as file +In-Reply-To: <20240912182302.5ddbe5a8@arachnoid> +References: + <20240912172737.1202b2bd@arachnoid> + + <20240912182302.5ddbe5a8@arachnoid> +Message-ID: + +? Thu, 12 Sep 2024 15:06:50 +0000 +Simon Andrews ?????: + +> > > download.file('https://seurat.nygenome.org/azimuth/references/homolo +> > > gs.rds', destfile = "c:/Users/andrewss/homologs.rds", method="curl") +<...> +> > curl: (35) schannel: next InitializeSecurityContext +> > failed: CRYPT_E_NO_REVOCATION_CHECK (0x80092012) - The revocation +> > function was unable to check revocation for the certificate. + +> This extra error code is useful, thank you for trying the "curl" +> method. https://github.com/curl/curl/issues/14315 suggests a libcurl option and a curl command line option. +> +> Does download.file(method = 'curl', extra = '--ssl-no-revoke') work for you? + +Yes! Adding that option does indeed work and generates no warnings. + +> Since R-4.2.2, R understands the R_LIBCURL_SSL_REVOKE_BEST_EFFORT environment variable. Does it help to set it > to "TRUE" (e.g. in the .Renviron file) before invoking download.file(method = "libcurl")? + +Yes, this also works and will provide a workable solution for our environment + +> Sys.getenv("R_LIBCURL_SSL_REVOKE_BEST_EFFORT") +[1] "TRUE" +> download.file('https://seurat.nygenome.org/azimuth/references/homologs.rds', destfile = "c:/Users/andrewss/homologs.rds", method="libcurl") +trying URL 'https://seurat.nygenome.org/azimuth/references/homologs.rds' +Content type 'application/octet-stream' length 3458249 bytes (3.3 MB) +downloaded 3.3 MB + +Thank you so much for your help with this. I shall implement this for the rest of our organisation. + +Simon. + + diff --git a/r-help/2024-September.txt b/r-help/2024-September.txt new file mode 100644 index 0000000..5039a30 --- /dev/null +++ b/r-help/2024-September.txt @@ -0,0 +1,6655 @@ +From @vi@e@gross m@iii@g oii gm@ii@com Sun Sep 1 00:17:27 2024 +From: @vi@e@gross m@iii@g oii gm@ii@com (@vi@e@gross m@iii@g oii gm@ii@com) +Date: Sat, 31 Aug 2024 18:17:27 -0400 +Subject: [R] aggregating data with quality control +In-Reply-To: <27c3b1964d64474aac11678760647e61@regione.marche.it> +References: <27c3b1964d64474aac11678760647e61@regione.marche.it> +Message-ID: <017a01dafbf3$902a31c0$b07e9540$@gmail.com> + +Stefano, + +I see you already have an answer that works for you. + +Sometimes you want to step back and see if some modification makes a problem easier to solve. + +I often simply switch to using tools in the tidyverse such as dplyr for parts of the job albeit much of the same can be done using functions built-in to R. + +In your case, there are many possible solutions besides taking the max in some way as in a factor column. + +You seem to expect exactly 48 measurements. Currently you encode them as one of two character strings but if this is really a binary choice, you could have used a 0/1 or TRUE/FALSE column instead, or make one. This lets you do things like take the sum and compare it to 48 to see if all are true, or to zero to check if all are false. You could take the product to check if at least one is false or use a negation for another perspective. If the number of rows may not be 48, you can compare to a calculation of the actual number of rows in that subset. + +If your data was placed into wide format, say based on your hs field being unique for each test site, there are similar ideas by taking a subset of the columns and applying things by using functions like rowSum. + +Again, some things I commonly use in dplyr such as group_by() and how it impacts other operations including reports, makes this a little different but most things can be done with careful use of base R, except areas where dplyr supports more and more abstract ways to specify what you want and that your example does not need. + +Just FYI, you did not share what your function my.mean() is. + +I won't share the code unless interested but it looks like part of what you are doing is to bundle by a truncated version of date/time to just a day. I am not sure your method is optimal. You make a list of three different things containing parts of a date. That can work but as dates are already looking like 2024-01-02 which sorts and compares well alphabetically, I wonder if instead you group by that. + + +-----Original Message----- +From: R-help On Behalf Of Stefano Sofia +Sent: Saturday, August 31, 2024 7:15 AM +To: r-help at R-project.org +Subject: [R] aggregating data with quality control + +Dear R-list users, + +I deal with semi-hourly data from automatic meteorological stations. + +They have to pass a manual validation; suppose that status = "C" stands for correct and status = "D" for discarded. + +Here a simple example with "Snow height" (HS): + + +mydf <- data.frame(data_POSIX=seq(as.POSIXct("2024-01-01 00:00:00", format = "%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), as.POSIXct("2024-01-02 23:30:00", format = "%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), by="30 min")) + +mydf$hs <- round(runif(96, 0, 100)) + +mydf$status <- c(rep("C", 50), "S", rep("C", 45)) + + +Evaluating the daily mean indipendently from the status is very easy: + +aggregate(mydf$hs, by=list(format(mydf$data_POSIX, "%Y"), format(mydf$data_POSIX, "%m"), format(mydf$data_POSIX, "%d")), my.mean) + + +Things become more complicated when I need to export also the status: this should be "C" when all 48 data have status equal to "C", and status "D" when at least one value has status ="D". + + +I have no clue on how to do that in an efficient way. + +Could some of you give me some clues on how to do that? + + +Thank you for your usual support + +Stefano Sofia + + + (oo) +--oOO--( )--OOo-------------------------------------- +Stefano Sofia PhD +Civil Protection - Marche Region - Italy +Meteo Section +Snow Section +Via del Colle Ameno 5 +60126 Torrette di Ancona, Ancona (AN) +Uff: +39 071 806 7743 +E-mail: stefano.sofia at regione.marche.it +---Oo---------oO---------------------------------------- + +________________________________ + +AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si ? il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la risposta al presente messaggio di posta elettronica pu? essere visionata da persone estranee al destinatario. +IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system. + + [[alternative HTML version deleted]] + + +From bog@@o@chr|@to|er @end|ng |rom gm@||@com Tue Sep 3 01:26:17 2024 +From: bog@@o@chr|@to|er @end|ng |rom gm@||@com (Christofer Bogaso) +Date: Tue, 3 Sep 2024 04:56:17 +0530 +Subject: [R] Adding parameters for Benchmark normal distribution in + shapiro.test +Message-ID: + +Hi, + +In ?shapiro.test, there seems to be no option to pass mean and sd +information of the Normal distribution which I want to compare my +sample data to. + +For example in the code below, I want to test my sample to N(0, 10). + +shapiro.test(rnorm(100, mean = 5, sd = 3)) + +Is there any way to pass the information of the benchmark normal distribution? + + +From jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ Tue Sep 3 01:53:50 2024 +From: jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ (Jeff Newmiller) +Date: Mon, 02 Sep 2024 16:53:50 -0700 +Subject: [R] Adding parameters for Benchmark normal distribution in + shapiro.test +In-Reply-To: +References: +Message-ID: <637456A8-A64B-44DB-AADE-E660B413238B@dcn.davis.ca.us> + +Wouldn't that be because the sample is not being compared to a specific distribution but rather to many possible distributions by MC? [1] + +If you think that need not be the case, perhaps you can write your own test... but then it will probably be answering a different question? + +[1] https://en.m.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test + +On September 2, 2024 4:26:17 PM PDT, Christofer Bogaso wrote: +>Hi, +> +>In ?shapiro.test, there seems to be no option to pass mean and sd +>information of the Normal distribution which I want to compare my +>sample data to. +> +>For example in the code below, I want to test my sample to N(0, 10). +> +>shapiro.test(rnorm(100, mean = 5, sd = 3)) +> +>Is there any way to pass the information of the benchmark normal distribution? +> +>______________________________________________ +>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>https://stat.ethz.ch/mailman/listinfo/r-help +>PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +>and provide commented, minimal, self-contained, reproducible code. + +-- +Sent from my phone. Please excuse my brevity. + + +From bbo|ker @end|ng |rom gm@||@com Tue Sep 3 02:22:08 2024 +From: bbo|ker @end|ng |rom gm@||@com (Ben Bolker) +Date: Mon, 2 Sep 2024 20:22:08 -0400 +Subject: [R] Adding parameters for Benchmark normal distribution in + shapiro.test +In-Reply-To: +References: +Message-ID: + + From Shapiro and Wilk's paper: + + > The objective is to derive a test for the hypothesis that this is a +sample from a normal distribution with unknown mean mu and unknown +variance sigma^2 + + That is, the estimates of the mean and SD are folded into the +derivation of the test statistic. + + If you want to test against a specified alternative you could try +e.g. a Kolmogorov-Smirnov test + +set.seed(101) +x <- rnorm(100, mean = 5, sd = 3) + +ks.test(x, "pnorm", 0, 10) + + + + +On 2024-09-02 7:26 p.m., Christofer Bogaso wrote: +> Hi, +> +> In ?shapiro.test, there seems to be no option to pass mean and sd +> information of the Normal distribution which I want to compare my +> sample data to. +> +> For example in the code below, I want to test my sample to N(0, 10). +> +> shapiro.test(rnorm(100, mean = 5, sd = 3)) +> +> Is there any way to pass the information of the benchmark normal distribution? +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + +-- +Dr. Benjamin Bolker +Professor, Mathematics & Statistics and Biology, McMaster University +Director, School of Computational Science and Engineering + > E-mail is sent at my convenience; I don't expect replies outside of +working hours. + + +From p@ych@o||u @end|ng |rom gm@||@com Tue Sep 3 18:30:54 2024 +From: p@ych@o||u @end|ng |rom gm@||@com (Chao Liu) +Date: Tue, 3 Sep 2024 12:30:54 -0400 +Subject: [R] Goodreader: Scrape and Analyze 'Goodreads' Book Data +Message-ID: + +Dear R Users, + +I am pleased to announce that Goodreader 0.1.1 is now available on CRAN. + +Goodreader offers a toolkit for scraping and analyzing book data from +Goodreads. Users can search for books, scrape detailed information and +reviews, perform sentiment analysis on reviews, and conduct topic modeling. + +Here?s a quick overview of how to use Goodreader: +# Search for books +AI_df <- search_goodreads(search_term = "artificial intelligence", +search_in = "title", num_books = 10, sort_by = "ratings") + +# Retrieve Book IDs and save them to a text file +get_book_ids(input_data = AI_df, file_name = "AI_books.txt") + +# Get book-related information +scrape_books(book_ids_path = "AI_books.txt") + +# Scrape book reviews +scrape_reviews(book_ids_path = "AI_books.txt", num_reviews = 10) + +For more details, please visit: https://liu-chao.site/Goodreader/ + +Best regards, + +Chao Liu + + [[alternative HTML version deleted]] + + +From bog@@o@chr|@to|er @end|ng |rom gm@||@com Wed Sep 4 01:15:11 2024 +From: bog@@o@chr|@to|er @end|ng |rom gm@||@com (Christofer Bogaso) +Date: Wed, 4 Sep 2024 04:45:11 +0530 +Subject: [R] How R calculates SE of prediction for Logistic regression? +Message-ID: + +Hi, + +I have below logistic regression + +Dat = +read.csv('https://raw.githubusercontent.com/sam16tyagi/Machine-Learning-techniques-in-python/master/logistic%20regression%20dataset-Social_Network_Ads.csv') +head(Dat) +Model = glm(Purchased ~ Gender, data = Dat, family = binomial()) + +How I can get Standard deviation of forecasts as + +head(predict(Model, type="response", se.fit = T)$se.fit) + +My question: given that in Logistic regression, logit link is used, +how R calculate SE for the predicted probability from the VCV matrix +of estimated coefficients? + +Does R uses some approximation like delta rule? + + +From |@go@g|ne @end|ng |rom @jd@e@ Wed Sep 4 07:23:48 2024 +From: |@go@g|ne @end|ng |rom @jd@e@ (=?iso-8859-1?Q?Iago_Gin=E9_V=E1zquez?=) +Date: Wed, 4 Sep 2024 05:23:48 +0000 +Subject: [R] fixed set.seed + kmeans output disagree on distinct platforms +Message-ID: + +Hi all, + +I build a dataset processing in the same way the same data in Windows than in Linux. + +The output of Windows processing is: https://gitlab.com/iagogv/repdata/-/raw/main/exdata.csv?ref_type=heads +The output of Linux processing is: https://gitlab.com/iagogv/repdata/-/raw/main/exdata2.csv?ref_type=heads + +exdata=as.matrix(read.csv("https://gitlab.com/iagogv/repdata/-/raw/main/exdata.csv?ref_type=heads", header=FALSE)) +exdata2=as.matrix(read.csv("https://gitlab.com/iagogv/repdata/-/raw/main/exdata2.csv?ref_type=heads", header=FALSE)) + +They are not identical (`identical(exdata,exdata2)` is FALSE), but they are essentially equal (`all.equal(exdata,exdata2)` is TRUE). If I run + +set.seed(20232260) +exkmns <- kmeans(exdata, centers = 7, iter.max = 2000, nstart = 750) + +I get + +exkmns$centers + V1 V2 V3 V4 V5 V6 +1 -0.4910731 -0.2662055 0.57928758 0.14267293 -0.03013791 0.106472717 +2 0.5301237 0.2815620 -0.23898532 1.00979412 -0.26123328 0.068099931 +3 0.2255298 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.107538855 +4 -0.2616257 0.5680582 0.55387437 -0.09562789 -0.01706577 -0.028248679 +5 -0.4820078 -0.1667370 -0.46533618 -0.05271446 0.05477352 0.005236259 +6 0.6455994 -0.1396674 0.05988547 -0.15557399 0.62766365 0.031051986 +7 0.1072127 0.5538876 -0.33117098 -0.43209203 -0.18646403 -0.081273130 + +both in Windows (1) and in Linux (2, 3) up to rows order. If I run in Linux in my computer (2) + +set.seed(20232260) +exkmns2 <- kmeans(exdata2, centers = 7, iter.max = 2000, nstart = 750) + +then, I get + +exkmns2$centers + V1 V2 V3 V4 V5 V6 +1 0.64559941 -0.1396674 0.05988547 -0.15557399 0.62766365 0.03105199 +2 -0.26162573 0.5680582 0.55387437 -0.09562789 -0.01706577 -0.02824868 +3 0.53012369 0.2815620 -0.23898532 1.00979412 -0.26123328 0.06809993 +4 0.03409765 0.3492520 -0.36910409 -0.40721418 -0.21482793 0.03073180 +5 -0.58527394 -0.1790337 -0.46778956 0.03573883 0.15473589 -0.07980379 +6 -0.49107314 -0.2662055 0.57928758 0.14267293 -0.03013791 0.10647272 +7 0.22552984 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.10753886 + +therefore, all rows essentially equal except for rows 5 and 7 of first dataset (5 and 4 of second dataset). With a bit more detail: + + * +Row 0.2255298 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.107538855 belongs to exdata (and exdata2) and is center of both outputs + * +Row 0.1072127 0.5538876 -0.33117098 -0.43209203 -0.18646403 -0.081273130 belongs to the dataset and it is only center of exdata output + * +Row -0.4820078 -0.1667370 -0.46533618 -0.05271446 0.05477352 0.005236259 does not belong to the dataset and it is only center of exdata output + * +Row -0.58527394 -0.1790337 -0.46778956 0.03573883 0.15473589 -0.07980379 belongs to the dataset and it is only center for exdata2 on Linux in my computer + * +Row 0.03409765 0.3492520 -0.36910409 -0.40721418 -0.21482793 0.03073180 does not belong to the dataset and it is only center for exdata2 on Linux in my computer + * +All other 4 rows (1,2,4 and 6 of first output) do not belong to the dataset and are common centers. + +Even, further, if I run + +set.seed(20232260) +exkmns <- kmeans(exdata, centers = 7, iter.max = 2000, nstart = 750) + +in posit.cloud (3), I get the same result than above. However, if I run (both in posit.cloud or in Windows) + +set.seed(20232260) +exkmns2 <- kmeans(exdata2, centers = 7, iter.max = 2000, nstart = 750) + +then I get + + +exkmns2$centers + V1 V2 V3 V4 V5 V6 +1 0.6426035 -0.1449498 0.05843435 -0.1527968 0.62943077 0.02984948 +2 -0.4092382 -0.3740695 0.69597037 0.1956896 -0.05026200 -0.01453132 +3 0.1072127 0.5538876 -0.33117098 -0.4320920 -0.18646403 -0.08127313 +4 0.2255298 -0.5165964 -0.02498471 -0.2043827 -0.41224195 -0.10753886 +5 0.5301237 0.2815620 -0.23898532 1.0097941 -0.26123328 0.06809993 +6 -0.5223387 -0.1484517 -0.38982567 -0.0341488 0.06446446 0.03622056 +7 -0.2701703 0.5263218 0.52942311 -0.1112202 -0.03460591 0.03577287 + +So only its rows 4 and 5 are common centers to both of previous outputs and row 3 is common width exdata centers. + +Does all this have any sense? + +Thanks! + +Iago + +(1) +R version 4.4.1 (2024-06-14 ucrt) +Platform: x86_64-w64-mingw32/x64 +Running under: Windows 10 x64 (build 19045) + +Matrix products: default + +(2) +R version 4.4.1 (2024-06-14) +Platform: x86_64-pc-linux-gnu +Running under: Debian GNU/Linux 12 (bookworm) + +Matrix products: default +BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 +LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so; LAPACK version 3.11.0 + +(3) +R version 4.4.1 (2024-06-14) +Platform: x86_64-pc-linux-gnu +Running under: Ubuntu 20.04.6 LTS + + Matrix products: default +BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so; LAPACK version 3.9.0 + + + + + [[alternative HTML version deleted]] + + +From bgunter@4567 @end|ng |rom gm@||@com Wed Sep 4 08:32:25 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Tue, 3 Sep 2024 23:32:25 -0700 +Subject: [R] + fixed set.seed + kmeans output disagree on distinct platforms +In-Reply-To: +References: +Message-ID: + +I have no clue, but I did note that you are using different versions of +BLAS/LAPACK on the different platforms. Could that be (part) of the issue? + +Cheers, +Bert + +On Tue, Sep 3, 2024 at 10:24?PM Iago Gin? V?zquez wrote: + +> Hi all, +> +> I build a dataset processing in the same way the same data in Windows than +> in Linux. +> +> The output of Windows processing is: +> https://gitlab.com/iagogv/repdata/-/raw/main/exdata.csv?ref_type=heads +> The output of Linux processing is: +> https://gitlab.com/iagogv/repdata/-/raw/main/exdata2.csv?ref_type=heads +> +> exdata=as.matrix(read.csv(" +> https://gitlab.com/iagogv/repdata/-/raw/main/exdata.csv?ref_type=heads", +> header=FALSE)) +> exdata2=as.matrix(read.csv(" +> https://gitlab.com/iagogv/repdata/-/raw/main/exdata2.csv?ref_type=heads", +> header=FALSE)) +> +> They are not identical (`identical(exdata,exdata2)` is FALSE), but they +> are essentially equal (`all.equal(exdata,exdata2)` is TRUE). If I run +> +> set.seed(20232260) +> exkmns <- kmeans(exdata, centers = 7, iter.max = 2000, nstart = 750) +> +> I get +> +> exkmns$centers +> V1 V2 V3 V4 V5 V6 +> 1 -0.4910731 -0.2662055 0.57928758 0.14267293 -0.03013791 0.106472717 +> 2 0.5301237 0.2815620 -0.23898532 1.00979412 -0.26123328 0.068099931 +> 3 0.2255298 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.107538855 +> 4 -0.2616257 0.5680582 0.55387437 -0.09562789 -0.01706577 -0.028248679 +> 5 -0.4820078 -0.1667370 -0.46533618 -0.05271446 0.05477352 0.005236259 +> 6 0.6455994 -0.1396674 0.05988547 -0.15557399 0.62766365 0.031051986 +> 7 0.1072127 0.5538876 -0.33117098 -0.43209203 -0.18646403 -0.081273130 +> +> both in Windows (1) and in Linux (2, 3) up to rows order. If I run in +> Linux in my computer (2) +> +> set.seed(20232260) +> exkmns2 <- kmeans(exdata2, centers = 7, iter.max = 2000, nstart = 750) +> +> then, I get +> +> exkmns2$centers +> V1 V2 V3 V4 V5 V6 +> 1 0.64559941 -0.1396674 0.05988547 -0.15557399 0.62766365 0.03105199 +> 2 -0.26162573 0.5680582 0.55387437 -0.09562789 -0.01706577 -0.02824868 +> 3 0.53012369 0.2815620 -0.23898532 1.00979412 -0.26123328 0.06809993 +> 4 0.03409765 0.3492520 -0.36910409 -0.40721418 -0.21482793 0.03073180 +> 5 -0.58527394 -0.1790337 -0.46778956 0.03573883 0.15473589 -0.07980379 +> 6 -0.49107314 -0.2662055 0.57928758 0.14267293 -0.03013791 0.10647272 +> 7 0.22552984 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.10753886 +> +> therefore, all rows essentially equal except for rows 5 and 7 of first +> dataset (5 and 4 of second dataset). With a bit more detail: +> +> * +> Row 0.2255298 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.107538855 +> belongs to exdata (and exdata2) and is center of both outputs +> * +> Row 0.1072127 0.5538876 -0.33117098 -0.43209203 -0.18646403 -0.081273130 +> belongs to the dataset and it is only center of exdata output +> * +> Row -0.4820078 -0.1667370 -0.46533618 -0.05271446 0.05477352 0.005236259 +> does not belong to the dataset and it is only center of exdata output +> * +> Row -0.58527394 -0.1790337 -0.46778956 0.03573883 0.15473589 -0.07980379 +> belongs to the dataset and it is only center for exdata2 on Linux in my +> computer +> * +> Row 0.03409765 0.3492520 -0.36910409 -0.40721418 -0.21482793 0.03073180 +> does not belong to the dataset and it is only center for exdata2 on Linux +> in my computer +> * +> All other 4 rows (1,2,4 and 6 of first output) do not belong to the +> dataset and are common centers. +> +> Even, further, if I run +> +> set.seed(20232260) +> exkmns <- kmeans(exdata, centers = 7, iter.max = 2000, nstart = 750) +> +> in posit.cloud (3), I get the same result than above. However, if I run +> (both in posit.cloud or in Windows) +> +> set.seed(20232260) +> exkmns2 <- kmeans(exdata2, centers = 7, iter.max = 2000, nstart = 750) +> +> then I get +> +> +> exkmns2$centers +> V1 V2 V3 V4 V5 V6 +> 1 0.6426035 -0.1449498 0.05843435 -0.1527968 0.62943077 0.02984948 +> 2 -0.4092382 -0.3740695 0.69597037 0.1956896 -0.05026200 -0.01453132 +> 3 0.1072127 0.5538876 -0.33117098 -0.4320920 -0.18646403 -0.08127313 +> 4 0.2255298 -0.5165964 -0.02498471 -0.2043827 -0.41224195 -0.10753886 +> 5 0.5301237 0.2815620 -0.23898532 1.0097941 -0.26123328 0.06809993 +> 6 -0.5223387 -0.1484517 -0.38982567 -0.0341488 0.06446446 0.03622056 +> 7 -0.2701703 0.5263218 0.52942311 -0.1112202 -0.03460591 0.03577287 +> +> So only its rows 4 and 5 are common centers to both of previous outputs +> and row 3 is common width exdata centers. +> +> Does all this have any sense? +> +> Thanks! +> +> Iago +> +> (1) +> R version 4.4.1 (2024-06-14 ucrt) +> Platform: x86_64-w64-mingw32/x64 +> Running under: Windows 10 x64 (build 19045) +> +> Matrix products: default +> +> (2) +> R version 4.4.1 (2024-06-14) +> Platform: x86_64-pc-linux-gnu +> Running under: Debian GNU/Linux 12 (bookworm) +> +> Matrix products: default +> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 +> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so; +> LAPACK version 3.11.0 +> +> (3) +> R version 4.4.1 (2024-06-14) +> Platform: x86_64-pc-linux-gnu +> Running under: Ubuntu 20.04.6 LTS +> +> Matrix products: default +> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/ +> libopenblasp-r0.3.8.so; LAPACK version 3.9.0 +> +> +> +> +> [[alternative HTML version deleted]] +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + [[alternative HTML version deleted]] + + +From m@ech|er @end|ng |rom @t@t@m@th@ethz@ch Wed Sep 4 10:41:34 2024 +From: m@ech|er @end|ng |rom @t@t@m@th@ethz@ch (Martin Maechler) +Date: Wed, 4 Sep 2024 10:41:34 +0200 +Subject: [R] + fixed set.seed + kmeans output disagree on distinct platforms +In-Reply-To: +References: + +Message-ID: <26328.7486.699492.779127@stat.math.ethz.ch> + +>>>>> Bert Gunter +>>>>> on Tue, 3 Sep 2024 23:32:25 -0700 writes: + + > I have no clue, but I did note that you are using different versions of + > BLAS/LAPACK on the different platforms. Could that be (part) of the issue? + +Good catch! My gut feeling would say "yes!" that is almost surely part of +the issue. + + > Cheers, + > Bert + +Additionally, careful reading of the help page (*before* any post ..) +would have shown + + Note: + + The clusters are numbered in the returned object, but they are a + _set_ and no ordering is implied. (Their apparent ordering may + differ by platform.) + + +Martin + + + + > On Tue, Sep 3, 2024 at 10:24?PM Iago Gin? V?zquez wrote: + + >> Hi all, + >> + >> I build a dataset processing in the same way the same data in Windows than + >> in Linux. + >> + >> The output of Windows processing is: + >> https://gitlab.com/iagogv/repdata/-/raw/main/exdata.csv?ref_type=heads + >> The output of Linux processing is: + >> https://gitlab.com/iagogv/repdata/-/raw/main/exdata2.csv?ref_type=heads + >> + >> exdata=as.matrix(read.csv(" + >> https://gitlab.com/iagogv/repdata/-/raw/main/exdata.csv?ref_type=heads", + >> header=FALSE)) + >> exdata2=as.matrix(read.csv(" + >> https://gitlab.com/iagogv/repdata/-/raw/main/exdata2.csv?ref_type=heads", + >> header=FALSE)) + >> + >> They are not identical (`identical(exdata,exdata2)` is FALSE), but they + >> are essentially equal (`all.equal(exdata,exdata2)` is TRUE). If I run + >> + >> set.seed(20232260) + >> exkmns <- kmeans(exdata, centers = 7, iter.max = 2000, nstart = 750) + >> + >> I get + >> + >> exkmns$centers + >> V1 V2 V3 V4 V5 V6 + >> 1 -0.4910731 -0.2662055 0.57928758 0.14267293 -0.03013791 0.106472717 + >> 2 0.5301237 0.2815620 -0.23898532 1.00979412 -0.26123328 0.068099931 + >> 3 0.2255298 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.107538855 + >> 4 -0.2616257 0.5680582 0.55387437 -0.09562789 -0.01706577 -0.028248679 + >> 5 -0.4820078 -0.1667370 -0.46533618 -0.05271446 0.05477352 0.005236259 + >> 6 0.6455994 -0.1396674 0.05988547 -0.15557399 0.62766365 0.031051986 + >> 7 0.1072127 0.5538876 -0.33117098 -0.43209203 -0.18646403 -0.081273130 + >> + >> both in Windows (1) and in Linux (2, 3) up to rows order. If I run in + >> Linux in my computer (2) + >> + >> set.seed(20232260) + >> exkmns2 <- kmeans(exdata2, centers = 7, iter.max = 2000, nstart = 750) + >> + >> then, I get + >> + >> exkmns2$centers + >> V1 V2 V3 V4 V5 V6 + >> 1 0.64559941 -0.1396674 0.05988547 -0.15557399 0.62766365 0.03105199 + >> 2 -0.26162573 0.5680582 0.55387437 -0.09562789 -0.01706577 -0.02824868 + >> 3 0.53012369 0.2815620 -0.23898532 1.00979412 -0.26123328 0.06809993 + >> 4 0.03409765 0.3492520 -0.36910409 -0.40721418 -0.21482793 0.03073180 + >> 5 -0.58527394 -0.1790337 -0.46778956 0.03573883 0.15473589 -0.07980379 + >> 6 -0.49107314 -0.2662055 0.57928758 0.14267293 -0.03013791 0.10647272 + >> 7 0.22552984 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.10753886 + >> + >> therefore, all rows essentially equal except for rows 5 and 7 of first + >> dataset (5 and 4 of second dataset). With a bit more detail: + >> + >> * + >> Row 0.2255298 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.107538855 + >> belongs to exdata (and exdata2) and is center of both outputs + >> * + >> Row 0.1072127 0.5538876 -0.33117098 -0.43209203 -0.18646403 -0.081273130 + >> belongs to the dataset and it is only center of exdata output + >> * + >> Row -0.4820078 -0.1667370 -0.46533618 -0.05271446 0.05477352 0.005236259 + >> does not belong to the dataset and it is only center of exdata output + >> * + >> Row -0.58527394 -0.1790337 -0.46778956 0.03573883 0.15473589 -0.07980379 + >> belongs to the dataset and it is only center for exdata2 on Linux in my + >> computer + >> * + >> Row 0.03409765 0.3492520 -0.36910409 -0.40721418 -0.21482793 0.03073180 + >> does not belong to the dataset and it is only center for exdata2 on Linux + >> in my computer + >> * + >> All other 4 rows (1,2,4 and 6 of first output) do not belong to the + >> dataset and are common centers. + >> + >> Even, further, if I run + >> + >> set.seed(20232260) + >> exkmns <- kmeans(exdata, centers = 7, iter.max = 2000, nstart = 750) + >> + >> in posit.cloud (3), I get the same result than above. However, if I run + >> (both in posit.cloud or in Windows) + >> + >> set.seed(20232260) + >> exkmns2 <- kmeans(exdata2, centers = 7, iter.max = 2000, nstart = 750) + >> + >> then I get + >> + >> + >> exkmns2$centers + >> V1 V2 V3 V4 V5 V6 + >> 1 0.6426035 -0.1449498 0.05843435 -0.1527968 0.62943077 0.02984948 + >> 2 -0.4092382 -0.3740695 0.69597037 0.1956896 -0.05026200 -0.01453132 + >> 3 0.1072127 0.5538876 -0.33117098 -0.4320920 -0.18646403 -0.08127313 + >> 4 0.2255298 -0.5165964 -0.02498471 -0.2043827 -0.41224195 -0.10753886 + >> 5 0.5301237 0.2815620 -0.23898532 1.0097941 -0.26123328 0.06809993 + >> 6 -0.5223387 -0.1484517 -0.38982567 -0.0341488 0.06446446 0.03622056 + >> 7 -0.2701703 0.5263218 0.52942311 -0.1112202 -0.03460591 0.03577287 + >> + >> So only its rows 4 and 5 are common centers to both of previous outputs + >> and row 3 is common width exdata centers. + >> + >> Does all this have any sense? + >> + >> Thanks! + >> + >> Iago + >> + >> (1) + >> R version 4.4.1 (2024-06-14 ucrt) + >> Platform: x86_64-w64-mingw32/x64 + >> Running under: Windows 10 x64 (build 19045) + >> + >> Matrix products: default + >> + >> (2) + >> R version 4.4.1 (2024-06-14) + >> Platform: x86_64-pc-linux-gnu + >> Running under: Debian GNU/Linux 12 (bookworm) + >> + >> Matrix products: default + >> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 + >> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so; + >> LAPACK version 3.11.0 + >> + >> (3) + >> R version 4.4.1 (2024-06-14) + >> Platform: x86_64-pc-linux-gnu + >> Running under: Ubuntu 20.04.6 LTS + >> + >> Matrix products: default + >> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/ + >> libopenblasp-r0.3.8.so; LAPACK version 3.9.0 + >> + + +From |@go@g|ne @end|ng |rom @jd@e@ Wed Sep 4 11:18:55 2024 +From: |@go@g|ne @end|ng |rom @jd@e@ (=?utf-8?B?SWFnbyBHaW7DqSBWw6F6cXVleg==?=) +Date: Wed, 4 Sep 2024 09:18:55 +0000 +Subject: [R] + fixed set.seed + kmeans output disagree on distinct platforms +In-Reply-To: <26328.7486.699492.779127@stat.math.ethz.ch> +References: + + <26328.7486.699492.779127@stat.math.ethz.ch> +Message-ID: + +Thanks both Bert and Martin, + +However exkmns2$centers is common in posit.cloud - LAPACK version 3.9.0- and in Windows -LAPACK 3.12.0-, while distinct with my Linux settings -LAPACK version 3.11.0- (I don't know the BLAS version used by R in windows). It is a bit strange... + +Iago + +________________________________ +De: Martin Maechler +Enviat el: dimecres, 4 de setembre de 2024 10:41 +Per a: Bert Gunter +A/c: Iago Gin? V?zquez ; r-help at r-project.org +Tema: Re: [R] fixed set.seed + kmeans output disagree on distinct platforms + +>>>>> Bert Gunter +>>>>> on Tue, 3 Sep 2024 23:32:25 -0700 writes: + + > I have no clue, but I did note that you are using different versions of + > BLAS/LAPACK on the different platforms. Could that be (part) of the issue? + +Good catch! My gut feeling would say "yes!" that is almost surely part of +the issue. + + > Cheers, + > Bert + +Additionally, careful reading of the help page (*before* any post ..) +would have shown + + Note: + + The clusters are numbered in the returned object, but they are a + _set_ and no ordering is implied. (Their apparent ordering may + differ by platform.) + + +Martin + + + + > On Tue, Sep 3, 2024 at 10:24?PM Iago Gin? V?zquez wrote: + + >> Hi all, + >> + >> I build a dataset processing in the same way the same data in Windows than + >> in Linux. + >> + >> The output of Windows processing is: + >> https://gitlab.com/iagogv/repdata/-/raw/main/exdata.csv?ref_type=heads + >> The output of Linux processing is: + >> https://gitlab.com/iagogv/repdata/-/raw/main/exdata2.csv?ref_type=heads + >> + >> exdata=as.matrix(read.csv(" + >> https://gitlab.com/iagogv/repdata/-/raw/main/exdata.csv?ref_type=heads", + >> header=FALSE)) + >> exdata2=as.matrix(read.csv(" + >> https://gitlab.com/iagogv/repdata/-/raw/main/exdata2.csv?ref_type=heads", + >> header=FALSE)) + >> + >> They are not identical (`identical(exdata,exdata2)` is FALSE), but they + >> are essentially equal (`all.equal(exdata,exdata2)` is TRUE). If I run + >> + >> set.seed(20232260) + >> exkmns <- kmeans(exdata, centers = 7, iter.max = 2000, nstart = 750) + >> + >> I get + >> + >> exkmns$centers + >> V1 V2 V3 V4 V5 V6 + >> 1 -0.4910731 -0.2662055 0.57928758 0.14267293 -0.03013791 0.106472717 + >> 2 0.5301237 0.2815620 -0.23898532 1.00979412 -0.26123328 0.068099931 + >> 3 0.2255298 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.107538855 + >> 4 -0.2616257 0.5680582 0.55387437 -0.09562789 -0.01706577 -0.028248679 + >> 5 -0.4820078 -0.1667370 -0.46533618 -0.05271446 0.05477352 0.005236259 + >> 6 0.6455994 -0.1396674 0.05988547 -0.15557399 0.62766365 0.031051986 + >> 7 0.1072127 0.5538876 -0.33117098 -0.43209203 -0.18646403 -0.081273130 + >> + >> both in Windows (1) and in Linux (2, 3) up to rows order. If I run in + >> Linux in my computer (2) + >> + >> set.seed(20232260) + >> exkmns2 <- kmeans(exdata2, centers = 7, iter.max = 2000, nstart = 750) + >> + >> then, I get + >> + >> exkmns2$centers + >> V1 V2 V3 V4 V5 V6 + >> 1 0.64559941 -0.1396674 0.05988547 -0.15557399 0.62766365 0.03105199 + >> 2 -0.26162573 0.5680582 0.55387437 -0.09562789 -0.01706577 -0.02824868 + >> 3 0.53012369 0.2815620 -0.23898532 1.00979412 -0.26123328 0.06809993 + >> 4 0.03409765 0.3492520 -0.36910409 -0.40721418 -0.21482793 0.03073180 + >> 5 -0.58527394 -0.1790337 -0.46778956 0.03573883 0.15473589 -0.07980379 + >> 6 -0.49107314 -0.2662055 0.57928758 0.14267293 -0.03013791 0.10647272 + >> 7 0.22552984 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.10753886 + >> + >> therefore, all rows essentially equal except for rows 5 and 7 of first + >> dataset (5 and 4 of second dataset). With a bit more detail: + >> + >> * + >> Row 0.2255298 -0.5165964 -0.02498471 -0.20438275 -0.41224195 -0.107538855 + >> belongs to exdata (and exdata2) and is center of both outputs + >> * + >> Row 0.1072127 0.5538876 -0.33117098 -0.43209203 -0.18646403 -0.081273130 + >> belongs to the dataset and it is only center of exdata output + >> * + >> Row -0.4820078 -0.1667370 -0.46533618 -0.05271446 0.05477352 0.005236259 + >> does not belong to the dataset and it is only center of exdata output + >> * + >> Row -0.58527394 -0.1790337 -0.46778956 0.03573883 0.15473589 -0.07980379 + >> belongs to the dataset and it is only center for exdata2 on Linux in my + >> computer + >> * + >> Row 0.03409765 0.3492520 -0.36910409 -0.40721418 -0.21482793 0.03073180 + >> does not belong to the dataset and it is only center for exdata2 on Linux + >> in my computer + >> * + >> All other 4 rows (1,2,4 and 6 of first output) do not belong to the + >> dataset and are common centers. + >> + >> Even, further, if I run + >> + >> set.seed(20232260) + >> exkmns <- kmeans(exdata, centers = 7, iter.max = 2000, nstart = 750) + >> + >> in posit.cloud (3), I get the same result than above. However, if I run + >> (both in posit.cloud or in Windows) + >> + >> set.seed(20232260) + >> exkmns2 <- kmeans(exdata2, centers = 7, iter.max = 2000, nstart = 750) + >> + >> then I get + >> + >> + >> exkmns2$centers + >> V1 V2 V3 V4 V5 V6 + >> 1 0.6426035 -0.1449498 0.05843435 -0.1527968 0.62943077 0.02984948 + >> 2 -0.4092382 -0.3740695 0.69597037 0.1956896 -0.05026200 -0.01453132 + >> 3 0.1072127 0.5538876 -0.33117098 -0.4320920 -0.18646403 -0.08127313 + >> 4 0.2255298 -0.5165964 -0.02498471 -0.2043827 -0.41224195 -0.10753886 + >> 5 0.5301237 0.2815620 -0.23898532 1.0097941 -0.26123328 0.06809993 + >> 6 -0.5223387 -0.1484517 -0.38982567 -0.0341488 0.06446446 0.03622056 + >> 7 -0.2701703 0.5263218 0.52942311 -0.1112202 -0.03460591 0.03577287 + >> + >> So only its rows 4 and 5 are common centers to both of previous outputs + >> and row 3 is common width exdata centers. + >> + >> Does all this have any sense? + >> + >> Thanks! + >> + >> Iago + >> + >> (1) + >> R version 4.4.1 (2024-06-14 ucrt) + >> Platform: x86_64-w64-mingw32/x64 + >> Running under: Windows 10 x64 (build 19045) + >> + >> Matrix products: default + >> + >> (2) + >> R version 4.4.1 (2024-06-14) + >> Platform: x86_64-pc-linux-gnu + >> Running under: Debian GNU/Linux 12 (bookworm) + >> + >> Matrix products: default + >> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 + >> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so; + >> LAPACK version 3.11.0 + >> + >> (3) + >> R version 4.4.1 (2024-06-14) + >> Platform: x86_64-pc-linux-gnu + >> Running under: Ubuntu 20.04.6 LTS + >> + >> Matrix products: default + >> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/ + >> libopenblasp-r0.3.8.so; LAPACK version 3.9.0 + >> + + [[alternative HTML version deleted]] + + +From @nupty@g| @end|ng |rom gm@||@com Wed Sep 4 15:16:18 2024 +From: @nupty@g| @end|ng |rom gm@||@com (Anupam Tyagi) +Date: Wed, 4 Sep 2024 18:46:18 +0530 +Subject: [R] dotchart and dotplot(lattice) plot with two/three conditioning + variables +Message-ID: + +Hello, I am trying to make a Cleaveland Dotplot with two, if possible +three, variables on the vertical axis. I was able to do it in Stata +with two variables, Year and Population (see graph at the link: +https://drive.google.com/file/d/1SiIfmmqk6IFa_OI5i26Ux1ZxkN2oek-o/view?usp=sharing +). I hope the link to the graph works. I have never tried this before. + +I want to make a similar (possibly better) graph in R. I tried several +ways to make it in R with dotchart() and dotplot(lattice). I have been +only partially successful thus far. I would like Year, Population and +popGroup on the vertical axis. If popGroup occupies too much space, +then I would like a gap between the groups of Cities and Villages, so +they can be seen as distinct "Populations". My code and a made-up data +are below (in actual data I have 18 categories in "Population", +instead of only six in the made-up data). How can I make this type of +graph? + +# Only for 2004-05. How to plot 2011-12 on the same plot? +dotchart(test$"X0_50"[test$"Year"=="2004-05"], labels=test$Population, +xlab = "Income Share ", + main = "Income shares of percentiles of population", xlim = c(12, 50)) +points(test$"X50_90"[test$"Year"=="2004-05"], 1:6, pch = 2) +points(test$"X90_100"[test$"Year"=="2004-05"], 1:6, pch = 16) +legend(x = "topleft", + legend = c("0-50%", "50-90%", "90-100%"), + pch = c(1,2, 16) +) + +# reorder so Year 2004-05 is plotted before Year 2011-12. This is not +plotting correctly for +# second and third variables. Gap between different Cities and +Villages is quite a bit. +test2 <- test[order(test$seqCode, test$Year, decreasing = T),] + +dotchart(test2$"X0_50", labels=test2$Year, xlab = "Income Share ", + main = "Income shares of percentiles of population", groups = +as.factor(test2$Population), xlim = c(12, 50)) +points(test2$"X50_90", 1:12, pch = 2) +points(test2$"X90_100", 1: 12, pch = 16) + + +# use lattice library +library(lattice) +dotplot(reorder(Population, -seqCode) ~ test$"X0_50" + test$"X50_90" + +test$"X90_100", data = test, auto.key = TRUE) + +testLong <- reshape(test, idvar = c("Population", "Year"), varying = list(5:7), + v.names = "ptile", direction = "long") + +dotplot(reorder(Population, -seqCode) ~ ptile | Year, data = testLong, +groups = time, auto.key = T) + +Dataframe is below using dput(). Dataframe is named "test" in my code. + +structure(list(seqCode = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, +4L, 5L, 6L), popGroup = c("City", "City", "City", "Village", +"Village", "Village", "City", "City", "City", "Village", "Village", +"Village"), Population = c("Dallas", "Boston", "Chicago", "Kip", +"Von", "Dan", "Dallas", "Boston", "Chicago", "Kip", "Von", "Dan" +), Year = c("2004-05", "2004-05", "2004-05", "2004-05", "2004-05", +"2004-05", "2011-12", "2011-12", "2011-12", "2011-12", "2011-12", +"2011-12"), X0_50 = c(15.47, 21.29, 18.04, 15.62, 18.89, 24.37, +17.43, 17.99, 18.04, 14.95, 16.33, 28.98), X50_90 = c(44.12, +43.25, 45.72, 46.15, 43.84, 46.24, 44.39, 44.08, 43.62, 42.89, +44.57, 47.14), X90_100 = c(40.42, 35.47, 36.24, 38.24, 37.27, +29.39, 38.18, 37.93, 38.34, 42.16, 39.11, 23.88)), class = +"data.frame", row.names = c(NA, +-12L)) + +-- +Anupam. + + +From deep@y@n@@@rk@r @end|ng |rom gm@||@com Wed Sep 4 16:31:40 2024 +From: deep@y@n@@@rk@r @end|ng |rom gm@||@com (Deepayan Sarkar) +Date: Wed, 4 Sep 2024 20:01:40 +0530 +Subject: [R] + dotchart and dotplot(lattice) plot with two/three conditioning + variables +In-Reply-To: +References: +Message-ID: + +For lattice::dotplot(), you are close; this is more like the layout you +want: + +dotplot(Year ~ ptile | reorder(Population, ptile, mean), testLong, + groups = c("0-50", "50-90", "90-100")[time], + layout = c(1, NA), + par.settings = simpleTheme(pch = 16), auto.key = TRUE) + +dotchart() works better with tables, but unfortunately it doesn't seem to +handle more than two dimensions, so you can only get one group at a time: + +xtabs(ptile ~ Year + Population, testLong, subset = time == 1) |> +dotchart(pch = 16) + +This seems like something that should not be too difficult to improve. + +Best, +-Deepayan + + +On Wed, 4 Sept 2024 at 18:46, Anupam Tyagi wrote: + +> Hello, I am trying to make a Cleaveland Dotplot with two, if possible +> three, variables on the vertical axis. I was able to do it in Stata +> with two variables, Year and Population (see graph at the link: +> +> https://drive.google.com/file/d/1SiIfmmqk6IFa_OI5i26Ux1ZxkN2oek-o/view?usp=sharing +> ). I hope the link to the graph works. I have never tried this before. +> +> I want to make a similar (possibly better) graph in R. I tried several +> ways to make it in R with dotchart() and dotplot(lattice). I have been +> only partially successful thus far. I would like Year, Population and +> popGroup on the vertical axis. If popGroup occupies too much space, +> then I would like a gap between the groups of Cities and Villages, so +> they can be seen as distinct "Populations". My code and a made-up data +> are below (in actual data I have 18 categories in "Population", +> instead of only six in the made-up data). How can I make this type of +> graph? +> +> # Only for 2004-05. How to plot 2011-12 on the same plot? +> dotchart(test$"X0_50"[test$"Year"=="2004-05"], labels=test$Population, +> xlab = "Income Share ", +> main = "Income shares of percentiles of population", xlim = c(12, +> 50)) +> points(test$"X50_90"[test$"Year"=="2004-05"], 1:6, pch = 2) +> points(test$"X90_100"[test$"Year"=="2004-05"], 1:6, pch = 16) +> legend(x = "topleft", +> legend = c("0-50%", "50-90%", "90-100%"), +> pch = c(1,2, 16) +> ) +> +> # reorder so Year 2004-05 is plotted before Year 2011-12. This is not +> plotting correctly for +> # second and third variables. Gap between different Cities and +> Villages is quite a bit. +> test2 <- test[order(test$seqCode, test$Year, decreasing = T),] +> +> dotchart(test2$"X0_50", labels=test2$Year, xlab = "Income Share ", +> main = "Income shares of percentiles of population", groups = +> as.factor(test2$Population), xlim = c(12, 50)) +> points(test2$"X50_90", 1:12, pch = 2) +> points(test2$"X90_100", 1: 12, pch = 16) +> +> +> # use lattice library +> library(lattice) +> dotplot(reorder(Population, -seqCode) ~ test$"X0_50" + test$"X50_90" + +> test$"X90_100", data = test, auto.key = TRUE) +> +> testLong <- reshape(test, idvar = c("Population", "Year"), varying = +> list(5:7), +> v.names = "ptile", direction = "long") +> +> dotplot(reorder(Population, -seqCode) ~ ptile | Year, data = testLong, +> groups = time, auto.key = T) +> +> Dataframe is below using dput(). Dataframe is named "test" in my code. +> +> structure(list(seqCode = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, +> 4L, 5L, 6L), popGroup = c("City", "City", "City", "Village", +> "Village", "Village", "City", "City", "City", "Village", "Village", +> "Village"), Population = c("Dallas", "Boston", "Chicago", "Kip", +> "Von", "Dan", "Dallas", "Boston", "Chicago", "Kip", "Von", "Dan" +> ), Year = c("2004-05", "2004-05", "2004-05", "2004-05", "2004-05", +> "2004-05", "2011-12", "2011-12", "2011-12", "2011-12", "2011-12", +> "2011-12"), X0_50 = c(15.47, 21.29, 18.04, 15.62, 18.89, 24.37, +> 17.43, 17.99, 18.04, 14.95, 16.33, 28.98), X50_90 = c(44.12, +> 43.25, 45.72, 46.15, 43.84, 46.24, 44.39, 44.08, 43.62, 42.89, +> 44.57, 47.14), X90_100 = c(40.42, 35.47, 36.24, 38.24, 37.27, +> 29.39, 38.18, 37.93, 38.34, 42.16, 39.11, 23.88)), class = +> "data.frame", row.names = c(NA, +> -12L)) +> +> -- +> Anupam. +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + [[alternative HTML version deleted]] + + +From d@n|e|obo9976 @end|ng |rom gm@||@com Wed Sep 4 20:54:15 2024 +From: d@n|e|obo9976 @end|ng |rom gm@||@com (Daniel Lobo) +Date: Thu, 5 Sep 2024 00:24:15 +0530 +Subject: [R] Calculation of VCV matrix of estimated coefficient +In-Reply-To: +References: +Message-ID: + +Hi, + +I am trying to replicate the R's result for VCV matrix of estimated +coefficients from linear model as below + +data(mtcars) +model <- lm(mpg~disp+hp, data=mtcars) +model_summ <-summary(model) +MSE = mean(model_summ$residuals^2) +vcov(model) + +Now I want to calculate the same thing manually, + +library(dplyr) +X = as.matrix(mtcars[, c('disp', 'hp')] %>% mutate(Intercept = 1)); +solve(t(X) %*% X) * MSE + +Unfortunately they do not match. + +Could you please help where I made mistake, if any. + +Thanks + + +From bbo|ker @end|ng |rom gm@||@com Wed Sep 4 22:14:09 2024 +From: bbo|ker @end|ng |rom gm@||@com (Ben Bolker) +Date: Wed, 4 Sep 2024 16:14:09 -0400 +Subject: [R] Calculation of VCV matrix of estimated coefficient +In-Reply-To: +References: + +Message-ID: + +The number you need for MSE is + +sum(residuals(model)^2)/df.residual(model) + +On Wed, Sep 4, 2024 at 3:34?PM Daniel Lobo wrote: +> +> Hi, +> +> I am trying to replicate the R's result for VCV matrix of estimated +> coefficients from linear model as below +> +> data(mtcars) +> model <- lm(mpg~disp+hp, data=mtcars) +> model_summ <-summary(model) +> MSE = mean(model_summ$residuals^2) +> vcov(model) +> +> Now I want to calculate the same thing manually, +> +> library(dplyr) +> X = as.matrix(mtcars[, c('disp', 'hp')] %>% mutate(Intercept = 1)); +> solve(t(X) %*% X) * MSE +> +> Unfortunately they do not match. +> +> Could you please help where I made mistake, if any. +> +> Thanks +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + + +From p@ych@o||u @end|ng |rom gm@||@com Tue Sep 3 18:30:54 2024 +From: p@ych@o||u @end|ng |rom gm@||@com (Chao Liu) +Date: Tue, 3 Sep 2024 12:30:54 -0400 +Subject: [R] [R-pkgs] Goodreader: Scrape and Analyze 'Goodreads' Book Data +Message-ID: + +Dear R Users, + +I am pleased to announce that Goodreader 0.1.1 is now available on CRAN. + +Goodreader offers a toolkit for scraping and analyzing book data from +Goodreads. Users can search for books, scrape detailed information and +reviews, perform sentiment analysis on reviews, and conduct topic modeling. + +Here?s a quick overview of how to use Goodreader: +# Search for books +AI_df <- search_goodreads(search_term = "artificial intelligence", +search_in = "title", num_books = 10, sort_by = "ratings") + +# Retrieve Book IDs and save them to a text file +get_book_ids(input_data = AI_df, file_name = "AI_books.txt") + +# Get book-related information +scrape_books(book_ids_path = "AI_books.txt") + +# Scrape book reviews +scrape_reviews(book_ids_path = "AI_books.txt", num_reviews = 10) + +For more details, please visit: https://liu-chao.site/Goodreader/ + +Best regards, + +Chao Liu + + [[alternative HTML version deleted]] + +_______________________________________________ +R-packages mailing list +R-packages at r-project.org +https://stat.ethz.ch/mailman/listinfo/r-packages + + +From gdr@|@m@ @end|ng |rom x@4@||@n| Thu Sep 5 13:05:35 2024 +From: gdr@|@m@ @end|ng |rom x@4@||@n| (Gerrit Draisma) +Date: Thu, 5 Sep 2024 13:05:35 +0200 +Subject: [R] lattice log scale labels. +Message-ID: + +Dear R-helpers, + +In the plot below I would like to have labels at positions 2^(3*(0:10)), +and keep the labels in the exponential format. +I tried using yscale.components.default. + +*This* gives the right format of the labels: +-------- + > yscale.components.default(lim= c(0,30),log=2) +.... +$num.limit +[1] 0 30 +... +[1] 0 5 10 15 20 25 30 +... +$left$labels$labels +[1] "2^0" "2^5" "2^10" "2^15" "2^20" "2^25" "2^30" +-------- + +and *this* gives the right locations +-------- + > yscale.components.default(lim= c(0,30),log=2,at=2^(3*(0:10))) +$num.limit +[1] 0 30 +... + [1] 0 3 6 9 12 15 18 21 24 27 30 +... +$left$labels$labels + [1] "1" "8" "64" "512" "4096" + [6] "32768" "262144" "2097152" "16777216" "134217728" +[11] "1073741824" +-------- + +How can I get the format in the first example at the locations of the +second? + +Thanks, +Gerrit + +------------- +x <- read.csv(text=" +n,c,t,u +4,1,2,1 +8,28,8,1 +12,495,42,3 +16,8008,256,7 +20,125970,1680,31 +24,1961256,11640,138 +28,30421755,83776,808 +32,471435600,620576,4956 +36,7307872110,4700880,33719") +library(lattice) +yscale.components.log2 <- function(...){ + ans <- yscale.components.default(...) + ans$left$labels$labels <- + parse(text = ans$left$labels$labels) + ans +} + + +# pdf("tangle_mathFig06.pdf") +xyplot(c+t+u~n,data=x,type="b", xlab="Size", ylab="Number of tangles", + scales=list(x=list(at=4*(1:9)),y=list(log=2)), + yscale.components=yscale.components.log2, + auto.key=list(columns=3,text=c("(choose m n)","tangles","unique +tangles"), + points=FALSE,lines=TRUE)) +# dev.off() + + +From ggrothend|eck @end|ng |rom gm@||@com Thu Sep 5 16:36:54 2024 +From: ggrothend|eck @end|ng |rom gm@||@com (Gabor Grothendieck) +Date: Thu, 5 Sep 2024 10:36:54 -0400 +Subject: [R] Calculation of VCV matrix of estimated coefficient +In-Reply-To: +References: + +Message-ID: + +sigma(model)^2 will give the correct MSE. Also note that your model +matrix has intercept at +the end whereas vcov will have it at the beginning so you will need to +permute the rows +and columns to get them to be the same/ + +On Wed, Sep 4, 2024 at 3:34?PM Daniel Lobo wrote: +> +> Hi, +> +> I am trying to replicate the R's result for VCV matrix of estimated +> coefficients from linear model as below +> +> data(mtcars) +> model <- lm(mpg~disp+hp, data=mtcars) +> model_summ <-summary(model) +> MSE = mean(model_summ$residuals^2) +> vcov(model) +> +> Now I want to calculate the same thing manually, +> +> library(dplyr) +> X = as.matrix(mtcars[, c('disp', 'hp')] %>% mutate(Intercept = 1)); +> solve(t(X) %*% X) * MSE +> +> Unfortunately they do not match. +> +> Could you please help where I made mistake, if any. +> +> Thanks +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + + + +-- +Statistics & Software Consulting +GKX Group, GKX Associates Inc. +tel: 1-877-GKX-GROUP +email: ggrothendieck at gmail.com + + +From bgunter@4567 @end|ng |rom gm@||@com Thu Sep 5 16:54:00 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Thu, 5 Sep 2024 07:54:00 -0700 +Subject: [R] lattice log scale labels. +In-Reply-To: +References: +Message-ID: + +Do the "at" and "labels" components of the "scales" list argument to xyplot +not do what you want? + +Cheers, +Bert + +On Thu, Sep 5, 2024 at 4:05?AM Gerrit Draisma wrote: + +> Dear R-helpers, +> +> In the plot below I would like to have labels at positions 2^(3*(0:10)), +> and keep the labels in the exponential format. +> I tried using yscale.components.default. +> +> *This* gives the right format of the labels: +> -------- +> > yscale.components.default(lim= c(0,30),log=2) +> .... +> $num.limit +> [1] 0 30 +> ... +> [1] 0 5 10 15 20 25 30 +> ... +> $left$labels$labels +> [1] "2^0" "2^5" "2^10" "2^15" "2^20" "2^25" "2^30" +> -------- +> +> and *this* gives the right locations +> -------- +> > yscale.components.default(lim= c(0,30),log=2,at=2^(3*(0:10))) +> $num.limit +> [1] 0 30 +> ... +> [1] 0 3 6 9 12 15 18 21 24 27 30 +> ... +> $left$labels$labels +> [1] "1" "8" "64" "512" "4096" +> [6] "32768" "262144" "2097152" "16777216" "134217728" +> [11] "1073741824" +> -------- +> +> How can I get the format in the first example at the locations of the +> second? +> +> Thanks, +> Gerrit +> +> ------------- +> x <- read.csv(text=" +> n,c,t,u +> 4,1,2,1 +> 8,28,8,1 +> 12,495,42,3 +> 16,8008,256,7 +> 20,125970,1680,31 +> 24,1961256,11640,138 +> 28,30421755,83776,808 +> 32,471435600,620576,4956 +> 36,7307872110,4700880,33719") +> library(lattice) +> yscale.components.log2 <- function(...){ +> ans <- yscale.components.default(...) +> ans$left$labels$labels <- +> parse(text = ans$left$labels$labels) +> ans +> } +> +> +> # pdf("tangle_mathFig06.pdf") +> xyplot(c+t+u~n,data=x,type="b", xlab="Size", ylab="Number of tangles", +> scales=list(x=list(at=4*(1:9)),y=list(log=2)), +> yscale.components=yscale.components.log2, +> auto.key=list(columns=3,text=c("(choose m n)","tangles","unique +> tangles"), +> points=FALSE,lines=TRUE)) +> # dev.off() +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + [[alternative HTML version deleted]] + + +From gdr@|@m@ @end|ng |rom x@4@||@n| Thu Sep 5 19:01:24 2024 +From: gdr@|@m@ @end|ng |rom x@4@||@n| (Gerrit Draisma) +Date: Thu, 5 Sep 2024 19:01:24 +0200 +Subject: [R] lattice log scale labels. +In-Reply-To: +References: + +Message-ID: <34aa1fe5-0160-4c34-8844-0145dc681b08@xs4all.nl> + +Thanks Greg and Bert for solving my problem. +This gives what I want: +---------------- + +myticks <- 2^(3*(0:11)) +mylabels <- parse(text=paste0("2^",log2(myticks))) + +xyplot(c+t+u~n,data=x,type="b", xlab="Size", ylab="Number of tangles", + scales=list(x=list(at=4*(1:9)),y=list(log=2,at=myticks,labels=mylabels), + auto.key=list(columns=3,text=c("(choose m n)","tangles","unique + tangles"), points=FALSE,lines=TRUE))) +---------------- +Solved! +Gerrit + +Op 05-09-2024 om 16:54 schreef Bert Gunter: +> Do the "at" and "labels" components of the "scales" list argument to +> xyplot not do what you want? +> +> Cheers, +> Bert +> +> On Thu, Sep 5, 2024 at 4:05?AM Gerrit Draisma > wrote: +> +> Dear R-helpers, +> +> In the plot below I would like to have labels at positions 2^(3*(0:10)), +> and keep the labels in the exponential format. +> I tried using yscale.components.default. +> +> *This* gives the right format of the labels: +> -------- +> ?> yscale.components.default(lim= c(0,30),log=2) +> .... +> $num.limit +> [1]? 0 30 +> ... +> [1]? 0? 5 10 15 20 25 30 +> ... +> $left$labels$labels +> [1] "2^0"? "2^5"? "2^10" "2^15" "2^20" "2^25" "2^30" +> -------- +> +> and *this* gives the right locations +> -------- +> ?> yscale.components.default(lim= c(0,30),log=2,at=2^(3*(0:10))) +> $num.limit +> [1]? 0 30 +> ... +> ? [1]? 0? 3? 6? 9 12 15 18 21 24 27 30 +> ... +> $left$labels$labels +> ? [1] "1"? ? ? ? ? "8"? ? ? ? ? "64"? ? ? ? ?"512"? ? ? ? "4096" +> ? [6] "32768"? ? ? "262144"? ? ?"2097152"? ? "16777216"? ?"134217728" +> [11] "1073741824" +> -------- +> +> How can I get the format in the first example at the locations of the +> second? +> +> Thanks, +> Gerrit +> +> ------------- +> x <- read.csv(text=" +> n,c,t,u +> 4,1,2,1 +> 8,28,8,1 +> 12,495,42,3 +> 16,8008,256,7 +> 20,125970,1680,31 +> 24,1961256,11640,138 +> 28,30421755,83776,808 +> 32,471435600,620576,4956 +> 36,7307872110,4700880,33719") +> library(lattice) +> yscale.components.log2 <- function(...){ +> ? ? ?ans <- yscale.components.default(...) +> ? ? ?ans$left$labels$labels <- +> ? ? ? ? ?parse(text = ans$left$labels$labels) +> ? ? ?ans +> } +> +> +> # pdf("tangle_mathFig06.pdf") +> xyplot(c+t+u~n,data=x,type="b", xlab="Size", ylab="Number of tangles", +> ? ? scales=list(x=list(at=4*(1:9)),y=list(log=2)), +> ? ? yscale.components=yscale.components.log2, +> ? ? auto.key=list(columns=3,text=c("(choose m n)","tangles","unique +> tangles"), +> ? ? points=FALSE,lines=TRUE)) +> # dev.off() +> +> ______________________________________________ +> R-help at r-project.org mailing list -- +> To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help mailman/listinfo/r-help> +> PLEASE do read the posting guide https://www.R-project.org/posting- +> guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + +From |eo@m@d@ @end|ng |rom @yon|c@eu Thu Sep 5 22:23:25 2024 +From: |eo@m@d@ @end|ng |rom @yon|c@eu (Leo Mada) +Date: Thu, 5 Sep 2024 20:23:25 +0000 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +Message-ID: + +Dear R Users, + +Is this desired behaviour? +I presume it's a bug. + +atan(1i) +# 0+Infi + +tan(atan(1i)) +# 0+1i + +atan(1i) / 5 +# NaN+Infi + +There were some changes in handling of complex numbers. But it looks like a bug. + +Sincerely, + +Leonard + + + [[alternative HTML version deleted]] + + +From bgunter@4567 @end|ng |rom gm@||@com Thu Sep 5 23:38:15 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Thu, 5 Sep 2024 14:38:15 -0700 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: +Message-ID: + +What version of R are you using and on what platform? + +I get: +> atan(1i) +[1] 0.7853982+Infi +> atan(1i)/5 +[1] NaN+Infi + +on: +R version 4.4.1 (2024-06-14) +Platform: aarch64-apple-darwin20 +Running under: macOS Sonoma 14.6.1 + +-- Bert + +On Thu, Sep 5, 2024 at 1:23?PM Leo Mada via R-help +wrote: + +> Dear R Users, +> +> Is this desired behaviour? +> I presume it's a bug. +> +> atan(1i) +> # 0+Infi +> +> tan(atan(1i)) +> # 0+1i +> +> atan(1i) / 5 +> # NaN+Infi +> +> There were some changes in handling of complex numbers. But it looks like +> a bug. +> +> Sincerely, +> +> Leonard +> +> +> [[alternative HTML version deleted]] +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + [[alternative HTML version deleted]] + + +From murdoch@dunc@n @end|ng |rom gm@||@com Thu Sep 5 23:40:53 2024 +From: murdoch@dunc@n @end|ng |rom gm@||@com (Duncan Murdoch) +Date: Thu, 5 Sep 2024 17:40:53 -0400 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: +Message-ID: + +On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +> Dear R Users, +> +> Is this desired behaviour? +> I presume it's a bug. +> +> atan(1i) +> # 0+Infi +> +> tan(atan(1i)) +> # 0+1i +> +> atan(1i) / 5 +> # NaN+Infi + +There's no need to involve atan() and tan() in this: + + > (0+Inf*1i)/5 +[1] NaN+Infi + +Why do you think this is a bug? + +Duncan Murdoch + + +From bgunter@4567 @end|ng |rom gm@||@com Fri Sep 6 00:12:06 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Thu, 5 Sep 2024 15:12:06 -0700 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + +Message-ID: + +Perhaps + +> Inf*1i +[1] NaN+Infi + +clarifies why it is *not* a bug. +(Boy, did that jog some long dusty math memories :-) ) + +-- Bert + +On Thu, Sep 5, 2024 at 2:48?PM Duncan Murdoch +wrote: + +> On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +> > Dear R Users, +> > +> > Is this desired behaviour? +> > I presume it's a bug. +> > +> > atan(1i) +> > # 0+Infi +> > +> > tan(atan(1i)) +> > # 0+1i +> > +> > atan(1i) / 5 +> > # NaN+Infi +> +> There's no need to involve atan() and tan() in this: +> +> > (0+Inf*1i)/5 +> [1] NaN+Infi +> +> Why do you think this is a bug? +> +> Duncan Murdoch +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + [[alternative HTML version deleted]] + + +From |eo@m@d@ @end|ng |rom @yon|c@eu Fri Sep 6 00:20:20 2024 +From: |eo@m@d@ @end|ng |rom @yon|c@eu (Leo Mada) +Date: Thu, 5 Sep 2024 22:20:20 +0000 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + +Message-ID: + +Dear Bert, + +These behave like real divisions/multiplications: +complex(re=Inf, im = Inf) * 5 +# Inf+Infi +complex(re=-Inf, im = Inf) * 5 +# -Inf+Infi + +The real division / multiplication should be faster and also is well behaved. I was expecting R to do the real division/multiplication on a complex number. Which R actually does for these very particular cases; but not when only Im(x) is Inf. + +Sincerely, + +Leonard + +________________________________ +From: Bert Gunter +Sent: Friday, September 6, 2024 1:12 AM +To: Duncan Murdoch +Cc: Leo Mada ; r-help at r-project.org +Subject: Re: [R] BUG: atan(1i) / 5 = NaN+Infi ? + +Perhaps + +> Inf*1i +[1] NaN+Infi + +clarifies why it is *not* a bug. +(Boy, did that jog some long dusty math memories :-) ) + +-- Bert + +On Thu, Sep 5, 2024 at 2:48?PM Duncan Murdoch > wrote: +On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +> Dear R Users, +> +> Is this desired behaviour? +> I presume it's a bug. +> +> atan(1i) +> # 0+Infi +> +> tan(atan(1i)) +> # 0+1i +> +> atan(1i) / 5 +> # NaN+Infi + +There's no need to involve atan() and tan() in this: + + > (0+Inf*1i)/5 +[1] NaN+Infi + +Why do you think this is a bug? + +Duncan Murdoch + +______________________________________________ +R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +https://stat.ethz.ch/mailman/listinfo/r-help +PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +and provide commented, minimal, self-contained, reproducible code. + + [[alternative HTML version deleted]] + + +From bgunter@4567 @end|ng |rom gm@||@com Fri Sep 6 00:38:33 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Thu, 5 Sep 2024 15:38:33 -0700 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + + +Message-ID: + +> complex(real = 0, imaginary = Inf) +[1] 0+Infi + +> Inf*1i +[1] NaN+Infi + +>> complex(real = 0, imaginary = Inf)/5 +[1] NaN+Infi + +See the Note in ?complex for the explanation, I think. Duncan can correct +if I'm wrong. + +-- Bert + +On Thu, Sep 5, 2024 at 3:20?PM Leo Mada wrote: + +> Dear Bert, +> +> These behave like real divisions/multiplications: +> complex(re=Inf, im = Inf) * 5 +> # Inf+Infi +> complex(re=-Inf, im = Inf) * 5 +> # -Inf+Infi +> +> The real division / multiplication should be faster and also is well +> behaved. I was expecting R to do the real division/multiplication on a +> complex number. Which R actually does for these very particular cases; but +> not when only Im(x) is Inf. +> +> Sincerely, +> +> Leonard +> +> ------------------------------ +> *From:* Bert Gunter +> *Sent:* Friday, September 6, 2024 1:12 AM +> *To:* Duncan Murdoch +> *Cc:* Leo Mada ; r-help at r-project.org < +> r-help at r-project.org> +> *Subject:* Re: [R] BUG: atan(1i) / 5 = NaN+Infi ? +> +> Perhaps +> +> > Inf*1i +> [1] NaN+Infi +> +> clarifies why it is *not* a bug. +> (Boy, did that jog some long dusty math memories :-) ) +> +> -- Bert +> +> On Thu, Sep 5, 2024 at 2:48?PM Duncan Murdoch +> wrote: +> +> On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +> > Dear R Users, +> > +> > Is this desired behaviour? +> > I presume it's a bug. +> > +> > atan(1i) +> > # 0+Infi +> > +> > tan(atan(1i)) +> > # 0+1i +> > +> > atan(1i) / 5 +> > # NaN+Infi +> +> There's no need to involve atan() and tan() in this: +> +> > (0+Inf*1i)/5 +> [1] NaN+Infi +> +> Why do you think this is a bug? +> +> Duncan Murdoch +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> +> + + [[alternative HTML version deleted]] + + +From jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ Fri Sep 6 01:06:57 2024 +From: jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ (Jeff Newmiller) +Date: Thu, 05 Sep 2024 16:06:57 -0700 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + + + +Message-ID: + +atan(1i) -> 0 + Inf i +complex(1/5) -> 0.2 + 0i +atan(1i) -> (0 + Inf i) * (0.2 + 0i) +-> 0*0.2 + 0*0i + Inf i * 0.2 + Inf i * 0i +infinity times zero is undefined +-> 0 + 0i + Inf i + NaN * i^2 +-> 0 + 0i + Inf i - NaN +-> NaN + Inf i + +I am not sure how complex arithmetic could arrive at another answer. + +I advise against messing with infinities... use atan2() if you don't actually need complex arithmetic. + +On September 5, 2024 3:38:33 PM PDT, Bert Gunter wrote: +>> complex(real = 0, imaginary = Inf) +>[1] 0+Infi +> +>> Inf*1i +>[1] NaN+Infi +> +>>> complex(real = 0, imaginary = Inf)/5 +>[1] NaN+Infi +> +>See the Note in ?complex for the explanation, I think. Duncan can correct +>if I'm wrong. +> +>-- Bert +> +>On Thu, Sep 5, 2024 at 3:20?PM Leo Mada wrote: +> +>> Dear Bert, +>> +>> These behave like real divisions/multiplications: +>> complex(re=Inf, im = Inf) * 5 +>> # Inf+Infi +>> complex(re=-Inf, im = Inf) * 5 +>> # -Inf+Infi +>> +>> The real division / multiplication should be faster and also is well +>> behaved. I was expecting R to do the real division/multiplication on a +>> complex number. Which R actually does for these very particular cases; but +>> not when only Im(x) is Inf. +>> +>> Sincerely, +>> +>> Leonard +>> +>> ------------------------------ +>> *From:* Bert Gunter +>> *Sent:* Friday, September 6, 2024 1:12 AM +>> *To:* Duncan Murdoch +>> *Cc:* Leo Mada ; r-help at r-project.org < +>> r-help at r-project.org> +>> *Subject:* Re: [R] BUG: atan(1i) / 5 = NaN+Infi ? +>> +>> Perhaps +>> +>> > Inf*1i +>> [1] NaN+Infi +>> +>> clarifies why it is *not* a bug. +>> (Boy, did that jog some long dusty math memories :-) ) +>> +>> -- Bert +>> +>> On Thu, Sep 5, 2024 at 2:48?PM Duncan Murdoch +>> wrote: +>> +>> On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +>> > Dear R Users, +>> > +>> > Is this desired behaviour? +>> > I presume it's a bug. +>> > +>> > atan(1i) +>> > # 0+Infi +>> > +>> > tan(atan(1i)) +>> > # 0+1i +>> > +>> > atan(1i) / 5 +>> > # NaN+Infi +>> +>> There's no need to involve atan() and tan() in this: +>> +>> > (0+Inf*1i)/5 +>> [1] NaN+Infi +>> +>> Why do you think this is a bug? +>> +>> Duncan Murdoch +>> +>> ______________________________________________ +>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>> https://stat.ethz.ch/mailman/listinfo/r-help +>> PLEASE do read the posting guide +>> https://www.R-project.org/posting-guide.html +>> and provide commented, minimal, self-contained, reproducible code. +>> +>> +> +> [[alternative HTML version deleted]] +> +>______________________________________________ +>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>https://stat.ethz.ch/mailman/listinfo/r-help +>PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +>and provide commented, minimal, self-contained, reproducible code. + +-- +Sent from my phone. Please excuse my brevity. + + +From murdoch@dunc@n @end|ng |rom gm@||@com Fri Sep 6 01:40:35 2024 +From: murdoch@dunc@n @end|ng |rom gm@||@com (Duncan Murdoch) +Date: Thu, 5 Sep 2024 19:40:35 -0400 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + +Message-ID: <73d502d2-d6f3-4dde-bcbf-e520666b9036@gmail.com> + +On 2024-09-05 6:12 p.m., Leo Mada wrote: +> Dear Duncan, +> +> Here is also the missing information: +> R version 4.4.1 (2024-06-14 ucrt) +> Platform: x86_64-w64-mingw32/x64 +> Running under: Windows 10 x64 (build 19045) +> +> Regarding the results: +> atan(1i) +> #?0+Infi +> Re(atan(1i)) +> # 0 +> Im(atan(1i)) +> #? Inf +> +> 0 + Inf i is a valid complex number: +> tan(atan(1i)) +> # 0+1i +> +> Inf / 5 +> # Inf +> +> Note: atan(1i) / 5 should have generated 0 + Inf * 1i; even the explicit +> complex number fails: +> complex(re=0, im = Inf) / 5 +> # NaN+Infi +> complex(re=Inf, im = Inf) / 5 +> # Inf+Infi +> +> I presume that R tries to do the complex division, although the real +> division is well defined. + +I imagine that what happens is that when one operand is complex, both +are coerced to complex and the operation is carried out. I had assumed +this was documented in ?complex, but I don't see it there. Maybe it +should be. + +If you want z/5 to be carried out using the correct mathematical +approach, you'll probably have to define it yourself. For example, + + CxByReal <- function(num, denom) { + if (is.complex(denom)) stop("this is for a real denominator!") + complex(real = Re(num)/denom, imaginary = Im(num)/denom) + } + + CxByReal(complex(real=0, imaginary=Inf), 5) + # [1] 0+Infi + +Duncan Murdoch + +> +> Sincerely, +> +> Leonard +> +> ------------------------------------------------------------------------ +> *From:* Duncan Murdoch +> *Sent:* Friday, September 6, 2024 12:40 AM +> *To:* Leo Mada ; r-help at r-project.org +> +> *Subject:* Re: [R] BUG: atan(1i) / 5 = NaN+Infi ? +> On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +>> Dear R Users, +>> +>> Is this desired behaviour? +>> I presume it's a bug. +>> +>> atan(1i) +>> # 0+Infi +>> +>> tan(atan(1i)) +>> # 0+1i +>> +>> atan(1i) / 5 +>> # NaN+Infi +> +> There's no need to involve atan() and tan() in this: +> +> ?> (0+Inf*1i)/5 +> [1] NaN+Infi +> +> Why do you think this is a bug? +> +> Duncan Murdoch +> + + +From bgunter@4567 @end|ng |rom gm@||@com Fri Sep 6 01:55:09 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Thu, 5 Sep 2024 16:55:09 -0700 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + + + + +Message-ID: + +> x <- complex(imag = Inf) +> x +[1] 0+Infi +> x*1 +[1] NaN+Infi +> x+0 +[1] 0+Infi + +R does the addition and subtraction "coordinatewise"; the C library handles +everything else. This results in 2 different ways the point at infinity is +printed. + +(Correction requested if this is wrong) + +Bert + +On Thu, Sep 5, 2024 at 4:07?PM Jeff Newmiller via R-help < +r-help at r-project.org> wrote: + +> atan(1i) -> 0 + Inf i +> complex(1/5) -> 0.2 + 0i +> atan(1i) -> (0 + Inf i) * (0.2 + 0i) +> -> 0*0.2 + 0*0i + Inf i * 0.2 + Inf i * 0i +> infinity times zero is undefined +> -> 0 + 0i + Inf i + NaN * i^2 +> -> 0 + 0i + Inf i - NaN +> -> NaN + Inf i +> +> I am not sure how complex arithmetic could arrive at another answer. +> +> I advise against messing with infinities... use atan2() if you don't +> actually need complex arithmetic. +> +> On September 5, 2024 3:38:33 PM PDT, Bert Gunter +> wrote: +> >> complex(real = 0, imaginary = Inf) +> >[1] 0+Infi +> > +> >> Inf*1i +> >[1] NaN+Infi +> > +> >>> complex(real = 0, imaginary = Inf)/5 +> >[1] NaN+Infi +> > +> >See the Note in ?complex for the explanation, I think. Duncan can correct +> >if I'm wrong. +> > +> >-- Bert +> > +> >On Thu, Sep 5, 2024 at 3:20?PM Leo Mada wrote: +> > +> >> Dear Bert, +> >> +> >> These behave like real divisions/multiplications: +> >> complex(re=Inf, im = Inf) * 5 +> >> # Inf+Infi +> >> complex(re=-Inf, im = Inf) * 5 +> >> # -Inf+Infi +> >> +> >> The real division / multiplication should be faster and also is well +> >> behaved. I was expecting R to do the real division/multiplication on a +> >> complex number. Which R actually does for these very particular cases; +> but +> >> not when only Im(x) is Inf. +> >> +> >> Sincerely, +> >> +> >> Leonard +> >> +> >> ------------------------------ +> >> *From:* Bert Gunter +> >> *Sent:* Friday, September 6, 2024 1:12 AM +> >> *To:* Duncan Murdoch +> >> *Cc:* Leo Mada ; r-help at r-project.org < +> >> r-help at r-project.org> +> >> *Subject:* Re: [R] BUG: atan(1i) / 5 = NaN+Infi ? +> >> +> >> Perhaps +> >> +> >> > Inf*1i +> >> [1] NaN+Infi +> >> +> >> clarifies why it is *not* a bug. +> >> (Boy, did that jog some long dusty math memories :-) ) +> >> +> >> -- Bert +> >> +> >> On Thu, Sep 5, 2024 at 2:48?PM Duncan Murdoch > +> >> wrote: +> >> +> >> On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +> >> > Dear R Users, +> >> > +> >> > Is this desired behaviour? +> >> > I presume it's a bug. +> >> > +> >> > atan(1i) +> >> > # 0+Infi +> >> > +> >> > tan(atan(1i)) +> >> > # 0+1i +> >> > +> >> > atan(1i) / 5 +> >> > # NaN+Infi +> >> +> >> There's no need to involve atan() and tan() in this: +> >> +> >> > (0+Inf*1i)/5 +> >> [1] NaN+Infi +> >> +> >> Why do you think this is a bug? +> >> +> >> Duncan Murdoch +> >> +> >> ______________________________________________ +> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> >> https://stat.ethz.ch/mailman/listinfo/r-help +> >> PLEASE do read the posting guide +> >> https://www.R-project.org/posting-guide.html +> >> and provide commented, minimal, self-contained, reproducible code. +> >> +> >> +> > +> > [[alternative HTML version deleted]] +> > +> >______________________________________________ +> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> >https://stat.ethz.ch/mailman/listinfo/r-help +> >PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> >and provide commented, minimal, self-contained, reproducible code. +> +> -- +> Sent from my phone. Please excuse my brevity. +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + [[alternative HTML version deleted]] + + +From cry@n @end|ng |rom b|ngh@mton@edu Fri Sep 6 04:50:24 2024 +From: cry@n @end|ng |rom b|ngh@mton@edu (Christopher W. Ryan) +Date: Thu, 5 Sep 2024 22:50:24 -0400 +Subject: [R] effects() extractor for a quantile reqression object: error + message +Message-ID: + +I'm using quantreg package version 5.98 of 24 May 2024, in R 4.4.1 on +Linux Mint. + +The online documentation for quantreg says, in part, under the +description of the rq.object, "The coefficients, residuals, and effects +may be extracted by the generic functions of the same name, rather than +by the $ operator." + +I create an rq object for the 0.9 quantile, called qm.9 + +effects(qm.9) + +yields, the error message, " effects(qm.9) +Error in UseMethod("effects") : + no applicable method for 'effects' applied to an object of class "rq" + +I'm confused. Appreciate any suggestions. Thanks. + +--Chris Ryan + + +From r@oknz @end|ng |rom gm@||@com Fri Sep 6 06:44:01 2024 +From: r@oknz @end|ng |rom gm@||@com (Richard O'Keefe) +Date: Fri, 6 Sep 2024 16:44:01 +1200 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + +Message-ID: + +I expect that atan(1i) = (0 + infinity i) and that atan(1i)/5 = (0 + +infinity i)/5 = (0 + infinity i). +Here's what I get in C: +(0,1) = (0, 1) +atan((0,1)) = (0, inf) +atan((0,1))/5 = (0, inf) + +Note the difference between I*infinity = (0,1)*infinity = +(0*infinity,1*infinity) = (NaN,infinity) +and (0,infinity)/5 = (0/5,infinity/5) = (0,infinity). +The former involves multiplying 0 by infinity, which yields NaN. +The latter does not. + +> complex(1,0,Inf)*2 +[1] NaN+Infi +There is no good reason for this. 0*2 is 0, not NaN. + +In IEEE arithmetic, multiplying or dividing a complex number by a real number is +NOT the same as multiplying or dividing by the complex version of that +real number. +(0,Inf) * 2 = (0*2, Inf*2) = (0, Inf). +(0,Inf) * (2,0) = (0*2 - Inf*0, 0*0 + Inf*2) = (NaN, Inf). + +There really truly is a bug here, and it is treating R*Z, Z*R, and Z/R +as if they were +the same as W*Z, Z*W, and Z/W where W = complex(1,R,0). + +On Fri, 6 Sept 2024 at 10:12, Bert Gunter wrote: +> +> Perhaps +> +> > Inf*1i +> [1] NaN+Infi +> +> clarifies why it is *not* a bug. +> (Boy, did that jog some long dusty math memories :-) ) +> +> -- Bert +> +> On Thu, Sep 5, 2024 at 2:48?PM Duncan Murdoch +> wrote: +> +> > On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +> > > Dear R Users, +> > > +> > > Is this desired behaviour? +> > > I presume it's a bug. +> > > +> > > atan(1i) +> > > # 0+Infi +> > > +> > > tan(atan(1i)) +> > > # 0+1i +> > > +> > > atan(1i) / 5 +> > > # NaN+Infi +> > +> > There's no need to involve atan() and tan() in this: +> > +> > > (0+Inf*1i)/5 +> > [1] NaN+Infi +> > +> > Why do you think this is a bug? +> > +> > Duncan Murdoch +> > +> > ______________________________________________ +> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> > https://stat.ethz.ch/mailman/listinfo/r-help +> > PLEASE do read the posting guide +> > https://www.R-project.org/posting-guide.html +> > and provide commented, minimal, self-contained, reproducible code. +> > +> +> [[alternative HTML version deleted]] +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + + +From r@oknz @end|ng |rom gm@||@com Fri Sep 6 07:24:07 2024 +From: r@oknz @end|ng |rom gm@||@com (Richard O'Keefe) +Date: Fri, 6 Sep 2024 17:24:07 +1200 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + + + + +Message-ID: + +The thing is that real*complex, complex*real, and complex/real are not +"complex arithmetic" +in the requisite sense. The complex numbers are a vector space over +the reals, and +complex*real and real*complex are vector*scalar and scalar*vector. +For example, in the Ada programming language, we have +function "*" (Left, Right : Complex) return Complex; +function "*" (Left : Complex; Right : Real'Base) return Complex; +function "*" (Left : Real'Base; Right : Complex) return Complex; +showing that Z*R and Z*W involve *different* functions. + +It's worth noting that complex*real and real*complex just require two +real multiplications, +no other arithmetic operations, while complex*complex requires four +real multiplications, +an addition, and a subtraction. So implementing complex*real by +conventing the real +to complex is inefficient (as well as getting the finer points of IEEE +arithmetic wrong). +As for comple division, getting that *right* in floating-point is +fiendishly difficult (there are +lots of algorithms out there and the majority of them have serious +flaws) and woefully costly. +It's not unfair to characterise implementing complex/real by +conversion to complex and +doing complex/complex as a beginner's bungle. + +There are good reasons why "double", "_Imaginary double", and "_Complex double" +are distinct types in standard C (as they are in Ada), and the +definition of multiplication +in G.5.1 para 2 is *direct* (not via complex*complex). + +Now R has its own way of doing things, and if the judgement of the R +maintainers is +that keeping the "convert to a common type and then operate" model is +more important +than getting good answers, well, it's THEIR language, not mine. But +let's not pretend +that the answers are *right* in any other sense. + +On Fri, 6 Sept 2024 at 11:07, Jeff Newmiller via R-help + wrote: +> +> atan(1i) -> 0 + Inf i +> complex(1/5) -> 0.2 + 0i +> atan(1i) -> (0 + Inf i) * (0.2 + 0i) +> -> 0*0.2 + 0*0i + Inf i * 0.2 + Inf i * 0i +> infinity times zero is undefined +> -> 0 + 0i + Inf i + NaN * i^2 +> -> 0 + 0i + Inf i - NaN +> -> NaN + Inf i +> +> I am not sure how complex arithmetic could arrive at another answer. +> +> I advise against messing with infinities... use atan2() if you don't actually need complex arithmetic. +> +> On September 5, 2024 3:38:33 PM PDT, Bert Gunter wrote: +> >> complex(real = 0, imaginary = Inf) +> >[1] 0+Infi +> > +> >> Inf*1i +> >[1] NaN+Infi +> > +> >>> complex(real = 0, imaginary = Inf)/5 +> >[1] NaN+Infi +> > +> >See the Note in ?complex for the explanation, I think. Duncan can correct +> >if I'm wrong. +> > +> >-- Bert +> > +> >On Thu, Sep 5, 2024 at 3:20?PM Leo Mada wrote: +> > +> >> Dear Bert, +> >> +> >> These behave like real divisions/multiplications: +> >> complex(re=Inf, im = Inf) * 5 +> >> # Inf+Infi +> >> complex(re=-Inf, im = Inf) * 5 +> >> # -Inf+Infi +> >> +> >> The real division / multiplication should be faster and also is well +> >> behaved. I was expecting R to do the real division/multiplication on a +> >> complex number. Which R actually does for these very particular cases; but +> >> not when only Im(x) is Inf. +> >> +> >> Sincerely, +> >> +> >> Leonard +> >> +> >> ------------------------------ +> >> *From:* Bert Gunter +> >> *Sent:* Friday, September 6, 2024 1:12 AM +> >> *To:* Duncan Murdoch +> >> *Cc:* Leo Mada ; r-help at r-project.org < +> >> r-help at r-project.org> +> >> *Subject:* Re: [R] BUG: atan(1i) / 5 = NaN+Infi ? +> >> +> >> Perhaps +> >> +> >> > Inf*1i +> >> [1] NaN+Infi +> >> +> >> clarifies why it is *not* a bug. +> >> (Boy, did that jog some long dusty math memories :-) ) +> >> +> >> -- Bert +> >> +> >> On Thu, Sep 5, 2024 at 2:48?PM Duncan Murdoch +> >> wrote: +> >> +> >> On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +> >> > Dear R Users, +> >> > +> >> > Is this desired behaviour? +> >> > I presume it's a bug. +> >> > +> >> > atan(1i) +> >> > # 0+Infi +> >> > +> >> > tan(atan(1i)) +> >> > # 0+1i +> >> > +> >> > atan(1i) / 5 +> >> > # NaN+Infi +> >> +> >> There's no need to involve atan() and tan() in this: +> >> +> >> > (0+Inf*1i)/5 +> >> [1] NaN+Infi +> >> +> >> Why do you think this is a bug? +> >> +> >> Duncan Murdoch +> >> +> >> ______________________________________________ +> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> >> https://stat.ethz.ch/mailman/listinfo/r-help +> >> PLEASE do read the posting guide +> >> https://www.R-project.org/posting-guide.html +> >> and provide commented, minimal, self-contained, reproducible code. +> >> +> >> +> > +> > [[alternative HTML version deleted]] +> > +> >______________________________________________ +> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> >https://stat.ethz.ch/mailman/listinfo/r-help +> >PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> >and provide commented, minimal, self-contained, reproducible code. +> +> -- +> Sent from my phone. Please excuse my brevity. +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + + +From |eo@m@d@ @end|ng |rom @yon|c@eu Fri Sep 6 00:12:43 2024 +From: |eo@m@d@ @end|ng |rom @yon|c@eu (Leo Mada) +Date: Thu, 5 Sep 2024 22:12:43 +0000 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + +Message-ID: + +Dear Duncan, + +Here is also the missing information: +R version 4.4.1 (2024-06-14 ucrt) +Platform: x86_64-w64-mingw32/x64 +Running under: Windows 10 x64 (build 19045) + +Regarding the results: +atan(1i) +# 0+Infi +Re(atan(1i)) +# 0 +Im(atan(1i)) +# Inf + +0 + Inf i is a valid complex number: +tan(atan(1i)) +# 0+1i + +Inf / 5 +# Inf + +Note: atan(1i) / 5 should have generated 0 + Inf * 1i; even the explicit complex number fails: +complex(re=0, im = Inf) / 5 +# NaN+Infi +complex(re=Inf, im = Inf) / 5 +# Inf+Infi + +I presume that R tries to do the complex division, although the real division is well defined. + +Sincerely, + +Leonard + +________________________________ +From: Duncan Murdoch +Sent: Friday, September 6, 2024 12:40 AM +To: Leo Mada ; r-help at r-project.org +Subject: Re: [R] BUG: atan(1i) / 5 = NaN+Infi ? + +On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +> Dear R Users, +> +> Is this desired behaviour? +> I presume it's a bug. +> +> atan(1i) +> # 0+Infi +> +> tan(atan(1i)) +> # 0+1i +> +> atan(1i) / 5 +> # NaN+Infi + +There's no need to involve atan() and tan() in this: + + > (0+Inf*1i)/5 +[1] NaN+Infi + +Why do you think this is a bug? + +Duncan Murdoch + + +-------------- next part -------------- +A non-text attachment was scrubbed... +Name: =?iso-8859-7?Q?5-(2019)-CEBP=E2-LIP_induces_cancer-type_metabolic_reprogr?= =?iso-8859-7?Q?amming_by_regulating_the_let-7LIN28B_circuit_in_mice.pdf?= +Type: application/pdf +Size: 2136921 bytes +Desc: =?iso-8859-7?Q?5-(2019)-CEBP=E2-LIP_induces_cancer-type_metabolic_reprogr?= =?iso-8859-7?Q?amming_by_regulating_the_let-7LIN28B_circuit_in_mice.pdf?= +URL: + +From m@ech|er @end|ng |rom @t@t@m@th@ethz@ch Fri Sep 6 10:04:11 2024 +From: m@ech|er @end|ng |rom @t@t@m@th@ethz@ch (Martin Maechler) +Date: Fri, 6 Sep 2024 10:04:11 +0200 +Subject: [R] effects() extractor for a quantile reqression object: error + message +In-Reply-To: +References: +Message-ID: <26330.46971.450132.890513@stat.math.ethz.ch> + +>>>>> Christopher W Ryan via R-help +>>>>> on Thu, 5 Sep 2024 22:50:24 -0400 writes: + + > I'm using quantreg package version 5.98 of 24 May 2024, in R 4.4.1 on + > Linux Mint. + + > The online documentation for quantreg says, in part, under the + > description of the rq.object, "The coefficients, residuals, and effects + > may be extracted by the generic functions of the same name, rather than + > by the $ operator." + + > I create an rq object for the 0.9 quantile, called qm.9 + + > effects(qm.9) + + > yields, the error message, " effects(qm.9) + > Error in UseMethod("effects") : + > no applicable method for 'effects' applied to an object of class "rq" + + > I'm confused. Appreciate any suggestions. Thanks. + + > --Chris Ryan + +Unfortunately, the documentation is wrong, here. + +You can always use + + methods(class = class(qm.9)) + +to get list of generic function names for there is a method for +your object (of class "rq") in this case... +and indeed, "effects" is not among them. + +Possibly this was a thinko (on the 'rq.object' help page) and +it was "predict" that was meant there, +as indeed there is a predict method (actually there are even 3 +different predict() methods in package 'quantreg', and they are +well documented on the ?predict.qr help page. + +{OTOH my guess is that there originally *was* an effects method + and it has been dropped in the mean time} + + +Martin Maechler + +ETH Zurich and R Core team + + +From m@ech|er @end|ng |rom @t@t@m@th@ethz@ch Fri Sep 6 10:37:36 2024 +From: m@ech|er @end|ng |rom @t@t@m@th@ethz@ch (Martin Maechler) +Date: Fri, 6 Sep 2024 10:37:36 +0200 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + + + + + +Message-ID: <26330.48976.240526.972936@stat.math.ethz.ch> + +>>>>> Richard O'Keefe +>>>>> on Fri, 6 Sep 2024 17:24:07 +1200 writes: + + > The thing is that real*complex, complex*real, and complex/real are not + > "complex arithmetic" in the requisite sense. + + > The complex numbers are a vector space over the reals, + +Yes, but they _also_ are field (and as others have argued mathematically only +have one infinity point), +and I think here we are fighting with which definition should +take precedence here. +The English Wikipedia page is even more extensive and precise, + https://en.wikipedia.org/wiki/Complex_number (line breaking by me): + + " The complex numbers form a rich structure that is simultaneously + - an algebraically closed field, + - a commutative algebra over the reals, and + - a Euclidean vector space of dimension two." + +our problem "of course" is that we additionally add +/- Inf for +the reals and for storage etc treating them as a 2D vector space +over the reals is "obvious". + + > and complex*real and real*complex are vector*scalar and scalar*vector. + > For example, in the Ada programming language, we have + > function "*" (Left, Right : Complex) return Complex; + > function "*" (Left : Complex; Right : Real'Base) return Complex; + > function "*" (Left : Real'Base; Right : Complex) return Complex; + > showing that Z*R and Z*W involve *different* functions. + + > It's worth noting that complex*real and real*complex just require two + > real multiplications, + > no other arithmetic operations, while complex*complex requires four + > real multiplications, + > an addition, and a subtraction. So implementing complex*real by + > conventing the real + > to complex is inefficient (as well as getting the finer points of IEEE + > arithmetic wrong). + +I see your point. + + > As for complex division, getting that *right* in floating-point is + > fiendishly difficult (there are + > lots of algorithms out there and the majority of them have serious flaws) + > and woefully costly. + + > It's not unfair to characterise implementing complex/real + > by conversion to complex and doing complex/complex as a + > beginner's bungle. + +ouch! ... but still I tend to acknowledge your point, incl the "not unfair" .. + + > There are good reasons why "double", "_Imaginary double", and "_Complex double" + > are distinct types in standard C (as they are in Ada), + +interesting. OTOH, I think standard C did not have strict +standards about complex number storage etc in the mid 1990s +when R was created. + + > and the definition of multiplication + > in G.5.1 para 2 is *direct* (not via complex*complex). + +I see (did not know about) -- where can we find 'G.5.1 para 2' + + > Now R has its own way of doing things, and if the judgement of the R + > maintainers is + > that keeping the "convert to a common type and then operate" model is + > more important + > than getting good answers, well, it's THEIR language, not mine. + +Well, it should also be the R community's language, +where we, the R core team, do most of the "base" work and also +emphasize guaranteeing long term stability. + +Personally, I think that + "convert to a common type and then operate" +is a good rule and principle in many, even most places and cases, +but I hate it if humans should not be allowed to break good +rules for even better reasons (but should rather behave like algorithms ..). + +This may well be a very good example of re-considering. +As mentioned above, e.g., I was not aware of the C language standard +being so specific here and different than what we've been doing +in R. + + + > But let's not pretend + > that the answers are *right* in any other sense. + +I think that's too strong -- Jeff's computation (just here below) +is showing one well defined sense of "right" I'd say. +(Still I know and agree the Inf * 0 |--> NaN + rule *is* sometimes undesirable) + + > On Fri, 6 Sept 2024 at 11:07, Jeff Newmiller via R-help + > wrote: + >> + >> atan(1i) -> 0 + Inf i + >> complex(1/5) -> 0.2 + 0i + >> atan(1i) -> (0 + Inf i) * (0.2 + 0i) + -> 0*0.2 + 0*0i + Inf i * 0.2 + Inf i * 0i + >> infinity times zero is undefined + -> 0 + 0i + Inf i + NaN * i^2 + -> 0 + 0i + Inf i - NaN + -> NaN + Inf i + >> + >> I am not sure how complex arithmetic could arrive at another answer. + >> + >> I advise against messing with infinities... use atan2() if you don't actually need complex arithmetic. + >> + >> On September 5, 2024 3:38:33 PM PDT, Bert Gunter wrote: + >> >> complex(real = 0, imaginary = Inf) + >> >[1] 0+Infi + >> > + >> >> Inf*1i + >> >[1] NaN+Infi + >> > + >> >>> complex(real = 0, imaginary = Inf)/5 + >> >[1] NaN+Infi + >> > + >> >See the Note in ?complex for the explanation, I think. Duncan can correct + >> >if I'm wrong. + >> > + >> >-- Bert + + [...................] + +Martin + +-- +Martin Maechler +ETH Zurich and R Core team + + +From rkoenker @end|ng |rom ||||no|@@edu Fri Sep 6 10:37:54 2024 +From: rkoenker @end|ng |rom ||||no|@@edu (Koenker, Roger W) +Date: Fri, 6 Sep 2024 08:37:54 +0000 +Subject: [R] Fwd: effects() extractor for a quantile reqression object: + error message +References: +Message-ID: + +Apologies, forgot to copy R-help on this response. + +Begin forwarded message: + + +From: Roger Koenker +Subject: Re: [R] effects() extractor for a quantile reqression object: error message +Date: September 6, 2024 at 8:38:47?AM GMT+1 +To: "Christopher W. Ryan" + +Chris, + +This was intended to emulate the effects component of lm() fitting, but was never implemented. Frankly, I don?t quite see on first glance how this works for lm() ? it seems to be (mostly) about situations where X is not full rank (see lm.fit) and I also never bothered to implement rq for X that were not full rank. + +Roger + + +On Sep 6, 2024, at 3:50?AM, Christopher W. Ryan via R-help wrote: + +I'm using quantreg package version 5.98 of 24 May 2024, in R 4.4.1 on +Linux Mint. + +The online documentation for quantreg says, in part, under the +description of the rq.object, "The coefficients, residuals, and effects +may be extracted by the generic functions of the same name, rather than +by the $ operator." + +I create an rq object for the 0.9 quantile, called qm.9 + +effects(qm.9) + +yields, the error message, " effects(qm.9) +Error in UseMethod("effects") : +no applicable method for 'effects' applied to an object of class "rq" + +I'm confused. Appreciate any suggestions. Thanks. + +--Chris Ryan + +______________________________________________ +R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!DZ3fjg!8EOq_-vshoZYLg-FZREULmFkpvaXrZ6aw5ABLzjX4aq3XNvoDxGipcY73SPgiBQasfWdncPj7J2odYZKU3BD$ +PLEASE do read the posting guide https://urldefense.com/v3/__https://www.R-project.org/posting-guide.html__;!!DZ3fjg!8EOq_-vshoZYLg-FZREULmFkpvaXrZ6aw5ABLzjX4aq3XNvoDxGipcY73SPgiBQasfWdncPj7J2odY7Gv6_Z$ +and provide commented, minimal, self-contained, reproducible code. + + + + [[alternative HTML version deleted]] + + +From pd@|gd @end|ng |rom gm@||@com Fri Sep 6 10:46:36 2024 +From: pd@|gd @end|ng |rom gm@||@com (peter dalgaard) +Date: Fri, 6 Sep 2024 10:46:36 +0200 +Subject: [R] Calculation of VCV matrix of estimated coefficient +In-Reply-To: +References: + + +Message-ID: + + + +> On 5 Sep 2024, at 16:36 , Gabor Grothendieck wrote: +> +> sigma(model)^2 will give the correct MSE. Also note that your model +> matrix has intercept at +> the end whereas vcov will have it at the beginning so you will need to +> permute the rows +> and columns to get them to be the same/ + + +Also, + +X <- cbind(1, as.matrix(mtcars[, c('disp', 'hp')])) + +or even + +X <- model.matrix(~ disp + hp, mtcars) + +-pd + + +> +> On Wed, Sep 4, 2024 at 3:34?PM Daniel Lobo wrote: +>> +>> Hi, +>> +>> I am trying to replicate the R's result for VCV matrix of estimated +>> coefficients from linear model as below +>> +>> data(mtcars) +>> model <- lm(mpg~disp+hp, data=mtcars) +>> model_summ <-summary(model) +>> MSE = mean(model_summ$residuals^2) +>> vcov(model) +>> +>> Now I want to calculate the same thing manually, +>> +>> library(dplyr) +>> X = as.matrix(mtcars[, c('disp', 'hp')] %>% mutate(Intercept = 1)); +>> solve(t(X) %*% X) * MSE +>> +>> Unfortunately they do not match. +>> +>> Could you please help where I made mistake, if any. +>> +>> Thanks +>> +>> ______________________________________________ +>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>> https://stat.ethz.ch/mailman/listinfo/r-help +>> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +>> and provide commented, minimal, self-contained, reproducible code. +> +> +> +> -- +> Statistics & Software Consulting +> GKX Group, GKX Associates Inc. +> tel: 1-877-GKX-GROUP +> email: ggrothendieck at gmail.com +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + +-- +Peter Dalgaard, Professor, +Center for Statistics, Copenhagen Business School +Solbjerg Plads 3, 2000 Frederiksberg, Denmark +Phone: (+45)38153501 +Office: A 4.23 +Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com + + +From murdoch@dunc@n @end|ng |rom gm@||@com Fri Sep 6 11:54:23 2024 +From: murdoch@dunc@n @end|ng |rom gm@||@com (Duncan Murdoch) +Date: Fri, 6 Sep 2024 05:54:23 -0400 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + + +Message-ID: <8975c0de-df0a-4f39-b298-662f7debbac3@gmail.com> + +On 2024-09-06 12:44 a.m., Richard O'Keefe wrote: +> I expect that atan(1i) = (0 + infinity i) and that atan(1i)/5 = (0 + +> infinity i)/5 = (0 + infinity i). +> Here's what I get in C: +> (0,1) = (0, 1) +> atan((0,1)) = (0, inf) +> atan((0,1))/5 = (0, inf) +> +> Note the difference between I*infinity = (0,1)*infinity = +> (0*infinity,1*infinity) = (NaN,infinity) +> and (0,infinity)/5 = (0/5,infinity/5) = (0,infinity). +> The former involves multiplying 0 by infinity, which yields NaN. +> The latter does not. +> +>> complex(1,0,Inf)*2 +> [1] NaN+Infi +> There is no good reason for this. 0*2 is 0, not NaN. +> +> In IEEE arithmetic, multiplying or dividing a complex number by a real number is +> NOT the same as multiplying or dividing by the complex version of that +> real number. +> (0,Inf) * 2 = (0*2, Inf*2) = (0, Inf). +> (0,Inf) * (2,0) = (0*2 - Inf*0, 0*0 + Inf*2) = (NaN, Inf). +> +> There really truly is a bug here, and it is treating R*Z, Z*R, and Z/R +> as if they were +> the same as W*Z, Z*W, and Z/W where W = complex(1,R,0). + +I would only disagree with the statement above by distinguishing between +a "bug" (where R is not behaving as documented) and a "design flaw" +(where it is behaving as documented, but the behaviour is undesirable). + +I think this is a design flaw rather than a bug. + +The distinction is important: if it is a design flaw, then a change is +harder, because users who rely on the behaviour deserve more help in +adapting than those who rely on a bug. Bugs should be fixed. Design +flaws need thinking about, and sometimes shouldn't be fixed. + +On the other hand, I was unable to find documentation saying that the +current behaviour is intended, so I could be wrong. + +Duncan Murdoch + +> +> On Fri, 6 Sept 2024 at 10:12, Bert Gunter wrote: +>> +>> Perhaps +>> +>>> Inf*1i +>> [1] NaN+Infi +>> +>> clarifies why it is *not* a bug. +>> (Boy, did that jog some long dusty math memories :-) ) +>> +>> -- Bert +>> +>> On Thu, Sep 5, 2024 at 2:48?PM Duncan Murdoch +>> wrote: +>> +>>> On 2024-09-05 4:23 p.m., Leo Mada via R-help wrote: +>>>> Dear R Users, +>>>> +>>>> Is this desired behaviour? +>>>> I presume it's a bug. +>>>> +>>>> atan(1i) +>>>> # 0+Infi +>>>> +>>>> tan(atan(1i)) +>>>> # 0+1i +>>>> +>>>> atan(1i) / 5 +>>>> # NaN+Infi +>>> +>>> There's no need to involve atan() and tan() in this: +>>> +>>> > (0+Inf*1i)/5 +>>> [1] NaN+Infi +>>> +>>> Why do you think this is a bug? +>>> +>>> Duncan Murdoch +>>> +>>> ______________________________________________ +>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>>> https://stat.ethz.ch/mailman/listinfo/r-help +>>> PLEASE do read the posting guide +>>> https://www.R-project.org/posting-guide.html +>>> and provide commented, minimal, self-contained, reproducible code. +>>> +>> +>> [[alternative HTML version deleted]] +>> +>> ______________________________________________ +>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>> https://stat.ethz.ch/mailman/listinfo/r-help +>> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +>> and provide commented, minimal, self-contained, reproducible code. + + +From m@ech|er @end|ng |rom @t@t@m@th@ethz@ch Fri Sep 6 13:10:13 2024 +From: m@ech|er @end|ng |rom @t@t@m@th@ethz@ch (Martin Maechler) +Date: Fri, 6 Sep 2024 13:10:13 +0200 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: <8975c0de-df0a-4f39-b298-662f7debbac3@gmail.com> +References: + + + + <8975c0de-df0a-4f39-b298-662f7debbac3@gmail.com> +Message-ID: <26330.58133.15925.293052@stat.math.ethz.ch> + +>>>>> Duncan Murdoch +>>>>> on Fri, 6 Sep 2024 05:54:23 -0400 writes: + + > On 2024-09-06 12:44 a.m., Richard O'Keefe wrote: + >> I expect that atan(1i) = (0 + infinity i) and that atan(1i)/5 = (0 + + >> infinity i)/5 = (0 + infinity i). + >> Here's what I get in C: + >> (0,1) = (0, 1) + >> atan((0,1)) = (0, inf) + >> atan((0,1))/5 = (0, inf) + >> + >> Note the difference between I*infinity = (0,1)*infinity = + >> (0*infinity,1*infinity) = (NaN,infinity) + >> and (0,infinity)/5 = (0/5,infinity/5) = (0,infinity). + >> The former involves multiplying 0 by infinity, which yields NaN. + >> The latter does not. + >> + >>> complex(1,0,Inf)*2 + >> [1] NaN+Infi + >> There is no good reason for this. 0*2 is 0, not NaN. + >> + >> In IEEE arithmetic, multiplying or dividing a complex number by a real number is + >> NOT the same as multiplying or dividing by the complex version of that + >> real number. + >> (0,Inf) * 2 = (0*2, Inf*2) = (0, Inf). + >> (0,Inf) * (2,0) = (0*2 - Inf*0, 0*0 + Inf*2) = (NaN, Inf). + >> + >> There really truly is a bug here, and it is treating R*Z, Z*R, and Z/R + >> as if they were + >> the same as W*Z, Z*W, and Z/W where W = complex(1,R,0). + + > I would only disagree with the statement above by distinguishing between + > a "bug" (where R is not behaving as documented) and a "design flaw" + > (where it is behaving as documented, but the behaviour is undesirable). + + > I think this is a design flaw rather than a bug. + + > The distinction is important: if it is a design flaw, then a change is + > harder, because users who rely on the behaviour deserve more help in + > adapting than those who rely on a bug. Bugs should be fixed. Design + > flaws need thinking about, and sometimes shouldn't be fixed. + + > On the other hand, I was unable to find documentation saying that the + > current behaviour is intended, so I could be wrong. + + > Duncan Murdoch + +I agree 100% (with Duncan). + +I've now also carefully read ?Arithmetic and ?complex +and could not find docu explicitly documenting the current +behavior, in the context of complex arithmetic not even +mentioning the "overall principle" used very often in R to + + 1. coerce to common type + 2. then compute with that "pure type" + +In ?complex, we have the (last sentence of the first) 'Note:' + + ---> For ?+? and ?-?, R's own handling works strictly ?coordinate wise?. + +now, implicitly that implies that {+, -} are the only +ops/functions where coordinate-wise arithmetic is used/applied +and hence could it be _stretched_ to say that for + * and / coordinate wise does not always apply, +in our case not even when it would make sense, i.e., when one of +the two operands is real (in the '/' case, it must be the 2nd operand) +and hence 2d-vector space arithmetic +should be applied naturally. + +It is still true that this must be considered a design flaw +rather than a bug, even if only because the above "overall +principle" is predominant and tought very often when teaching R. + +In this case, I do think we should look into the consequences of +indeed distinguishing + * + * and + / +from their respective current {1. coerce to complex, 2. use complex arith} +arithmetic. + +Hence, thanks a lot for bringing this up, to Leo Mada and +notably to Richard O'Keefe for providing context, notably wrt +to C standards {which I'd want to follow mostly, but not +always ... but I'm not opening that now}... + +Martin + + +From g@@@powe|| @end|ng |rom protonm@||@com Fri Sep 6 16:12:53 2024 +From: g@@@powe|| @end|ng |rom protonm@||@com (Gregg Powell) +Date: Fri, 06 Sep 2024 14:12:53 +0000 +Subject: [R] Fwd: effects() extractor for a quantile reqression object: + error message +In-Reply-To: +References: + +Message-ID: + +Need to kill some time, so thought I'd Opine. + +Given the intent, as I understood it... to extract components from a quantile regression (rq) object similar to how one might extract effects from an lm object. + +Since it seems effects() is not implemented for rq, here are some alternative approaches to achieve similar functionality or extract useful information from the quantile regression model: + +1. Extracting Coefficients. Use the coef() function to extract the coefficients of the quantile regression model. This is similar to extracting the effects in a linear model. + +> coef(qm.9) + +This will return the estimated coefficients for the 0.9 quantile. + +2. Extracting Fitted Values. To get the fitted values from the quantile regression model, use: + +> fitted(qm.9) + +This gives the fitted values for the quantile regression model, similar to what you'd see in an lm() model. + +3. Extracting ResidualsResiduals from the quantile regression can be extracted using the resid() function: + +> resid(qm.9) + +This gives the residuals for the 0.9 quantile regression model, which can be a useful diagnostic for checking model fit. + +4. Manually Calculating "Effects". While effects() is not available, you can manually calculate effects by examining the design matrix and the coefficients. If the term "effects" refers to the influence of different covariates in the model, these can be assessed by looking at the coefficients themselves and their impact on the fitted values. + +5. Using the summary() Function. The summary() function for rq objects provides detailed information about the quantile regression fit, including coefficient estimates, standard errors, and statistical significance: + +> summary(qm.9) + +This can give a more comprehensive understanding of how covariates are contributing to the model. + +6. Working with Design Matrices. If you need the design matrix for further custom calculations, you can extract it using: + +> model.matrix(qm.9) + + This returns the matrix of predictor variables, which you can then use to manually compute the "effects" of changes in the predictors. + +7. Explore Partial Effects with predict(). The predict() function can help you assess how changes in predictor values affect the outcome. For instance, to predict values at specific points in the design space, you can use: + +> predict(qm.9, newdata = data.frame(your_new_data)) + +8. Bootstrapping to Examine Variability. If you want to assess variability in the effects of predictors, you could use bootstrapping. The boot.rq() function in the quantreg package allows you to bootstrap the quantile regression coefficients, giving insight into the variability of the estimated "effects." + +Example: + +> boot_results <- boot.rq(y ~ X, tau = 0.9, data = your_data, R = 1000) + +> summary(boot_results) + +9.Interaction or Additive Effects (If Applicable). If you're trying to capture interaction or additive effects, you might need to specify interaction terms directly in your formula and then inspect the coefficients for these terms. Quantile regression will estimate these coefficients in the same manner as linear regression but specific to the quantile of interest. + +In conclusion, while effects() is not available for quantile regression, the combination of coef(), fitted(), resid(), model.matrix(), and summary() provides the main components of a quantile regression model that should provide similar insights to what effects() provides for linear models. + +Forgive any typos please... I'm on a mobile device. + +Kind regards, +Gregg Powell + + +Sent from Proton Mail Android + + +-------- Original Message -------- +On 06/09/2024 01:37, Koenker, Roger W wrote: + +> Apologies, forgot to copy R-help on this response. +> +> Begin forwarded message: +> +> +> From: Roger Koenker +> Subject: Re: [R] effects() extractor for a quantile reqression object: error message +> Date: September 6, 2024 at 8:38:47?AM GMT+1 +> To: "Christopher W. Ryan" +> +> Chris, +> +> This was intended to emulate the effects component of lm() fitting, but was never implemented. Frankly, I don?t quite see on first glance how this works for lm() ? it seems to be (mostly) about situations where X is not full rank (see lm.fit) and I also never bothered to implement rq for X that were not full rank. +> +> Roger +> +> +> On Sep 6, 2024, at 3:50?AM, Christopher W. Ryan via R-help wrote: +> +> I'm using quantreg package version 5.98 of 24 May 2024, in R 4.4.1 on +> Linux Mint. +> +> The online documentation for quantreg says, in part, under the +> description of the rq.object, "The coefficients, residuals, and effects +> may be extracted by the generic functions of the same name, rather than +> by the $ operator." +> +> I create an rq object for the 0.9 quantile, called qm.9 +> +> effects(qm.9) +> +> yields, the error message, " effects(qm.9) +> Error in UseMethod("effects") : +> no applicable method for 'effects' applied to an object of class "rq" +> +> I'm confused. Appreciate any suggestions. Thanks. +> +> --Chris Ryan +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!DZ3fjg!8EOq_-vshoZYLg-FZREULmFkpvaXrZ6aw5ABLzjX4aq3XNvoDxGipcY73SPgiBQasfWdncPj7J2odYZKU3BD$ +> PLEASE do read the posting guide https://urldefense.com/v3/__https://www.R-project.org/posting-guide.html__;!!DZ3fjg!8EOq_-vshoZYLg-FZREULmFkpvaXrZ6aw5ABLzjX4aq3XNvoDxGipcY73SPgiBQasfWdncPj7J2odY7Gv6_Z$ +> and provide commented, minimal, self-contained, reproducible code. +> +> +> +> [[alternative HTML version deleted]] +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> +-------------- next part -------------- +A non-text attachment was scrubbed... +Name: signature.asc +Type: application/pgp-signature +Size: 509 bytes +Desc: OpenPGP digital signature +URL: + +From r@oknz @end|ng |rom gm@||@com Fri Sep 6 16:40:29 2024 +From: r@oknz @end|ng |rom gm@||@com (Richard O'Keefe) +Date: Sat, 7 Sep 2024 02:40:29 +1200 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: <26330.48976.240526.972936@stat.math.ethz.ch> +References: + + + + + + + <26330.48976.240526.972936@stat.math.ethz.ch> +Message-ID: + +G.5.1 para 2 can be found in the C17 standard -- I actually have the +final draft not the published standard. It's in earlier standards, I +just didn't check earlier standards. Complex arithmetic was not in +the first C standard (C89) but was in C99. + +The complex numbers do indeed form a field, and Z*W invokes an +operation in that field when Z and W are both complex numbers. Z*R +and R*Z, where R is real-but-not-complex, is not that field operation; +it's the scalar multiplication from the vector spec view. + +One way to characterise the C and Ada view is that real numbers x can +be viewed as (x,ZERO!!!) and imaginary numbers y*i can be viewed as +(ZERO!!!, y) where ZERO!!! is a real serious HARD zero, not an IEEE ++epsilon or -epsilon, In fact this is important for getting "sign of +zero" right. +x + (u,v) = (x+u, v) *NOT* (x+u, v+0) and this does matter in IEEE arithmetic. + +R is of course based on S, and S was not only designed before C got +complex numbers, but before there was an IEEE standard with things +like NaN, Inf, and signed zeros But there *is* an IEEE standard now, +and even IBM mainframes offer IEEE-compliant arithmetic, so worrying +about the sign of zero etc is not something we can really overlook +these days. + +You are of course correct that the one-point compactification of the +complex numbers involves adjoining just one infinity and that whacking +IEEE infinities into complex numbers does not give you anything +mathematically interesting (unless you count grief and vexation as +"interesting" (:-)). Since R distinguishes between 0+Infi and +NaN+Infi, it's not clear that the one-point compactification has any +relevance to what R does. And it's not just an unexpected NaN; with +all numbers finite you can get zeros with the wrong sign. (S having +been designed before the sign of zero was something you needed to be +aware of.) + +For what it's worth, the ISO "Language Independent Arithmetic" +standard, part 3, defines separate real, imaginary, and complex types, +and defines x*(z,w) to be (x*z, x*w) directly, just like C and Ada. +So it is quite clear that R does not currently conform to LIA-3. +LIA-3 (ISO/IEC 10967-3:2006) is the nearest we have to a definition of +what "right" answers are for floating-point complex arithmetic, and +what R does cannot be called "right" by that definition. But of +course R doesn't claim conformance to any part of the LIA standard. + +Whether the R community *want* R and C to give the same answers is not +for me to say. I can only say that *I* found it reassuring that C +gave the expected answers when R did not, or, to put it another way, +disconcerting that R did not agree with C (or LIA-3). + + +What really annoys me is that I wrote an entire technical report on +(some of the) problems with complex arithmetic, and this whole "just +treat x as (x, +0.0)" thing completely failed to occur to me as +something anyone might do. + +On Fri, 6 Sept 2024 at 20:37, Martin Maechler + wrote: +> +> >>>>> Richard O'Keefe +> >>>>> on Fri, 6 Sep 2024 17:24:07 +1200 writes: +> +> > The thing is that real*complex, complex*real, and complex/real are not +> > "complex arithmetic" in the requisite sense. +> +> > The complex numbers are a vector space over the reals, +> +> Yes, but they _also_ are field (and as others have argued mathematically only +> have one infinity point), +> and I think here we are fighting with which definition should +> take precedence here. +> The English Wikipedia page is even more extensive and precise, +> https://en.wikipedia.org/wiki/Complex_number (line breaking by me): +> +> " The complex numbers form a rich structure that is simultaneously +> - an algebraically closed field, +> - a commutative algebra over the reals, and +> - a Euclidean vector space of dimension two." +> +> our problem "of course" is that we additionally add +/- Inf for +> the reals and for storage etc treating them as a 2D vector space +> over the reals is "obvious". +> +> > and complex*real and real*complex are vector*scalar and scalar*vector. +> > For example, in the Ada programming language, we have +> > function "*" (Left, Right : Complex) return Complex; +> > function "*" (Left : Complex; Right : Real'Base) return Complex; +> > function "*" (Left : Real'Base; Right : Complex) return Complex; +> > showing that Z*R and Z*W involve *different* functions. +> +> > It's worth noting that complex*real and real*complex just require two +> > real multiplications, +> > no other arithmetic operations, while complex*complex requires four +> > real multiplications, +> > an addition, and a subtraction. So implementing complex*real by +> > conventing the real +> > to complex is inefficient (as well as getting the finer points of IEEE +> > arithmetic wrong). +> +> I see your point. +> +> > As for complex division, getting that *right* in floating-point is +> > fiendishly difficult (there are +> > lots of algorithms out there and the majority of them have serious flaws) +> > and woefully costly. +> +> > It's not unfair to characterise implementing complex/real +> > by conversion to complex and doing complex/complex as a +> > beginner's bungle. +> +> ouch! ... but still I tend to acknowledge your point, incl the "not unfair" .. +> +> > There are good reasons why "double", "_Imaginary double", and "_Complex double" +> > are distinct types in standard C (as they are in Ada), +> +> interesting. OTOH, I think standard C did not have strict +> standards about complex number storage etc in the mid 1990s +> when R was created. +> +> > and the definition of multiplication +> > in G.5.1 para 2 is *direct* (not via complex*complex). +> +> I see (did not know about) -- where can we find 'G.5.1 para 2' +> +> > Now R has its own way of doing things, and if the judgement of the R +> > maintainers is +> > that keeping the "convert to a common type and then operate" model is +> > more important +> > than getting good answers, well, it's THEIR language, not mine. +> +> Well, it should also be the R community's language, +> where we, the R core team, do most of the "base" work and also +> emphasize guaranteeing long term stability. +> +> Personally, I think that +> "convert to a common type and then operate" +> is a good rule and principle in many, even most places and cases, +> but I hate it if humans should not be allowed to break good +> rules for even better reasons (but should rather behave like algorithms ..). +> +> This may well be a very good example of re-considering. +> As mentioned above, e.g., I was not aware of the C language standard +> being so specific here and different than what we've been doing +> in R. +> +> +> > But let's not pretend +> > that the answers are *right* in any other sense. +> +> I think that's too strong -- Jeff's computation (just here below) +> is showing one well defined sense of "right" I'd say. +> (Still I know and agree the Inf * 0 |--> NaN +> rule *is* sometimes undesirable) +> +> > On Fri, 6 Sept 2024 at 11:07, Jeff Newmiller via R-help +> > wrote: +> >> +> >> atan(1i) -> 0 + Inf i +> >> complex(1/5) -> 0.2 + 0i +> >> atan(1i) -> (0 + Inf i) * (0.2 + 0i) +> -> 0*0.2 + 0*0i + Inf i * 0.2 + Inf i * 0i +> >> infinity times zero is undefined +> -> 0 + 0i + Inf i + NaN * i^2 +> -> 0 + 0i + Inf i - NaN +> -> NaN + Inf i +> >> +> >> I am not sure how complex arithmetic could arrive at another answer. +> >> +> >> I advise against messing with infinities... use atan2() if you don't actually need complex arithmetic. +> >> +> >> On September 5, 2024 3:38:33 PM PDT, Bert Gunter wrote: +> >> >> complex(real = 0, imaginary = Inf) +> >> >[1] 0+Infi +> >> > +> >> >> Inf*1i +> >> >[1] NaN+Infi +> >> > +> >> >>> complex(real = 0, imaginary = Inf)/5 +> >> >[1] NaN+Infi +> >> > +> >> >See the Note in ?complex for the explanation, I think. Duncan can correct +> >> >if I'm wrong. +> >> > +> >> >-- Bert +> +> [...................] +> +> Martin +> +> -- +> Martin Maechler +> ETH Zurich and R Core team + + +From JH@rm@e @end|ng |rom roku@com Fri Sep 6 18:57:35 2024 +From: JH@rm@e @end|ng |rom roku@com (Jorgen Harmse) +Date: Fri, 6 Sep 2024 16:57:35 +0000 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: +Message-ID: + +It seems to me that the documentation of R's complex class & R's atan function do not tell us what to expect, so (as others have suggested), some additional notes are needed. I think that mathematically atan(1i) should be NA_complex_, but R seems not to use any mathematically standard compactification of the complex plane (and I'm not sure that IEEE does either). + +Incidentally, the signature of the complex constructor is confusing. complex(1L) returns zero, but complex(1L, argument=theta) is an element of the unit circle. The defaults suggest ambiguous results in case only length.out is specified, and you have to read a parenthesis in the details to figure out what will happen. Even then, the behaviour in my example is not spelled out (although it is suggested by negative inference). Moreover, the real & imaginary parts are ignored if either modulus or argument is provided, and I don't see that this is explained at all. + +R's numeric (& IEEE's floating-point types) seem to approximate a multi-point compactification of the real line. +Inf & -Inf fill out the approximation to the extended real line, and NaN, NA_real_ & maybe some others handle some cases in which the answer does not live in the extended real line. (I'm not digging into bit patterns here. I suspect that there are several versions of NaN, but I hope that they all behave the same way.) The documentation suggests that a complex scalar in R is just a pair of numeric scalars, so we are not dealing with the Riemann sphere or any other usually-studied extension of the complex plane. Since R distinguishes various complex infinities (and seems to allow any combination of numeric values in real & imaginary parts), the usual mathematical answer for atan(1i) may no longer be relevant. + +The tangent function has an essential singularity at complex infinity (the compactification point in the Riemann sphere, which I consider the natural extension for the study of meromorphic functions, for example making the tangent function well defined on the whole plane), so the usual extension of the plane does not give us an answer for atan(1i). However, another possible extension is the Cartesian square of the extended real line, and in that extension continuity suggests that tan(x + Inf*1i) = 1i and tan(x - Inf*1i) = -1i (for x real & finite). That is the result from R's tan function, and it explains why atan(1i) in R is not NA or NaN. The specific choice of pi/4 + Inf*1i puzzled me at first, but I think it's related to the branch-cut rules given in the documentation. The real part of atan((1+.Machine$double.eps)*1i) is pi/2, and the real part of atan((1-.Machine$double.eps)*1i) is zero, and someone apparently decided to average those for atan(1i). + +TL;DR: The documentation needs more details, and I don't really like the extended complex plane that R implemented, but within that framework the answers for atan(1i) & atan(-1i) make sense. + +Regards, +Jorgen Harmse. + + + + + + [[alternative HTML version deleted]] + + +From m@ech|er @end|ng |rom @t@t@m@th@ethz@ch Sat Sep 7 17:01:12 2024 +From: m@ech|er @end|ng |rom @t@t@m@th@ethz@ch (Martin Maechler) +Date: Sat, 7 Sep 2024 17:01:12 +0200 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + + + + + + + <26330.48976.240526.972936@stat.math.ethz.ch> + +Message-ID: <26332.27320.79283.293497@stat.math.ethz.ch> + +>>>>> Richard O'Keefe +>>>>> on Sat, 7 Sep 2024 02:40:29 +1200 writes: + + > G.5.1 para 2 can be found in the C17 standard -- I + > actually have the final draft not the published standard. + +Ok. Thank you. + +A direct hopefully stable link to that final draft's Appendix G +seems to be + https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2310.pdf#chapter.14 + +which is good to have available. + + > It's in earlier standards, I just didn't check earlier + > standards. Complex arithmetic was not in the first C + > standard (C89) but was in C99. + +indeed, currently we only require C11 for R. + +A longer term solution that we (the R core team) will probably +look into is to start making use of current C standards for +complex arithmetic. + +As mentioned, all this was not yet available when R started and +already came with complex numbers as base type .... +It may (or may not, I'm not the expert) be a bit challenging +trying to remain back compatible (e.g. with save complex number +R objects) and still use C standard complex headers ... + +But mid to long term I guess that would be the way to go. + +Martin + + > The complex numbers do indeed form a field, and Z*W + > invokes an operation in that field when Z and W are both + > complex numbers. Z*R and R*Z, where R is + > real-but-not-complex, is not that field operation; it's + > the scalar multiplication from the vector spec view. + + > One way to characterise the C and Ada view is that real + > numbers x can be viewed as (x,ZERO!!!) and imaginary + > numbers y*i can be viewed as (ZERO!!!, y) where ZERO!!! is + > a real serious HARD zero, not an IEEE +epsilon or + > -epsilon, In fact this is important for getting "sign of + > zero" right. x + (u,v) = (x+u, v) *NOT* (x+u, v+0) and + > this does matter in IEEE arithmetic. + + > R is of course based on S, and S was not only designed + > before C got complex numbers, but before there was an IEEE + > standard with things like NaN, Inf, and signed zeros But + > there *is* an IEEE standard now, and even IBM mainframes + > offer IEEE-compliant arithmetic, so worrying about the + > sign of zero etc is not something we can really overlook + > these days. + + > You are of course correct that the one-point + > compactification of the complex numbers involves adjoining + > just one infinity and that whacking IEEE infinities into + > complex numbers does not give you anything mathematically + > interesting (unless you count grief and vexation as + > "interesting" (:-)). Since R distinguishes between 0+Infi + > and NaN+Infi, it's not clear that the one-point + > compactification has any relevance to what R does. And + > it's not just an unexpected NaN; with all numbers finite + > you can get zeros with the wrong sign. (S having been + > designed before the sign of zero was something you needed + > to be aware of.) + + > For what it's worth, the ISO "Language Independent + > Arithmetic" standard, part 3, defines separate real, + > imaginary, and complex types, and defines x*(z,w) to be + > (x*z, x*w) directly, just like C and Ada. So it is quite + > clear that R does not currently conform to LIA-3. LIA-3 + > (ISO/IEC 10967-3:2006) is the nearest we have to a + > definition of what "right" answers are for floating-point + > complex arithmetic, and what R does cannot be called + > "right" by that definition. But of course R doesn't claim + > conformance to any part of the LIA standard. + + > Whether the R community *want* R and C to give the same + > answers is not for me to say. I can only say that *I* + > found it reassuring that C gave the expected answers when + > R did not, or, to put it another way, disconcerting that R + > did not agree with C (or LIA-3). + + +Thank you. +Indeed, I'd like the idea to consider LIA standards as much +as (sensibly) possible. + +Martin + + + > What really annoys me is that I wrote an entire technical + > report on (some of the) problems with complex arithmetic, + > and this whole "just treat x as (x, +0.0)" thing + > completely failed to occur to me as something anyone might + > do. + + > On Fri, 6 Sept 2024 at 20:37, Martin Maechler + > wrote: + >> + >> >>>>> Richard O'Keefe >>>>> on Fri, 6 Sep 2024 17:24:07 + >> +1200 writes: + >> + >> > The thing is that real*complex, complex*real, and + >> complex/real are not > "complex arithmetic" in the + >> requisite sense. + >> + >> > The complex numbers are a vector space over the reals, + >> + >> Yes, but they _also_ are field (and as others have argued + >> mathematically only have one infinity point), and I think + >> here we are fighting with which definition should take + >> precedence here. The English Wikipedia page is even more + >> extensive and precise, + >> https://en.wikipedia.org/wiki/Complex_number (line + >> breaking by me): + >> + >> " The complex numbers form a rich structure that is + >> simultaneously - an algebraically closed field, - a + >> commutative algebra over the reals, and - a Euclidean + >> vector space of dimension two." + >> + >> our problem "of course" is that we additionally add +/- + >> Inf for the reals and for storage etc treating them as a + >> 2D vector space over the reals is "obvious". + >> + >> > and complex*real and real*complex are vector*scalar and + >> scalar*vector. > For example, in the Ada programming + >> language, we have > function "*" (Left, Right : Complex) + >> return Complex; > function "*" (Left : Complex; Right : + >> Real'Base) return Complex; > function "*" (Left : + >> Real'Base; Right : Complex) return Complex; > showing + >> that Z*R and Z*W involve *different* functions. + >> + >> > It's worth noting that complex*real and real*complex + >> just require two > real multiplications, > no other + >> arithmetic operations, while complex*complex requires + >> four > real multiplications, > an addition, and a + >> subtraction. So implementing complex*real by > + >> conventing the real > to complex is inefficient (as well + >> as getting the finer points of IEEE > arithmetic wrong). + >> + >> I see your point. + >> + >> > As for complex division, getting that *right* in + >> floating-point is > fiendishly difficult (there are > + >> lots of algorithms out there and the majority of them + >> have serious flaws) > and woefully costly. + >> + >> > It's not unfair to characterise implementing + >> complex/real > by conversion to complex and doing + >> complex/complex as a > beginner's bungle. + >> + >> ouch! ... but still I tend to acknowledge your point, + >> incl the "not unfair" .. + >> + >> > There are good reasons why "double", "_Imaginary + >> double", and "_Complex double" > are distinct types in + >> standard C (as they are in Ada), + >> + >> interesting. OTOH, I think standard C did not have + >> strict standards about complex number storage etc in the + >> mid 1990s when R was created. + >> + >> > and the definition of multiplication > in G.5.1 para 2 + >> is *direct* (not via complex*complex). + >> + >> I see (did not know about) -- where can we find 'G.5.1 + >> para 2' + >> + >> > Now R has its own way of doing things, and if the + >> judgement of the R > maintainers is > that keeping the + >> "convert to a common type and then operate" model is > + >> more important > than getting good answers, well, it's + >> THEIR language, not mine. + >> + >> Well, it should also be the R community's language, where + >> we, the R core team, do most of the "base" work and also + >> emphasize guaranteeing long term stability. + >> + >> Personally, I think that "convert to a common type and + >> then operate" is a good rule and principle in many, even + >> most places and cases, but I hate it if humans should not + >> be allowed to break good rules for even better reasons + >> (but should rather behave like algorithms ..). + >> + >> This may well be a very good example of re-considering. + >> As mentioned above, e.g., I was not aware of the C + >> language standard being so specific here and different + >> than what we've been doing in R. + >> + >> + >> > But let's not pretend > that the answers are *right* in + >> any other sense. + >> + >> I think that's too strong -- Jeff's computation (just + >> here below) is showing one well defined sense of "right" + >> I'd say. (Still I know and agree the Inf * 0 |--> NaN + >> rule *is* sometimes undesirable) + >> + >> > On Fri, 6 Sept 2024 at 11:07, Jeff Newmiller via R-help + >> > wrote: + >> >> + >> >> atan(1i) -> 0 + Inf i >> complex(1/5) -> 0.2 + 0i >> + >> atan(1i) -> (0 + Inf i) * (0.2 + 0i) + -> 0*0.2 + 0*0i + Inf i * 0.2 + Inf i * 0i + >> >> infinity times zero is undefined + -> 0 + 0i + Inf i + NaN * i^2 0 + 0i + Inf i - NaN NaN + Inf + -> i + >> >> + >> >> I am not sure how complex arithmetic could arrive at + >> another answer. + >> >> + >> >> I advise against messing with infinities... use + >> atan2() if you don't actually need complex arithmetic. + >> >> + >> >> On September 5, 2024 3:38:33 PM PDT, Bert Gunter + >> wrote: >> >> complex(real = 0, + >> imaginary = Inf) >> >[1] 0+Infi + >> >> > + >> >> >> Inf*1i >> >[1] NaN+Infi + >> >> > + >> >> >>> complex(real = 0, imaginary = Inf)/5 >> >[1] + >> NaN+Infi + >> >> > + >> >> >See the Note in ?complex for the explanation, I + >> think. Duncan can correct >> >if I'm wrong. + >> >> > + >> >> >-- Bert + >> + >> [...................] + >> + >> Martin + >> + >> -- + >> Martin Maechler ETH Zurich and R Core team + + +From bog@@o@chr|@to|er @end|ng |rom gm@||@com Sat Sep 7 21:56:36 2024 +From: bog@@o@chr|@to|er @end|ng |rom gm@||@com (Christofer Bogaso) +Date: Sun, 8 Sep 2024 01:26:36 +0530 +Subject: [R] Reading a txt file from internet +Message-ID: + +Hi, + +I am trying to the data from +https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +without any success. Below is the error I am getting: + +> read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt') + +Error in make.names(col.names, unique = TRUE) : + + invalid multibyte string at 't' + +In addition: Warning messages: + +1: In read.table(file = file, header = header, sep = sep, quote = quote, : + + line 1 appears to contain embedded nulls + +2: In read.table(file = file, header = header, sep = sep, quote = quote, : + + line 2 appears to contain embedded nulls + +3: In read.table(file = file, header = header, sep = sep, quote = quote, : + + line 3 appears to contain embedded nulls + +4: In read.table(file = file, header = header, sep = sep, quote = quote, : + + line 4 appears to contain embedded nulls + +5: In read.table(file = file, header = header, sep = sep, quote = quote, : + + line 5 appears to contain embedded nulls + +Is there any way to read this data directly onto R? + +Thanks for your time + + +From bgunter@4567 @end|ng |rom gm@||@com Sat Sep 7 22:20:21 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Sat, 7 Sep 2024 13:20:21 -0700 +Subject: [R] Reading a txt file from internet +In-Reply-To: +References: +Message-ID: + +Well, this is frankly an unsatisfactory answer, as it does not try to deal +properly with the issues that you experienced, which I also did. However, +it's simple and works. As this is a small text file, +simply copy it in your browser to the clipboard, and then use: +thefile <- read.table(text = +"", header = TRUE) +either in an editor or directly at the prompt in the console. + +In fact here's the whole thing that you can just copy and paste from this +email: + +thefile <-read.table(text = +"time vendor metal +1 322 44.2 +2 317 44.3 +3 319 44.4 +4 323 43.4 +5 327 42.8 +6 328 44.3 +7 325 44.4 +8 326 44.8 +9 330 44.4 +10 334 43.1 +11 337 42.6 +12 341 42.4 +13 322 42.2 +14 318 41.8 +15 320 40.1 +16 326 42 +17 332 42.4 +18 334 43.1 +19 335 42.4 +20 336 43.1 +21 335 43.2 +22 338 42.8 +23 342 43 +24 348 42.8 +25 330 42.5 +26 326 42.6 +27 329 42.3 +28 337 42.9 +29 345 43.6 +30 350 44.7 +31 351 44.5 +32 354 45 +33 355 44.8 +34 357 44.9 +35 362 45.2 +36 368 45.2 +37 348 45 +38 345 45.5 +39 349 46.2 +40 355 46.8 +41 362 47.5 +42 367 48.3 +43 366 48.3 +44 370 49.1 +45 371 48.9 +46 375 49.4 +47 380 50 +48 385 50 +49 361 49.6 +50 354 49.9 +51 357 49.6 +52 367 50.7 +53 376 50.7 +54 381 50.9 +55 381 50.5 +56 383 51.2 +57 384 50.7 +58 387 50.3 +59 392 49.2 +60 396 48.1", header = TRUE) + +Cheers, +Bert + +On Sat, Sep 7, 2024 at 12:57?PM Christofer Bogaso < +bogaso.christofer at gmail.com> wrote: + +> Hi, +> +> I am trying to the data from +> +> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +> without any success. Below is the error I am getting: +> +> > read.delim(' +> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +> ') +> +> Error in make.names(col.names, unique = TRUE) : +> +> invalid multibyte string at 't' +> +> In addition: Warning messages: +> +> 1: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 1 appears to contain embedded nulls +> +> 2: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 2 appears to contain embedded nulls +> +> 3: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 3 appears to contain embedded nulls +> +> 4: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 4 appears to contain embedded nulls +> +> 5: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 5 appears to contain embedded nulls +> +> Is there any way to read this data directly onto R? +> +> Thanks for your time +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + [[alternative HTML version deleted]] + + +From |kw@|mmo @end|ng |rom gm@||@com Sat Sep 7 22:20:59 2024 +From: |kw@|mmo @end|ng |rom gm@||@com (Iris Simmons) +Date: Sat, 7 Sep 2024 16:20:59 -0400 +Subject: [R] Reading a txt file from internet +In-Reply-To: +References: +Message-ID: + +That looks like a UTF-16LE byte order mark. Simply open the connection +with the proper encoding: + +read.delim( + 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', + fileEncoding = "UTF-16LE" +) + +On Sat, Sep 7, 2024 at 3:57?PM Christofer Bogaso + wrote: +> +> Hi, +> +> I am trying to the data from +> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +> without any success. Below is the error I am getting: +> +> > read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt') +> +> Error in make.names(col.names, unique = TRUE) : +> +> invalid multibyte string at 't' +> +> In addition: Warning messages: +> +> 1: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 1 appears to contain embedded nulls +> +> 2: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 2 appears to contain embedded nulls +> +> 3: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 3 appears to contain embedded nulls +> +> 4: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 4 appears to contain embedded nulls +> +> 5: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 5 appears to contain embedded nulls +> +> Is there any way to read this data directly onto R? +> +> Thanks for your time +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + + +From e@ @end|ng |rom enr|co@chum@nn@net Sat Sep 7 22:22:23 2024 +From: e@ @end|ng |rom enr|co@chum@nn@net (Enrico Schumann) +Date: Sat, 07 Sep 2024 22:22:23 +0200 +Subject: [R] Reading a txt file from internet +In-Reply-To: + (Christofer Bogaso's message of "Sun, 8 Sep 2024 01:26:36 +0530") +References: +Message-ID: <8734mb1af4.fsf@enricoschumann.net> + +On Sun, 08 Sep 2024, Christofer Bogaso writes: + +> Hi, +> +> I am trying to the data from +> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +> without any success. Below is the error I am getting: +> +>> read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt') +> +> Error in make.names(col.names, unique = TRUE) : +> +> invalid multibyte string at 't' +> +> In addition: Warning messages: +> +> 1: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 1 appears to contain embedded nulls +> +> 2: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 2 appears to contain embedded nulls +> +> 3: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 3 appears to contain embedded nulls +> +> 4: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 4 appears to contain embedded nulls +> +> 5: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 5 appears to contain embedded nulls +> +> Is there any way to read this data directly onto R? +> +> Thanks for your time +> + +The looks like a byte-order mark +(https://en.wikipedia.org/wiki/Byte_order_mark). +Try this: + + fn <- file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', + encoding = "UTF-16LE") + read.delim(fn) + +-- +Enrico Schumann +Lucerne, Switzerland +https://enricoschumann.net + + +From jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ Sat Sep 7 22:40:34 2024 +From: jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ (Jeff Newmiller) +Date: Sat, 07 Sep 2024 13:40:34 -0700 +Subject: [R] Reading a txt file from internet +In-Reply-To: +References: +Message-ID: <9983BE81-B175-42CC-8479-66004BA83A64@dcn.davis.ca.us> + +Add the + + fileEncoding = "UTF-16" + +argument to the read call. + +For a human explanation of why this is going on I recommend [1]. For a more R-related take, try [2]. + +For reference, I downloaded your file and used the "file" command line program typically available on Linux (and possibly MacOSX) which will tell you about what encoding is used in a particular file. + +[1] https://www.youtube.com/watch?v=4mRxIgu9R70 +[2] https://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/ + +On September 7, 2024 12:56:36 PM PDT, Christofer Bogaso wrote: +>Hi, +> +>I am trying to the data from +>https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +>without any success. Below is the error I am getting: +> +>> read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt') +> +>Error in make.names(col.names, unique = TRUE) : +> +> invalid multibyte string at 't' +> +>In addition: Warning messages: +> +>1: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 1 appears to contain embedded nulls +> +>2: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 2 appears to contain embedded nulls +> +>3: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 3 appears to contain embedded nulls +> +>4: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 4 appears to contain embedded nulls +> +>5: In read.table(file = file, header = header, sep = sep, quote = quote, : +> +> line 5 appears to contain embedded nulls +> +>Is there any way to read this data directly onto R? +> +>Thanks for your time +> +>______________________________________________ +>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>https://stat.ethz.ch/mailman/listinfo/r-help +>PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +>and provide commented, minimal, self-contained, reproducible code. + +-- +Sent from my phone. Please excuse my brevity. + + +From bgunter@4567 @end|ng |rom gm@||@com Sat Sep 7 22:44:47 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Sat, 7 Sep 2024 13:44:47 -0700 +Subject: [R] Reading a txt file from internet +In-Reply-To: +References: + +Message-ID: + +Ha, the proper answer! +Thanks for this, Iris. I followed up by consulting the Wikipedia "byte +order mark" entry and learned something I knew nothing about. + +FWIW, if I had simply searched on t it would have immediately led +me to BOMs. + +Best, +Bert + +On Sat, Sep 7, 2024 at 1:30?PM Iris Simmons wrote: + +> That looks like a UTF-16LE byte order mark. Simply open the connection +> with the proper encoding: +> +> read.delim( +> ' +> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +> ', +> fileEncoding = "UTF-16LE" +> ) +> +> On Sat, Sep 7, 2024 at 3:57?PM Christofer Bogaso +> wrote: +> > +> > Hi, +> > +> > I am trying to the data from +> > +> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +> > without any success. Below is the error I am getting: +> > +> > > read.delim(' +> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +> ') +> > +> > Error in make.names(col.names, unique = TRUE) : +> > +> > invalid multibyte string at 't' +> > +> > In addition: Warning messages: +> > +> > 1: In read.table(file = file, header = header, sep = sep, quote = +> quote, : +> > +> > line 1 appears to contain embedded nulls +> > +> > 2: In read.table(file = file, header = header, sep = sep, quote = +> quote, : +> > +> > line 2 appears to contain embedded nulls +> > +> > 3: In read.table(file = file, header = header, sep = sep, quote = +> quote, : +> > +> > line 3 appears to contain embedded nulls +> > +> > 4: In read.table(file = file, header = header, sep = sep, quote = +> quote, : +> > +> > line 4 appears to contain embedded nulls +> > +> > 5: In read.table(file = file, header = header, sep = sep, quote = +> quote, : +> > +> > line 5 appears to contain embedded nulls +> > +> > Is there any way to read this data directly onto R? +> > +> > Thanks for your time +> > +> > ______________________________________________ +> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> > https://stat.ethz.ch/mailman/listinfo/r-help +> > PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> > and provide commented, minimal, self-contained, reproducible code. +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + [[alternative HTML version deleted]] + + +From jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ Sat Sep 7 22:52:31 2024 +From: jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ (Jeff Newmiller) +Date: Sat, 07 Sep 2024 13:52:31 -0700 +Subject: [R] Reading a txt file from internet +In-Reply-To: <8734mb1af4.fsf@enricoschumann.net> +References: + <8734mb1af4.fsf@enricoschumann.net> +Message-ID: <66E77559-A57E-4273-B8E6-E9B433ECDE43@dcn.davis.ca.us> + +When you specify LE in the encoding type, you are logically telling the decoder that you know the two-byte pairs are in little-endian order... which could override whatever the byte-order-mark was indicating. If the BOM indicated big-endian then the file decoding would break. If there is a BOM, don't override it unless you have to (e.g. for a wrong BOM)... leave off the LE unless you really need it. + +On September 7, 2024 1:22:23 PM PDT, Enrico Schumann wrote: +>On Sun, 08 Sep 2024, Christofer Bogaso writes: +> +>> Hi, +>> +>> I am trying to the data from +>> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +>> without any success. Below is the error I am getting: +>> +>>> read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt') +>> +>> Error in make.names(col.names, unique = TRUE) : +>> +>> invalid multibyte string at 't' +>> +>> In addition: Warning messages: +>> +>> 1: In read.table(file = file, header = header, sep = sep, quote = quote, : +>> +>> line 1 appears to contain embedded nulls +>> +>> 2: In read.table(file = file, header = header, sep = sep, quote = quote, : +>> +>> line 2 appears to contain embedded nulls +>> +>> 3: In read.table(file = file, header = header, sep = sep, quote = quote, : +>> +>> line 3 appears to contain embedded nulls +>> +>> 4: In read.table(file = file, header = header, sep = sep, quote = quote, : +>> +>> line 4 appears to contain embedded nulls +>> +>> 5: In read.table(file = file, header = header, sep = sep, quote = quote, : +>> +>> line 5 appears to contain embedded nulls +>> +>> Is there any way to read this data directly onto R? +>> +>> Thanks for your time +>> +> +>The looks like a byte-order mark +>(https://en.wikipedia.org/wiki/Byte_order_mark). +>Try this: +> +> fn <- file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +> encoding = "UTF-16LE") +> read.delim(fn) +> + +-- +Sent from my phone. Please excuse my brevity. + + +From murdoch@dunc@n @end|ng |rom gm@||@com Sat Sep 7 23:43:24 2024 +From: murdoch@dunc@n @end|ng |rom gm@||@com (Duncan Murdoch) +Date: Sat, 7 Sep 2024 17:43:24 -0400 +Subject: [R] Reading a txt file from internet +In-Reply-To: <66E77559-A57E-4273-B8E6-E9B433ECDE43@dcn.davis.ca.us> +References: + <8734mb1af4.fsf@enricoschumann.net> + <66E77559-A57E-4273-B8E6-E9B433ECDE43@dcn.davis.ca.us> +Message-ID: <54c12472-4017-4e09-ab1a-2df1c6b12ecb@gmail.com> + +On 2024-09-07 4:52 p.m., Jeff Newmiller via R-help wrote: +> When you specify LE in the encoding type, you are logically telling the decoder that you know the two-byte pairs are in little-endian order... which could override whatever the byte-order-mark was indicating. If the BOM indicated big-endian then the file decoding would break. If there is a BOM, don't override it unless you have to (e.g. for a wrong BOM)... leave off the LE unless you really need it. + +That sounds like good advice, but it doesn't work: + + > read.delim( + + 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files +/ch15/employee.txt', + + fileEncoding = "UTF-16" + + ) + [1] time + + + + + + + + + + + + + + + [2] +vendor.?????........??........?.??........?.??.?..?.....?..?..?...?.?..?..?...?.??....?...?.??. + +and so on. +> +> On September 7, 2024 1:22:23 PM PDT, Enrico Schumann wrote: +>> On Sun, 08 Sep 2024, Christofer Bogaso writes: +>> +>>> Hi, +>>> +>>> I am trying to the data from +>>> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +>>> without any success. Below is the error I am getting: +>>> +>>>> read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt') +>>> +>>> Error in make.names(col.names, unique = TRUE) : +>>> +>>> invalid multibyte string at 't' +>>> +>>> In addition: Warning messages: +>>> +>>> 1: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>> +>>> line 1 appears to contain embedded nulls +>>> +>>> 2: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>> +>>> line 2 appears to contain embedded nulls +>>> +>>> 3: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>> +>>> line 3 appears to contain embedded nulls +>>> +>>> 4: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>> +>>> line 4 appears to contain embedded nulls +>>> +>>> 5: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>> +>>> line 5 appears to contain embedded nulls +>>> +>>> Is there any way to read this data directly onto R? +>>> +>>> Thanks for your time +>>> +>> +>> The looks like a byte-order mark +>> (https://en.wikipedia.org/wiki/Byte_order_mark). +>> Try this: +>> +>> fn <- file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>> encoding = "UTF-16LE") +>> read.delim(fn) +>> +> + + +From jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ Sun Sep 8 01:37:50 2024 +From: jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ (Jeff Newmiller) +Date: Sat, 07 Sep 2024 16:37:50 -0700 +Subject: [R] Reading a txt file from internet +In-Reply-To: <54c12472-4017-4e09-ab1a-2df1c6b12ecb@gmail.com> +References: + <8734mb1af4.fsf@enricoschumann.net> + <66E77559-A57E-4273-B8E6-E9B433ECDE43@dcn.davis.ca.us> + <54c12472-4017-4e09-ab1a-2df1c6b12ecb@gmail.com> +Message-ID: <663B47CF-365D-4C12-8B9E-4BDDF4F10E05@dcn.davis.ca.us> + +I tried it on R 4.4.1 on Linux Mint 21.3 just before I posted it, and I just tried it on R 3.4.2 on Ubuntu 16.04 and R 4.3.2 on Windows 11 just now and it works on all of them. + +I don't have a big-endian machine to test on, but the Unicode spec says to honor the BOM and if there isn't one to assume that it is big-endian data. But in this case there is a BOM so your machine has a buggy decoder? + +On September 7, 2024 2:43:24 PM PDT, Duncan Murdoch wrote: +>On 2024-09-07 4:52 p.m., Jeff Newmiller via R-help wrote: +>> When you specify LE in the encoding type, you are logically telling the decoder that you know the two-byte pairs are in little-endian order... which could override whatever the byte-order-mark was indicating. If the BOM indicated big-endian then the file decoding would break. If there is a BOM, don't override it unless you have to (e.g. for a wrong BOM)... leave off the LE unless you really need it. +> +>That sounds like good advice, but it doesn't work: +> +> > read.delim( +> + 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files /ch15/employee.txt', +> + fileEncoding = "UTF-16" +> + ) +> [1] time +> +> +> +> +> +> +> +> +> +> +> +> +> +> [2] vendor.?????........??........?.??........?.??.?..?.....?..?..?...?.?..?..?...?.??....?...?.??. +> +>and so on. +>> +>> On September 7, 2024 1:22:23 PM PDT, Enrico Schumann wrote: +>>> On Sun, 08 Sep 2024, Christofer Bogaso writes: +>>> +>>>> Hi, +>>>> +>>>> I am trying to the data from +>>>> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +>>>> without any success. Below is the error I am getting: +>>>> +>>>>> read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt') +>>>> +>>>> Error in make.names(col.names, unique = TRUE) : +>>>> +>>>> invalid multibyte string at 't' +>>>> +>>>> In addition: Warning messages: +>>>> +>>>> 1: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>> +>>>> line 1 appears to contain embedded nulls +>>>> +>>>> 2: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>> +>>>> line 2 appears to contain embedded nulls +>>>> +>>>> 3: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>> +>>>> line 3 appears to contain embedded nulls +>>>> +>>>> 4: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>> +>>>> line 4 appears to contain embedded nulls +>>>> +>>>> 5: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>> +>>>> line 5 appears to contain embedded nulls +>>>> +>>>> Is there any way to read this data directly onto R? +>>>> +>>>> Thanks for your time +>>>> +>>> +>>> The looks like a byte-order mark +>>> (https://en.wikipedia.org/wiki/Byte_order_mark). +>>> Try this: +>>> +>>> fn <- file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>>> encoding = "UTF-16LE") +>>> read.delim(fn) +>>> +>> +> + +-- +Sent from my phone. Please excuse my brevity. + + +From murdoch@dunc@n @end|ng |rom gm@||@com Sun Sep 8 11:05:41 2024 +From: murdoch@dunc@n @end|ng |rom gm@||@com (Duncan Murdoch) +Date: Sun, 8 Sep 2024 05:05:41 -0400 +Subject: [R] Reading a txt file from internet +In-Reply-To: <663B47CF-365D-4C12-8B9E-4BDDF4F10E05@dcn.davis.ca.us> +References: + <8734mb1af4.fsf@enricoschumann.net> + <66E77559-A57E-4273-B8E6-E9B433ECDE43@dcn.davis.ca.us> + <54c12472-4017-4e09-ab1a-2df1c6b12ecb@gmail.com> + <663B47CF-365D-4C12-8B9E-4BDDF4F10E05@dcn.davis.ca.us> +Message-ID: <0a263f2d-51c5-4900-b4ab-ec11d3edd821@gmail.com> + +On 2024-09-07 7:37 p.m., Jeff Newmiller wrote: +> I tried it on R 4.4.1 on Linux Mint 21.3 just before I posted it, and I just tried it on R 3.4.2 on Ubuntu 16.04 and R 4.3.2 on Windows 11 just now and it works on all of them. +> +> I don't have a big-endian machine to test on, but the Unicode spec says to honor the BOM and if there isn't one to assume that it is big-endian data. But in this case there is a BOM so your machine has a buggy decoder? + +Sounds like it! I did it on a Mac running R 4.4.1. + +Duncan Murdoch + +> +> On September 7, 2024 2:43:24 PM PDT, Duncan Murdoch wrote: +>> On 2024-09-07 4:52 p.m., Jeff Newmiller via R-help wrote: +>>> When you specify LE in the encoding type, you are logically telling the decoder that you know the two-byte pairs are in little-endian order... which could override whatever the byte-order-mark was indicating. If the BOM indicated big-endian then the file decoding would break. If there is a BOM, don't override it unless you have to (e.g. for a wrong BOM)... leave off the LE unless you really need it. +>> +>> That sounds like good advice, but it doesn't work: +>> +>>> read.delim( +>> + 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files /ch15/employee.txt', +>> + fileEncoding = "UTF-16" +>> + ) +>> [1] time +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> [2] vendor.?????........??........?.??........?.??.?..?.....?..?..?...?.?..?..?...?.??....?...?.??. +>> +>> and so on. +>>> +>>> On September 7, 2024 1:22:23 PM PDT, Enrico Schumann wrote: +>>>> On Sun, 08 Sep 2024, Christofer Bogaso writes: +>>>> +>>>>> Hi, +>>>>> +>>>>> I am trying to the data from +>>>>> https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt +>>>>> without any success. Below is the error I am getting: +>>>>> +>>>>>> read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt') +>>>>> +>>>>> Error in make.names(col.names, unique = TRUE) : +>>>>> +>>>>> invalid multibyte string at 't' +>>>>> +>>>>> In addition: Warning messages: +>>>>> +>>>>> 1: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 1 appears to contain embedded nulls +>>>>> +>>>>> 2: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 2 appears to contain embedded nulls +>>>>> +>>>>> 3: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 3 appears to contain embedded nulls +>>>>> +>>>>> 4: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 4 appears to contain embedded nulls +>>>>> +>>>>> 5: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 5 appears to contain embedded nulls +>>>>> +>>>>> Is there any way to read this data directly onto R? +>>>>> +>>>>> Thanks for your time +>>>>> +>>>> +>>>> The looks like a byte-order mark +>>>> (https://en.wikipedia.org/wiki/Byte_order_mark). +>>>> Try this: +>>>> +>>>> fn <- file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>>>> encoding = "UTF-16LE") +>>>> read.delim(fn) +>>>> +>>> +>> +> + + +From tebert @end|ng |rom u||@edu Sun Sep 8 13:47:57 2024 +From: tebert @end|ng |rom u||@edu (Ebert,Timothy Aaron) +Date: Sun, 8 Sep 2024 11:47:57 +0000 +Subject: [R] Reading a txt file from internet +In-Reply-To: <0a263f2d-51c5-4900-b4ab-ec11d3edd821@gmail.com> +References: + <8734mb1af4.fsf@enricoschumann.net> + <66E77559-A57E-4273-B8E6-E9B433ECDE43@dcn.davis.ca.us> + <54c12472-4017-4e09-ab1a-2df1c6b12ecb@gmail.com> + <663B47CF-365D-4C12-8B9E-4BDDF4F10E05@dcn.davis.ca.us> + <0a263f2d-51c5-4900-b4ab-ec11d3edd821@gmail.com> +Message-ID: + +Say that you have several files from different places or times and you wanted to run your program on all of them without reprogramming. You could start with the readr package and use guess_encoding. +j <- 1 +for (i in file_paths){ + file_encoding[j] <- as.character(readr::guess_encoding(i)$encoding) + j=j+1 +} + +With the encoding of each file, you can combine file_paths and file_encoding and then break this into multiple data frames based on encoding. Read all the data, reformat for consistency, and then combine them. + +More simply, you could just guess_encoding() on one file just to see what it might be like. It gives you a name like UTF-16LE that you can then use in the encoding statement as others have already shown. + + +Tim + + + +-----Original Message----- +From: R-help On Behalf Of Duncan Murdoch +Sent: Sunday, September 8, 2024 5:06 AM +To: Jeff Newmiller ; r-help at r-project.org; Enrico Schumann ; Christofer Bogaso +Subject: Re: [R] Reading a txt file from internet + +[External Email] + +On 2024-09-07 7:37 p.m., Jeff Newmiller wrote: +> I tried it on R 4.4.1 on Linux Mint 21.3 just before I posted it, and I just tried it on R 3.4.2 on Ubuntu 16.04 and R 4.3.2 on Windows 11 just now and it works on all of them. +> +> I don't have a big-endian machine to test on, but the Unicode spec says to honor the BOM and if there isn't one to assume that it is big-endian data. But in this case there is a BOM so your machine has a buggy decoder? + +Sounds like it! I did it on a Mac running R 4.4.1. + +Duncan Murdoch + +> +> On September 7, 2024 2:43:24 PM PDT, Duncan Murdoch wrote: +>> On 2024-09-07 4:52 p.m., Jeff Newmiller via R-help wrote: +>>> When you specify LE in the encoding type, you are logically telling the decoder that you know the two-byte pairs are in little-endian order... which could override whatever the byte-order-mark was indicating. If the BOM indicated big-endian then the file decoding would break. If there is a BOM, don't override it unless you have to (e.g. for a wrong BOM)... leave off the LE unless you really need it. +>> +>> That sounds like good advice, but it doesn't work: +>> +>>> read.delim( +>> + 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files /ch15/employee.txt', +>> + fileEncoding = "UTF-16" +>> + ) +>> [1] time +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> +>> [2] vendor.?????........??........?.??........?.??.?..?.....?..?..?...?.?..?..?...?.??....?...?.??. +>> +>> and so on. +>>> +>>> On September 7, 2024 1:22:23 PM PDT, Enrico Schumann wrote: +>>>> On Sun, 08 Sep 2024, Christofer Bogaso writes: +>>>> +>>>>> Hi, +>>>>> +>>>>> I am trying to the data from +>>>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F +>>>>> online.stat.psu.edu%2Fonlinecourses%2Fsites%2Fstat501%2Ffiles%2Fch +>>>>> 15%2Femployee.txt&data=05%7C02%7Ctebert%40ufl.edu%7C07d806c97fa945 +>>>>> f64baf08dccfe57631%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C63 +>>>>> 8613831690785878%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI +>>>>> joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=icg%2 +>>>>> BW984cnNyT1XEjXU8HA%2B%2Bm0euoDQjblE4gsdFl4c%3D&reserved=0 +>>>>> without any success. Below is the error I am getting: +>>>>> +>>>>>> read.delim('http://h/ +>>>>>> ttps%3A%2F%2Fonline.stat.psu.edu%2Fonlinecourses%2Fsites%2Fstat50 +>>>>>> 1%2Ffiles%2Fch15%2Femployee.txt&data=05%7C02%7Ctebert%40ufl.edu%7 +>>>>>> C07d806c97fa945f64baf08dccfe57631%7C0d4da0f84a314d76ace60a62331e1 +>>>>>> b84%7C0%7C0%7C638613831690788947%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi +>>>>>> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C +>>>>>> %7C%7C&sdata=CKXMTrJdSo%2F%2FADPixS7XKkliVcETjbxLq0X2BSseCe4%3D&r +>>>>>> eserved=0') +>>>>> +>>>>> Error in make.names(col.names, unique = TRUE) : +>>>>> +>>>>> invalid multibyte string at 't' +>>>>> +>>>>> In addition: Warning messages: +>>>>> +>>>>> 1: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 1 appears to contain embedded nulls +>>>>> +>>>>> 2: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 2 appears to contain embedded nulls +>>>>> +>>>>> 3: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 3 appears to contain embedded nulls +>>>>> +>>>>> 4: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 4 appears to contain embedded nulls +>>>>> +>>>>> 5: In read.table(file = file, header = header, sep = sep, quote = quote, : +>>>>> +>>>>> line 5 appears to contain embedded nulls +>>>>> +>>>>> Is there any way to read this data directly onto R? +>>>>> +>>>>> Thanks for your time +>>>>> +>>>> +>>>> The looks like a byte-order mark +>>>> (https://en.wikipedia.org/wiki/Byte_order_mark). +>>>> Try this: +>>>> +>>>> fn <- file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>>>> encoding = "UTF-16LE") +>>>> read.delim(fn) +>>>> +>>> +>> +> + +______________________________________________ +R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +https://stat.ethz.ch/mailman/listinfo/r-help +PLEASE do read the posting guide https://www.r-project.org/posting-guide.html +and provide commented, minimal, self-contained, reproducible code. + +From @@e||ck @end|ng |rom gm@||@com Wed Sep 11 15:44:05 2024 +From: @@e||ck @end|ng |rom gm@||@com (stephen sefick) +Date: Wed, 11 Sep 2024 09:44:05 -0400 +Subject: [R] Stop or limit to console printing large list and so on +Message-ID: + +Hello All: + +Background/setup: +Editor Emacs using "isend" to send code to shell in another window. This is +on Linux. I can share more of the setup if others would find it useful, but +I suspect this is an option I am unaware of. + +Problem: +I am having a problem with accidentally typing an object name at the +console that is a very large list and then having to wait for it to be +printed until I can resume my work. + +Is there a way that I have missed to stem this default behavior? +Kindest regards, + +Stephen Sefick + + [[alternative HTML version deleted]] + + +From |kry|ov @end|ng |rom d|@root@org Wed Sep 11 17:47:47 2024 +From: |kry|ov @end|ng |rom d|@root@org (Ivan Krylov) +Date: Wed, 11 Sep 2024 18:47:47 +0300 +Subject: [R] Stop or limit to console printing large list and so on +In-Reply-To: +References: +Message-ID: <20240911184747.4210bf09@arachnoid> + +? Wed, 11 Sep 2024 09:44:05 -0400 +stephen sefick ?????: + +> I am having a problem with accidentally typing an object name at the +> console that is a very large list and then having to wait for it to be +> printed until I can resume my work. + +Does it help to interrupt the process? + +https://www.gnu.org/software/emacs/manual/html_mono/emacs.html#index-C_002dc-C_002dc-_0028Shell-mode_0029 +https://ess.r-project.org/Manual/ess.html#index-interrupting-R-commands + +I'm afraid that the behaviour of the print() method is very +class-dependent and limiting options(max.print=...) may not help in +your case. + +-- +Best regards, +Ivan + + +From th|erry@onke||nx @end|ng |rom |nbo@be Wed Sep 11 17:55:53 2024 +From: th|erry@onke||nx @end|ng |rom |nbo@be (Thierry Onkelinx) +Date: Wed, 11 Sep 2024 17:55:53 +0200 +Subject: [R] non-conformable arrays +Message-ID: + +Dear all, + +I'm puzzled by this error. When running tcrossprod() within the function it +returns the error message. The code also outputs the object a and b. +Running the tcrossprod() outside of the function works as expected. + + cat("a <-") + dput(a) + cat("b <-") + dput(b) + cat("tcrossprod(a, b)") + tcrossprod(a, b) + +a <-structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, +0, 0, 1), dim = c(10L, 2L), dimnames = list(c("9", "13", "21", +"29", "30", "37", "52", "53", "56", "70"), +c("tmp2028c70ae152b4c63bb7ab902158b408366217581", +"tmp2028c70ae152b4c63bb7ab902158b408366217582")), assign = c(1L, +1L), contrasts = list(tmp2028c70ae152b4c63bb7ab902158b40836621758 = +"contr.treatment")) +b <-structure(c(-0.916362039446752, -0.849801808879291, -0.744535398206787, +0.875896407785924, 0.822587420283086, 0.894210774042389), dim = 3:2) +tcrossprod(a, b) + +For those how like a fully reproducible example: +the offending line in the code: +https://github.com/inbo/multimput/blob/e1cd0cdff7d2868e4101c411f7508301c7be7482/R/impute_glmermod.R#L65 +a (failing) unit test for the code: +https://github.com/inbo/multimput/blob/e1cd0cdff7d2868e4101c411f7508301c7be7482/tests/testthat/test_ccc_hurdle_impute.R#L10 + +Best regards, + +ir. Thierry Onkelinx +Statisticus / Statistician + +Vlaamse Overheid / Government of Flanders +INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND +FOREST +Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance +thierry.onkelinx at inbo.be +Havenlaan 88 bus 73, 1000 Brussel +*Postadres:* Koning Albert II-laan 15 bus 186, 1210 Brussel +*Poststukken die naar dit adres worden gestuurd, worden ingescand en +digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar +dossiers volledig digitaal behandelen. Poststukken met de vermelding +?vertrouwelijk? worden niet ingescand, maar ongeopend aan de geadresseerde +bezorgd.* +www.inbo.be + +/////////////////////////////////////////////////////////////////////////////////////////// +To call in the statistician after the experiment is done may be no more +than asking him to perform a post-mortem examination: he may be able to say +what the experiment died of. ~ Sir Ronald Aylmer Fisher +The plural of anecdote is not data. ~ Roger Brinner +The combination of some data and an aching desire for an answer does not +ensure that a reasonable answer can be extracted from a given body of data. +~ John Tukey +/////////////////////////////////////////////////////////////////////////////////////////// + + + + [[alternative HTML version deleted]] + + +From @@e||ck @end|ng |rom gm@||@com Wed Sep 11 18:21:53 2024 +From: @@e||ck @end|ng |rom gm@||@com (stephen sefick) +Date: Wed, 11 Sep 2024 12:21:53 -0400 +Subject: [R] Stop or limit to console printing large list and so on +In-Reply-To: <20240911184747.4210bf09@arachnoid> +References: + <20240911184747.4210bf09@arachnoid> +Message-ID: + +No an interrupt does not help, unfortunately. + +I'll just try be more careful. + +Stephen Sefick + +On Wed, Sep 11, 2024, 11:47 Ivan Krylov wrote: + +> ? Wed, 11 Sep 2024 09:44:05 -0400 +> stephen sefick ?????: +> +> > I am having a problem with accidentally typing an object name at the +> > console that is a very large list and then having to wait for it to be +> > printed until I can resume my work. +> +> Does it help to interrupt the process? +> +> +> https://www.gnu.org/software/emacs/manual/html_mono/emacs.html#index-C_002dc-C_002dc-_0028Shell-mode_0029 +> https://ess.r-project.org/Manual/ess.html#index-interrupting-R-commands +> +> I'm afraid that the behaviour of the print() method is very +> class-dependent and limiting options(max.print=...) may not help in +> your case. +> +> -- +> Best regards, +> Ivan +> + + [[alternative HTML version deleted]] + + +From pd@|gd @end|ng |rom gm@||@com Wed Sep 11 18:41:02 2024 +From: pd@|gd @end|ng |rom gm@||@com (peter dalgaard) +Date: Wed, 11 Sep 2024 18:41:02 +0200 +Subject: [R] non-conformable arrays +In-Reply-To: +References: +Message-ID: <02692EEA-454B-4515-9FDB-81BF44C12AF7@gmail.com> + +Hum... Two points: You are using |> a lot and tcrossprod() is a primitive, so ignores argnames. This makes me suspicious that things like + +... ) |> tcrossprod(x = mm) + +might not do what you think it does. + +E.g., + +> a <- matrix(1,10,2) +> b <- matrix(1,3,2) +> tcrossprod(y=b, x=a) + [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] +[1,] 2 2 2 2 2 2 2 2 2 2 +[2,] 2 2 2 2 2 2 2 2 2 2 +[3,] 2 2 2 2 2 2 2 2 2 2 +> tcrossprod(x=a, y=b) + [,1] [,2] [,3] + [1,] 2 2 2 + [2,] 2 2 2 + [3,] 2 2 2 + [4,] 2 2 2 + [5,] 2 2 2 + [6,] 2 2 2 + [7,] 2 2 2 + [8,] 2 2 2 + [9,] 2 2 2 +[10,] 2 2 2 + +-pd + +> On 11 Sep 2024, at 17:55 , Thierry Onkelinx wrote: +> +> Dear all, +> +> I'm puzzled by this error. When running tcrossprod() within the function it +> returns the error message. The code also outputs the object a and b. +> Running the tcrossprod() outside of the function works as expected. +> +> cat("a <-") +> dput(a) +> cat("b <-") +> dput(b) +> cat("tcrossprod(a, b)") +> tcrossprod(a, b) +> +> a <-structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, +> 0, 0, 1), dim = c(10L, 2L), dimnames = list(c("9", "13", "21", +> "29", "30", "37", "52", "53", "56", "70"), +> c("tmp2028c70ae152b4c63bb7ab902158b408366217581", +> "tmp2028c70ae152b4c63bb7ab902158b408366217582")), assign = c(1L, +> 1L), contrasts = list(tmp2028c70ae152b4c63bb7ab902158b40836621758 = +> "contr.treatment")) +> b <-structure(c(-0.916362039446752, -0.849801808879291, -0.744535398206787, +> 0.875896407785924, 0.822587420283086, 0.894210774042389), dim = 3:2) +> tcrossprod(a, b) +> +> For those how like a fully reproducible example: +> the offending line in the code: +> https://github.com/inbo/multimput/blob/e1cd0cdff7d2868e4101c411f7508301c7be7482/R/impute_glmermod.R#L65 +> a (failing) unit test for the code: +> https://github.com/inbo/multimput/blob/e1cd0cdff7d2868e4101c411f7508301c7be7482/tests/testthat/test_ccc_hurdle_impute.R#L10 +> +> Best regards, +> +> ir. Thierry Onkelinx +> Statisticus / Statistician +> +> Vlaamse Overheid / Government of Flanders +> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND +> FOREST +> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance +> thierry.onkelinx at inbo.be +> Havenlaan 88 bus 73, 1000 Brussel +> *Postadres:* Koning Albert II-laan 15 bus 186, 1210 Brussel +> *Poststukken die naar dit adres worden gestuurd, worden ingescand en +> digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar +> dossiers volledig digitaal behandelen. Poststukken met de vermelding +> ?vertrouwelijk? worden niet ingescand, maar ongeopend aan de geadresseerde +> bezorgd.* +> www.inbo.be +> +> /////////////////////////////////////////////////////////////////////////////////////////// +> To call in the statistician after the experiment is done may be no more +> than asking him to perform a post-mortem examination: he may be able to say +> what the experiment died of. ~ Sir Ronald Aylmer Fisher +> The plural of anecdote is not data. ~ Roger Brinner +> The combination of some data and an aching desire for an answer does not +> ensure that a reasonable answer can be extracted from a given body of data. +> ~ John Tukey +> /////////////////////////////////////////////////////////////////////////////////////////// +> +> +> +> [[alternative HTML version deleted]] +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + +-- +Peter Dalgaard, Professor, +Center for Statistics, Copenhagen Business School +Solbjerg Plads 3, 2000 Frederiksberg, Denmark +Phone: (+45)38153501 +Office: A 4.23 +Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com + + +From |kry|ov @end|ng |rom d|@root@org Wed Sep 11 22:27:55 2024 +From: |kry|ov @end|ng |rom d|@root@org (Ivan Krylov) +Date: Wed, 11 Sep 2024 23:27:55 +0300 +Subject: [R] non-conformable arrays +In-Reply-To: +References: +Message-ID: <20240911232755.16850b0b@Tarkus> + +? Wed, 11 Sep 2024 17:55:53 +0200 +Thierry Onkelinx ?????: + +> For those how like a fully reproducible example: +> the offending line in the code: +> https://github.com/inbo/multimput/blob/e1cd0cdff7d2868e4101c411f7508301c7be7482/R/impute_glmermod.R#L65 +> a (failing) unit test for the code: +> https://github.com/inbo/multimput/blob/e1cd0cdff7d2868e4101c411f7508301c7be7482/tests/testthat/test_ccc_hurdle_impute.R#L10 + +Setting options(error = recover) and evaluating your test expression +directly using +eval(parse('testthat/test_ccc_hurdle_impute.R')[[1]][[3]]), I see: + +Browse[1]> names(random) +[1] "Year" +Browse[1]> paste("~0 + ", 'Year') |> ++ as.formula() |> ++ model.matrix(data = data[missing_obs,]) |> ++ str() + num [1:68, 1] 7 6 9 4 3 5 10 1 6 8 ... + - attr(*, "dimnames")=List of 2 + ..$ : chr [1:68] "37" "56" "59" "114" ... + ..$ : chr "Year" + - attr(*, "assign")= int 1 +Browse[1]> t(random[['Year']]) |> str() + num [1:19, 1:10] 0.175 0.181 0.102 0.119 0.158 ... + +...and they are indeed non-conformable. Why is random$Year a matrix? + +-- +Best regards, +Ivan + + +From |ern@nd_@rce @end|ng |rom y@hoo@e@ Wed Sep 11 16:25:03 2024 +From: |ern@nd_@rce @end|ng |rom y@hoo@e@ (Fer Arce) +Date: Wed, 11 Sep 2024 16:25:03 +0200 +Subject: [R] Stop or limit to console printing large list and so on +In-Reply-To: +References: +Message-ID: + +Hello Stephen. +I am not sure of the exact details of your problem, but following the +second part of your e-mail, if you accidentally print a large object in +the console and do not want to wait (i.e. you want to stop printing), +just press C-c C-c and it will stop it (it will stop any process +happening in the console, the same if you send a loong loop and want to +abort...) +cheers +F. + +On 9/11/24 15:44, stephen sefick wrote: +> Stephen Sefick + + +From |r@nce@c@@p@ncotto @end|ng |rom gm@||@com Thu Sep 12 09:42:57 2024 +From: |r@nce@c@@p@ncotto @end|ng |rom gm@||@com (Francesca) +Date: Thu, 12 Sep 2024 09:42:57 +0200 +Subject: [R] "And" condition spanning over multiple columns in data frame +Message-ID: + +Dear contributors, +I need to create a set of columns, based on conditions of a dataframe as +follows. +I have managed to do the trick for one column, but I do not seem to find +any good example where the condition is extended to all the dataframe. + +I have these dataframe called c10Dt: + + + +id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 cp11 cp12 +1 1 NA NA NA NA NA NA NA NA NA NA NA NA +2 4 8 18 15 10 12 11 9 18 8 16 15 NA +3 3 8 5 5 4 NA 5 NA 6 NA 10 10 10 +4 3 5 5 4 4 3 2 1 3 2 1 1 2 +5 1 NA NA NA NA NA NA NA NA NA NA NA NA +6 2 5 5 10 10 9 10 10 10 NA 10 9 10 +-- + +Columns are id, cp1, cp2.. and so on. + +What I need to do is the following, made on just one column: + +c10Dt <- mutate(c10Dt, exit1= ifelse(is.na(cp1) & id!=1, 1, 0)) + +So, I create a new variable, called exit1, in which the program selects +cp1, checks if it is NA, and if it is NA but also the value of the column +"id" is not 1, then it gives back a 1, otherwise 0. +So, what I want is that it selects all the cases in which the id=2,3, or 4 +is not NA in the corresponding values of the matrix. +I managed to do it manually column by column, but I feel there should be +something smarter here. + +The problem is that I need to replicate this over all the columns from cp2, +to cp12, but keeping fixed the id column instead. + +I have tried with + +c10Dt %>% + mutate(x=across(starts_with("cp"), ~ifelse(. == NA)) & id!=1,1,0 ) + +but the problem with across is that it will implement the condition only on +cp_ columns. How do I tell R to use the column id with all the other +columns? + + +Thanks for any help provided. + + +Francesca + + +---------------------------------- + + [[alternative HTML version deleted]] + + +From |kry|ov @end|ng |rom d|@root@org Thu Sep 12 10:42:43 2024 +From: |kry|ov @end|ng |rom d|@root@org (Ivan Krylov) +Date: Thu, 12 Sep 2024 11:42:43 +0300 +Subject: [R] + "And" condition spanning over multiple columns in data frame +In-Reply-To: +References: +Message-ID: <20240912114243.62d9a757@arachnoid> + +? Thu, 12 Sep 2024 09:42:57 +0200 +Francesca ?????: + +> c10Dt <- mutate(c10Dt, exit1= ifelse(is.na(cp1) & id!=1, 1, 0)) + +> So, I create a new variable, called exit1, in which the program +> selects cp1, checks if it is NA, and if it is NA but also the value +> of the column "id" is not 1, then it gives back a 1, otherwise 0. +> So, what I want is that it selects all the cases in which the id=2,3, +> or 4 is not NA in the corresponding values of the matrix. + +Since all your columns except the first one are the desired "cp*" +columns, you can obtain your "exit" columns in bulk: + +( + c10Dt$id != 1 & # will be recycled column-wise, as we need + is.na(c10Dt[-1]) +) |> +# ...and then convert back into a data.frame, +as.data.frame() |> +# rename the columns... +(\(x) setNames(x, sub('cp', 'exit', names(x))))() |> +# ...and finally attach to the original data.frame +cbind(c10Dt) + +-- +Best regards, +Ivan + + +From er|cjberger @end|ng |rom gm@||@com Thu Sep 12 10:44:54 2024 +From: er|cjberger @end|ng |rom gm@||@com (Eric Berger) +Date: Thu, 12 Sep 2024 11:44:54 +0300 +Subject: [R] + "And" condition spanning over multiple columns in data frame +In-Reply-To: +References: +Message-ID: + +Hi, +To rephrase what you are trying to do, you want a copy of all the cp +columns, in which all the NAs become 1s and any other value becomes a +zero. There is an exception for the first row, where the NAs should +become 0s. + +a <- c10Dt +b <- matrix(as.numeric(is.na(a[,-1])), nrow=nrow(a)) +b[1,] <- 0 # first row gets special treatment +colnames(b) <- paste0("exit",1:ncol(b)) +d <- cbind(a,b) +d + + +On Thu, Sep 12, 2024 at 10:43?AM Francesca wrote: +> +> Dear contributors, +> I need to create a set of columns, based on conditions of a dataframe as +> follows. +> I have managed to do the trick for one column, but I do not seem to find +> any good example where the condition is extended to all the dataframe. +> +> I have these dataframe called c10Dt: +> +> +> +> id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 cp11 cp12 +> 1 1 NA NA NA NA NA NA NA NA NA NA NA NA +> 2 4 8 18 15 10 12 11 9 18 8 16 15 NA +> 3 3 8 5 5 4 NA 5 NA 6 NA 10 10 10 +> 4 3 5 5 4 4 3 2 1 3 2 1 1 2 +> 5 1 NA NA NA NA NA NA NA NA NA NA NA NA +> 6 2 5 5 10 10 9 10 10 10 NA 10 9 10 +> -- +> +> Columns are id, cp1, cp2.. and so on. +> +> What I need to do is the following, made on just one column: +> +> c10Dt <- mutate(c10Dt, exit1= ifelse(is.na(cp1) & id!=1, 1, 0)) +> +> So, I create a new variable, called exit1, in which the program selects +> cp1, checks if it is NA, and if it is NA but also the value of the column +> "id" is not 1, then it gives back a 1, otherwise 0. +> So, what I want is that it selects all the cases in which the id=2,3, or 4 +> is not NA in the corresponding values of the matrix. +> I managed to do it manually column by column, but I feel there should be +> something smarter here. +> +> The problem is that I need to replicate this over all the columns from cp2, +> to cp12, but keeping fixed the id column instead. +> +> I have tried with +> +> c10Dt %>% +> mutate(x=across(starts_with("cp"), ~ifelse(. == NA)) & id!=1,1,0 ) +> +> but the problem with across is that it will implement the condition only on +> cp_ columns. How do I tell R to use the column id with all the other +> columns? +> +> +> Thanks for any help provided. +> +> +> Francesca +> +> +> ---------------------------------- +> +> [[alternative HTML version deleted]] +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + + +From th|erry@onke||nx @end|ng |rom |nbo@be Thu Sep 12 14:48:31 2024 +From: th|erry@onke||nx @end|ng |rom |nbo@be (Thierry Onkelinx) +Date: Thu, 12 Sep 2024 14:48:31 +0200 +Subject: [R] non-conformable arrays +In-Reply-To: <02692EEA-454B-4515-9FDB-81BF44C12AF7@gmail.com> +References: + <02692EEA-454B-4515-9FDB-81BF44C12AF7@gmail.com> +Message-ID: + +Dear Peter, + +That was indeed this issue. Thanks for the feedback. + +Best regards, + +ir. Thierry Onkelinx +Statisticus / Statistician + +Vlaamse Overheid / Government of Flanders +INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND +FOREST +Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance +thierry.onkelinx at inbo.be +Havenlaan 88 bus 73, 1000 Brussel +*Postadres:* Koning Albert II-laan 15 bus 186, 1210 Brussel +*Poststukken die naar dit adres worden gestuurd, worden ingescand en +digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar +dossiers volledig digitaal behandelen. Poststukken met de vermelding +?vertrouwelijk? worden niet ingescand, maar ongeopend aan de geadresseerde +bezorgd.* +www.inbo.be + +/////////////////////////////////////////////////////////////////////////////////////////// +To call in the statistician after the experiment is done may be no more +than asking him to perform a post-mortem examination: he may be able to say +what the experiment died of. ~ Sir Ronald Aylmer Fisher +The plural of anecdote is not data. ~ Roger Brinner +The combination of some data and an aching desire for an answer does not +ensure that a reasonable answer can be extracted from a given body of data. +~ John Tukey +/////////////////////////////////////////////////////////////////////////////////////////// + + + + +Op wo 11 sep 2024 om 18:41 schreef peter dalgaard : + +> Hum... Two points: You are using |> a lot and tcrossprod() is a primitive, +> so ignores argnames. This makes me suspicious that things like +> +> ... ) |> tcrossprod(x = mm) +> +> might not do what you think it does. +> +> E.g., +> +> > a <- matrix(1,10,2) +> > b <- matrix(1,3,2) +> > tcrossprod(y=b, x=a) +> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] +> [1,] 2 2 2 2 2 2 2 2 2 2 +> [2,] 2 2 2 2 2 2 2 2 2 2 +> [3,] 2 2 2 2 2 2 2 2 2 2 +> > tcrossprod(x=a, y=b) +> [,1] [,2] [,3] +> [1,] 2 2 2 +> [2,] 2 2 2 +> [3,] 2 2 2 +> [4,] 2 2 2 +> [5,] 2 2 2 +> [6,] 2 2 2 +> [7,] 2 2 2 +> [8,] 2 2 2 +> [9,] 2 2 2 +> [10,] 2 2 2 +> +> -pd +> +> > On 11 Sep 2024, at 17:55 , Thierry Onkelinx +> wrote: +> > +> > Dear all, +> > +> > I'm puzzled by this error. When running tcrossprod() within the function +> it +> > returns the error message. The code also outputs the object a and b. +> > Running the tcrossprod() outside of the function works as expected. +> > +> > cat("a <-") +> > dput(a) +> > cat("b <-") +> > dput(b) +> > cat("tcrossprod(a, b)") +> > tcrossprod(a, b) +> > +> > a <-structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, +> > 0, 0, 1), dim = c(10L, 2L), dimnames = list(c("9", "13", "21", +> > "29", "30", "37", "52", "53", "56", "70"), +> > c("tmp2028c70ae152b4c63bb7ab902158b408366217581", +> > "tmp2028c70ae152b4c63bb7ab902158b408366217582")), assign = c(1L, +> > 1L), contrasts = list(tmp2028c70ae152b4c63bb7ab902158b40836621758 = +> > "contr.treatment")) +> > b <-structure(c(-0.916362039446752, -0.849801808879291, +> -0.744535398206787, +> > 0.875896407785924, 0.822587420283086, 0.894210774042389), dim = 3:2) +> > tcrossprod(a, b) +> > +> > For those how like a fully reproducible example: +> > the offending line in the code: +> > +> https://github.com/inbo/multimput/blob/e1cd0cdff7d2868e4101c411f7508301c7be7482/R/impute_glmermod.R#L65 +> > a (failing) unit test for the code: +> > +> https://github.com/inbo/multimput/blob/e1cd0cdff7d2868e4101c411f7508301c7be7482/tests/testthat/test_ccc_hurdle_impute.R#L10 +> > +> > Best regards, +> > +> > ir. Thierry Onkelinx +> > Statisticus / Statistician +> > +> > Vlaamse Overheid / Government of Flanders +> > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE +> AND +> > FOREST +> > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance +> > thierry.onkelinx at inbo.be +> > Havenlaan 88 bus 73, 1000 Brussel +> > *Postadres:* Koning Albert II-laan 15 bus 186, 1210 Brussel +> > *Poststukken die naar dit adres worden gestuurd, worden ingescand en +> > digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar +> > dossiers volledig digitaal behandelen. Poststukken met de vermelding +> > ?vertrouwelijk? worden niet ingescand, maar ongeopend aan de +> geadresseerde +> > bezorgd.* +> > www.inbo.be +> > +> > +> /////////////////////////////////////////////////////////////////////////////////////////// +> > To call in the statistician after the experiment is done may be no more +> > than asking him to perform a post-mortem examination: he may be able to +> say +> > what the experiment died of. ~ Sir Ronald Aylmer Fisher +> > The plural of anecdote is not data. ~ Roger Brinner +> > The combination of some data and an aching desire for an answer does not +> > ensure that a reasonable answer can be extracted from a given body of +> data. +> > ~ John Tukey +> > +> /////////////////////////////////////////////////////////////////////////////////////////// +> > +> > +> > +> > [[alternative HTML version deleted]] +> > +> > ______________________________________________ +> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> > https://stat.ethz.ch/mailman/listinfo/r-help +> > PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> > and provide commented, minimal, self-contained, reproducible code. +> +> -- +> Peter Dalgaard, Professor, +> Center for Statistics, Copenhagen Business School +> Solbjerg Plads 3, 2000 Frederiksberg, Denmark +> Phone: (+45)38153501 +> Office: A 4.23 +> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com +> +> + + [[alternative HTML version deleted]] + + +From ru|pb@rr@d@@ @end|ng |rom @@po@pt Thu Sep 12 16:36:32 2024 +From: ru|pb@rr@d@@ @end|ng |rom @@po@pt (Rui Barradas) +Date: Thu, 12 Sep 2024 15:36:32 +0100 +Subject: [R] + "And" condition spanning over multiple columns in data frame +In-Reply-To: +References: +Message-ID: <748281fa-6b29-42d4-a59a-d1f027aa5480@sapo.pt> + +?s 08:42 de 12/09/2024, Francesca escreveu: +> Dear contributors, +> I need to create a set of columns, based on conditions of a dataframe as +> follows. +> I have managed to do the trick for one column, but I do not seem to find +> any good example where the condition is extended to all the dataframe. +> +> I have these dataframe called c10Dt: +> +> +> +> id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 cp11 cp12 +> 1 1 NA NA NA NA NA NA NA NA NA NA NA NA +> 2 4 8 18 15 10 12 11 9 18 8 16 15 NA +> 3 3 8 5 5 4 NA 5 NA 6 NA 10 10 10 +> 4 3 5 5 4 4 3 2 1 3 2 1 1 2 +> 5 1 NA NA NA NA NA NA NA NA NA NA NA NA +> 6 2 5 5 10 10 9 10 10 10 NA 10 9 10 +> -- Columns are id, cp1, cp2.. and so on. What I need to do is the +> following, made on just one column: c10Dt <- mutate(c10Dt, exit1= +> ifelse(is.na(cp1) & id!=1, 1, 0)) So, I create a new variable, called +> exit1, in which the program selects cp1, checks if it is NA, and if it +> is NA but also the value of the column "id" is not 1, then it gives back +> a 1, otherwise 0. So, what I want is that it selects all the cases in +> which the id=2,3, or 4 is not NA in the corresponding values of the +> matrix. I managed to do it manually column by column, but I feel there +> should be something smarter here. The problem is that I need to +> replicate this over all the columns from cp2, to cp12, but keeping fixed +> the id column instead. I have tried with c10Dt %>% +> mutate(x=across(starts_with("cp"), ~ifelse(. == NA)) & id!=1,1,0 ) but +> the problem with across is that it will implement the condition only on +> cp_ columns. How do I tell R to use the column id with all the other +> columns? Thanks for any help provided. Francesca +> ---------------------------------- + +Hello, + +Something like this? + +1. If an ifelse instruction is meant to create a binary result, coerce +the logical condition to integer instead. You can make it more clear by +substituting as.integer for the plus sign below; +2. the .names argument is used to create new columns and keeping the +original ones. + + + +df1 <- read.table(text = "id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 +cp11 cp12 +1 1 NA NA NA NA NA NA NA NA NA NA NA NA +2 4 8 18 15 10 12 11 9 18 8 16 15 NA +3 3 8 5 5 4 NA 5 NA 6 NA 10 10 10 +4 3 5 5 4 4 3 2 1 3 2 1 1 2 +5 1 NA NA NA NA NA NA NA NA NA NA NA NA +6 2 5 5 10 10 9 10 10 10 NA 10 9 10", header = TRUE) +df1 + +library(dplyr) + +df1 %>% + mutate(across(starts_with("cp"), ~ +(is.na(.) & id != 1), .names = +"{col}_new")) + + + +Hope this helps, + +Rui Barradas + + +-- +Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. +www.avg.com + + +From du@ho|| @end|ng |rom mcm@@ter@c@ Thu Sep 12 17:08:54 2024 +From: du@ho|| @end|ng |rom mcm@@ter@c@ (Jonathan Dushoff) +Date: Thu, 12 Sep 2024 11:08:54 -0400 +Subject: [R] Subject: Re: BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: <63654bcd06e94303a406e1e06fad5663@YTBPR01MB2797.CANPRD01.PROD.OUTLOOK.COM> +References: <63654bcd06e94303a406e1e06fad5663@YTBPR01MB2797.CANPRD01.PROD.OUTLOOK.COM> +Message-ID: + +> In this case, I do think we should look into the consequences of +> indeed distinguishing +> * +> * and +> / +> from their respective current {1. coerce to complex, 2. use complex arith} +> arithmetic. + +I'm wondering whether ? if this indeed gets opened up ? it might also +make sense to calculate x / y using real arithmetic +(as x*y / |y|?) + +Jonathan + + +From murdoch@dunc@n @end|ng |rom gm@||@com Thu Sep 12 17:21:02 2024 +From: murdoch@dunc@n @end|ng |rom gm@||@com (Duncan Murdoch) +Date: Thu, 12 Sep 2024 11:21:02 -0400 +Subject: [R] Subject: Re: BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: <63654bcd06e94303a406e1e06fad5663@YTBPR01MB2797.CANPRD01.PROD.OUTLOOK.COM> + +Message-ID: <35d23b38-9943-4f16-a847-a1f659bbbbb1@gmail.com> + +On 2024-09-12 11:08 a.m., Jonathan Dushoff wrote: +>> In this case, I do think we should look into the consequences of +>> indeed distinguishing +>> * +>> * and +>> / +>> from their respective current {1. coerce to complex, 2. use complex arith} +>> arithmetic. +> +> I'm wondering whether ? if this indeed gets opened up ? it might also +> make sense to calculate x / y using real arithmetic +> (as x*y / |y|?) + +That's not the correct formula, is it? I think the result should be x * +Conj(y) / Mod(y)^2 . So that would involve * and + / , not just real arithmetic. + +Duncan Murdoch + + +From cwr @end|ng |rom @gency@t@t|@t|c@|@com Fri Sep 13 03:10:53 2024 +From: cwr @end|ng |rom @gency@t@t|@t|c@|@com (Christopher W. Ryan) +Date: Thu, 12 Sep 2024 21:10:53 -0400 +Subject: [R] how to specify point symbols in the key on a lattice dotplot +Message-ID: <20240912211053.17549527@rcw-lm203c> + +I am making a dotplot with lattice, as follows: + +dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., + as.table = TRUE, + pch = 16:17, + cex = 1.8, + scales = list(cex = 1.4), + auto.key = TRUE) + +impact is a factor with two levels. + +They key shows 2 open circles, one of each color of my two +plotting symbols, one for each group. I would like the +symbols in the key to match the plotting characters in the graph: 16 +(filled circle) for one group and 17 (filled triangle) for the second +group. How would I do that? I have not had any success with supplying +arguments to auto.key, simpleKey, or key. Guess I'm not understanding +the syntax. + +Thanks. + +--Chris Ryan + +-- +Agency Statistical Consulting, LLC +Helping those in public service get the most from their data. +www.agencystatistical.com + +Public GnuPG email encryption key at +https://keys.openpgp.org +9E53101D261BEC070CFF1A0DC8BC50E715A672A0 + + +From po|c1410 @end|ng |rom gm@||@com Fri Sep 13 09:29:20 2024 +From: po|c1410 @end|ng |rom gm@||@com (CALUM POLWART) +Date: Fri, 13 Sep 2024 08:29:20 +0100 +Subject: [R] + how to specify point symbols in the key on a lattice dotplot +In-Reply-To: <20240912211053.17549527@rcw-lm203c> +References: <20240912211053.17549527@rcw-lm203c> +Message-ID: + +Add: + +key = list(points=16:17) + +Into the dotplot section possibly without the autokey + +On Fri, 13 Sep 2024, 08:19 Christopher W. Ryan, +wrote: + +> I am making a dotplot with lattice, as follows: +> +> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> as.table = TRUE, +> pch = 16:17, +> cex = 1.8, +> scales = list(cex = 1.4), +> auto.key = TRUE) +> +> impact is a factor with two levels. +> +> They key shows 2 open circles, one of each color of my two +> plotting symbols, one for each group. I would like the +> symbols in the key to match the plotting characters in the graph: 16 +> (filled circle) for one group and 17 (filled triangle) for the second +> group. How would I do that? I have not had any success with supplying +> arguments to auto.key, simpleKey, or key. Guess I'm not understanding +> the syntax. +> +> Thanks. +> +> --Chris Ryan +> +> -- +> Agency Statistical Consulting, LLC +> Helping those in public service get the most from their data. +> www.agencystatistical.com +> +> Public GnuPG email encryption key at +> https://keys.openpgp.org +> 9E53101D261BEC070CFF1A0DC8BC50E715A672A0 +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide +> https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. +> + + [[alternative HTML version deleted]] + + +From deep@y@n@@@rk@r @end|ng |rom gm@||@com Fri Sep 13 12:19:48 2024 +From: deep@y@n@@@rk@r @end|ng |rom gm@||@com (Deepayan Sarkar) +Date: Fri, 13 Sep 2024 15:49:48 +0530 +Subject: [R] + how to specify point symbols in the key on a lattice dotplot +In-Reply-To: <20240912211053.17549527@rcw-lm203c> +References: <20240912211053.17549527@rcw-lm203c> +Message-ID: + +On Fri, 13 Sept 2024 at 12:49, Christopher W. Ryan + wrote: +> +> I am making a dotplot with lattice, as follows: +> +> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> as.table = TRUE, +> pch = 16:17, +> cex = 1.8, +> scales = list(cex = 1.4), +> auto.key = TRUE) +> +> impact is a factor with two levels. +> +> They key shows 2 open circles, one of each color of my two +> plotting symbols, one for each group. I would like the +> symbols in the key to match the plotting characters in the graph: 16 +> (filled circle) for one group and 17 (filled triangle) for the second +> group. How would I do that? I have not had any success with supplying +> arguments to auto.key, simpleKey, or key. Guess I'm not understanding +> the syntax. + +Specifying key = list(...) will work, but the shortcut is to add + +par.settings = simpleTheme(pch = 16:17, cex = 1.8) + +That way, you don't need to specify the parameters anywhere else. + +-Deepayan + +> Thanks. +> +> --Chris Ryan +> +> -- +> Agency Statistical Consulting, LLC +> Helping those in public service get the most from their data. +> www.agencystatistical.com +> +> Public GnuPG email encryption key at +> https://keys.openpgp.org +> 9E53101D261BEC070CFF1A0DC8BC50E715A672A0 +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + + +From du@ho|| @end|ng |rom mcm@@ter@c@ Fri Sep 13 14:53:55 2024 +From: du@ho|| @end|ng |rom mcm@@ter@c@ (Jonathan Dushoff) +Date: Fri, 13 Sep 2024 08:53:55 -0400 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: +Message-ID: + +> Message: 4 +> Date: Thu, 12 Sep 2024 11:21:02 -0400 +> From: Duncan Murdoch +> That's not the correct formula, is it? I think the result should be x * +> Conj(y) / Mod(y)^2 . + +Correct, sorry. And thanks. + +> So that would involve * and +> / , not just real arithmetic. + +Not an expert, but I don't see it. Conj and Mod seem to be numerically +straightforward real-like operations. We do those, and then multiply +one complex number by one real quotient. + + +From cwr @end|ng |rom @gency@t@t|@t|c@|@com Fri Sep 13 17:58:50 2024 +From: cwr @end|ng |rom @gency@t@t|@t|c@|@com (Christopher W. Ryan) +Date: Fri, 13 Sep 2024 11:58:50 -0400 +Subject: [R] + how to specify point symbols in the key on a lattice dotplot +In-Reply-To: +References: <20240912211053.17549527@rcw-lm203c> + +Message-ID: <20240913115850.03bbed1d@rcw-lm203c> + + +dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., + pch = 16:17, + cex = 1.8, + scales = list(cex = 1.4), + key = list(points = 16:17) ) + +produces a graph with no discernible key, but with an asterisk at the +top, above the plotting region. + +Same result from + +dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., + pch = 16:17, + cex = 1.8, + scales = list(cex = 1.4), + key = list(points = 16:17), + auto.key = TRUE ) + + + + +dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., + scales = list(cex = 1.4), + par.settings = simpleTheme(pch = 16:17, cex = 1.8), + auto.key = TRUE) + +produces the desired result. + +Why does key = list(points = 16:17) not work? Below is a MWE: + +================================ + +library(lattice) +library(dplyr) +dd <- structure(list(impact = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, +2L), levels = c("impaction", "no impaction"), class = "factor"), + segment = structure(c(4L, 2L, 1L, 3L, 4L, 2L, 1L, 3L), levels = +c("left", "right", "rectosigmoid", "total"), class = c("ordered", +"factor" )), transit_time = c(70, 10, 20, 32, 42, 10, 12, 18)), class = +"data.frame", row.names = c(NA, -8L)) + +dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., + pch = 16:17, + cex = 1.8, + scales = list(cex = 1.4), + key = list(points = 16:17) ) + +================================= + + +Thanks. + +--Chris Ryan +-- +Agency Statistical Consulting, LLC +Helping those in public service get the most from their data. +www.agencystatistical.com + +Public GnuPG email encryption key at +https://keys.openpgp.org +9E53101D261BEC070CFF1A0DC8BC50E715A672A0 + + +On Fri, 13 Sep 2024 08:29:20 +0100, CALUM POLWART wrote: + +>Add: +> +>key = list(points=16:17) +> +>Into the dotplot section possibly without the autokey +> +>On Fri, 13 Sep 2024, 08:19 Christopher W. Ryan, +> wrote: +> +>> I am making a dotplot with lattice, as follows: +>> +>> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +>> as.table = TRUE, +>> pch = 16:17, +>> cex = 1.8, +>> scales = list(cex = 1.4), +>> auto.key = TRUE) +>> +>> impact is a factor with two levels. +>> +>> They key shows 2 open circles, one of each color of my two +>> plotting symbols, one for each group. I would like the +>> symbols in the key to match the plotting characters in the graph: 16 +>> (filled circle) for one group and 17 (filled triangle) for the second +>> group. How would I do that? I have not had any success with +>> supplying arguments to auto.key, simpleKey, or key. Guess I'm not +>> understanding the syntax. +>> +>> Thanks. +>> +>> --Chris Ryan +>> +>> -- +>> Agency Statistical Consulting, LLC +>> Helping those in public service get the most from their data. +>> www.agencystatistical.com +>> +>> Public GnuPG email encryption key at +>> https://keys.openpgp.org +>> 9E53101D261BEC070CFF1A0DC8BC50E715A672A0 +>> +>> ______________________________________________ +>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>> https://stat.ethz.ch/mailman/listinfo/r-help +>> PLEASE do read the posting guide +>> https://www.R-project.org/posting-guide.html +>> and provide commented, minimal, self-contained, reproducible code. +>> + + +From murdoch@dunc@n @end|ng |rom gm@||@com Fri Sep 13 19:10:16 2024 +From: murdoch@dunc@n @end|ng |rom gm@||@com (Duncan Murdoch) +Date: Fri, 13 Sep 2024 13:10:16 -0400 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: +References: + +Message-ID: + +On 2024-09-13 8:53 a.m., Jonathan Dushoff wrote: +>> Message: 4 +>> Date: Thu, 12 Sep 2024 11:21:02 -0400 +>> From: Duncan Murdoch +>> That's not the correct formula, is it? I think the result should be x * +>> Conj(y) / Mod(y)^2 . +> +> Correct, sorry. And thanks. +> +>> So that would involve * and +>> / , not just real arithmetic. +> +> Not an expert, but I don't see it. Conj and Mod seem to be numerically +> straightforward real-like operations. We do those, and then multiply +> one complex number by one real quotient. +> + +Are you sure? We aren't dealing with real numbers and complex numbers +here, we're dealing with those sets extended with infinities and other +weird things. + +I think the formula I gave assumes no infinities. + +So for example if y is some kind of infinite complex number, then 1/y +should come out to zero, and if x is finite, the final result of x/y +should be zero. + +But if we evaluate x/y as (x / Mod(y)^2) * Conj(y), won't we get a NaN +from zero times infinity? + +I imagine someone has thought about all these edge cases. Maybe they're +discussed in one of the standards that Richard referenced. + +Duncan Murdoch + + +From bgunter@4567 @end|ng |rom gm@||@com Fri Sep 13 19:45:08 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Fri, 13 Sep 2024 10:45:08 -0700 +Subject: [R] + how to specify point symbols in the key on a lattice dotplot +In-Reply-To: <20240913115850.03bbed1d@rcw-lm203c> +References: <20240912211053.17549527@rcw-lm203c> + + <20240913115850.03bbed1d@rcw-lm203c> +Message-ID: + +"Why does key = list(points = 16:17) not work? " + +Because, from the "key" section of ?xyplot +" The contents of the key are determined by (possibly repeated) +components named "rectangles", "lines", "points" or "text". Each of +these must be **lists** with relevant graphical parameters (see later) +controlling their appearance." + +Ergo, try: + +dd |> dotplot( segment ~ transit_time, groups = impact, data = ., + pch = 16:17, + cex = 1.8, + scales = list(cex = 1.4), + key = list(points = list(pch =16:17) )) + +Cheers, +Bert + + +On Fri, Sep 13, 2024 at 9:53?AM Christopher W. Ryan + wrote: +> +> +> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> pch = 16:17, +> cex = 1.8, +> scales = list(cex = 1.4), +> key = list(points = 16:17) ) +> +> produces a graph with no discernible key, but with an asterisk at the +> top, above the plotting region. +> +> Same result from +> +> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> pch = 16:17, +> cex = 1.8, +> scales = list(cex = 1.4), +> key = list(points = 16:17), +> auto.key = TRUE ) +> +> +> +> +> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> scales = list(cex = 1.4), +> par.settings = simpleTheme(pch = 16:17, cex = 1.8), +> auto.key = TRUE) +> +> produces the desired result. +> +> Why does key = list(points = 16:17) not work? Below is a MWE: +> +> ================================ +> +> library(lattice) +> library(dplyr) +> dd <- structure(list(impact = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, +> 2L), levels = c("impaction", "no impaction"), class = "factor"), +> segment = structure(c(4L, 2L, 1L, 3L, 4L, 2L, 1L, 3L), levels = +> c("left", "right", "rectosigmoid", "total"), class = c("ordered", +> "factor" )), transit_time = c(70, 10, 20, 32, 42, 10, 12, 18)), class = +> "data.frame", row.names = c(NA, -8L)) +> +> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> pch = 16:17, +> cex = 1.8, +> scales = list(cex = 1.4), +> key = list(points = 16:17) ) +> +> ================================= +> +> +> Thanks. +> +> --Chris Ryan +> -- +> Agency Statistical Consulting, LLC +> Helping those in public service get the most from their data. +> www.agencystatistical.com +> +> Public GnuPG email encryption key at +> https://keys.openpgp.org +> 9E53101D261BEC070CFF1A0DC8BC50E715A672A0 +> +> +> On Fri, 13 Sep 2024 08:29:20 +0100, CALUM POLWART wrote: +> +> >Add: +> > +> >key = list(points=16:17) +> > +> >Into the dotplot section possibly without the autokey +> > +> >On Fri, 13 Sep 2024, 08:19 Christopher W. Ryan, +> > wrote: +> > +> >> I am making a dotplot with lattice, as follows: +> >> +> >> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> >> as.table = TRUE, +> >> pch = 16:17, +> >> cex = 1.8, +> >> scales = list(cex = 1.4), +> >> auto.key = TRUE) +> >> +> >> impact is a factor with two levels. +> >> +> >> They key shows 2 open circles, one of each color of my two +> >> plotting symbols, one for each group. I would like the +> >> symbols in the key to match the plotting characters in the graph: 16 +> >> (filled circle) for one group and 17 (filled triangle) for the second +> >> group. How would I do that? I have not had any success with +> >> supplying arguments to auto.key, simpleKey, or key. Guess I'm not +> >> understanding the syntax. +> >> +> >> Thanks. +> >> +> >> --Chris Ryan +> >> +> >> -- +> >> Agency Statistical Consulting, LLC +> >> Helping those in public service get the most from their data. +> >> www.agencystatistical.com +> >> +> >> Public GnuPG email encryption key at +> >> https://keys.openpgp.org +> >> 9E53101D261BEC070CFF1A0DC8BC50E715A672A0 +> >> +> >> ______________________________________________ +> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> >> https://stat.ethz.ch/mailman/listinfo/r-help +> >> PLEASE do read the posting guide +> >> https://www.R-project.org/posting-guide.html +> >> and provide commented, minimal, self-contained, reproducible code. +> >> +> +> ______________________________________________ +> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> https://stat.ethz.ch/mailman/listinfo/r-help +> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> and provide commented, minimal, self-contained, reproducible code. + + +From bgunter@4567 @end|ng |rom gm@||@com Fri Sep 13 19:55:03 2024 +From: bgunter@4567 @end|ng |rom gm@||@com (Bert Gunter) +Date: Fri, 13 Sep 2024 10:55:03 -0700 +Subject: [R] + how to specify point symbols in the key on a lattice dotplot +In-Reply-To: +References: <20240912211053.17549527@rcw-lm203c> + + <20240913115850.03bbed1d@rcw-lm203c> + +Message-ID: + +Oh, and for correct syntax with R's |> operator, "data = ." , should +be "data = _" ; although the former seemed to work. + +-- Bert + +On Fri, Sep 13, 2024 at 10:45?AM Bert Gunter wrote: +> +> "Why does key = list(points = 16:17) not work? " +> +> Because, from the "key" section of ?xyplot +> " The contents of the key are determined by (possibly repeated) +> components named "rectangles", "lines", "points" or "text". Each of +> these must be **lists** with relevant graphical parameters (see later) +> controlling their appearance." +> +> Ergo, try: +> +> dd |> dotplot( segment ~ transit_time, groups = impact, data = ., +> pch = 16:17, +> cex = 1.8, +> scales = list(cex = 1.4), +> key = list(points = list(pch =16:17) )) +> +> Cheers, +> Bert +> +> +> On Fri, Sep 13, 2024 at 9:53?AM Christopher W. Ryan +> wrote: +> > +> > +> > dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> > pch = 16:17, +> > cex = 1.8, +> > scales = list(cex = 1.4), +> > key = list(points = 16:17) ) +> > +> > produces a graph with no discernible key, but with an asterisk at the +> > top, above the plotting region. +> > +> > Same result from +> > +> > dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> > pch = 16:17, +> > cex = 1.8, +> > scales = list(cex = 1.4), +> > key = list(points = 16:17), +> > auto.key = TRUE ) +> > +> > +> > +> > +> > dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> > scales = list(cex = 1.4), +> > par.settings = simpleTheme(pch = 16:17, cex = 1.8), +> > auto.key = TRUE) +> > +> > produces the desired result. +> > +> > Why does key = list(points = 16:17) not work? Below is a MWE: +> > +> > ================================ +> > +> > library(lattice) +> > library(dplyr) +> > dd <- structure(list(impact = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, +> > 2L), levels = c("impaction", "no impaction"), class = "factor"), +> > segment = structure(c(4L, 2L, 1L, 3L, 4L, 2L, 1L, 3L), levels = +> > c("left", "right", "rectosigmoid", "total"), class = c("ordered", +> > "factor" )), transit_time = c(70, 10, 20, 32, 42, 10, 12, 18)), class = +> > "data.frame", row.names = c(NA, -8L)) +> > +> > dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> > pch = 16:17, +> > cex = 1.8, +> > scales = list(cex = 1.4), +> > key = list(points = 16:17) ) +> > +> > ================================= +> > +> > +> > Thanks. +> > +> > --Chris Ryan +> > -- +> > Agency Statistical Consulting, LLC +> > Helping those in public service get the most from their data. +> > www.agencystatistical.com +> > +> > Public GnuPG email encryption key at +> > https://keys.openpgp.org +> > 9E53101D261BEC070CFF1A0DC8BC50E715A672A0 +> > +> > +> > On Fri, 13 Sep 2024 08:29:20 +0100, CALUM POLWART wrote: +> > +> > >Add: +> > > +> > >key = list(points=16:17) +> > > +> > >Into the dotplot section possibly without the autokey +> > > +> > >On Fri, 13 Sep 2024, 08:19 Christopher W. Ryan, +> > > wrote: +> > > +> > >> I am making a dotplot with lattice, as follows: +> > >> +> > >> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +> > >> as.table = TRUE, +> > >> pch = 16:17, +> > >> cex = 1.8, +> > >> scales = list(cex = 1.4), +> > >> auto.key = TRUE) +> > >> +> > >> impact is a factor with two levels. +> > >> +> > >> They key shows 2 open circles, one of each color of my two +> > >> plotting symbols, one for each group. I would like the +> > >> symbols in the key to match the plotting characters in the graph: 16 +> > >> (filled circle) for one group and 17 (filled triangle) for the second +> > >> group. How would I do that? I have not had any success with +> > >> supplying arguments to auto.key, simpleKey, or key. Guess I'm not +> > >> understanding the syntax. +> > >> +> > >> Thanks. +> > >> +> > >> --Chris Ryan +> > >> +> > >> -- +> > >> Agency Statistical Consulting, LLC +> > >> Helping those in public service get the most from their data. +> > >> www.agencystatistical.com +> > >> +> > >> Public GnuPG email encryption key at +> > >> https://keys.openpgp.org +> > >> 9E53101D261BEC070CFF1A0DC8BC50E715A672A0 +> > >> +> > >> ______________________________________________ +> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> > >> https://stat.ethz.ch/mailman/listinfo/r-help +> > >> PLEASE do read the posting guide +> > >> https://www.R-project.org/posting-guide.html +> > >> and provide commented, minimal, self-contained, reproducible code. +> > >> +> > +> > ______________________________________________ +> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +> > https://stat.ethz.ch/mailman/listinfo/r-help +> > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html +> > and provide commented, minimal, self-contained, reproducible code. + + +From cwr @end|ng |rom @gency@t@t|@t|c@|@com Fri Sep 13 20:06:05 2024 +From: cwr @end|ng |rom @gency@t@t|@t|c@|@com (Christopher W. Ryan) +Date: Fri, 13 Sep 2024 14:06:05 -0400 +Subject: [R] + how to specify point symbols in the key on a lattice dotplot +In-Reply-To: +References: <20240912211053.17549527@rcw-lm203c> + + <20240913115850.03bbed1d@rcw-lm203c> + +Message-ID: <20240913140605.51a11cb6@rcw-lm203c> + +For me, Bert's suggestion produces a plot with two black symbols above +the plotting region, a circle and a triangle, both filled, and no text. + +This, in which I specify several features of the symbols in the key, + +dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., + pch = 16:17, + col = 1:2, + cex = 1.8, + scales = list(cex = 1.4), + key = list(text = list(c("impaction", "no impaction")), +points = list(pch =16:17, col = 1:2) )) + +gets me what I want. + +When using key = (), is it necessary to specify all features of the +plotting symbols and the text? I was under the impression that auto.key +and/or simpleKey (which I'd also tried) had certain defaults, but one or +more of those defaults could be changed by providing arguments, with all +unspecified features remaining at their respective defaults. + +Thanks. + +--Chris Ryan + + + + +On Fri, 13 Sep 2024 10:45:08 -0700, Bert Gunter wrote: + +>"Why does key = list(points = 16:17) not work? " +> +>Because, from the "key" section of ?xyplot +>" The contents of the key are determined by (possibly repeated) +>components named "rectangles", "lines", "points" or "text". Each of +>these must be **lists** with relevant graphical parameters (see later) +>controlling their appearance." +> +>Ergo, try: +> +>dd |> dotplot( segment ~ transit_time, groups = impact, data = ., +> pch = 16:17, +> cex = 1.8, +> scales = list(cex = 1.4), +> key = list(points = list(pch =16:17) )) +> +>Cheers, +>Bert +> +> +>On Fri, Sep 13, 2024 at 9:53?AM Christopher W. Ryan +> wrote: +>> +>> +>> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +>> pch = 16:17, +>> cex = 1.8, +>> scales = list(cex = 1.4), +>> key = list(points = 16:17) ) +>> +>> produces a graph with no discernible key, but with an asterisk at the +>> top, above the plotting region. +>> +>> Same result from +>> +>> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +>> pch = 16:17, +>> cex = 1.8, +>> scales = list(cex = 1.4), +>> key = list(points = 16:17), +>> auto.key = TRUE ) +>> +>> +>> +>> +>> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +>> scales = list(cex = 1.4), +>> par.settings = simpleTheme(pch = 16:17, cex = 1.8), +>> auto.key = TRUE) +>> +>> produces the desired result. +>> +>> Why does key = list(points = 16:17) not work? Below is a MWE: +>> +>> ================================ +>> +>> library(lattice) +>> library(dplyr) +>> dd <- structure(list(impact = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, +>> 2L), levels = c("impaction", "no impaction"), class = "factor"), +>> segment = structure(c(4L, 2L, 1L, 3L, 4L, 2L, 1L, 3L), levels = +>> c("left", "right", "rectosigmoid", "total"), class = c("ordered", +>> "factor" )), transit_time = c(70, 10, 20, 32, 42, 10, 12, 18)), +>> class = "data.frame", row.names = c(NA, -8L)) +>> +>> dd %>% dotplot( segment ~ transit_time, groups = impact, data = ., +>> pch = 16:17, +>> cex = 1.8, +>> scales = list(cex = 1.4), +>> key = list(points = 16:17) ) +>> +>> ================================= +>> +>> +>> Thanks. +>> +>> --Chris Ryan +>> -- +>> Agency Statistical Consulting, LLC +>> Helping those in public service get the most from their data. +>> www.agencystatistical.com +>> +>> Public GnuPG email encryption key at +>> https://keys.openpgp.org +>> 9E53101D261BEC070CFF1A0DC8BC50E715A672A0 +>> +>> +>> On Fri, 13 Sep 2024 08:29:20 +0100, CALUM POLWART wrote: +>> +>> >Add: +>> > +>> >key = list(points=16:17) +>> > +>> >Into the dotplot section possibly without the autokey +>> > +>> >On Fri, 13 Sep 2024, 08:19 Christopher W. Ryan, +>> > wrote: +>> > +>> >> I am making a dotplot with lattice, as follows: +>> >> +>> >> dd %>% dotplot( segment ~ transit_time, groups = impact, data = +>> >> ., as.table = TRUE, +>> >> pch = 16:17, +>> >> cex = 1.8, +>> >> scales = list(cex = 1.4), +>> >> auto.key = TRUE) +>> >> +>> >> impact is a factor with two levels. +>> >> +>> >> They key shows 2 open circles, one of each color of my two +>> >> plotting symbols, one for each group. I would like the +>> >> symbols in the key to match the plotting characters in the graph: +>> >> 16 (filled circle) for one group and 17 (filled triangle) for the +>> >> second group. How would I do that? I have not had any success +>> >> with supplying arguments to auto.key, simpleKey, or key. Guess +>> >> I'm not understanding the syntax. +>> >> +>> >> Thanks. +>> >> +>> >> --Chris Ryan +>> >> +>> >> -- +>> >> Agency Statistical Consulting, LLC +>> >> Helping those in public service get the most from their data. +>> >> www.agencystatistical.com +>> >> +>> >> Public GnuPG email encryption key at +>> >> https://keys.openpgp.org +>> >> 9E53101D261BEC070CFF1A0DC8BC50E715A672A0 +>> >> +>> >> ______________________________________________ +>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>> >> https://stat.ethz.ch/mailman/listinfo/r-help +>> >> PLEASE do read the posting guide +>> >> https://www.R-project.org/posting-guide.html +>> >> and provide commented, minimal, self-contained, reproducible code. +>> >> +>> +>> ______________________________________________ +>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see +>> https://stat.ethz.ch/mailman/listinfo/r-help +>> PLEASE do read the posting guide +>> https://www.R-project.org/posting-guide.html and provide commented, +>> minimal, self-contained, reproducible code. + + +From du@ho|| @end|ng |rom mcm@@ter@c@ Fri Sep 13 20:23:14 2024 +From: du@ho|| @end|ng |rom mcm@@ter@c@ (Jonathan Dushoff) +Date: Fri, 13 Sep 2024 14:23:14 -0400 +Subject: [R] BUG: atan(1i) / 5 = NaN+Infi ? +In-Reply-To: <5fdbeea24c5b4da78a0d5257a8f5089f@YTBPR01MB2797.CANPRD01.PROD.OUTLOOK.COM> +References: + + <5fdbeea24c5b4da78a0d5257a8f5089f@YTBPR01MB2797.CANPRD01.PROD.OUTLOOK.COM> +Message-ID: + +On Fri, Sep 13, 2024 at 1:10?PM Duncan Murdoch wrote: + +> [You don't often get email from murdoch.duncan at gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] + +> Caution: External email. + + +> On 2024-09-13 8:53 a.m., Jonathan Dushoff wrote: +> >> Message: 4 +> >> Date: Thu, 12 Sep 2024 11:21:02 -0400 +> >> From: Duncan Murdoch +> >> That's not the correct formula, is it? I think the result should be x * +> >> Conj(y) / Mod(y)^2 . + +> > Correct, sorry. And thanks. + +> >> So that would involve * and +> >> / , not just real arithmetic. + +> > Not an expert, but I don't see it. Conj and Mod seem to be numerically +> > straightforward real-like operations. We do those, and then multiply +> > one complex number by one real quotient. + +> Are you sure? We aren't dealing with real numbers and complex numbers +> here, we're dealing with those sets extended with infinities and other +> weird things. + +Definitely not sure, just thought I would suggest it as a possibility. + +> So for example if y is some kind of infinite complex number, then 1/y +> should come out to zero, and if x is finite, the final result of x/y +> should be zero. + +> But if we evaluate x/y as (x / Mod(y)^2) * Conj(y), won't we get a NaN +> from zero times infinity? + +Yes, and it's not trivial to work around, so probably not worth it. + +Thanks, + + diff --git a/r-package-devel/2024q3.txt b/r-package-devel/2024q3.txt index 1958e8f..2355d65 100644 --- a/r-package-devel/2024q3.txt +++ b/r-package-devel/2024q3.txt @@ -7122,3 +7122,3885 @@ On Tue, 13/08/2024 10:08, Ivan Krylov via R-package-devel wrote: > +From @@j5xsj9 m@iii@g oii @iiiy@@ddy@io Thu Aug 15 20:58:39 2024 +From: @@j5xsj9 m@iii@g oii @iiiy@@ddy@io (@@j5xsj9 m@iii@g oii @iiiy@@ddy@io) +Date: Thu, 15 Aug 2024 18:58:39 +0000 +Subject: [R-pkg-devel] Build process generated non-portable files +In-Reply-To: <20240813110814.21a43010@Tarkus> +References: <3f8d2aec95fb552733dc236aa611130e@nilly.addy.io> + <20240813110814.21a43010@Tarkus> +Message-ID: <5301b8e64edc0ce9e85e272a60cd5535@nilly.addy.io> + +This seems like it should work. Unfortunately my rhub github actions is failing to get past the setup deps step which has been occuring inconsistently in the past but right now it's consistently failing to build deps so I can't confirm it work. I was also unable to successfully build R using intel compilers, even when using Rhubs container as template. + +> Include the first 'all' target in your Makevars + +I was really trying to avoid explicitly creating a target in makevars but this is simple enough. + +In case anyone else comes across this, the genmod files end up in `src` even if the original source files are under a subdirectory so the recipe ends up being: + +> -rm -f *genmod.f90 + +> If you manage to find an option for the ifx compiler that disables creation of these files + +I installed intel compilers and checked the `ifx` man page. Could not find an option for turning off generation of the genmod files. + +> a brief Web search says they are for only the user to read, but most results are from early 2010's + +Yeah I checked one of the files again and it does say that it's generated only for reference. + +?- David R. Connell + +On Tuesday, August 13th, 2024 at 3:08 AM, Ivan Krylov 'ikrylov at disroot.org' wrote: + +> ? Mon, 12 Aug 2024 18:24:30 +0000 +> David via R-package-devel r-package-devel at r-project.org ?????: +> +> > in the intel environment (provided by rhub), the intel fortran +> > compiler generates intermediary files from *.f -> *__genmod.f90. The +> > R check then complains that the genmod files are not portable. I +> > include removal of the files in my cleanup file so the files do not +> > exist in the original package source or in the final source tarball +> > but it seems the portable files check is done after compilation but +> > before cleanup. +> > +> > - Is there a way to get around this complaint? +> +> +> Include the first 'all' target in your Makevars, make it depend on the +> package shared library, and make it remove genmod files in the recipe: +> +> all: $(SHLIB) +> -rm -f arpack/*genmod.f90 +> +> A similar trick is mentioned in +> https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Using-Makevars. +> +> > - Should this complaint be here in the first place? +> +> +> Perhaps not. If you manage to find an option for the ifx compiler that +> disables creation of these files (a brief Web search says they are for +> only the user to read, but most results are from early 2010's), post it +> here. This may be a good argument to make this option recommended for R. +> +> > Shouldn't the portable files check only be performed on the shipped +> > source code? +> +> +> False negatives are possible too, in case the installation stage +> (configure and/or make) performs a lot of preprocessing, or unpacks +> extra sources. You could be right; I don't have statistics to back +> either option as less wasteful of effort. +> +> -- +> Best regards, +> Ivan + + +From |kry|ov @end|ng |rom d|@root@org Fri Aug 16 14:58:41 2024 +From: |kry|ov @end|ng |rom d|@root@org (Ivan Krylov) +Date: Fri, 16 Aug 2024 15:58:41 +0300 +Subject: [R-pkg-devel] Build process generated non-portable files +In-Reply-To: <94db9a3e5dcb79a6629143e1752a2816@nilly.addy.io> +References: <3f8d2aec95fb552733dc236aa611130e@nilly.addy.io> + <20240813110814.21a43010@Tarkus> + <94db9a3e5dcb79a6629143e1752a2816@nilly.addy.io> +Message-ID: <20240816155841.456cfd2f@arachnoid> + +? Thu, 15 Aug 2024 18:58:41 +0000 +anj5xsj9 at nilly.addy.io ?????: + +> This seems like it should work. Unfortunately my rhub github actions +> is failing to get past the setup deps step which has been occuring +> inconsistently in the past but right now it's consistently failing to +> build deps so I can't confirm it work. + +This may be worth reporting to the rhub developers. The error is really +strange. It looks like the package at +https://github.com/cran/igraph/releases/download/2.0.3/igraph_2.0.3_b1_R4.5_x86_64-pc-linux-gnu-fedora-38.tar.gz +(referenced from https://github.com/r-hub/repos) has a binary +dependency on OpenBLAS: + +$ readelf -d igraph/libs/igraph.so | grep openblas +0x0000000000000001 (NEEDED) Shared library: [libopenblasp.so.0] + +...but that's either not noted or not installed correctly. + +> I was also unable to successfully build R using intel compilers, even +> when using Rhubs container as template. + +If you'd like to dig deeper, feel free to ask here with details. + +> In case anyone else comes across this, the genmod files end up in +> `src` even if the original source files are under a subdirectory so +> the recipe ends up being: +> +> > -rm -f *genmod.f90 + +Thank you for letting us know! + +> I installed intel compilers and checked the `ifx` man page. Could not +> find an option for turning off generation of the genmod files. + +I experimented with the "ghcr.io/r-hub/containers/intel:latest" +container and was able to find out that the option -[no]gen-interfaces +controls the generation of *__genmod* files: + +/opt/intel/oneapi/compiler/latest/bin/ifx -O3 -fp-model precise \ + -warn all,noexternals -c -o arpack/dgetv0.o arpack/dgetv0.f; \ + ls *genmod* +# ... +# dgetv0__genmod.f90 dgetv0__genmod.mod +rm -vf *genmod* +# removed 'dgetv0__genmod.f90' +# removed 'dgetv0__genmod.mod' +/opt/intel/oneapi/compiler/latest/bin/ifx -nogen-interfaces -O3 \ + -fp-model precise -warn all,noexternals -c -o arpack/dgetv0.o \ + arpack/dgetv0.f; \ + ls *genmod* +# ... +# ls: cannot access '*genmod*': No such file or directory + +This option is already used as part of the "Intel" additional checks +performed by Prof. Brian D. Ripley, so the *__genmod.* files should not +be a problem on CRAN: +https://svn.r-project.org/R-dev-web/trunk/CRAN/QA/BDR/gannet/Intel/config.site + +-- +Best regards, +Ivan + + +From kev|n@r@coombe@ @end|ng |rom gm@||@com Fri Aug 16 19:25:36 2024 +From: kev|n@r@coombe@ @end|ng |rom gm@||@com (Kevin R. Coombes) +Date: Fri, 16 Aug 2024 13:25:36 -0400 +Subject: [R-pkg-devel] Inherited methods +Message-ID: + +Hi, + +I seem to have gotten myself confused about inheritance of S4 classes +and the dispatch of inherited methods. + +The core situation is that + + * The graphics package already defines the function "hist" + * My package A creates an S4 generic "hist", using the existing + function as the default method. + * Package A also defines a class B with its own implementation of the + method "hist". It exports both the generic method and the class B + specific method. + * Now I create a new package C. In that package, I define a new class + D that inherits from class B. The implementation of "hist" for class + B should do everything I want for class D, so I don't want to have + to re-implement it. + +My question: what do I have to do to enable users of package D to call +"hist(B)" and have it work without complaint? After many go-rounds of +changing the code, changing the NAMESPACE, and running "R CMD check" +approximately an infinite number of times, I now have + +NAMESPACE: +importMethodFrom("A", "hist") +exportMethod("hist") + +R SCRIPT: +setClass("D", slots = c(some_extra_stuff), contains = "B") + +setMethod("hist", "D", function(x, ...) { + ? callNextMethod(x, ...) + ? invisible(x) +}) + +Do I really need all that? Why do I even need *any* of it? Shouldn't I +be able to leave out the implementation that just says "callNextMethod" +and have the standard dispatch figure out what to call? Why does it want +to call the default function if it knows there is a B-method? (And it +must know it, since callNextMethod knows it.) + +Confused, + ?? Kevin + + [[alternative HTML version deleted]] + + +From kev|n@r@coombe@ @end|ng |rom gm@||@com Fri Aug 16 19:32:57 2024 +From: kev|n@r@coombe@ @end|ng |rom gm@||@com (Kevin R. Coombes) +Date: Fri, 16 Aug 2024 13:32:57 -0400 +Subject: [R-pkg-devel] R CMD check options +Message-ID: <52ea54f5-2a6a-4a6b-bb5d-2309f8c86f1f@gmail.com> + +Hi, + +I maintain a bunch of packages on CRAN. When updating packages, I often +find two sort of NOTES on some of the platforms that CRAN checks that +don't show up on my stupid Windows (I know that's redundant; you don't +have to tell me) machine at home. The two most common issues are + + * spelling (especially in the DESCRIPTION file), and + * links in the documentation with missing package anchors. + +How do I enable these checks so that when I run "R CMD check --as-cran" +it actually does behave like those CRAN machines do, so I can find and +fix the issues before submitting the package? + +Thanks, + ? Kevin + + + [[alternative HTML version deleted]] + + +From @@j5xsj9 m@iii@g oii @iiiy@@ddy@io Fri Aug 16 20:53:49 2024 +From: @@j5xsj9 m@iii@g oii @iiiy@@ddy@io (@@j5xsj9 m@iii@g oii @iiiy@@ddy@io) +Date: Fri, 16 Aug 2024 18:53:49 +0000 +Subject: [R-pkg-devel] Build process generated non-portable files +In-Reply-To: <20240816155841.456cfd2f@arachnoid> +References: <3f8d2aec95fb552733dc236aa611130e@nilly.addy.io> + <20240813110814.21a43010@Tarkus> + <94db9a3e5dcb79a6629143e1752a2816@nilly.addy.io> + <20240816155841.456cfd2f@arachnoid> +Message-ID: <17411dc6e7509077db21c3a0dfcb1020@nilly.addy.io> + +I locally ran the rhub intel docker and that was much easier to set up. So I can now confirm the change to Makevars does work. + +> I experimented with the "ghcr.io/r-hub/containers/intel:latest" container and was able to find out that the option -[no]gen-interfaces controls the generation of __genmod files: + +You are right. I temporarily removed the changes to Makevar and added the `-nogen-interfaces` flag to FFLAGS and that also prevented the warning. I can open an issue at rhub and suggest adding that as a default for the intel compilers. + +> This may be worth reporting to the rhub developers. The error is really strange. It looks like the package at https://github.com/cran/igraph/releases/download/2.0.3/igraph_2.0.3_b1_R4.5_x86_64-pc-linux-gnu-fedora-38.tar.gz + +The strange thing is the intel container on github actions has succeeded in the past but now is consistently failing to build `targets`: + +> ? Failed to build targets 1.7.1 (2.1s) +> Error: +> ! error in pak subprocess +> Caused by error in `stop_task_build(state, worker)`: +> ! Failed to build source package targets. +> Full installation output: +> * installing *source* package ?targets? ... +> ** package ?targets? successfully unpacked and MD5 sums checked +> staged installation is only possible with locking +> ** using non-staged installation +> ** R +> ** inst +> ** byte-compile and prepare package for lazy loading +> Error in dyn.load(file, DLLpath = DLLpath, ...) : +> unable to load shared object '/github/home/R/x86_64-pc-linux-gnu-library/4.5/igraph/libs/igraph.so': +> libopenblasp.so.0: cannot open shared object file: No such file or directory +> Calls: ... asNamespace -> loadNamespace -> library.dynam -> dyn.load +> Execution halted +> ERROR: lazy loading failed for package ?targets? +> * removing ?/tmp/RtmpLpWRX0/pkg-lib3c3299c227c/targets? + +Which is related to the igraph issue you mentioned. Checking the packages installed in a previous successful intel action, targets was not listed. I don't know why it's being installed now but not previously, I haven't changed dependencies. + +In the past other packages have failed to build and not only on the intel container see "https://github.com/SpeakEasy-2/speakeasyR/actions/runs/10202337528/job/28226219457" where several containers failed at the setup-deps step. There is overlap in which package fails (i.e. protGenerics and sparseArray fail in multiple containers but succeed in others while in one container ExperimentHub fails). It seems the only packages failing are from Bioconductor. Assume this is a bioconductor or pak issue. + +The issue with igraph is interesting though since I do use the igraph package for some examples and inside the intel container, R CMD build has no problem running igraph. Inspecting the resulting tarball shows the html version of my vignette contains results that depends on running igraph code and my test using igraph succeeds with R CMD check. Yet when I run R inside the container and try to load the igraph library or run code via `igraph::` I get an error + +> > igraph::sample_pref(10) +> Error in dyn.load(file, DLLpath = DLLpath, ...) : +> unable to load shared object '/root/R/x86_64-pc-linux-gnu-library/4.5/igraph/libs/igraph.so': +> libopenblasp.so.0: cannot open shared object file: No such file or directory + +I.e. the same error with building targets. I can raise an issue on rigraph as well. + +?- David R. Connell + +On Friday, August 16th, 2024 at 7:58 AM, Ivan Krylov 'ikrylov at disroot.org' wrote: + +> ? Thu, 15 Aug 2024 18:58:41 +0000 +> anj5xsj9 at nilly.addy.io ?????: +> +> > This seems like it should work. Unfortunately my rhub github actions +> > is failing to get past the setup deps step which has been occuring +> > inconsistently in the past but right now it's consistently failing to +> > build deps so I can't confirm it work. +> +> +> This may be worth reporting to the rhub developers. The error is really +> strange. It looks like the package at +> https://github.com/cran/igraph/releases/download/2.0.3/igraph_2.0.3_b1_R4.5_x86_64-pc-linux-gnu-fedora-38.tar.gz +> (referenced from https://github.com/r-hub/repos) has a binary +> dependency on OpenBLAS: +> +> $ readelf -d igraph/libs/igraph.so | grep openblas +> 0x0000000000000001 (NEEDED) Shared library: [libopenblasp.so.0] +> +> ...but that's either not noted or not installed correctly. +> +> > I was also unable to successfully build R using intel compilers, even +> > when using Rhubs container as template. +> +> +> If you'd like to dig deeper, feel free to ask here with details. +> +> > In case anyone else comes across this, the genmod files end up in +> > `src` even if the original source files are under a subdirectory so +> > the recipe ends up being: +> > +> > > -rm -f *genmod.f90 +> +> +> Thank you for letting us know! +> +> > I installed intel compilers and checked the `ifx` man page. Could not +> > find an option for turning off generation of the genmod files. +> +> +> I experimented with the "ghcr.io/r-hub/containers/intel:latest" +> container and was able to find out that the option -[no]gen-interfaces +> controls the generation of __genmod files: +> +> /opt/intel/oneapi/compiler/latest/bin/ifx -O3 -fp-model precise \ +> -warn all,noexternals -c -o arpack/dgetv0.o arpack/dgetv0.f; \ +> ls genmod +> # ... +> # dgetv0__genmod.f90 dgetv0__genmod.mod +> rm -vf genmod +> # removed 'dgetv0__genmod.f90' +> # removed 'dgetv0__genmod.mod' +> /opt/intel/oneapi/compiler/latest/bin/ifx -nogen-interfaces -O3 \ +> -fp-model precise -warn all,noexternals -c -o arpack/dgetv0.o \ +> arpack/dgetv0.f; \ +> ls genmod +> # ... +> # ls: cannot access 'genmod': No such file or directory +> +> This option is already used as part of the "Intel" additional checks +> performed by Prof. Brian D. Ripley, so the __genmod. files should not +> be a problem on CRAN: +> https://svn.r-project.org/R-dev-web/trunk/CRAN/QA/BDR/gannet/Intel/config.site +> +> -- +> Best regards, +> Ivan + + +From georg|@bo@hn@kov @end|ng |rom m@nche@ter@@c@uk Sat Aug 17 11:05:52 2024 +From: georg|@bo@hn@kov @end|ng |rom m@nche@ter@@c@uk (Georgi Boshnakov) +Date: Sat, 17 Aug 2024 09:05:52 +0000 +Subject: [R-pkg-devel] R CMD check options +In-Reply-To: <52ea54f5-2a6a-4a6b-bb5d-2309f8c86f1f@gmail.com> +References: <52ea54f5-2a6a-4a6b-bb5d-2309f8c86f1f@gmail.com> +Message-ID: + +In most cases, installing a recent version of R-devel will do (here, the check with it will show the notes about missing package anchors). + +Georgi Boshnakov + +________________________________________ +From: R-package-devel on behalf of Kevin R. Coombes +Sent: 16 August 2024 18:32 +To: r-package-devel at r-project.org +Subject: [R-pkg-devel] R CMD check options + +Hi, I maintain a bunch of packages on CRAN. When updating packages, I often find two sort of NOTES on some of the platforms that CRAN checks that don't show up on my stupid Windows (I know that's redundant; you don't have to tell me) machine +ZjQcmQRYFpfptBannerStart +This Message Is From a New External Sender +You have not previously corresponded with this sender. Please exercise caution when opening links or attachments included in this message. + +ZjQcmQRYFpfptBannerEnd + +Hi, + +I maintain a bunch of packages on CRAN. When updating packages, I often +find two sort of NOTES on some of the platforms that CRAN checks that +don't show up on my stupid Windows (I know that's redundant; you don't +have to tell me) machine at home. The two most common issues are + + * spelling (especially in the DESCRIPTION file), and + * links in the documentation with missing package anchors. + +How do I enable these checks so that when I run "R CMD check --as-cran" +it actually does behave like those CRAN machines do, so I can find and +fix the issues before submitting the package? + +Thanks, + Kevin + + + [[alternative HTML version deleted]] + +______________________________________________ +R-package-devel at r-project.org mailing list +https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!PDiH4ENfjr2_Jw!E2Ma-V8bhj8k-YfGsRWAQHNJSG51USXTaeHl8IHyVh3CnziYF9t3isoCoR3Lzs13rfjPktwG-mm5LBdm2sWKZVxnV7PDLaqoBMEpMJI$[stat[.]ethz[.]ch] + + + +From ||@t@ @end|ng |rom dewey@myzen@co@uk Sat Aug 17 14:45:15 2024 +From: ||@t@ @end|ng |rom dewey@myzen@co@uk (Michael Dewey) +Date: Sat, 17 Aug 2024 13:45:15 +0100 +Subject: [R-pkg-devel] R CMD check options +In-Reply-To: +References: <52ea54f5-2a6a-4a6b-bb5d-2309f8c86f1f@gmail.com> + +Message-ID: <6ac7a7fe-a549-7551-a02f-af63258a19ae@dewey.myzen.co.uk> + +And if all else fails submit to Winbuilder using the R-devel branch. + +Michael + +On 17/08/2024 10:05, Georgi Boshnakov wrote: +> In most cases, installing a recent version of R-devel will do (here, the check with it will show the notes about missing package anchors). +> +> Georgi Boshnakov +> +> ________________________________________ +> From: R-package-devel on behalf of Kevin R. Coombes +> Sent: 16 August 2024 18:32 +> To: r-package-devel at r-project.org +> Subject: [R-pkg-devel] R CMD check options +> +> Hi, I maintain a bunch of packages on CRAN. When updating packages, I often find two sort of NOTES on some of the platforms that CRAN checks that don't show up on my stupid Windows (I know that's redundant; you don't have to tell me) machine +> ZjQcmQRYFpfptBannerStart +> This Message Is From a New External Sender +> You have not previously corresponded with this sender. Please exercise caution when opening links or attachments included in this message. +> +> ZjQcmQRYFpfptBannerEnd +> +> Hi, +> +> I maintain a bunch of packages on CRAN. When updating packages, I often +> find two sort of NOTES on some of the platforms that CRAN checks that +> don't show up on my stupid Windows (I know that's redundant; you don't +> have to tell me) machine at home. The two most common issues are +> +> * spelling (especially in the DESCRIPTION file), and +> * links in the documentation with missing package anchors. +> +> How do I enable these checks so that when I run "R CMD check --as-cran" +> it actually does behave like those CRAN machines do, so I can find and +> fix the issues before submitting the package? +> +> Thanks, +> Kevin +> +> +> [[alternative HTML version deleted]] +> +> ______________________________________________ +> R-package-devel at r-project.org mailing list +> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!PDiH4ENfjr2_Jw!E2Ma-V8bhj8k-YfGsRWAQHNJSG51USXTaeHl8IHyVh3CnziYF9t3isoCoR3Lzs13rfjPktwG-mm5LBdm2sWKZVxnV7PDLaqoBMEpMJI$[stat[.]ethz[.]ch] +> +> +> ______________________________________________ +> R-package-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-package-devel +> + +-- +Michael + + +From bbo|ker @end|ng |rom gm@||@com Sat Aug 17 15:17:40 2024 +From: bbo|ker @end|ng |rom gm@||@com (Ben Bolker) +Date: Sat, 17 Aug 2024 09:17:40 -0400 +Subject: [R-pkg-devel] R CMD check options +In-Reply-To: <6ac7a7fe-a549-7551-a02f-af63258a19ae@dewey.myzen.co.uk> +References: <52ea54f5-2a6a-4a6b-bb5d-2309f8c86f1f@gmail.com> + + <6ac7a7fe-a549-7551-a02f-af63258a19ae@dewey.myzen.co.uk> +Message-ID: + +Slight thread hijacking: does there exist/has someone compiled a list of +the environment variables that determine R CMD check's behaviour? + +On Sat, Aug 17, 2024, 9:03 AM Michael Dewey wrote: + +> And if all else fails submit to Winbuilder using the R-devel branch. +> +> Michael +> +> On 17/08/2024 10:05, Georgi Boshnakov wrote: +> > In most cases, installing a recent version of R-devel will do (here, the +> check with it will show the notes about missing package anchors). +> > +> > Georgi Boshnakov +> > +> > ________________________________________ +> > From: R-package-devel on behalf +> of Kevin R. Coombes +> > Sent: 16 August 2024 18:32 +> > To: r-package-devel at r-project.org +> > Subject: [R-pkg-devel] R CMD check options +> > +> > Hi, I maintain a bunch of packages on CRAN. When updating packages, I +> often find two sort of NOTES on some of the platforms that CRAN checks that +> don't show up on my stupid Windows (I know that's redundant; you don't have +> to tell me) machine +> > ZjQcmQRYFpfptBannerStart +> > This Message Is From a New External Sender +> > You have not previously corresponded with this sender. Please exercise +> caution when opening links or attachments included in this message. +> > +> > ZjQcmQRYFpfptBannerEnd +> > +> > Hi, +> > +> > I maintain a bunch of packages on CRAN. When updating packages, I often +> > find two sort of NOTES on some of the platforms that CRAN checks that +> > don't show up on my stupid Windows (I know that's redundant; you don't +> > have to tell me) machine at home. The two most common issues are +> > +> > * spelling (especially in the DESCRIPTION file), and +> > * links in the documentation with missing package anchors. +> > +> > How do I enable these checks so that when I run "R CMD check --as-cran" +> > it actually does behave like those CRAN machines do, so I can find and +> > fix the issues before submitting the package? +> > +> > Thanks, +> > Kevin +> > +> > +> > [[alternative HTML version deleted]] +> > +> > ______________________________________________ +> > R-package-devel at r-project.org mailing list +> > +> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!PDiH4ENfjr2_Jw!E2Ma-V8bhj8k-YfGsRWAQHNJSG51USXTaeHl8IHyVh3CnziYF9t3isoCoR3Lzs13rfjPktwG-mm5LBdm2sWKZVxnV7PDLaqoBMEpMJI$[stat[.]ethz[.]ch] +> > +> > +> > ______________________________________________ +> > R-package-devel at r-project.org mailing list +> > https://stat.ethz.ch/mailman/listinfo/r-package-devel +> > +> +> -- +> Michael +> +> ______________________________________________ +> R-package-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-package-devel +> + + [[alternative HTML version deleted]] + + +From |kry|ov @end|ng |rom d|@root@org Sat Aug 17 15:24:31 2024 +From: |kry|ov @end|ng |rom d|@root@org (Ivan Krylov) +Date: Sat, 17 Aug 2024 16:24:31 +0300 +Subject: [R-pkg-devel] R CMD check options +In-Reply-To: +References: <52ea54f5-2a6a-4a6b-bb5d-2309f8c86f1f@gmail.com> + + <6ac7a7fe-a549-7551-a02f-af63258a19ae@dewey.myzen.co.uk> + +Message-ID: <20240817162431.3a7727da@trisector> + +? Sat, 17 Aug 2024 09:17:40 -0400 +Ben Bolker ?????: + +> does there exist/has someone compiled a list of +> the environment variables that determine R CMD check's behaviour? + +https://cran.r-project.org/doc/manuals/R-ints.html#Tools + +But the _R_CHECK_CRAN_INCOMING_USE_ASPELL_ variable that Kevin would +need to set to "TRUE" in order to enable the spell checks is not +documented there. + +-- +Best regards, +Ivan + + +From |kry|ov @end|ng |rom d|@root@org Sat Aug 17 16:22:32 2024 +From: |kry|ov @end|ng |rom d|@root@org (Ivan Krylov) +Date: Sat, 17 Aug 2024 17:22:32 +0300 +Subject: [R-pkg-devel] Build process generated non-portable files +In-Reply-To: <32c21f19cb644375bc4790dce67725fa@nilly.addy.io> +References: <3f8d2aec95fb552733dc236aa611130e@nilly.addy.io> + <20240813110814.21a43010@Tarkus> + <94db9a3e5dcb79a6629143e1752a2816@nilly.addy.io> + <20240816155841.456cfd2f@arachnoid> + <32c21f19cb644375bc4790dce67725fa@nilly.addy.io> +Message-ID: <20240817172232.03f94a10@trisector> + +? Fri, 16 Aug 2024 18:53:55 +0000 +anj5xsj9 at nilly.addy.io ?????: + +> In the past other packages have failed to build and not only on the +> intel container see +> "https://github.com/SpeakEasy-2/speakeasyR/actions/runs/10202337528/job/28226219457" +> where several containers failed at the setup-deps step. There is +> overlap in which package fails (i.e. protGenerics and sparseArray +> fail in multiple containers but succeed in others while in one +> container ExperimentHub fails). It seems the only packages failing +> are from Bioconductor. Assume this is a bioconductor or pak issue. + +Could also be an rhub issue, although unlike the igraph problem below, +I have no idea where to start diagnosing it. + +> > > igraph::sample_pref(10) +> > Error in dyn.load(file, DLLpath = DLLpath, ...) : +> > unable to load shared object +> > '/root/R/x86_64-pc-linux-gnu-library/4.5/igraph/libs/igraph.so': +> > libopenblasp.so.0: cannot open shared object file: No such file or +> > directory +> +> I.e. the same error with building targets. I can raise an issue on +> rigraph as well. + +This is a problem with the binary package used by rhub. If you +reinstall the source package from CRAN instead of +https://github.com/r-hub/repos and +https://github.com/cran/igraph/releases/, it will work, but take much +more time compiling the package: + +options(repos = getOption('repos')['CRAN']) +install.packages('igraph') + +-- +Best regards, +Ivan + + +From edd @end|ng |rom deb|@n@org Sat Aug 17 16:33:24 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Sat, 17 Aug 2024 09:33:24 -0500 +Subject: [R-pkg-devel] Build process generated non-portable files +In-Reply-To: <20240817172232.03f94a10@trisector> +References: <3f8d2aec95fb552733dc236aa611130e@nilly.addy.io> + <20240813110814.21a43010@Tarkus> + <94db9a3e5dcb79a6629143e1752a2816@nilly.addy.io> + <20240816155841.456cfd2f@arachnoid> + <32c21f19cb644375bc4790dce67725fa@nilly.addy.io> + <20240817172232.03f94a10@trisector> +Message-ID: <26304.46260.827467.518754@rob.eddelbuettel.com> + + +On 17 August 2024 at 17:22, Ivan Krylov via R-package-devel wrote: +| ? Fri, 16 Aug 2024 18:53:55 +0000 +| anj5xsj9 at nilly.addy.io ?????: +| +| > In the past other packages have failed to build and not only on the +| > intel container see +| > "https://github.com/SpeakEasy-2/speakeasyR/actions/runs/10202337528/job/28226219457" +| > where several containers failed at the setup-deps step. There is +| > overlap in which package fails (i.e. protGenerics and sparseArray +| > fail in multiple containers but succeed in others while in one +| > container ExperimentHub fails). It seems the only packages failing +| > are from Bioconductor. Assume this is a bioconductor or pak issue. +| +| Could also be an rhub issue, although unlike the igraph problem below, +| I have no idea where to start diagnosing it. +| +| > > > igraph::sample_pref(10) +| > > Error in dyn.load(file, DLLpath = DLLpath, ...) : +| > > unable to load shared object +| > > '/root/R/x86_64-pc-linux-gnu-library/4.5/igraph/libs/igraph.so': +| > > libopenblasp.so.0: cannot open shared object file: No such file or +| > > directory +| > +| > I.e. the same error with building targets. I can raise an issue on +| > rigraph as well. +| +| This is a problem with the binary package used by rhub. If you +| reinstall the source package from CRAN instead of +| https://github.com/r-hub/repos and +| https://github.com/cran/igraph/releases/, it will work, but take much +| more time compiling the package: +| +| options(repos = getOption('repos')['CRAN']) +| install.packages('igraph') + +The r2u binaries can offer help here for running on Ubuntu. They are a +'superset' of the same p3m binaries but aim to (and generally manage to) +provide working binaries. I just validated via a Docker container running it: + +edd at rob:~$ docker run --rm -ti rocker/r2u:jammy bash +root at 64a8b23a9bc7:/# install.r igraph # install.packages() works too +[ ... log of installation of 14 binaries omitted here ... ] +root at 64a8b23a9bc7:/# R + +R version 4.4.1 (2024-06-14) -- "Race for Your Life" +Copyright (C) 2024 The R Foundation for Statistical Computing +Platform: x86_64-pc-linux-gnu + +R is free software and comes with ABSOLUTELY NO WARRANTY. +You are welcome to redistribute it under certain conditions. +Type 'license()' or 'licence()' for distribution details. + + Natural language support but running in an English locale + +R is a collaborative project with many contributors. +Type 'contributors()' for more information and +'citation()' on how to cite R or R packages in publications. + +Type 'demo()' for some demos, 'help()' for on-line help, or +'help.start()' for an HTML browser interface to help. +Type 'q()' to quit R. + +> library(igraph) + +Attaching package: ?igraph? + +The following objects are masked from ?package:stats?: + + decompose, spectrum + +The following object is masked from ?package:base?: + + union + +> + + +Best, Dirk + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From kev|n@r@coombe@ @end|ng |rom gm@||@com Sat Aug 17 16:58:16 2024 +From: kev|n@r@coombe@ @end|ng |rom gm@||@com (Kevin R. Coombes) +Date: Sat, 17 Aug 2024 10:58:16 -0400 +Subject: [R-pkg-devel] R CMD check options +In-Reply-To: <20240817162431.3a7727da@trisector> +References: <52ea54f5-2a6a-4a6b-bb5d-2309f8c86f1f@gmail.com> + + <6ac7a7fe-a549-7551-a02f-af63258a19ae@dewey.myzen.co.uk> + + <20240817162431.3a7727da@trisector> +Message-ID: <9d390252-f6b9-4fa8-9f45-740544dd4db6@gmail.com> + +Thank you! + +On 8/17/2024 9:24 AM, Ivan Krylov wrote: +> ? Sat, 17 Aug 2024 09:17:40 -0400 +> Ben Bolker ?????: +> +>> does there exist/has someone compiled a list of +>> the environment variables that determine R CMD check's behaviour? +> https://cran.r-project.org/doc/manuals/R-ints.html#Tools +> +> But the _R_CHECK_CRAN_INCOMING_USE_ASPELL_ variable that Kevin would +> need to set to "TRUE" in order to enable the spell checks is not +> documented there. +> + + +From edd @end|ng |rom deb|@n@org Mon Aug 19 14:54:22 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Mon, 19 Aug 2024 07:54:22 -0500 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +Message-ID: <26307.16510.300799.974636@rob.eddelbuettel.com> + + +Has anybody written a quick helper function that extracts the Authors at R field +from tools::CRAN_package_db() and 'stems' it into 'Name, Firstname, ORCID' +which one could use to look up ORCID IDs at CRAN? The lookup at orcid.org +sometimes gives us 'private entries' that make it harder / impossible to +confirm a match. Having a normalised matrix or data.frame (or ...) would also +make it easier to generate Authors at R. + +Cheers, Dirk + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From th|erry@onke||nx @end|ng |rom |nbo@be Mon Aug 19 15:15:22 2024 +From: th|erry@onke||nx @end|ng |rom |nbo@be (Thierry Onkelinx) +Date: Mon, 19 Aug 2024 15:15:22 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26307.16510.300799.974636@rob.eddelbuettel.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> +Message-ID: + +Dear Dirk, + +Maybe checklist:::author2df() might be useful. It is an unexported function +from my checklist package. It converts a person() object to a dataframe. +https://github.com/inbo/checklist/blob/5649985b58693acb88337873ae14a7d5bc018d96/R/store_authors.R#L38 + +df <- tools::CRAN_package_db() +lapply( + df$`Authors at R`[df$Package %in% c("git2rdata", "qrcode")], + function(x) { + parse(text = x) |> + eval() |> + vapply(checklist:::author2df, vector(mode = "list", 1)) |> + do.call(what = rbind) + } +) + +[[1]] + given family email orcid +affiliation usage +1 Thierry Onkelinx thierry.onkelinx at inbo.be 0000-0001-8804-4216 + 1 +2 Floris Vanderhaeghe floris.vanderhaeghe at inbo.be 0000-0002-6378-6229 + 1 +3 Peter Desmet peter.desmet at inbo.be 0000-0002-8442-8025 + 1 +4 Els Lommelen els.lommelen at inbo.be 0000-0002-3481-5684 + 1 + +[[2]] + given family email orcid affiliation usage +1 Thierry Onkelinx qrcode at muscardinus.be 0000-0001-8804-4216 1 +2 Victor Teh victorteh at gmail.com + + +ir. Thierry Onkelinx +Statisticus / Statistician + +Vlaamse Overheid / Government of Flanders +INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND +FOREST +Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance +thierry.onkelinx at inbo.be +Havenlaan 88 bus 73, 1000 Brussel +*Postadres:* Koning Albert II-laan 15 bus 186, 1210 Brussel +*Poststukken die naar dit adres worden gestuurd, worden ingescand en +digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar +dossiers volledig digitaal behandelen. Poststukken met de vermelding +?vertrouwelijk? worden niet ingescand, maar ongeopend aan de geadresseerde +bezorgd.* +www.inbo.be + +/////////////////////////////////////////////////////////////////////////////////////////// +To call in the statistician after the experiment is done may be no more +than asking him to perform a post-mortem examination: he may be able to say +what the experiment died of. ~ Sir Ronald Aylmer Fisher +The plural of anecdote is not data. ~ Roger Brinner +The combination of some data and an aching desire for an answer does not +ensure that a reasonable answer can be extracted from a given body of data. +~ John Tukey +/////////////////////////////////////////////////////////////////////////////////////////// + + + + +Op ma 19 aug 2024 om 14:54 schreef Dirk Eddelbuettel : + +> +> Has anybody written a quick helper function that extracts the Authors at R +> field +> from tools::CRAN_package_db() and 'stems' it into 'Name, Firstname, ORCID' +> which one could use to look up ORCID IDs at CRAN? The lookup at orcid.org +> sometimes gives us 'private entries' that make it harder / impossible to +> confirm a match. Having a normalised matrix or data.frame (or ...) would +> also +> make it easier to generate Authors at R. +> +> Cheers, Dirk +> +> -- +> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +> +> ______________________________________________ +> R-package-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-package-devel +> + + [[alternative HTML version deleted]] + + +From edd @end|ng |rom deb|@n@org Mon Aug 19 15:39:04 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Mon, 19 Aug 2024 08:39:04 -0500 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + +Message-ID: <26307.19192.825552.550821@rob.eddelbuettel.com> + + +On 19 August 2024 at 15:15, Thierry Onkelinx wrote: +| Maybe checklist:::author2df() might be useful. It is an unexported function +| from my checklist package. It converts a person() object to a dataframe. +| https://github.com/inbo/checklist/blob/5649985b58693acb88337873ae14a7d5bc018d96 +| /R/store_authors.R#L38 +| +| df <- tools::CRAN_package_db() +| lapply( +| ? df$`Authors at R`[df$Package ?%in% c("git2rdata", "qrcode")], +| ? function(x) { +| ? ? parse(text = x) |> +| ? ? ? eval() |> +| ? ? ? vapply(checklist:::author2df, vector(mode = "list", 1)) |> +| ? ? ? do.call(what = rbind) +| ? } +| ) +| +| +| [[1]] +| given family email orcid affiliation usage +| 1 Thierry Onkelinx thierry.onkelinx at inbo.be 0000-0001-8804-4216 1 +| 2 Floris Vanderhaeghe floris.vanderhaeghe at inbo.be 0000-0002-6378-6229 1 +| 3 Peter Desmet peter.desmet at inbo.be 0000-0002-8442-8025 1 +| 4 Els Lommelen els.lommelen at inbo.be 0000-0002-3481-5684 1 +| +| [[2]] +| given family email orcid affiliation usage +| 1 Thierry Onkelinx qrcode at muscardinus.be 0000-0001-8804-4216 1 +| 2 Victor Teh victorteh at gmail.com + +That's a very nice start, thank you. (Will also look more closely at +checklist.) It needs an `na.omit()` or alike, and even with that `rbind` +barked a few entries in (i = 19 if you select the full vector right now). + +But definitely something to play with and possibly build upon. Thanks! (And +the IDs of Floris and you were two of the ones I 'manually' added to a +DESCRIPTION file ;-) + +Best, Dirk + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From th|erry@onke||nx @end|ng |rom |nbo@be Tue Aug 20 13:43:20 2024 +From: th|erry@onke||nx @end|ng |rom |nbo@be (Thierry Onkelinx) +Date: Tue, 20 Aug 2024 13:43:20 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26307.19192.825552.550821@rob.eddelbuettel.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> +Message-ID: + +Hi Dirk, + +Happy to help. I'm working on a new version of the checklist package. I +could export the function if that makes it easier for you. + +Best regards, + +Thierry + +ir. Thierry Onkelinx +Statisticus / Statistician + +Vlaamse Overheid / Government of Flanders +INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND +FOREST +Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance +thierry.onkelinx at inbo.be +Havenlaan 88 bus 73, 1000 Brussel +*Postadres:* Koning Albert II-laan 15 bus 186, 1210 Brussel +*Poststukken die naar dit adres worden gestuurd, worden ingescand en +digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar +dossiers volledig digitaal behandelen. Poststukken met de vermelding +?vertrouwelijk? worden niet ingescand, maar ongeopend aan de geadresseerde +bezorgd.* +www.inbo.be + +/////////////////////////////////////////////////////////////////////////////////////////// +To call in the statistician after the experiment is done may be no more +than asking him to perform a post-mortem examination: he may be able to say +what the experiment died of. ~ Sir Ronald Aylmer Fisher +The plural of anecdote is not data. ~ Roger Brinner +The combination of some data and an aching desire for an answer does not +ensure that a reasonable answer can be extracted from a given body of data. +~ John Tukey +/////////////////////////////////////////////////////////////////////////////////////////// + + + + +Op ma 19 aug 2024 om 15:39 schreef Dirk Eddelbuettel : + +> +> On 19 August 2024 at 15:15, Thierry Onkelinx wrote: +> | Maybe checklist:::author2df() might be useful. It is an unexported +> function +> | from my checklist package. It converts a person() object to a dataframe. +> | +> https://github.com/inbo/checklist/blob/5649985b58693acb88337873ae14a7d5bc018d96 +> | /R/store_authors.R#L38 +> | +> | df <- tools::CRAN_package_db() +> | lapply( +> | df$`Authors at R`[df$Package %in% c("git2rdata", "qrcode")], +> | function(x) { +> | parse(text = x) |> +> | eval() |> +> | vapply(checklist:::author2df, vector(mode = "list", 1)) |> +> | do.call(what = rbind) +> | } +> | ) +> | +> | +> | [[1]] +> | given family email orcid +> affiliation usage +> | 1 Thierry Onkelinx thierry.onkelinx at inbo.be 0000-0001-8804-4216 +> 1 +> | 2 Floris Vanderhaeghe floris.vanderhaeghe at inbo.be 0000-0002-6378-6229 +> 1 +> | 3 Peter Desmet peter.desmet at inbo.be 0000-0002-8442-8025 +> 1 +> | 4 Els Lommelen els.lommelen at inbo.be 0000-0002-3481-5684 +> 1 +> | +> | [[2]] +> | given family email orcid affiliation +> usage +> | 1 Thierry Onkelinx qrcode at muscardinus.be 0000-0001-8804-4216 +> 1 +> | 2 Victor Teh victorteh at gmail.com +> +> That's a very nice start, thank you. (Will also look more closely at +> checklist.) It needs an `na.omit()` or alike, and even with that `rbind` +> barked a few entries in (i = 19 if you select the full vector right now). +> +> But definitely something to play with and possibly build upon. Thanks! +> (And +> the IDs of Floris and you were two of the ones I 'manually' added to a +> DESCRIPTION file ;-) +> +> Best, Dirk +> +> -- +> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +> + + [[alternative HTML version deleted]] + + +From edd @end|ng |rom deb|@n@org Tue Aug 20 13:50:20 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Tue, 20 Aug 2024 06:50:20 -0500 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + +Message-ID: <26308.33532.40021.698529@rob.eddelbuettel.com> + + +Salut Thierry, + +On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +| Happy to help. I'm working on a new version of the checklist package. I could +| export the function if that makes it easier for you. + +Would be happy to help / iterate. Can you take a stab at making the +per-column split more robust so that we can bulk-process all non-NA entries +of the returned db? + +Best, Dirk + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From Kurt@Horn|k @end|ng |rom wu@@c@@t Tue Aug 20 14:29:50 2024 +From: Kurt@Horn|k @end|ng |rom wu@@c@@t (Kurt Hornik) +Date: Tue, 20 Aug 2024 14:29:50 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26308.33532.40021.698529@rob.eddelbuettel.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> +Message-ID: <26308.35902.672739.512614@hornik.net> + +>>>>> Dirk Eddelbuettel writes: + +Dirk et al, + +Sorry for not replying any sooner :-) + +I think for now you could use something like what I attach below. + +Not ideal: I had not too long ago starting adding orcidtools.R to tools, +which e.g. has .persons_from_metadata(), but that works on the unpacked +sources and not the CRAN package db. Need to think about that ... + +Best +-k + +******************************************************************** +x <- tools::CRAN_package_db() +a <- lapply(x[["Authors at R"]], + function(a) { + if(!is.na(a)) { + a <- tryCatch(utils:::.read_authors_at_R_field(a), + error = identity) + if (inherits(a, "person")) + return(a) + } + NULL + }) +a <- do.call(c, a) +a <- lapply(a, + function(e) { + if(is.null(o <- e$comment["ORCID"]) || is.na(o)) + return(NULL) + cbind(given = paste(e$given, collapse = " "), + family = paste(e$family, collapse = " "), + oid = unname(o)) + }) +a <- as.data.frame(do.call(rbind, a)) +******************************************************************** + +> Salut Thierry, + +> On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +> | Happy to help. I'm working on a new version of the checklist package. I could +> | export the function if that makes it easier for you. + +> Would be happy to help / iterate. Can you take a stab at making the +> per-column split more robust so that we can bulk-process all non-NA entries +> of the returned db? + +> Best, Dirk + +> -- +> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From edd @end|ng |rom deb|@n@org Tue Aug 20 14:57:14 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Tue, 20 Aug 2024 07:57:14 -0500 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26308.35902.672739.512614@hornik.net> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> +Message-ID: <26308.37546.287996.39155@rob.eddelbuettel.com> + + +Hi Kurt, + +On 20 August 2024 at 14:29, Kurt Hornik wrote: +| I think for now you could use something like what I attach below. +| +| Not ideal: I had not too long ago starting adding orcidtools.R to tools, +| which e.g. has .persons_from_metadata(), but that works on the unpacked +| sources and not the CRAN package db. Need to think about that ... + +We need something like that too as I fat-fingered the string 'ORCID'. See +fortune::fortunes("Dirk can type"). + +Will the function below later. Many thanks for sending it along. + +Dirk + +| +| Best +| -k +| +| ******************************************************************** +| x <- tools::CRAN_package_db() +| a <- lapply(x[["Authors at R"]], +| function(a) { +| if(!is.na(a)) { +| a <- tryCatch(utils:::.read_authors_at_R_field(a), +| error = identity) +| if (inherits(a, "person")) +| return(a) +| } +| NULL +| }) +| a <- do.call(c, a) +| a <- lapply(a, +| function(e) { +| if(is.null(o <- e$comment["ORCID"]) || is.na(o)) +| return(NULL) +| cbind(given = paste(e$given, collapse = " "), +| family = paste(e$family, collapse = " "), +| oid = unname(o)) +| }) +| a <- as.data.frame(do.call(rbind, a)) +| ******************************************************************** +| +| > Salut Thierry, +| +| > On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +| > | Happy to help. I'm working on a new version of the checklist package. I could +| > | export the function if that makes it easier for you. +| +| > Would be happy to help / iterate. Can you take a stab at making the +| > per-column split more robust so that we can bulk-process all non-NA entries +| > of the returned db? +| +| > Best, Dirk +| +| > -- +| > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From chr|@ho|d @end|ng |rom p@yctc@org Tue Aug 20 15:13:24 2024 +From: chr|@ho|d @end|ng |rom p@yctc@org (Chris Evans) +Date: Tue, 20 Aug 2024 15:13:24 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26308.37546.287996.39155@rob.eddelbuettel.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> +Message-ID: + +As I think that should be + +fortunes::fortune("Dirk can type") + +rather than + +fortune::fortunes("Dirk can type") + +I think that has become both recursive and demonstrating excellent +test-retest stability!? Oh boy do I know that issue! + +Chris + +On 20/08/2024 14:57, Dirk Eddelbuettel wrote: +> Hi Kurt, +> +> On 20 August 2024 at 14:29, Kurt Hornik wrote: +> | I think for now you could use something like what I attach below. +> | +> | Not ideal: I had not too long ago starting adding orcidtools.R to tools, +> | which e.g. has .persons_from_metadata(), but that works on the unpacked +> | sources and not the CRAN package db. Need to think about that ... +> +> We need something like that too as I fat-fingered the string 'ORCID'. See +> fortune::fortunes("Dirk can type"). +> +> Will the function below later. Many thanks for sending it along. +> +> Dirk +> +> | +> | Best +> | -k +> | +> | ******************************************************************** +> | x <- tools::CRAN_package_db() +> | a <- lapply(x[["Authors at R"]], +> | function(a) { +> | if(!is.na(a)) { +> | a <- tryCatch(utils:::.read_authors_at_R_field(a), +> | error = identity) +> | if (inherits(a, "person")) +> | return(a) +> | } +> | NULL +> | }) +> | a <- do.call(c, a) +> | a <- lapply(a, +> | function(e) { +> | if(is.null(o <- e$comment["ORCID"]) || is.na(o)) +> | return(NULL) +> | cbind(given = paste(e$given, collapse = " "), +> | family = paste(e$family, collapse = " "), +> | oid = unname(o)) +> | }) +> | a <- as.data.frame(do.call(rbind, a)) +> | ******************************************************************** +> | +> | > Salut Thierry, +> | +> | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +> | > | Happy to help. I'm working on a new version of the checklist package. I could +> | > | export the function if that makes it easier for you. +> | +> | > Would be happy to help / iterate. Can you take a stab at making the +> | > per-column split more robust so that we can bulk-process all non-NA entries +> | > of the returned db? +> | +> | > Best, Dirk +> | +> | > -- +> | > dirk.eddelbuettel.com | @eddelbuettel |edd at debian.org +> +-- +Chris Evans (he/him) +Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, +University of Roehampton, London, UK. +Work web site: https://www.psyctc.org/psyctc/ +CORE site: http://www.coresystemtrust.org.uk/ +Personal site: https://www.psyctc.org/pelerinage2016/ +Emeetings (Thursdays): +https://www.psyctc.org/psyctc/booking-meetings-with-me/ +(Beware: French time, generally an hour ahead of UK) + + [[alternative HTML version deleted]] + + +From edd @end|ng |rom deb|@n@org Tue Aug 20 15:29:19 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Tue, 20 Aug 2024 08:29:19 -0500 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + +Message-ID: <26308.39471.752585.593306@rob.eddelbuettel.com> + + +On 20 August 2024 at 15:13, Chris Evans wrote: +| As I think that should be +| +| fortunes::fortune("Dirk can type") +| +| rather than +| +| fortune::fortunes("Dirk can type") + +Yes, thank you. I also failed to run that post through CI and testing before +sending. Doing too many things at once... + +Dirk + +| I think that has become both recursive and demonstrating excellent +| test-retest stability!? Oh boy do I know that issue! +| +| Chris +| +| On 20/08/2024 14:57, Dirk Eddelbuettel wrote: +| > Hi Kurt, +| > +| > On 20 August 2024 at 14:29, Kurt Hornik wrote: +| > | I think for now you could use something like what I attach below. +| > | +| > | Not ideal: I had not too long ago starting adding orcidtools.R to tools, +| > | which e.g. has .persons_from_metadata(), but that works on the unpacked +| > | sources and not the CRAN package db. Need to think about that ... +| > +| > We need something like that too as I fat-fingered the string 'ORCID'. See +| > fortune::fortunes("Dirk can type"). +| > +| > Will the function below later. Many thanks for sending it along. +| > +| > Dirk +| > +| > | +| > | Best +| > | -k +| > | +| > | ******************************************************************** +| > | x <- tools::CRAN_package_db() +| > | a <- lapply(x[["Authors at R"]], +| > | function(a) { +| > | if(!is.na(a)) { +| > | a <- tryCatch(utils:::.read_authors_at_R_field(a), +| > | error = identity) +| > | if (inherits(a, "person")) +| > | return(a) +| > | } +| > | NULL +| > | }) +| > | a <- do.call(c, a) +| > | a <- lapply(a, +| > | function(e) { +| > | if(is.null(o <- e$comment["ORCID"]) || is.na(o)) +| > | return(NULL) +| > | cbind(given = paste(e$given, collapse = " "), +| > | family = paste(e$family, collapse = " "), +| > | oid = unname(o)) +| > | }) +| > | a <- as.data.frame(do.call(rbind, a)) +| > | ******************************************************************** +| > | +| > | > Salut Thierry, +| > | +| > | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +| > | > | Happy to help. I'm working on a new version of the checklist package. I could +| > | > | export the function if that makes it easier for you. +| > | +| > | > Would be happy to help / iterate. Can you take a stab at making the +| > | > per-column split more robust so that we can bulk-process all non-NA entries +| > | > of the returned db? +| > | +| > | > Best, Dirk +| > | +| > | > -- +| > | > dirk.eddelbuettel.com | @eddelbuettel |edd at debian.org +| > +| -- +| Chris Evans (he/him) +| Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, +| University of Roehampton, London, UK. +| Work web site: https://www.psyctc.org/psyctc/ +| CORE site: http://www.coresystemtrust.org.uk/ +| Personal site: https://www.psyctc.org/pelerinage2016/ +| Emeetings (Thursdays): +| https://www.psyctc.org/psyctc/booking-meetings-with-me/ +| (Beware: French time, generally an hour ahead of UK) +| +| [[alternative HTML version deleted]] +| +| ______________________________________________ +| R-package-devel at r-project.org mailing list +| https://stat.ethz.ch/mailman/listinfo/r-package-devel + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From edd @end|ng |rom deb|@n@org Tue Aug 20 15:31:45 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Tue, 20 Aug 2024 08:31:45 -0500 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26308.37546.287996.39155@rob.eddelbuettel.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> +Message-ID: <26308.39617.980467.97910@rob.eddelbuettel.com> + + +On 20 August 2024 at 07:57, Dirk Eddelbuettel wrote: +| +| Hi Kurt, +| +| On 20 August 2024 at 14:29, Kurt Hornik wrote: +| | I think for now you could use something like what I attach below. +| | +| | Not ideal: I had not too long ago starting adding orcidtools.R to tools, +| | which e.g. has .persons_from_metadata(), but that works on the unpacked +| | sources and not the CRAN package db. Need to think about that ... +| +| We need something like that too as I fat-fingered the string 'ORCID'. See +| fortune::fortunes("Dirk can type"). +| +| Will the function below later. Many thanks for sending it along. + +Very nice. Resisted my common impulse to make it a data.table for easy +sorting via keys etc. After running your code the line + + head(with(a, sort_by(a, ~ family + given)), 100) + +shows that we need a bit more QA as person entries are not properly split +between 'family' and 'given', use the URL and that we have repeats. +Excluding those is next. + +Dirk + +| Dirk +| +| | +| | Best +| | -k +| | +| | ******************************************************************** +| | x <- tools::CRAN_package_db() +| | a <- lapply(x[["Authors at R"]], +| | function(a) { +| | if(!is.na(a)) { +| | a <- tryCatch(utils:::.read_authors_at_R_field(a), +| | error = identity) +| | if (inherits(a, "person")) +| | return(a) +| | } +| | NULL +| | }) +| | a <- do.call(c, a) +| | a <- lapply(a, +| | function(e) { +| | if(is.null(o <- e$comment["ORCID"]) || is.na(o)) +| | return(NULL) +| | cbind(given = paste(e$given, collapse = " "), +| | family = paste(e$family, collapse = " "), +| | oid = unname(o)) +| | }) +| | a <- as.data.frame(do.call(rbind, a)) +| | ******************************************************************** +| | +| | > Salut Thierry, +| | +| | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +| | > | Happy to help. I'm working on a new version of the checklist package. I could +| | > | export the function if that makes it easier for you. +| | +| | > Would be happy to help / iterate. Can you take a stab at making the +| | > per-column split more robust so that we can bulk-process all non-NA entries +| | > of the returned db? +| | +| | > Best, Dirk +| | +| | > -- +| | > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +| +| -- +| dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From Kurt@Horn|k @end|ng |rom wu@@c@@t Tue Aug 20 15:43:04 2024 +From: Kurt@Horn|k @end|ng |rom wu@@c@@t (Kurt Hornik) +Date: Tue, 20 Aug 2024 15:43:04 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26308.39617.980467.97910@rob.eddelbuettel.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> +Message-ID: <26308.40296.318704.764186@hornik.net> + +>>>>> Dirk Eddelbuettel writes: + +> On 20 August 2024 at 07:57, Dirk Eddelbuettel wrote: +> | +> | Hi Kurt, +> | +> | On 20 August 2024 at 14:29, Kurt Hornik wrote: +> | | I think for now you could use something like what I attach below. +> | | +> | | Not ideal: I had not too long ago starting adding orcidtools.R to tools, +> | | which e.g. has .persons_from_metadata(), but that works on the unpacked +> | | sources and not the CRAN package db. Need to think about that ... +> | +> | We need something like that too as I fat-fingered the string 'ORCID'. See +> | fortune::fortunes("Dirk can type"). +> | +> | Will the function below later. Many thanks for sending it along. + +> Very nice. Resisted my common impulse to make it a data.table for easy +> sorting via keys etc. After running your code the line + +> head(with(a, sort_by(a, ~ family + given)), 100) + +> shows that we need a bit more QA as person entries are not properly split +> between 'family' and 'given', use the URL and that we have repeats. +> Excluding those is next. + +Right. One should canonicalize the ORCID (having the URLs is from being +nice) and then do unique() ... + +Best +-k + + + +> Dirk + +> | Dirk +> | +> | | +> | | Best +> | | -k +> | | +> | | ******************************************************************** +> | | x <- tools::CRAN_package_db() +> | | a <- lapply(x[["Authors at R"]], +> | | function(a) { +> | | if(!is.na(a)) { +> | | a <- tryCatch(utils:::.read_authors_at_R_field(a), +> | | error = identity) +> | | if (inherits(a, "person")) +> | | return(a) +> | | } +> | | NULL +> | | }) +> | | a <- do.call(c, a) +> | | a <- lapply(a, +> | | function(e) { +> | | if(is.null(o <- e$comment["ORCID"]) || is.na(o)) +> | | return(NULL) +> | | cbind(given = paste(e$given, collapse = " "), +> | | family = paste(e$family, collapse = " "), +> | | oid = unname(o)) +> | | }) +> | | a <- as.data.frame(do.call(rbind, a)) +> | | ******************************************************************** +> | | +> | | > Salut Thierry, +> | | +> | | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +> | | > | Happy to help. I'm working on a new version of the checklist package. I could +> | | > | export the function if that makes it easier for you. +> | | +> | | > Would be happy to help / iterate. Can you take a stab at making the +> | | > per-column split more robust so that we can bulk-process all non-NA entries +> | | > of the returned db? +> | | +> | | > Best, Dirk +> | | +> | | > -- +> | | > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +> | +> | -- +> | dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + +> -- +> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From Kurt@Horn|k @end|ng |rom wu@@c@@t Tue Aug 20 15:47:22 2024 +From: Kurt@Horn|k @end|ng |rom wu@@c@@t (Kurt Hornik) +Date: Tue, 20 Aug 2024 15:47:22 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26308.40296.318704.764186@hornik.net> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> + <26308.40296.318704.764186@hornik.net> +Message-ID: <26308.40554.894652.252504@hornik.net> + +>>>>> Kurt Hornik writes: + +The variant attaches drops the URL and does unique. + +Hmm, the ones in + + head(with(a, sort_by(a, ~ family + given)), 100) + +without a family look suspicious ... + +Best +-k + + +-------------- next part -------------- +An embedded and charset-unspecified text was scrubbed... +Name: orcid.R +URL: + +-------------- next part -------------- + + +>>>>> Dirk Eddelbuettel writes: +>> On 20 August 2024 at 07:57, Dirk Eddelbuettel wrote: +>> | +>> | Hi Kurt, +>> | +>> | On 20 August 2024 at 14:29, Kurt Hornik wrote: +>> | | I think for now you could use something like what I attach below. +>> | | +>> | | Not ideal: I had not too long ago starting adding orcidtools.R to tools, +>> | | which e.g. has .persons_from_metadata(), but that works on the unpacked +>> | | sources and not the CRAN package db. Need to think about that ... +>> | +>> | We need something like that too as I fat-fingered the string 'ORCID'. See +>> | fortune::fortunes("Dirk can type"). +>> | +>> | Will the function below later. Many thanks for sending it along. + +>> Very nice. Resisted my common impulse to make it a data.table for easy +>> sorting via keys etc. After running your code the line + +>> head(with(a, sort_by(a, ~ family + given)), 100) + +>> shows that we need a bit more QA as person entries are not properly split +>> between 'family' and 'given', use the URL and that we have repeats. +>> Excluding those is next. + +> Right. One should canonicalize the ORCID (having the URLs is from being +> nice) and then do unique() ... + +> Best +> -k + + + +>> Dirk + +>> | Dirk +>> | +>> | | +>> | | Best +>> | | -k +>> | | +>> | | ******************************************************************** +>> | | x <- tools::CRAN_package_db() +>> | | a <- lapply(x[["Authors at R"]], +>> | | function(a) { +>> | | if(!is.na(a)) { +>> | | a <- tryCatch(utils:::.read_authors_at_R_field(a), +>> | | error = identity) +>> | | if (inherits(a, "person")) +>> | | return(a) +>> | | } +>> | | NULL +>> | | }) +>> | | a <- do.call(c, a) +>> | | a <- lapply(a, +>> | | function(e) { +>> | | if(is.null(o <- e$comment["ORCID"]) || is.na(o)) +>> | | return(NULL) +>> | | cbind(given = paste(e$given, collapse = " "), +>> | | family = paste(e$family, collapse = " "), +>> | | oid = unname(o)) +>> | | }) +>> | | a <- as.data.frame(do.call(rbind, a)) +>> | | ******************************************************************** +>> | | +>> | | > Salut Thierry, +>> | | +>> | | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +>> | | > | Happy to help. I'm working on a new version of the checklist package. I could +>> | | > | export the function if that makes it easier for you. +>> | | +>> | | > Would be happy to help / iterate. Can you take a stab at making the +>> | | > per-column split more robust so that we can bulk-process all non-NA entries +>> | | > of the returned db? +>> | | +>> | | > Best, Dirk +>> | | +>> | | > -- +>> | | > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +>> | +>> | -- +>> | dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + +>> -- +>> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + +From bbo|ker @end|ng |rom gm@||@com Tue Aug 20 15:58:49 2024 +From: bbo|ker @end|ng |rom gm@||@com (Ben Bolker) +Date: Tue, 20 Aug 2024 09:58:49 -0400 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26308.40554.894652.252504@hornik.net> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> + <26308.40296.318704.764186@hornik.net> <26308.40554.894652.252504@hornik.net> +Message-ID: <65590043-337b-4d89-b540-36fb5ba27101@gmail.com> + + Looking into one particular example, + +https://github.com/seabbs/idmodelr/blob/master/DESCRIPTION + +this appears to be the authors' fault: + +Authors at R: c( + person(given = "Sam Abbott", + role = c("aut", "cre"), + email = "contact at samabbott.co.uk", + comment = c(ORCID = "0000-0001-8057-8037")), + person(given = "Akira Endo", + role = c("aut"), + email = "akira.endo at lshtm.ac.uk", + comment = c(ORCID = "0000-0001-6377-7296"))) + + Maybe CRAN should start checking for missing 'family' fields in +Authors at R ... ??? + + cheers + Ben Bolker + +On 2024-08-20 9:47 a.m., Kurt Hornik wrote: +>>>>>> Kurt Hornik writes: +> +> The variant attaches drops the URL and does unique. +> +> Hmm, the ones in +> +> head(with(a, sort_by(a, ~ family + given)), 100) +> +> without a family look suspicious ... +> +> Best +> -k +> +> +> +> +>>>>>> Dirk Eddelbuettel writes: +>>> On 20 August 2024 at 07:57, Dirk Eddelbuettel wrote: +>>> | +>>> | Hi Kurt, +>>> | +>>> | On 20 August 2024 at 14:29, Kurt Hornik wrote: +>>> | | I think for now you could use something like what I attach below. +>>> | | +>>> | | Not ideal: I had not too long ago starting adding orcidtools.R to tools, +>>> | | which e.g. has .persons_from_metadata(), but that works on the unpacked +>>> | | sources and not the CRAN package db. Need to think about that ... +>>> | +>>> | We need something like that too as I fat-fingered the string 'ORCID'. See +>>> | fortune::fortunes("Dirk can type"). +>>> | +>>> | Will the function below later. Many thanks for sending it along. +> +>>> Very nice. Resisted my common impulse to make it a data.table for easy +>>> sorting via keys etc. After running your code the line +> +>>> head(with(a, sort_by(a, ~ family + given)), 100) +> +>>> shows that we need a bit more QA as person entries are not properly split +>>> between 'family' and 'given', use the URL and that we have repeats. +>>> Excluding those is next. +> +>> Right. One should canonicalize the ORCID (having the URLs is from being +>> nice) and then do unique() ... +> +>> Best +>> -k +> +> +> +>>> Dirk +> +>>> | Dirk +>>> | +>>> | | +>>> | | Best +>>> | | -k +>>> | | +>>> | | ******************************************************************** +>>> | | x <- tools::CRAN_package_db() +>>> | | a <- lapply(x[["Authors at R"]], +>>> | | function(a) { +>>> | | if(!is.na(a)) { +>>> | | a <- tryCatch(utils:::.read_authors_at_R_field(a), +>>> | | error = identity) +>>> | | if (inherits(a, "person")) +>>> | | return(a) +>>> | | } +>>> | | NULL +>>> | | }) +>>> | | a <- do.call(c, a) +>>> | | a <- lapply(a, +>>> | | function(e) { +>>> | | if(is.null(o <- e$comment["ORCID"]) || is.na(o)) +>>> | | return(NULL) +>>> | | cbind(given = paste(e$given, collapse = " "), +>>> | | family = paste(e$family, collapse = " "), +>>> | | oid = unname(o)) +>>> | | }) +>>> | | a <- as.data.frame(do.call(rbind, a)) +>>> | | ******************************************************************** +>>> | | +>>> | | > Salut Thierry, +>>> | | +>>> | | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +>>> | | > | Happy to help. I'm working on a new version of the checklist package. I could +>>> | | > | export the function if that makes it easier for you. +>>> | | +>>> | | > Would be happy to help / iterate. Can you take a stab at making the +>>> | | > per-column split more robust so that we can bulk-process all non-NA entries +>>> | | > of the returned db? +>>> | | +>>> | | > Best, Dirk +>>> | | +>>> | | > -- +>>> | | > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +>>> | +>>> | -- +>>> | dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +> +>>> -- +>>> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +>>> +>>> ______________________________________________ +>>> R-package-devel at r-project.org mailing list +>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel + +-- +Dr. Benjamin Bolker +Professor, Mathematics & Statistics and Biology, McMaster University +Director, School of Computational Science and Engineering + > E-mail is sent at my convenience; I don't expect replies outside of +working hours. + + +From th|erry@onke||nx @end|ng |rom |nbo@be Tue Aug 20 16:25:25 2024 +From: th|erry@onke||nx @end|ng |rom |nbo@be (Thierry Onkelinx) +Date: Tue, 20 Aug 2024 16:25:25 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <65590043-337b-4d89-b540-36fb5ba27101@gmail.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> + <26308.40296.318704.764186@hornik.net> <26308.40554.894652.252504@hornik.net> + <65590043-337b-4d89-b540-36fb5ba27101@gmail.com> +Message-ID: + +Dear Ben, + +This is as simple as setting mandatory given and family fields. +checklist::check_description() ensures that given and family are set unless +the role is "cph" or "fnd". Allowing for organisations to be listed with +only the given field. + +The 0.4.1 branch of checklist + now +exports the author2df() function which now can handle objects of call +person, list, logical (NA) and NULL. Feedback is welcome. + +library(checklist) +df <- tools::CRAN_package_db() +vapply( + df$`Authors at R`[df$Package %in% c("git2rdata", "A3", "digest", "abe")], + function(x) { + parse(text = x) |> + eval() |> + list() + }, + vector(mode = "list", 1) +) |> + unname() |> + author2df() + +Best regards, + +ir. Thierry Onkelinx +Statisticus / Statistician + +Vlaamse Overheid / Government of Flanders +INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND +FOREST +Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance +thierry.onkelinx at inbo.be +Havenlaan 88 bus 73, 1000 Brussel +*Postadres:* Koning Albert II-laan 15 bus 186, 1210 Brussel +*Poststukken die naar dit adres worden gestuurd, worden ingescand en +digitaal aan de geadresseerde bezorgd. Zo kan de Vlaamse overheid haar +dossiers volledig digitaal behandelen. Poststukken met de vermelding +?vertrouwelijk? worden niet ingescand, maar ongeopend aan de geadresseerde +bezorgd.* +www.inbo.be + +/////////////////////////////////////////////////////////////////////////////////////////// +To call in the statistician after the experiment is done may be no more +than asking him to perform a post-mortem examination: he may be able to say +what the experiment died of. ~ Sir Ronald Aylmer Fisher +The plural of anecdote is not data. ~ Roger Brinner +The combination of some data and an aching desire for an answer does not +ensure that a reasonable answer can be extracted from a given body of data. +~ John Tukey +/////////////////////////////////////////////////////////////////////////////////////////// + + + + +Op di 20 aug 2024 om 15:59 schreef Ben Bolker : + +> Looking into one particular example, +> +> https://github.com/seabbs/idmodelr/blob/master/DESCRIPTION +> +> this appears to be the authors' fault: +> +> Authors at R: c( +> person(given = "Sam Abbott", +> role = c("aut", "cre"), +> email = "contact at samabbott.co.uk", +> comment = c(ORCID = "0000-0001-8057-8037")), +> person(given = "Akira Endo", +> role = c("aut"), +> email = "akira.endo at lshtm.ac.uk", +> comment = c(ORCID = "0000-0001-6377-7296"))) +> +> Maybe CRAN should start checking for missing 'family' fields in +> Authors at R ... ??? +> +> cheers +> Ben Bolker +> +> On 2024-08-20 9:47 a.m., Kurt Hornik wrote: +> >>>>>> Kurt Hornik writes: +> > +> > The variant attaches drops the URL and does unique. +> > +> > Hmm, the ones in +> > +> > head(with(a, sort_by(a, ~ family + given)), 100) +> > +> > without a family look suspicious ... +> > +> > Best +> > -k +> > +> > +> > +> > +> >>>>>> Dirk Eddelbuettel writes: +> >>> On 20 August 2024 at 07:57, Dirk Eddelbuettel wrote: +> >>> | +> >>> | Hi Kurt, +> >>> | +> >>> | On 20 August 2024 at 14:29, Kurt Hornik wrote: +> >>> | | I think for now you could use something like what I attach below. +> >>> | | +> >>> | | Not ideal: I had not too long ago starting adding orcidtools.R to +> tools, +> >>> | | which e.g. has .persons_from_metadata(), but that works on the +> unpacked +> >>> | | sources and not the CRAN package db. Need to think about that ... +> >>> | +> >>> | We need something like that too as I fat-fingered the string +> 'ORCID'. See +> >>> | fortune::fortunes("Dirk can type"). +> >>> | +> >>> | Will the function below later. Many thanks for sending it along. +> > +> >>> Very nice. Resisted my common impulse to make it a data.table for easy +> >>> sorting via keys etc. After running your code the line +> > +> >>> head(with(a, sort_by(a, ~ family + given)), 100) +> > +> >>> shows that we need a bit more QA as person entries are not properly +> split +> >>> between 'family' and 'given', use the URL and that we have repeats. +> >>> Excluding those is next. +> > +> >> Right. One should canonicalize the ORCID (having the URLs is from being +> >> nice) and then do unique() ... +> > +> >> Best +> >> -k +> > +> > +> > +> >>> Dirk +> > +> >>> | Dirk +> >>> | +> >>> | | +> >>> | | Best +> >>> | | -k +> >>> | | +> >>> | | +> ******************************************************************** +> >>> | | x <- tools::CRAN_package_db() +> >>> | | a <- lapply(x[["Authors at R"]], +> >>> | | function(a) { +> >>> | | if(!is.na(a)) { +> >>> | | a <- +> tryCatch(utils:::.read_authors_at_R_field(a), +> >>> | | error = identity) +> >>> | | if (inherits(a, "person")) +> >>> | | return(a) +> >>> | | } +> >>> | | NULL +> >>> | | }) +> >>> | | a <- do.call(c, a) +> >>> | | a <- lapply(a, +> >>> | | function(e) { +> >>> | | if(is.null(o <- e$comment["ORCID"]) || is.na(o)) +> >>> | | return(NULL) +> >>> | | cbind(given = paste(e$given, collapse = " "), +> >>> | | family = paste(e$family, collapse = " "), +> >>> | | oid = unname(o)) +> >>> | | }) +> >>> | | a <- as.data.frame(do.call(rbind, a)) +> >>> | | +> ******************************************************************** +> >>> | | +> >>> | | > Salut Thierry, +> >>> | | +> >>> | | > On 20 August 2024 at 13:43, Thierry Onkelinx wrote: +> >>> | | > | Happy to help. I'm working on a new version of the checklist +> package. I could +> >>> | | > | export the function if that makes it easier for you. +> >>> | | +> >>> | | > Would be happy to help / iterate. Can you take a stab at making +> the +> >>> | | > per-column split more robust so that we can bulk-process all +> non-NA entries +> >>> | | > of the returned db? +> >>> | | +> >>> | | > Best, Dirk +> >>> | | +> >>> | | > -- +> >>> | | > dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +> >>> | +> >>> | -- +> >>> | dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +> > +> >>> -- +> >>> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org +> >>> +> >>> ______________________________________________ +> >>> R-package-devel at r-project.org mailing list +> >>> https://stat.ethz.ch/mailman/listinfo/r-package-devel +> +> -- +> Dr. Benjamin Bolker +> Professor, Mathematics & Statistics and Biology, McMaster University +> Director, School of Computational Science and Engineering +> > E-mail is sent at my convenience; I don't expect replies outside of +> working hours. +> +> ______________________________________________ +> R-package-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-package-devel +> + + [[alternative HTML version deleted]] + + +From kev|n@r@coombe@ @end|ng |rom gm@||@com Tue Aug 20 17:51:03 2024 +From: kev|n@r@coombe@ @end|ng |rom gm@||@com (Kevin R. Coombes) +Date: Tue, 20 Aug 2024 11:51:03 -0400 +Subject: [R-pkg-devel] Spell Check with Hunspell on Windows +Message-ID: <68c9f4e1-fe16-4f60-b317-f82a3c3e339e@gmail.com> + +Hi, + +This is a follow-up to an earlier question where I asked about R CMD +check on Windows to be able to check R packages in a manner closer to +the checks on CRAN machines before i submit (new or updated) packages.? + From the answers to the previous question, I learned the (magic) +environment variable to set in order to trigger spell checks. + +So, I then managed to install "hunspell" n my Windows machine. I have +confirmed that hunspell works + + * from the command line + * inside emacs + * inside R when running the "aspell" (or "hunspell") command to check + files. (Note that I do not need to set? the "program" argument to + aspell in order for this to work.) + +However, when I run R CMD check from the command line, I get the message: + ??? ''No suitable spell-checker program found." + +Is there some other environment variable or command line option that I +have to set in order to get R CMD check to use "hunspell" instead of +"aspell" or "ispell"? + +Thanks, + ? Kevin + + [[alternative HTML version deleted]] + + +From murdoch@dunc@n @end|ng |rom gm@||@com Tue Aug 20 18:02:06 2024 +From: murdoch@dunc@n @end|ng |rom gm@||@com (Duncan Murdoch) +Date: Tue, 20 Aug 2024 12:02:06 -0400 +Subject: [R-pkg-devel] Spell Check with Hunspell on Windows +In-Reply-To: <68c9f4e1-fe16-4f60-b317-f82a3c3e339e@gmail.com> +References: <68c9f4e1-fe16-4f60-b317-f82a3c3e339e@gmail.com> +Message-ID: <5d6cc0d9-3ba4-4610-b1c3-3451b2e5f768@gmail.com> + +When running R CMD check I think the test is using + + Sys.which(c("aspell", "hunspell", "ispell")) + +to find the spell check program, so the fact that the aspell() command +worked in R suggests that the PATH that R CMD check saw is different +than the one that R saw. Were you running R from the command line when +aspell() worked? + +Duncan Murdoch + +On 2024-08-20 11:51 a.m., Kevin R. Coombes wrote: +> Hi, +> +> This is a follow-up to an earlier question where I asked about R CMD +> check on Windows to be able to check R packages in a manner closer to +> the checks on CRAN machines before i submit (new or updated) packages. +> From the answers to the previous question, I learned the (magic) +> environment variable to set in order to trigger spell checks. +> +> So, I then managed to install "hunspell" n my Windows machine. I have +> confirmed that hunspell works +> +> * from the command line +> * inside emacs +> * inside R when running the "aspell" (or "hunspell") command to check +> files. (Note that I do not need to set? the "program" argument to +> aspell in order for this to work.) +> +> However, when I run R CMD check from the command line, I get the message: +> ??? ''No suitable spell-checker program found." +> +> Is there some other environment variable or command line option that I +> have to set in order to get R CMD check to use "hunspell" instead of +> "aspell" or "ispell"? +> +> Thanks, +> ? Kevin +> +> [[alternative HTML version deleted]] +> +> ______________________________________________ +> R-package-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-package-devel + + +From edd @end|ng |rom deb|@n@org Wed Aug 21 14:43:05 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Wed, 21 Aug 2024 07:43:05 -0500 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26308.40554.894652.252504@hornik.net> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> + <26308.40296.318704.764186@hornik.net> + <26308.40554.894652.252504@hornik.net> +Message-ID: <26309.57561.42726.681094@rob.eddelbuettel.com> + + +On 20 August 2024 at 15:47, Kurt Hornik wrote: +| >>>>> Kurt Hornik writes: +| +| The variant attaches drops the URL and does unique. + +Nice. Alas, some of us default to r-release as the daily driver and then + + Error in unname(tools:::.ORCID_iD_canonicalize(o)) : + object '.ORCID_iD_canonicalize' not found + > + +Will play with my 'RD' which I keep approximately 'weekly-current'. Quick +rebuild first. + +Dirk + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From Kurt@Horn|k @end|ng |rom wu@@c@@t Wed Aug 21 14:50:20 2024 +From: Kurt@Horn|k @end|ng |rom wu@@c@@t (Kurt Hornik) +Date: Wed, 21 Aug 2024 14:50:20 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26309.57561.42726.681094@rob.eddelbuettel.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> + <26308.40296.318704.764186@hornik.net> + <26308.40554.894652.252504@hornik.net> + <26309.57561.42726.681094@rob.eddelbuettel.com> +Message-ID: <26309.57996.150219.355206@hornik.net> + +>>>>> Dirk Eddelbuettel writes: + +Meanwhile, I am close to committing a change to R-devel which adds +tools::CRAN_authors_db() with docs + + \code{CRAN_authors_db()} returns information on the authors of the + current CRAN packages extracted from the \samp{Authors at R} fields in + the package \file{DESCRIPTION} files, as a data frame with character + columns giving the given and family names, email addresses, + \abbr{ORCID} identifier, roles, and comments of the person entries, + and the corresponding package. + +Once make check-all is done ... + +Best +-k + +PS. Sorry about tools:::.ORCID_iD_canonicalize(), had run into the same +issue when building the authors db on the CRAN master (which uses +current R release ...) + +> On 20 August 2024 at 15:47, Kurt Hornik wrote: +> | >>>>> Kurt Hornik writes: +> | +> | The variant attaches drops the URL and does unique. + +> Nice. Alas, some of us default to r-release as the daily driver and then + +> Error in unname(tools:::.ORCID_iD_canonicalize(o)) : +> object '.ORCID_iD_canonicalize' not found +>> + +> Will play with my 'RD' which I keep approximately 'weekly-current'. Quick +> rebuild first. + +> Dirk + +> -- +> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From edd @end|ng |rom deb|@n@org Wed Aug 21 14:54:00 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Wed, 21 Aug 2024 07:54:00 -0500 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26309.57561.42726.681094@rob.eddelbuettel.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> + <26308.40296.318704.764186@hornik.net> + <26308.40554.894652.252504@hornik.net> + <26309.57561.42726.681094@rob.eddelbuettel.com> +Message-ID: <26309.58216.837682.944710@rob.eddelbuettel.com> + + +On 21 August 2024 at 07:43, Dirk Eddelbuettel wrote: +| +| On 20 August 2024 at 15:47, Kurt Hornik wrote: +| | >>>>> Kurt Hornik writes: +| | +| | The variant attaches drops the URL and does unique. +| +| Nice. Alas, some of us default to r-release as the daily driver and then +| +| Error in unname(tools:::.ORCID_iD_canonicalize(o)) : +| object '.ORCID_iD_canonicalize' not found +| > +| +| Will play with my 'RD' which I keep approximately 'weekly-current'. Quick +| rebuild first. + +As simple as adding + + .ORCID_iD_canonicalize <- function (x) sub(tools:::.ORCID_iD_variants_regexp, "\\3", x) + +and making the call (or maybe making it a lambda anyway ...) + + oid = unname(.ORCID_iD_canonicalize(o))) + +After adding + + a <- sort_by(a, ~ a$family + a$given) + +the first 48 out if a (currently) total of 6465 are empty for family. + + > sum(a$family == "") + [1] 48 + > + +Rest is great! + +Dirk + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From Kurt@Horn|k @end|ng |rom wu@@c@@t Wed Aug 21 15:47:53 2024 +From: Kurt@Horn|k @end|ng |rom wu@@c@@t (Kurt Hornik) +Date: Wed, 21 Aug 2024 15:47:53 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26309.57996.150219.355206@hornik.net> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> + <26308.40296.318704.764186@hornik.net> + <26308.40554.894652.252504@hornik.net> + <26309.57561.42726.681094@rob.eddelbuettel.com> + <26309.57996.150219.355206@hornik.net> +Message-ID: <26309.61449.590848.266355@hornik.net> + +>>>>> Kurt Hornik writes: + +Committed now. + +Best +-k + +>>>>> Dirk Eddelbuettel writes: +> Meanwhile, I am close to committing a change to R-devel which adds +> tools::CRAN_authors_db() with docs + +> \code{CRAN_authors_db()} returns information on the authors of the +> current CRAN packages extracted from the \samp{Authors at R} fields in +> the package \file{DESCRIPTION} files, as a data frame with character +> columns giving the given and family names, email addresses, +> \abbr{ORCID} identifier, roles, and comments of the person entries, +> and the corresponding package. + +> Once make check-all is done ... + +> Best +> -k + +> PS. Sorry about tools:::.ORCID_iD_canonicalize(), had run into the same +> issue when building the authors db on the CRAN master (which uses +> current R release ...) + +>> On 20 August 2024 at 15:47, Kurt Hornik wrote: +>> | >>>>> Kurt Hornik writes: +>> | +>> | The variant attaches drops the URL and does unique. + +>> Nice. Alas, some of us default to r-release as the daily driver and then + +>> Error in unname(tools:::.ORCID_iD_canonicalize(o)) : +>> object '.ORCID_iD_canonicalize' not found +>>> + +>> Will play with my 'RD' which I keep approximately 'weekly-current'. Quick +>> rebuild first. + +>> Dirk + +>> -- +>> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From edd @end|ng |rom deb|@n@org Wed Aug 21 15:56:57 2024 +From: edd @end|ng |rom deb|@n@org (Dirk Eddelbuettel) +Date: Wed, 21 Aug 2024 08:56:57 -0500 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26309.61449.590848.266355@hornik.net> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> + <26308.40296.318704.764186@hornik.net> + <26308.40554.894652.252504@hornik.net> + <26309.57561.42726.681094@rob.eddelbuettel.com> + <26309.57996.150219.355206@hornik.net> + <26309.61449.590848.266355@hornik.net> +Message-ID: <26309.61993.756836.505160@rob.eddelbuettel.com> + + +On 21 August 2024 at 15:47, Kurt Hornik wrote: +| >>>>> Kurt Hornik writes: +| +| Committed now. + +That is just *lovely*: + + > aut <- tools::CRAN_authors_db() + > dim(aut) + [1] 47433 7 + > head(aut) + given family email orcid role comment package + 1 Martin Bladt martinbladt at math.ku.dk aut, cre AalenJohansen + 2 Christian Furrer furrer at math.ku.dk aut AalenJohansen + 3 Sercan Kahveci sercan.kahveci at plus.ac.at aut, cre AATtools + 4 Andrew Pilny andy.pilny at uky.edu 0000-0001-6603-5490 aut, cre abasequence + 5 Sigbert Klinke sigbert at hu-berlin.de aut, cre abbreviate + 6 Csillery Katalin kati.csillery at gmail.com aut abc + > + +Can we possibly get this into r-patched and the next r-release? + +Dirk + +-- +dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From Kurt@Horn|k @end|ng |rom wu@@c@@t Wed Aug 21 18:01:49 2024 +From: Kurt@Horn|k @end|ng |rom wu@@c@@t (Kurt Hornik) +Date: Wed, 21 Aug 2024 18:01:49 +0200 +Subject: [R-pkg-devel] ORCID ID finder via tools::CRAN_package_db() ? +In-Reply-To: <26309.61993.756836.505160@rob.eddelbuettel.com> +References: <26307.16510.300799.974636@rob.eddelbuettel.com> + + <26307.19192.825552.550821@rob.eddelbuettel.com> + + <26308.33532.40021.698529@rob.eddelbuettel.com> + <26308.35902.672739.512614@hornik.net> + <26308.37546.287996.39155@rob.eddelbuettel.com> + <26308.39617.980467.97910@rob.eddelbuettel.com> + <26308.40296.318704.764186@hornik.net> + <26308.40554.894652.252504@hornik.net> + <26309.57561.42726.681094@rob.eddelbuettel.com> + <26309.57996.150219.355206@hornik.net> + <26309.61449.590848.266355@hornik.net> + <26309.61993.756836.505160@rob.eddelbuettel.com> +Message-ID: <26310.3949.654647.24648@hornik.net> + +>>>>> Dirk Eddelbuettel writes: + +Possibly yes, if there is enough "need" :-) + +Best +-k + +> On 21 August 2024 at 15:47, Kurt Hornik wrote: +> | >>>>> Kurt Hornik writes: +> | +> | Committed now. + +> That is just *lovely*: + +>> aut <- tools::CRAN_authors_db() +>> dim(aut) +> [1] 47433 7 +>> head(aut) +> given family email orcid role comment package +> 1 Martin Bladt martinbladt at math.ku.dk aut, cre AalenJohansen +> 2 Christian Furrer furrer at math.ku.dk aut AalenJohansen +> 3 Sercan Kahveci sercan.kahveci at plus.ac.at aut, cre AATtools +> 4 Andrew Pilny andy.pilny at uky.edu 0000-0001-6603-5490 aut, cre abasequence +> 5 Sigbert Klinke sigbert at hu-berlin.de aut, cre abbreviate +> 6 Csillery Katalin kati.csillery at gmail.com aut abc +>> + +> Can we possibly get this into r-patched and the next r-release? + +> Dirk + +> -- +> dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org + + +From j|ox @end|ng |rom mcm@@ter@c@ Mon Sep 2 17:34:09 2024 +From: j|ox @end|ng |rom mcm@@ter@c@ (John Fox) +Date: Mon, 2 Sep 2024 11:34:09 -0400 +Subject: [R-pkg-devel] unregistered S3 methods in a package +Message-ID: + +Dear R-package-devel list members, + +I want to introduce several unregistered S3 methods into the cv package +(code at ). These methods have the form + + coef.merMod <- function(object, ...) lme4::fixef(object) + +The object is to mask, e.g., lme4:::coef.merMod(), which returns BLUPs +rather than fixed effects, internally in the cv package but *not* to +mask the lme4 version of the method for users of the cv package -- that +could wreak havoc with their work. Doing this substantially simplifies +some of the code in the cv package. + +My question: Is it legitimate to define a method in a package for +internal use without registering it? + +This approach appears to work fine, and R CMD check doesn't complain, +although Roxygen does complain that the method isn't "exported" +(actually, isn't registered). + +Any advice or relevant information would be appreciated. + +Thank you, + John +-- +John Fox, Professor Emeritus +McMaster University +Hamilton, Ontario, Canada +web: https://www.john-fox.ca/ +-- + + +From j|ox @end|ng |rom mcm@@ter@c@ Mon Sep 2 20:57:58 2024 +From: j|ox @end|ng |rom mcm@@ter@c@ (John Fox) +Date: Mon, 2 Sep 2024 14:57:58 -0400 +Subject: [R-pkg-devel] unregistered S3 methods in a package +In-Reply-To: +References: +Message-ID: <4aa634d6-01e1-4cb3-8119-35ccdec32ef9@mcmaster.ca> + +As it turned out, I was able to avoid redefining coef.merMod(), etc., by +making a simple modification to the cv package. + +I'm still curious about whether it's OK to have unregistered S3 methods +for internal use in a package even though that's no longer necessary for +the cv package. + +On 2024-09-02 11:34 a.m., John Fox wrote: +> Caution: External email. +> +> +> Dear R-package-devel list members, +> +> I want to introduce several unregistered S3 methods into the cv package +> (code at ). These methods have the form +> +> ?????? coef.merMod <- function(object, ...) lme4::fixef(object) +> +> The object is to mask, e.g., lme4:::coef.merMod(), which returns BLUPs +> rather than fixed effects, internally in the cv package but *not* to +> mask the lme4 version of the method for users of the cv package -- that +> could wreak havoc with their work. Doing this substantially simplifies +> some of the code in the cv package. +> +> My question: Is it legitimate to define a method in a package for +> internal use without registering it? +> +> This approach appears to work fine, and R CMD check doesn't complain, +> although Roxygen does complain that the method isn't "exported" +> (actually, isn't registered). +> +> Any advice or relevant information would be appreciated. +> +> Thank you, +> ?John +> -- +> John Fox, Professor Emeritus +> McMaster University +> Hamilton, Ontario, Canada +> web: https://www.john-fox.ca/ +> -- +> +> ______________________________________________ +> R-package-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-package-devel + + +From tdhock5 @end|ng |rom gm@||@com Wed Sep 4 20:21:22 2024 +From: tdhock5 @end|ng |rom gm@||@com (Toby Hocking) +Date: Wed, 4 Sep 2024 14:21:22 -0400 +Subject: [R-pkg-devel] unregistered S3 methods in a package +In-Reply-To: <4aa634d6-01e1-4cb3-8119-35ccdec32ef9@mcmaster.ca> +References: + <4aa634d6-01e1-4cb3-8119-35ccdec32ef9@mcmaster.ca> +Message-ID: + +I got this warning too, so I filed an issue to ask +https://github.com/r-lib/roxygen2/issues/1654 + +On Mon, Sep 2, 2024 at 2:58?PM John Fox wrote: +> +> As it turned out, I was able to avoid redefining coef.merMod(), etc., by +> making a simple modification to the cv package. +> +> I'm still curious about whether it's OK to have unregistered S3 methods +> for internal use in a package even though that's no longer necessary for +> the cv package. +> +> On 2024-09-02 11:34 a.m., John Fox wrote: +> > Caution: External email. +> > +> > +> > Dear R-package-devel list members, +> > +> > I want to introduce several unregistered S3 methods into the cv package +> > (code at ). These methods have the form +> > +> > coef.merMod <- function(object, ...) lme4::fixef(object) +> > +> > The object is to mask, e.g., lme4:::coef.merMod(), which returns BLUPs +> > rather than fixed effects, internally in the cv package but *not* to +> > mask the lme4 version of the method for users of the cv package -- that +> > could wreak havoc with their work. Doing this substantially simplifies +> > some of the code in the cv package. +> > +> > My question: Is it legitimate to define a method in a package for +> > internal use without registering it? +> > +> > This approach appears to work fine, and R CMD check doesn't complain, +> > although Roxygen does complain that the method isn't "exported" +> > (actually, isn't registered). +> > +> > Any advice or relevant information would be appreciated. +> > +> > Thank you, +> > John +> > -- +> > John Fox, Professor Emeritus +> > McMaster University +> > Hamilton, Ontario, Canada +> > web: https://www.john-fox.ca/ +> > -- +> > +> > ______________________________________________ +> > R-package-devel at r-project.org mailing list +> > https://stat.ethz.ch/mailman/listinfo/r-package-devel +> +> ______________________________________________ +> R-package-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-package-devel + + +From jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ Wed Sep 4 23:12:09 2024 +From: jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ (Jeff Newmiller) +Date: Wed, 04 Sep 2024 14:12:09 -0700 +Subject: [R-pkg-devel] unregistered S3 methods in a package +In-Reply-To: +References: + <4aa634d6-01e1-4cb3-8119-35ccdec32ef9@mcmaster.ca> + +Message-ID: + +I have been reluctant to pipe up on this because I am no expert on the dark corners of the S3 dispatch mechanism, but I think unregistered S3 methods in packages are verboten. Perhaps [1] will shed some light? + +[1] https://blog.r-project.org/2019/08/19/s3-method-lookup/ + +On September 4, 2024 11:21:22 AM PDT, Toby Hocking wrote: +>I got this warning too, so I filed an issue to ask +>https://github.com/r-lib/roxygen2/issues/1654 +> +>On Mon, Sep 2, 2024 at 2:58?PM John Fox wrote: +>> +>> As it turned out, I was able to avoid redefining coef.merMod(), etc., by +>> making a simple modification to the cv package. +>> +>> I'm still curious about whether it's OK to have unregistered S3 methods +>> for internal use in a package even though that's no longer necessary for +>> the cv package. +>> +>> On 2024-09-02 11:34 a.m., John Fox wrote: +>> > Caution: External email. +>> > +>> > +>> > Dear R-package-devel list members, +>> > +>> > I want to introduce several unregistered S3 methods into the cv package +>> > (code at ). These methods have the form +>> > +>> > coef.merMod <- function(object, ...) lme4::fixef(object) +>> > +>> > The object is to mask, e.g., lme4:::coef.merMod(), which returns BLUPs +>> > rather than fixed effects, internally in the cv package but *not* to +>> > mask the lme4 version of the method for users of the cv package -- that +>> > could wreak havoc with their work. Doing this substantially simplifies +>> > some of the code in the cv package. +>> > +>> > My question: Is it legitimate to define a method in a package for +>> > internal use without registering it? +>> > +>> > This approach appears to work fine, and R CMD check doesn't complain, +>> > although Roxygen does complain that the method isn't "exported" +>> > (actually, isn't registered). +>> > +>> > Any advice or relevant information would be appreciated. +>> > +>> > Thank you, +>> > John +>> > -- +>> > John Fox, Professor Emeritus +>> > McMaster University +>> > Hamilton, Ontario, Canada +>> > web: https://www.john-fox.ca/ +>> > -- +>> > +>> > ______________________________________________ +>> > R-package-devel at r-project.org mailing list +>> > https://stat.ethz.ch/mailman/listinfo/r-package-devel +>> +>> ______________________________________________ +>> R-package-devel at r-project.org mailing list +>> https://stat.ethz.ch/mailman/listinfo/r-package-devel +> +>______________________________________________ +>R-package-devel at r-project.org mailing list +>https://stat.ethz.ch/mailman/listinfo/r-package-devel + +-- +Sent from my phone. Please excuse my brevity. + + +From j|ox @end|ng |rom mcm@@ter@c@ Thu Sep 5 00:10:22 2024 +From: j|ox @end|ng |rom mcm@@ter@c@ (John Fox) +Date: Wed, 4 Sep 2024 18:10:22 -0400 +Subject: [R-pkg-devel] unregistered S3 methods in a package +In-Reply-To: +References: + <4aa634d6-01e1-4cb3-8119-35ccdec32ef9@mcmaster.ca> + + +Message-ID: <1e4f4cfc-85ef-4220-947e-f080e56c70f5@mcmaster.ca> + +Thanks Toby and Jeff for chiming in on this. + +Jeff: I already read Kurt Hornik's post on "S3 Method Lookup" and quite +a few other sources. + +The main point is that failing to register the methods works in that the +methods are nevertheless invoked internally by functions in the package +but don't shadow versions of the methods registered by other package +externally (e.g., at the command prompt), which was the effect that I +wanted. Moreover, as I said, R CMD check (unlike roxygen) didn't complain. + +As I mentioned, this is now moot for the cv package, but I'm still +interested in the answer, as, apparently, is Toby. + +Best, + John + +On 2024-09-04 5:12 p.m., Jeff Newmiller wrote: +> Caution: External email. +> +> +> I have been reluctant to pipe up on this because I am no expert on the dark corners of the S3 dispatch mechanism, but I think unregistered S3 methods in packages are verboten. Perhaps [1] will shed some light? +> +> [1] https://blog.r-project.org/2019/08/19/s3-method-lookup/ +> +> On September 4, 2024 11:21:22 AM PDT, Toby Hocking wrote: +>> I got this warning too, so I filed an issue to ask +>> https://github.com/r-lib/roxygen2/issues/1654 +>> +>> On Mon, Sep 2, 2024 at 2:58?PM John Fox wrote: +>>> +>>> As it turned out, I was able to avoid redefining coef.merMod(), etc., by +>>> making a simple modification to the cv package. +>>> +>>> I'm still curious about whether it's OK to have unregistered S3 methods +>>> for internal use in a package even though that's no longer necessary for +>>> the cv package. +>>> +>>> On 2024-09-02 11:34 a.m., John Fox wrote: +>>>> Caution: External email. +>>>> +>>>> +>>>> Dear R-package-devel list members, +>>>> +>>>> I want to introduce several unregistered S3 methods into the cv package +>>>> (code at ). These methods have the form +>>>> +>>>> coef.merMod <- function(object, ...) lme4::fixef(object) +>>>> +>>>> The object is to mask, e.g., lme4:::coef.merMod(), which returns BLUPs +>>>> rather than fixed effects, internally in the cv package but *not* to +>>>> mask the lme4 version of the method for users of the cv package -- that +>>>> could wreak havoc with their work. Doing this substantially simplifies +>>>> some of the code in the cv package. +>>>> +>>>> My question: Is it legitimate to define a method in a package for +>>>> internal use without registering it? +>>>> +>>>> This approach appears to work fine, and R CMD check doesn't complain, +>>>> although Roxygen does complain that the method isn't "exported" +>>>> (actually, isn't registered). +>>>> +>>>> Any advice or relevant information would be appreciated. +>>>> +>>>> Thank you, +>>>> John +>>>> -- +>>>> John Fox, Professor Emeritus +>>>> McMaster University +>>>> Hamilton, Ontario, Canada +>>>> web: https://www.john-fox.ca/ +>>>> -- +>>>> +>>>> ______________________________________________ +>>>> R-package-devel at r-project.org mailing list +>>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel +>>> +>>> ______________________________________________ +>>> R-package-devel at r-project.org mailing list +>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel +>> +>> ______________________________________________ +>> R-package-devel at r-project.org mailing list +>> https://stat.ethz.ch/mailman/listinfo/r-package-devel +> +> -- +> Sent from my phone. Please excuse my brevity. + + +From ||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de Thu Sep 5 08:47:06 2024 +From: ||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de (Uwe Ligges) +Date: Thu, 5 Sep 2024 08:47:06 +0200 +Subject: [R-pkg-devel] unregistered S3 methods in a package +In-Reply-To: <1e4f4cfc-85ef-4220-947e-f080e56c70f5@mcmaster.ca> +References: + <4aa634d6-01e1-4cb3-8119-35ccdec32ef9@mcmaster.ca> + + + <1e4f4cfc-85ef-4220-947e-f080e56c70f5@mcmaster.ca> +Message-ID: + +Dear John, + +the question is not really easy to answer, but there is a nice summary +Kurt pointed me to: The code of checkS3methods() includes the following +comments with the last paragraph containing the short answer to your +question: + + ## Check S3 generics and methods consistency. + + ## Unfortunately, what is an S3 method is not clear. + ## These days, S3 methods for a generic GEN are found + ## A. via GEN.CLS lookup from the callenv to its topenv; + ## B. the S3 registry; + ## C. GEN.CLS lookup from the parent of the topenv to baseenv, + ## skipping everything on the search path between globalenv and + ## baseenv. + ## Thus if "package code" calls GEN, we first look in the package + ## namespace itself, then in the registry, and then in the package + ## imports and .BaseNamespaceEnv (and globalenv and baseenv again). + ## + ## Clearly, everything registered via S3method() should be an S3 + ## method. Interestingly, we seem to have some registrations for + ## non-generics, such as grDevices::axis(). These are "harmless" + ## but likely not "as intended", and hence inconsistencies are not + ## ignored. + ## + ## If the package namespace has a function named GEN.CLS, it is used + ## as an S3 method for an S3 generic named GEN (and hence "is an S3 + ## method") only if the package code actually calls GEN (see A + ## above). So one could argue that we should not be looking at all + ## GEN.CLS matches with GEN a generic in the package itself, its + ## imports or base, but restrict to only the ones where the package + ## code calls GEN. Doable, but not straightforward (calls could be + ## PKG::GEN) and possibly quite time consuming. For generics from + ## the package itself or its imports, not restricting should not + ## make a difference (why define or import when not calling?), but + ## for generics from base it may: hence we filter out the mismatches + ## for base GEN not called in the package. + ## + ## If a package provides an S3 generic GEN, there is no need to + ## register GEN.CLS functions for "internal use" (see above). + ## However, if GEN is exported then likely all GEN.CLS functions + ## should be registered as S3 methods. + + +Best wishes, +Uwe + + +On 05.09.2024 00:10, John Fox wrote: +> Thanks Toby and Jeff for chiming in on this. +> +> Jeff: I already read Kurt Hornik's post on "S3 Method Lookup" and quite +> a few other sources. +> +> The main point is that failing to register the methods works in that the +> methods are nevertheless invoked internally by functions in the package +> but don't shadow versions of the methods registered by other package +> externally (e.g., at the command prompt), which was the effect that I +> wanted. Moreover, as I said, R CMD check (unlike roxygen) didn't complain. +> +> As I mentioned, this is now moot for the cv package, but I'm still +> interested in the answer, as, apparently, is Toby. +> +> Best, +> ?John +> +> On 2024-09-04 5:12 p.m., Jeff Newmiller wrote: +>> Caution: External email. +>> +>> +>> I have been reluctant to pipe up on this because I am no expert on the +>> dark corners of the S3 dispatch mechanism, but I think unregistered S3 +>> methods in packages are verboten. Perhaps [1] will shed some light? +>> +>> [1] https://blog.r-project.org/2019/08/19/s3-method-lookup/ +>> +>> On September 4, 2024 11:21:22 AM PDT, Toby Hocking +>> wrote: +>>> I got this warning too, so I filed an issue to ask +>>> https://github.com/r-lib/roxygen2/issues/1654 +>>> +>>> On Mon, Sep 2, 2024 at 2:58?PM John Fox wrote: +>>>> +>>>> As it turned out, I was able to avoid redefining coef.merMod(), +>>>> etc., by +>>>> making a simple modification to the cv package. +>>>> +>>>> I'm still curious about whether it's OK to have unregistered S3 methods +>>>> for internal use in a package even though that's no longer necessary +>>>> for +>>>> the cv package. +>>>> +>>>> On 2024-09-02 11:34 a.m., John Fox wrote: +>>>>> Caution: External email. +>>>>> +>>>>> +>>>>> Dear R-package-devel list members, +>>>>> +>>>>> I want to introduce several unregistered S3 methods into the cv +>>>>> package +>>>>> (code at ). These methods have the +>>>>> form +>>>>> +>>>>> ???????? coef.merMod <- function(object, ...) lme4::fixef(object) +>>>>> +>>>>> The object is to mask, e.g., lme4:::coef.merMod(), which returns BLUPs +>>>>> rather than fixed effects, internally in the cv package but *not* to +>>>>> mask the lme4 version of the method for users of the cv package -- +>>>>> that +>>>>> could wreak havoc with their work. Doing this substantially simplifies +>>>>> some of the code in the cv package. +>>>>> +>>>>> My question: Is it legitimate to define a method in a package for +>>>>> internal use without registering it? +>>>>> +>>>>> This approach appears to work fine, and R CMD check doesn't complain, +>>>>> although Roxygen does complain that the method isn't "exported" +>>>>> (actually, isn't registered). +>>>>> +>>>>> Any advice or relevant information would be appreciated. +>>>>> +>>>>> Thank you, +>>>>> ?? John +>>>>> -- +>>>>> John Fox, Professor Emeritus +>>>>> McMaster University +>>>>> Hamilton, Ontario, Canada +>>>>> web: https://www.john-fox.ca/ +>>>>> -- +>>>>> +>>>>> ______________________________________________ +>>>>> R-package-devel at r-project.org mailing list +>>>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel +>>>> +>>>> ______________________________________________ +>>>> R-package-devel at r-project.org mailing list +>>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel +>>> +>>> ______________________________________________ +>>> R-package-devel at r-project.org mailing list +>>> https://stat.ethz.ch/mailman/listinfo/r-package-devel +>> +>> -- +>> Sent from my phone. Please excuse my brevity. +> +> ______________________________________________ +> R-package-devel at r-project.org mailing list +> https://stat.ethz.ch/mailman/listinfo/r-package-devel + +From m@ech|er @end|ng |rom @t@t@m@th@ethz@ch Thu Sep 5 09:44:52 2024 +From: m@ech|er @end|ng |rom @t@t@m@th@ethz@ch (Martin Maechler) +Date: Thu, 5 Sep 2024 09:44:52 +0200 +Subject: [R-pkg-devel] unregistered S3 methods in a package +In-Reply-To: +References: + <4aa634d6-01e1-4cb3-8119-35ccdec32ef9@mcmaster.ca> + + + <1e4f4cfc-85ef-4220-947e-f080e56c70f5@mcmaster.ca> + +Message-ID: <26329.24948.135486.590205@stat.math.ethz.ch> + +>>>>> Uwe Ligges +>>>>> on Thu, 5 Sep 2024 08:47:06 +0200 writes: + + > Dear John, + > the question is not really easy to answer, but there is a nice summary + > Kurt pointed me to: The code of checkS3methods() includes the following + > comments with the last paragraph containing the short answer to your + > question: + + > ## Check S3 generics and methods consistency. + + > ## Unfortunately, what is an S3 method is not clear. + > ## These days, S3 methods for a generic GEN are found + > ## A. via GEN.CLS lookup from the callenv to its topenv; + > ## B. the S3 registry; + > ## C. GEN.CLS lookup from the parent of the topenv to baseenv, + > ## skipping everything on the search path between globalenv and + > ## baseenv. + > ## Thus if "package code" calls GEN, we first look in the package + > ## namespace itself, then in the registry, and then in the package + > ## imports and .BaseNamespaceEnv (and globalenv and baseenv again). + > ## + > ## Clearly, everything registered via S3method() should be an S3 + > ## method. Interestingly, we seem to have some registrations for + > ## non-generics, such as grDevices::axis(). These are "harmless" + > ## but likely not "as intended", and hence inconsistencies are not + > ## ignored. + > ## + > ## If the package namespace has a function named GEN.CLS, it is used + > ## as an S3 method for an S3 generic named GEN (and hence "is an S3 + > ## method") only if the package code actually calls GEN (see A + > ## above). So one could argue that we should not be looking at all + > ## GEN.CLS matches with GEN a generic in the package itself, its + > ## imports or base, but restrict to only the ones where the package + > ## code calls GEN. Doable, but not straightforward (calls could be + > ## PKG::GEN) and possibly quite time consuming. For generics from + > ## the package itself or its imports, not restricting should not + > ## make a difference (why define or import when not calling?), but + > ## for generics from base it may: hence we filter out the mismatches + > ## for base GEN not called in the package. + > ## + > ## If a package provides an S3 generic GEN, there is no need to + > ## register GEN.CLS functions for "internal use" (see above). + > ## However, if GEN is exported then likely all GEN.CLS functions + > ## should be registered as S3 methods. + + > Best wishes, + > Uwe + +Excellent! Thank you, Uwe, for digging this out. + +I'd also had answered to John Fox (even more strongly) +that it *must* remain to be allowed to define "private" +S3 methods in a package in all cases: +- for a generic from the package (where the generic would also + not be exported) +- for a generic from another package (which is imported) + or base (which need not and cannot be imported). + +Note that the same is possible with S4 and (AFAIK, untested) +with S7. {which people should really try using more ... } + +Martin + + + + > On 05.09.2024 00:10, John Fox wrote: + >> Thanks Toby and Jeff for chiming in on this. + >> + >> Jeff: I already read Kurt Hornik's post on "S3 Method Lookup" and quite + >> a few other sources. + >> + >> The main point is that failing to register the methods works in that the + >> methods are nevertheless invoked internally by functions in the package + >> but don't shadow versions of the methods registered by other package + >> externally (e.g., at the command prompt), which was the effect that I + >> wanted. Moreover, as I said, R CMD check (unlike roxygen) didn't complain. + >> + >> As I mentioned, this is now moot for the cv package, but I'm still + >> interested in the answer, as, apparently, is Toby. + >> + >> Best, + >> ?John + >> + >> On 2024-09-04 5:12 p.m., Jeff Newmiller wrote: + >>> Caution: External email. + >>> + >>> + >>> I have been reluctant to pipe up on this because I am no expert on the + >>> dark corners of the S3 dispatch mechanism, but I think unregistered S3 + >>> methods in packages are verboten. Perhaps [1] will shed some light? + >>> + >>> [1] https://blog.r-project.org/2019/08/19/s3-method-lookup/ + >>> + >>> On September 4, 2024 11:21:22 AM PDT, Toby Hocking + >>> wrote: + >>>> I got this warning too, so I filed an issue to ask + >>>> https://github.com/r-lib/roxygen2/issues/1654 + >>>> + >>>> On Mon, Sep 2, 2024 at 2:58?PM John Fox wrote: + >>>>> + >>>>> As it turned out, I was able to avoid redefining coef.merMod(), + >>>>> etc., by + >>>>> making a simple modification to the cv package. + >>>>> + >>>>> I'm still curious about whether it's OK to have unregistered S3 methods + >>>>> for internal use in a package even though that's no longer necessary + >>>>> for + >>>>> the cv package. + >>>>> + >>>>> On 2024-09-02 11:34 a.m., John Fox wrote: +>>>>> Caution: External email. + >>>>>> + >>>>>> +>>>>> Dear R-package-devel list members, + >>>>>> +>>>>> I want to introduce several unregistered S3 methods into the cv +>>>>> package +>>>>> (code at ). These methods have the +>>>>> form + >>>>>> +>>>>> ???????? coef.merMod <- function(object, ...) lme4::fixef(object) + >>>>>> +>>>>> The object is to mask, e.g., lme4:::coef.merMod(), which returns BLUPs +>>>>> rather than fixed effects, internally in the cv package but *not* to +>>>>> mask the lme4 version of the method for users of the cv package -- +>>>>> that +>>>>> could wreak havoc with their work. Doing this substantially simplifies +>>>>> some of the code in the cv package. + >>>>>> +>>>>> My question: Is it legitimate to define a method in a package for +>>>>> internal use without registering it? + >>>>>> +>>>>> This approach appears to work fine, and R CMD check doesn't complain, +>>>>> although Roxygen does complain that the method isn't "exported" +>>>>> (actually, isn't registered). + >>>>>> +>>>>> Any advice or relevant information would be appreciated. + >>>>>> +>>>>> Thank you, +>>>>> ?? John +>>>>> -- +>>>>> John Fox, Professor Emeritus +>>>>> McMaster University +>>>>> Hamilton, Ontario, Canada +>>>>> web: https://www.john-fox.ca/ + + +From t@eyongp @end|ng |rom @ndrew@cmu@edu Thu Sep 5 22:40:00 2024 +From: t@eyongp @end|ng |rom @ndrew@cmu@edu (Taeyong Park) +Date: Thu, 5 Sep 2024 23:40:00 +0300 +Subject: [R-pkg-devel] Error in creating virtual environment on Debian + machines +Message-ID: + +Hello, + +I am trying to create a virtual environment in the zzz.r file of my +package, and my package is currently passing on Windows and failing on +Debian with 1 ERROR. + + +This is the ERROR: + +* installing *source* package ?PytrendsLongitudinalR? ... +** using staged installation +** R +** inst +** byte-compile and prepare package for lazy loading +** help +*** installing help indices +** building package indices +** installing vignettes +** testing if installed package can be loaded from temporary location +Using Python: /usr/bin/python3.12 +Creating virtual environment '/home/hornik/.virtualenvs/pytrends-in-r-new' ... ++ /usr/bin/python3.12 -m venv /home/hornik/.virtualenvs/pytrends-in-r-new +The virtual environment was not created successfully because ensurepip is not +available. On Debian/Ubuntu systems, you need to install the python3-venv +package using the following command. + + apt install python3.12-venv + +You may need to use sudo with that command. After installing the python3-venv +package, recreate your virtual environment. + +Failing command: /home/hornik/.virtualenvs/pytrends-in-r-new/bin/python3.12 + +FAILED +Error: package or namespace load failed for ?PytrendsLongitudinalR?: + .onLoad failed in loadNamespace() for 'PytrendsLongitudinalR', details: + call: NULL + error: Error creating virtual environment +'/home/hornik/.virtualenvs/pytrends-in-r-new' [error code 1] +Error: loading failed +Execution halted +ERROR: loading failed +* removing ?/srv/hornik/tmp/CRAN_pretest/PytrendsLongitudinalR.Rcheck/PytrendsLongitudinalR? + +and this is the onLoad function of my zzz.r file: + + +.onLoad <- function(libname, pkgname) { + + + # Define paths + venv_path <- file.path(Sys.getenv("HOME"), ".virtualenvs", +"pytrends-in-r-new") + python_path <- file.path(venv_path, "bin", "python") + + # Ensure 'virtualenv' is installed + if (!reticulate::py_module_available("virtualenv")) { + reticulate::py_install("virtualenv") + } + + + # Check if the virtual environment exists + if (!reticulate::virtualenv_exists(venv_path)) { + # If it doesn't exist, attempt to create it + tryCatch({ + reticulate::virtualenv_create(envname = venv_path, python = +python_path, force = TRUE, module = +getOption("reticulate.virtualenv.module")) + }, error = function(e) { + # Fallback: install Python and create the virtual environment + reticulate::virtualenv_create(envname = venv_path, force = TRUE, +module = getOption("reticulate.virtualenv.module")) + }) + } + + # Use the virtual environment for reticulate operations + reticulate::use_virtualenv(venv_path, required = TRUE) + + # Install packages if not already installed + packages_to_install <- c("pandas", "requests", "pytrends", "rich") + for (package in packages_to_install) { + if (!reticulate::py_module_available(package)) { + reticulate::py_install(package, envname = "pytrends-in-r-new") + } + } + + + TrendReq <<- reticulate::import("pytrends.request", delay_load = +TRUE)$TrendReq + ResponseError <<- reticulate::import("pytrends.exceptions", +delay_load = TRUE)$ResponseError + pd <<- reticulate::import("pandas", delay_load = TRUE) + os <<- reticulate::import("os", delay_load = TRUE) + glob <<- reticulate::import("glob", delay_load = TRUE) + json <<- reticulate::import("json", delay_load = TRUE) + requests <<- reticulate::import("requests", delay_load = TRUE) + dt <<- reticulate::import("datetime", delay_load = TRUE) + relativedelta <<- reticulate::import("dateutil.relativedelta", +delay_load = TRUE) + time <<- reticulate::import("time", delay_load = TRUE) + logging <<- reticulate::import("logging", delay_load = TRUE) + console <<- reticulate::import("rich.console", delay_load = TRUE)$Console + RichHandler <<- reticulate::import("rich.logging", delay_load = +TRUE)$RichHandler + math <<- reticulate::import("math", delay_load = TRUE) + platform <<- reticulate::import("platform", delay_load = TRUE) + + # Configure logging + configure_logging() +} + +How do I ensure that the virtual environment is created properly on +Debian systems? I tried the instructions in the error message but +still got the error. + +Thank you in advance for your help! + + +Best, + +Taeyong + + + + +*Taeyong Park, Ph.D.* +*Assistant Teaching Professor of Statistics* +*Director, Statistical Consulting Center* +*Carnegie Mellon University Qatar* + + [[alternative HTML version deleted]] + + +From |kry|ov @end|ng |rom d|@root@org Fri Sep 6 12:45:03 2024 +From: |kry|ov @end|ng |rom d|@root@org (Ivan Krylov) +Date: Fri, 6 Sep 2024 13:45:03 +0300 +Subject: [R-pkg-devel] Error in creating virtual environment on Debian + machines +In-Reply-To: +References: +Message-ID: <20240906134503.2ff1f2da@arachnoid> + +Hello Taeyong and welcome to R-package-devel! + +? Thu, 5 Sep 2024 23:40:00 +0300 +Taeyong Park ?????: + +> # Define paths +> venv_path <- file.path(Sys.getenv("HOME"), ".virtualenvs", +> "pytrends-in-r-new") +> python_path <- file.path(venv_path, "bin", "python") + +Please don't require placing the virtual environments under +~/.virtualenvs. 'reticulate' will use this path as a default if you only +provide the name of the virtual environment, but will also let the user +configure virtualenv_root() using environment variables. + +> TrendReq <<- reticulate::import("pytrends.request", delay_load = +> TRUE)$TrendReq +> ResponseError <<- reticulate::import("pytrends.exceptions", +> delay_load = TRUE)$ResponseError + +I'm afraid this defeats the purpose of 'delay_load' by immediately +accessing the module object and forcing 'reticulate' to load it. + +I understand the desire to get everything working automatically right +when the package is loaded, but Python dependency management is a +complex topic and not all of it is safe to perform from .onLoad. In +particular, if .onLoad fails, you don't get to let the user call +PytrendsLongitudinalR::install_pytrends() because the namespace will +not be available. Try following the guidance in the 'reticulate' +vignettes [1]: + +1. In .onLoad, only express a soft preference for a named virtual +environment and create but do not access lazy-load Python module +objects: + +.onLoad <- function(libname, pkgname) { + use_virtualenv("pytrends-in-r-new", required = FALSE) + pytrends.request <<- reticulate::import("pytrends.request", delay_load += TRUE) + pd <<- reticulate::import("pandas", delay_load = TRUE) + # and so on +} + +2. Move all the installation work into a separate function: + +install_pytrends <- function(envname = "pytrends-in-r-new", ...) + # the vignette suggests wiping the "pytrends-in-r-new" venv here, + # just in case + py_install( + c("pandas", "requests", "pytrends", "rich"), + envname = envname, ... + ) + +3. In tests and examples, wrap all uses of Python in checks for +py_module_available(...). In regular code, you can suggest running +install_pytrends(), but don't run it for the user. _Automatically_ +installing additional software, whether Python modules or Python +itself, is prohibited by the CRAN policy [2]: + +>> Packages should not write in the user?s home filespace <...> nor +>> anywhere else on the file system apart from the R session?s +>> temporary directory (or during installation in the location pointed +>> to by TMPDIR: and such usage should be cleaned up). <...> +>> Limited exceptions may be allowed in interactive sessions if the +>> package obtains confirmation from the user. + +Admittedly, this complicates the tests and the examples for your +package with boilerplate. I see that 'tensorflow', for example, wraps +all its examples in \dontrun{}, but it's an exceptional package. A few +other packages that depend on 'reticulate' that I've just taken a look +at do wrap their examples in checks for the Python packages being +available. + +-- +Best regards, +Ivan + +[1] +https://cran.r-project.org/package=reticulate/vignettes/package.html +https://cran.r-project.org/package=reticulate/vignettes/python_dependencies.html + +[2] +https://cran.r-project.org/web/packages/policies.html + + +From @|gbert @end|ng |rom w|w|@hu-ber||n@de Fri Sep 6 14:43:31 2024 +From: @|gbert @end|ng |rom w|w|@hu-ber||n@de (Sigbert Klinke) +Date: Fri, 6 Sep 2024 14:43:31 +0200 +Subject: [R-pkg-devel] devtools::build() and Authors@R +Message-ID: <70311e44-7477-4e05-9956-cb7e9e5be549@wiwi.hu-berlin.de> + +Hi, + +upon resubmitting my package to CRAN, I received the following note: + + + +Author field differs from that derived from Authors at R + + + +Author: 'Sigbert Klinke [aut, cre] +(https://orcid.org/0000-0003-3337-1863), Jekaterina Zukovska [ctb] +(https://orcid.org/0000-0002-7753-9210)' + +Authors at R: 'Sigbert Klinke [aut, cre] (ORCID: +https://orcid.org/0000-0003-3337-1863), Jekaterina Zukovska [ctb] +(ORCID: https://orcid.org/0000-0002-7753-9210)' + +Since my DESCRIPTION file only includes the Authors at R field, I was +initially confused. + + + +However, after inspecting the *.tgz file generated with +devtools::build(), I found that both fields (Author and Authors at R) were +indeed present in the DESCRIPTION file inside the *.tgz. This was also +the case with older *.tgz files, but CRAN hadn?t flagged it before. + + + +As a workaround, I replaced Authors at R with separate Author and +Maintainer fields. + + + +Do you have any other suggestions on how to resolve this? + +Thanks Sigbert + +-- +https://hu.berlin/sk + + +From ||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de Fri Sep 6 14:47:37 2024 +From: ||gge@ @end|ng |rom @t@t|@t|k@tu-dortmund@de (Uwe Ligges) +Date: Fri, 6 Sep 2024 14:47:37 +0200 +Subject: [R-pkg-devel] devtools::build() and Authors@R +In-Reply-To: <70311e44-7477-4e05-9956-cb7e9e5be549@wiwi.hu-berlin.de> +References: <70311e44-7477-4e05-9956-cb7e9e5be549@wiwi.hu-berlin.de> +Message-ID: <99e84273-d74d-4bb5-8fc3-49a12fd928d9@statistik.tu-dortmund.de> + + + +On 06.09.2024 14:43, Sigbert Klinke wrote: +> Hi, +> +> upon resubmitting my package to CRAN, I received the following note: +> +> +> +> Author field differs from that derived from Authors at R +> +> +> +> Author: 'Sigbert Klinke [aut, cre] +> (https://orcid.org/0000-0003-3337-1863), Jekaterina Zukovska [ctb] +> (https://orcid.org/0000-0002-7753-9210)' +> +> Authors at R: 'Sigbert Klinke [aut, cre] (ORCID: +> https://orcid.org/0000-0003-3337-1863), Jekaterina Zukovska [ctb] +> (ORCID: https://orcid.org/0000-0002-7753-9210)' +> +> Since my DESCRIPTION file only includes the Authors at R field, I was +> initially confused. +> +> +> +> However, after inspecting the *.tgz file generated with +> devtools::build(), I found that both fields (Author and Authors at R) were +> indeed present in the DESCRIPTION file inside the *.tgz. This was also +> the case with older *.tgz files, but CRAN hadn?t flagged it before. +> +> +> +> As a workaround, I replaced Authors at R with separate Author and +> Maintainer fields. +> +> +> +> Do you have any other suggestions on how to resolve this? + +Ignore for now: +This is from a very recent change in R-devel and should not have been +flagged by the checks, now fixed. +The idea is to allow for more than only ORCID and hence give labels to +these ids. This is now done automatically when the Author field is +generated from Authors at R by R CMD build, and unfortunately got flagged +by our checks that use a very recent version of R-devel from last night. + +In short: Simply submit again, the note will be gone. + +Best, +Uwe Ligges + + + + + + +From m@ech|er @end|ng |rom @t@t@m@th@ethz@ch Fri Sep 6 14:49:21 2024 +From: m@ech|er @end|ng |rom @t@t@m@th@ethz@ch (Martin Maechler) +Date: Fri, 6 Sep 2024 14:49:21 +0200 +Subject: [R-pkg-devel] devtools::build() and Authors@R +In-Reply-To: <70311e44-7477-4e05-9956-cb7e9e5be549@wiwi.hu-berlin.de> +References: <70311e44-7477-4e05-9956-cb7e9e5be549@wiwi.hu-berlin.de> +Message-ID: <26330.64081.194155.832151@stat.math.ethz.ch> + +>>>>> Sigbert Klinke +>>>>> on Fri, 6 Sep 2024 14:43:31 +0200 writes: + + > Hi, + > upon resubmitting my package to CRAN, I received the following note: + + + + > Author field differs from that derived from Authors at R + + + + > Author: 'Sigbert Klinke [aut, cre] + > (https://orcid.org/0000-0003-3337-1863), Jekaterina Zukovska [ctb] + > (https://orcid.org/0000-0002-7753-9210)' + + > Authors at R: 'Sigbert Klinke [aut, cre] (ORCID: + > https://orcid.org/0000-0003-3337-1863), Jekaterina Zukovska [ctb] + > (ORCID: https://orcid.org/0000-0002-7753-9210)' + + > Since my DESCRIPTION file only includes the Authors at R field, I was + > initially confused. + + + + > However, after inspecting the *.tgz file generated with + > devtools::build(), I found that both fields (Author and Authors at R) were + > indeed present in the DESCRIPTION file inside the *.tgz. This was also + > the case with older *.tgz files, but CRAN hadn?t flagged it before. + + + + > As a workaround, I replaced Authors at R with separate Author and + > Maintainer fields. + + + + > Do you have any other suggestions on how to resolve this? + +Well, that's definitely easy to answer: + +Follow *the* reference, the 'R Extensions' manual, +and use R CMD build ... + + + > Thanks Sigbert + + > -- + > https://hu.berlin/sk + + > ______________________________________________ + > R-package-devel at r-project.org mailing list + > https://stat.ethz.ch/mailman/listinfo/r-package-devel + + +From j|ox @end|ng |rom mcm@@ter@c@ Fri Sep 6 19:17:06 2024 +From: j|ox @end|ng |rom mcm@@ter@c@ (John Fox) +Date: Fri, 6 Sep 2024 13:17:06 -0400 +Subject: [R-pkg-devel] unregistered S3 methods in a package +In-Reply-To: <26329.24948.135486.590205@stat.math.ethz.ch> +References: + <4aa634d6-01e1-4cb3-8119-35ccdec32ef9@mcmaster.ca> + + + <1e4f4cfc-85ef-4220-947e-f080e56c70f5@mcmaster.ca> + + <26329.24948.135486.590205@stat.math.ethz.ch> +Message-ID: + +Thank you Uwe and Martin for clarifying this issue. + +The last paragraph of the comments that Kurt pointed out refers to a +generic function defined in a package; in my case, the generic is +coef(), which is in the stats package not in the cv package, and the +question concerned methods for coef() (previously) defined in the cv +package and not registered. But the comments as a whole do cover that +case as well. + +Best, + John + + +On 2024-09-05 3:44 a.m., Martin Maechler wrote: +> Caution: External email. +> +> +>>>>>> Uwe Ligges +>>>>>> on Thu, 5 Sep 2024 08:47:06 +0200 writes: +> +> > Dear John, +> > the question is not really easy to answer, but there is a nice summary +> > Kurt pointed me to: The code of checkS3methods() includes the following +> > comments with the last paragraph containing the short answer to your +> > question: +> +> > ## Check S3 generics and methods consistency. +> +> > ## Unfortunately, what is an S3 method is not clear. +> > ## These days, S3 methods for a generic GEN are found +> > ## A. via GEN.CLS lookup from the callenv to its topenv; +> > ## B. the S3 registry; +> > ## C. GEN.CLS lookup from the parent of the topenv to baseenv, +> > ## skipping everything on the search path between globalenv and +> > ## baseenv. +> > ## Thus if "package code" calls GEN, we first look in the package +> > ## namespace itself, then in the registry, and then in the package +> > ## imports and .BaseNamespaceEnv (and globalenv and baseenv again). +> > ## +> > ## Clearly, everything registered via S3method() should be an S3 +> > ## method. Interestingly, we seem to have some registrations for +> > ## non-generics, such as grDevices::axis(). These are "harmless" +> > ## but likely not "as intended", and hence inconsistencies are not +> > ## ignored. +> > ## +> > ## If the package namespace has a function named GEN.CLS, it is used +> > ## as an S3 method for an S3 generic named GEN (and hence "is an S3 +> > ## method") only if the package code actually calls GEN (see A +> > ## above). So one could argue that we should not be looking at all +> > ## GEN.CLS matches with GEN a generic in the package itself, its +> > ## imports or base, but restrict to only the ones where the package +> > ## code calls GEN. Doable, but not straightforward (calls could be +> > ## PKG::GEN) and possibly quite time consuming. For generics from +> > ## the package itself or its imports, not restricting should not +> > ## make a difference (why define or import when not calling?), but +> > ## for generics from base it may: hence we filter out the mismatches +> > ## for base GEN not called in the package. +> > ## +> > ## If a package provides an S3 generic GEN, there is no need to +> > ## register GEN.CLS functions for "internal use" (see above). +> > ## However, if GEN is exported then likely all GEN.CLS functions +> > ## should be registered as S3 methods. +> +> > Best wishes, +> > Uwe +> +> Excellent! Thank you, Uwe, for digging this out. +> +> I'd also had answered to John Fox (even more strongly) +> that it *must* remain to be allowed to define "private" +> S3 methods in a package in all cases: +> - for a generic from the package (where the generic would also +> not be exported) +> - for a generic from another package (which is imported) +> or base (which need not and cannot be imported). +> +> Note that the same is possible with S4 and (AFAIK, untested) +> with S7. {which people should really try using more ... } +> +> Martin +> +> +> +> > On 05.09.2024 00:10, John Fox wrote: +> >> Thanks Toby and Jeff for chiming in on this. +> >> +> >> Jeff: I already read Kurt Hornik's post on "S3 Method Lookup" and quite +> >> a few other sources. +> >> +> >> The main point is that failing to register the methods works in that the +> >> methods are nevertheless invoked internally by functions in the package +> >> but don't shadow versions of the methods registered by other package +> >> externally (e.g., at the command prompt), which was the effect that I +> >> wanted. Moreover, as I said, R CMD check (unlike roxygen) didn't complain. +> >> +> >> As I mentioned, this is now moot for the cv package, but I'm still +> >> interested in the answer, as, apparently, is Toby. +> >> +> >> Best, +> >> John +> >> +> >> On 2024-09-04 5:12 p.m., Jeff Newmiller wrote: +> >>> Caution: External email. +> >>> +> >>> +> >>> I have been reluctant to pipe up on this because I am no expert on the +> >>> dark corners of the S3 dispatch mechanism, but I think unregistered S3 +> >>> methods in packages are verboten. Perhaps [1] will shed some light? +> >>> +> >>> [1] https://blog.r-project.org/2019/08/19/s3-method-lookup/ +> >>> +> >>> On September 4, 2024 11:21:22 AM PDT, Toby Hocking +> >>> wrote: +> >>>> I got this warning too, so I filed an issue to ask +> >>>> https://github.com/r-lib/roxygen2/issues/1654 +> >>>> +> >>>> On Mon, Sep 2, 2024 at 2:58?PM John Fox wrote: +> >>>>> +> >>>>> As it turned out, I was able to avoid redefining coef.merMod(), +> >>>>> etc., by +> >>>>> making a simple modification to the cv package. +> >>>>> +> >>>>> I'm still curious about whether it's OK to have unregistered S3 methods +> >>>>> for internal use in a package even though that's no longer necessary +> >>>>> for +> >>>>> the cv package. +> >>>>> +> >>>>> On 2024-09-02 11:34 a.m., John Fox wrote: +>>>>>> Caution: External email. +> >>>>>> +> >>>>>> +>>>>>> Dear R-package-devel list members, +> >>>>>> +>>>>>> I want to introduce several unregistered S3 methods into the cv +>>>>>> package +>>>>>> (code at ). These methods have the +>>>>>> form +> >>>>>> +>>>>>> coef.merMod <- function(object, ...) lme4::fixef(object) +> >>>>>> +>>>>>> The object is to mask, e.g., lme4:::coef.merMod(), which returns BLUPs +>>>>>> rather than fixed effects, internally in the cv package but *not* to +>>>>>> mask the lme4 version of the method for users of the cv package -- +>>>>>> that +>>>>>> could wreak havoc with their work. Doing this substantially simplifies +>>>>>> some of the code in the cv package. +> >>>>>> +>>>>>> My question: Is it legitimate to define a method in a package for +>>>>>> internal use without registering it? +> >>>>>> +>>>>>> This approach appears to work fine, and R CMD check doesn't complain, +>>>>>> although Roxygen does complain that the method isn't "exported" +>>>>>> (actually, isn't registered). +> >>>>>> +>>>>>> Any advice or relevant information would be appreciated. +> >>>>>> +>>>>>> Thank you, +>>>>>> John +>>>>>> -- +>>>>>> John Fox, Professor Emeritus +>>>>>> McMaster University +>>>>>> Hamilton, Ontario, Canada +>>>>>> web: https://www.john-fox.ca/ + + +From |@go@bonn|c| @end|ng |rom umontpe|||er@|r Fri Sep 13 15:00:06 2024 +From: |@go@bonn|c| @end|ng |rom umontpe|||er@|r (Iago Bonnici) +Date: Fri, 13 Sep 2024 15:00:06 +0200 +Subject: [R-pkg-devel] Justification and status of the 100-bytes path length + limit. +Message-ID: + +Hello @r-package-devel, + + ??? Reading from the following guide, I understand that file paths in +my R package should remain under 100 bytes long: + +https://cran.r-project.org/doc/manuals/R-exts.html#Package-structure-1 + + ??? The only explanation I could find stands within the same line in +this document: + +/>??? ??? packages are normally distributed as tarballs, and these have +a limit on path lengths: for maximal portability + +/??? So, if I understand correctly, the limit set by CRAN is deferred to +some limit of the `tar` archive format. + + ??? Now, reading from the following page: + +https://en.wikipedia.org/wiki/Tar_(computing)#File_format + + ??? I learn that `tar` was indeed limited to 100-bytes paths prior to +POSIX.1-1988. + ??? If I understand correctly. This limitation has been bumped to 256 +bytes in year 1988. + ??? Then, it has been completely lifted in 2001, meaning that +POSIX.1-2001-compliant `tar` files have no limit on the file path lengths. + + ??? Although I do understand the importance of ensuring +backward-compatibility, I find it suprising that the 100-bytes +limitation ensures compatibility with files format /*older than R +itself*/ (1993, right?). + ??? Considering that the limitation is suprisingly short by modern +standards, I have two questions about it: + + ??? - Is the 100-bytes paths limitation only supposed to ensure +compatilbility with pre-POSIX.1-1998 `tar` files? + + ??? ??? - If not, then I believe it should be stated in the official +guide what are the other reasons to set such a low limit. + + ??? ??? - If yes, then I believe the limitation should either: + + ??? ??? ??? - be reconsidered. For example, it could be lifted for +packages targetting R versions posterior to 2021 (20 years after +POSIX.1-2001) or anything similar. + + ??? ??? ??? - be justified deeper in the official guide. The guide +should explain why CRAN considers it meaningful or important to support +pre-POSIX.1-1998 format. + + ??? - Has this discussion already happened like 1000 times + ? If so, then I +believe the answers should be clarified in the official guide anyway. + + ??? Thank you for kind responses, I am happy to help pushing this +forward if required. + + ??? Best regards, + + +-- +Iago-lito + [[alternative HTML version deleted]] + + diff --git a/r-sig-mac/2024-August.txt b/r-sig-mac/2024-August.txt new file mode 100644 index 0000000..6dd088d --- /dev/null +++ b/r-sig-mac/2024-August.txt @@ -0,0 +1,238 @@ +From peterv@n@ummeren @end|ng |rom gm@||@com Fri Aug 23 11:15:44 2024 +From: peterv@n@ummeren @end|ng |rom gm@||@com (Peter van Summeren) +Date: Fri, 23 Aug 2024 11:15:44 +0200 +Subject: [R-SIG-Mac] fcaR plot not working +Message-ID: + +Hello, +I have a MacBook Air M1 + +I newly installed R and R studio. Also fcaR(formal concept analysis for R). +I followed fcaR via the help files. +Finally I did fc_planets$standardize() : worked. +Then: fc_planets$concepts$plot() +Warning message: +You have not installed the 'hasseDiagram' package, which is needed to plot the lattice. +So, I installed: +install.packages("hasseDiagram?) +Then: library("hasseDiagram?) +Finally: + fc_planets$concepts$plot() +It gave: +Warning in install.packages : + dependencies ?Rgraphviz?, ?graph? are not available +trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.4/hasseDiagram_0.2.0.tgz' +Content type 'application/x-gzip' length 37789 bytes (36 KB) +================================================== +downloaded 36 KB + +No idea what to do to get a plot of the lattice. +Any help would be appreciated. +Peter + + +From r@p@r@p@ @end|ng |rom mcw@edu Fri Aug 23 17:06:17 2024 +From: r@p@r@p@ @end|ng |rom mcw@edu (Sparapani, Rodney) +Date: Fri, 23 Aug 2024 15:06:17 +0000 +Subject: [R-SIG-Mac] fcaR plot not working +In-Reply-To: +References: +Message-ID: + +Hi Peter: + +I don?t know what the graph package is. But Rgraphviz is an archived package on CRAN? +https://cran.r-project.org/src/contrib/Archive/Rgraphviz/ + +-- +Rodney Sparapani, Associate Professor of Biostatistics, He/Him/His +President, Wisconsin Chapter of the American Statistical Association +Data Science Institute, Division of Biostatistics +Medical College of Wisconsin, Milwaukee Campus + +If this is outside of working hours, then please respond when convenient. + +From: R-SIG-Mac on behalf of Peter van Summeren +Date: Friday, August 23, 2024 at 4:44?AM +To: R-SIG-Mac at r-project.org +Subject: [R-SIG-Mac] fcaR plot not working +ATTENTION: This email originated from a sender outside of MCW. Use caution when clicking on links or opening attachments. +________________________________ + +Hello, +I have a MacBook Air M1 + +I newly installed R and R studio. Also fcaR(formal concept analysis for R). +I followed fcaR via the help files. +Finally I did fc_planets$standardize() : worked. +Then: fc_planets$concepts$plot() +Warning message: +You have not installed the 'hasseDiagram' package, which is needed to plot the lattice. +So, I installed: +install.packages("hasseDiagram?) +Then: library("hasseDiagram?) +Finally: + fc_planets$concepts$plot() +It gave: +Warning in install.packages : + dependencies ?Rgraphviz?, ?graph? are not available +trying URL 'https://urldefense.com/v3/__https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.4/hasseDiagram_0.2.0.tgz__;!!H8mHWRdzp34!5oi0FHg-_i3CfEk8FOCqWEVuAOVF_cgiitwPE6t-nvlbIckN5N9UXPvstxUJYtbLyVGCJay83ujsmvcIDmXE3Oh3CA$ ' +Content type 'application/x-gzip' length 37789 bytes (36 KB) +================================================== +downloaded 36 KB + +No idea what to do to get a plot of the lattice. +Any help would be appreciated. +Peter + +_______________________________________________ +R-SIG-Mac mailing list +R-SIG-Mac at r-project.org +https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-mac__;!!H8mHWRdzp34!5oi0FHg-_i3CfEk8FOCqWEVuAOVF_cgiitwPE6t-nvlbIckN5N9UXPvstxUJYtbLyVGCJay83ujsmvcIDmXIcMD7ng$ + + [[alternative HTML version deleted]] + + +From r|p|ey @end|ng |rom @t@t@@ox@@c@uk Fri Aug 23 17:32:22 2024 +From: r|p|ey @end|ng |rom @t@t@@ox@@c@uk (Prof Brian Ripley) +Date: Fri, 23 Aug 2024 16:32:22 +0100 +Subject: [R-SIG-Mac] fcaR plot not working +In-Reply-To: +References: + +Message-ID: <44e07634-8581-4b1d-bd17-80ca8ad87f05@stats.ox.ac.uk> + +On 23/08/2024 16:06, Sparapani, Rodney via R-SIG-Mac wrote: +> Hi Peter: +> +> I don?t know what the graph package is. But Rgraphviz is an archived package on CRAN? +> https://cran.r-project.org/src/contrib/Archive/Rgraphviz/ + +Both are current Bioconductor packages, so their software repository +needs to be selected. + +This is not really a macOS question. On any platform + +setRepositories(ind = c(1:4)) +install.packages(c('fcaR', 'hasseDiagram')) + +should work: I tested a vanilla R 4.4.1 on arm64 macOS. + +> +> -- +> Rodney Sparapani, Associate Professor of Biostatistics, He/Him/His +> President, Wisconsin Chapter of the American Statistical Association +> Data Science Institute, Division of Biostatistics +> Medical College of Wisconsin, Milwaukee Campus +> +> If this is outside of working hours, then please respond when convenient. +> +> From: R-SIG-Mac on behalf of Peter van Summeren +> Date: Friday, August 23, 2024 at 4:44?AM +> To: R-SIG-Mac at r-project.org +> Subject: [R-SIG-Mac] fcaR plot not working +> ATTENTION: This email originated from a sender outside of MCW. Use caution when clicking on links or opening attachments. +> ________________________________ +> +> Hello, +> I have a MacBook Air M1 +> +> I newly installed R and R studio. Also fcaR(formal concept analysis for R). +> I followed fcaR via the help files. +> Finally I did fc_planets$standardize() : worked. +> Then: fc_planets$concepts$plot() +> Warning message: +> You have not installed the 'hasseDiagram' package, which is needed to plot the lattice. +> So, I installed: +> install.packages("hasseDiagram?) +> Then: library("hasseDiagram?) +> Finally: +> fc_planets$concepts$plot() +> It gave: +> Warning in install.packages : +> dependencies ?Rgraphviz?, ?graph? are not available +> trying URL 'https://urldefense.com/v3/__https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.4/hasseDiagram_0.2.0.tgz__;!!H8mHWRdzp34!5oi0FHg-_i3CfEk8FOCqWEVuAOVF_cgiitwPE6t-nvlbIckN5N9UXPvstxUJYtbLyVGCJay83ujsmvcIDmXE3Oh3CA$ ' +> Content type 'application/x-gzip' length 37789 bytes (36 KB) +> ================================================== +> downloaded 36 KB +> +> No idea what to do to get a plot of the lattice. +> Any help would be appreciated. +> Peter +> +> _______________________________________________ +> R-SIG-Mac mailing list +> R-SIG-Mac at r-project.org +> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-mac__;!!H8mHWRdzp34!5oi0FHg-_i3CfEk8FOCqWEVuAOVF_cgiitwPE6t-nvlbIckN5N9UXPvstxUJYtbLyVGCJay83ujsmvcIDmXIcMD7ng$ +> +> [[alternative HTML version deleted]] +> +> _______________________________________________ +> R-SIG-Mac mailing list +> R-SIG-Mac at r-project.org +> https://stat.ethz.ch/mailman/listinfo/r-sig-mac + + +-- +Brian D. Ripley, ripley at stats.ox.ac.uk +Emeritus Professor of Applied Statistics, University of Oxford + + +From @|mon@urb@nek @end|ng |rom R-project@org Thu Aug 29 01:26:52 2024 +From: @|mon@urb@nek @end|ng |rom R-project@org (Simon Urbanek) +Date: Thu, 29 Aug 2024 11:26:52 +1200 +Subject: [R-SIG-Mac] fcaR plot not working +In-Reply-To: +References: +Message-ID: + +Peter, + +hasseDiagram unfortunately depends on packages outside of CRAN, namely in Bioconductor, so you have to add the corresponding repositories before you install it, e.g.: + +setRepositories(ind=1:4) +install.packages("hasseDiagram") +# also installing the dependencies ?BiocGenerics?, ?Rgraphviz?, 'graph? + +Ideally, hasseDiagram would warn the user and provide an informative error, but that you?d have to take up with the author. + +Cheers, +Simon + + + +> On 23 Aug 2024, at 21:15, Peter van Summeren wrote: +> +> Hello, +> I have a MacBook Air M1 +> +> I newly installed R and R studio. Also fcaR(formal concept analysis for R). +> I followed fcaR via the help files. +> Finally I did fc_planets$standardize() : worked. +> Then: fc_planets$concepts$plot() +> Warning message: +> You have not installed the 'hasseDiagram' package, which is needed to plot the lattice. +> So, I installed: +> install.packages("hasseDiagram?) +> Then: library("hasseDiagram?) +> Finally: +> fc_planets$concepts$plot() +> It gave: +> Warning in install.packages : +> dependencies ?Rgraphviz?, ?graph? are not available +> trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.4/hasseDiagram_0.2.0.tgz' +> Content type 'application/x-gzip' length 37789 bytes (36 KB) +> ================================================== +> downloaded 36 KB +> +> No idea what to do to get a plot of the lattice. +> Any help would be appreciated. +> Peter +> +> _______________________________________________ +> R-SIG-Mac mailing list +> R-SIG-Mac at r-project.org +> https://stat.ethz.ch/mailman/listinfo/r-sig-mac +> + + diff --git a/r-sig-mac/2024-September.txt b/r-sig-mac/2024-September.txt new file mode 100644 index 0000000..ae1e484 --- /dev/null +++ b/r-sig-mac/2024-September.txt @@ -0,0 +1,556 @@ +From murdoch@dunc@n @end|ng |rom gm@||@com Sun Sep 8 11:23:36 2024 +From: murdoch@dunc@n @end|ng |rom gm@||@com (Duncan Murdoch) +Date: Sun, 8 Sep 2024 05:23:36 -0400 +Subject: [R-SIG-Mac] Bug in reading UTF-16LE file? +Message-ID: <7cbae008-78fe-48b1-9551-ece52beb495d@gmail.com> + +To R-SIG-Mac, with a copy to Jeff Newmiller: + +On R-help there's a thread about reading a remote file that is coded in +UTF-16LE with a byte-order mark. Jeff Newmiller pointed out +(https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that +it would be better to declare the encoding as "UTF-16", because the BOM +will indicate little endian. + +I tried this on my Mac running R 4.4.1, and it didn't work. I get the +same incorrect result from all of these commands: + + # Automatically recognizing a URL and using fileEncoding: + read.delim( + +'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', + fileEncoding = "UTF-16" + ) + + # Using explicit url() with encoding: + read.delim( + +url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', + encoding = "UTF-16") + ) + + # Specifying the endianness incorrectly: + read.delim( + +url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', + encoding = "UTF-16BE") + ) + +The only way I get the correct result is if I specify "UTF-16LE" +explicitly, whereas Jeff got correct results on several different +systems using "UTF-16". + +Is this a MacOS bug or an R for MacOS bug? + +Duncan Murdoch + + +From jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ Mon Sep 9 00:41:00 2024 +From: jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ (Jeff Newmiller) +Date: Sun, 08 Sep 2024 15:41:00 -0700 +Subject: [R-SIG-Mac] Bug in reading UTF-16LE file? +In-Reply-To: <7cbae008-78fe-48b1-9551-ece52beb495d@gmail.com> +References: <7cbae008-78fe-48b1-9551-ece52beb495d@gmail.com> +Message-ID: <268E971B-BFAC-43D3-9B98-44CF23F92A5E@dcn.davis.ca.us> + +I don't know whether MacOSX uses libiconv, but I was looking at libiconv-1.17/lib/utf16.h and utf16_mbtowc assumes the first argument has an istate element that is pre-initialized to the architecture endianness. I don't have time to keep digging into this right now (and no ARM mac to debug on), but if that was somehow always set to LE in this context (by R?) then I think that would explain this behavior. + +I know, most people will just bail on UTF16 and use the UTF16LE hack (hacky because the BOM is there you aren't supposed to use LE) to get on with life, but this seems to me like an unfortunate failure to follow the standard that ought to have been noticed by now. [1] + +[1] https://unicode.org/faq/utf_bom.html#bom10 item (4)... don't mix LE/BE specification with data that has a BOM. + +On September 8, 2024 2:23:36 AM PDT, Duncan Murdoch wrote: +>To R-SIG-Mac, with a copy to Jeff Newmiller: +> +>On R-help there's a thread about reading a remote file that is coded in UTF-16LE with a byte-order mark. Jeff Newmiller pointed out (https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that it would be better to declare the encoding as "UTF-16", because the BOM will indicate little endian. +> +>I tried this on my Mac running R 4.4.1, and it didn't work. I get the same incorrect result from all of these commands: +> +> # Automatically recognizing a URL and using fileEncoding: +> read.delim( +> +>'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +> fileEncoding = "UTF-16" +> ) +> +> # Using explicit url() with encoding: +> read.delim( +> +>url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +> encoding = "UTF-16") +> ) +> +> # Specifying the endianness incorrectly: +> read.delim( +> +>url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +> encoding = "UTF-16BE") +> ) +> +>The only way I get the correct result is if I specify "UTF-16LE" explicitly, whereas Jeff got correct results on several different systems using "UTF-16". +> +>Is this a MacOS bug or an R for MacOS bug? +> +>Duncan Murdoch + +-- +Sent from my phone. Please excuse my brevity. + + +From @|mon@urb@nek @end|ng |rom R-project@org Mon Sep 9 01:11:40 2024 +From: @|mon@urb@nek @end|ng |rom R-project@org (Simon Urbanek) +Date: Mon, 9 Sep 2024 11:11:40 +1200 +Subject: [R-SIG-Mac] Bug in reading UTF-16LE file? +In-Reply-To: <7cbae008-78fe-48b1-9551-ece52beb495d@gmail.com> +References: <7cbae008-78fe-48b1-9551-ece52beb495d@gmail.com> +Message-ID: + +From the help page: + + The encodings ?"UCS-2LE"? and ?"UTF-16LE"? are treated specially, + as they are appropriate values for Windows ?Unicode? text files. + If the first two bytes are the Byte Order Mark ?0xFEFF? then these + are removed as some implementations of ?iconv? do not accept BOMs. + +so "UTF-16LE" is the documented way to reliably read such files. + +Cheers, +Simon + + + +> On 8 Sep 2024, at 21:23, Duncan Murdoch wrote: +> +> To R-SIG-Mac, with a copy to Jeff Newmiller: +> +> On R-help there's a thread about reading a remote file that is coded in UTF-16LE with a byte-order mark. Jeff Newmiller pointed out (https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that it would be better to declare the encoding as "UTF-16", because the BOM will indicate little endian. +> +> I tried this on my Mac running R 4.4.1, and it didn't work. I get the same incorrect result from all of these commands: +> +> # Automatically recognizing a URL and using fileEncoding: +> read.delim( +> 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +> fileEncoding = "UTF-16" +> ) +> +> # Using explicit url() with encoding: +> read.delim( +> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +> encoding = "UTF-16") +> ) +> +> # Specifying the endianness incorrectly: +> read.delim( +> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +> encoding = "UTF-16BE") +> ) +> +> The only way I get the correct result is if I specify "UTF-16LE" explicitly, whereas Jeff got correct results on several different systems using "UTF-16". +> +> Is this a MacOS bug or an R for MacOS bug? +> +> Duncan Murdoch +> +> _______________________________________________ +> R-SIG-Mac mailing list +> R-SIG-Mac at r-project.org +> https://stat.ethz.ch/mailman/listinfo/r-sig-mac +> + + +From pd@|gd @end|ng |rom gm@||@com Mon Sep 9 10:53:45 2024 +From: pd@|gd @end|ng |rom gm@||@com (peter dalgaard) +Date: Mon, 9 Sep 2024 10:53:45 +0200 +Subject: [R-SIG-Mac] Bug in reading UTF-16LE file? +In-Reply-To: +References: <7cbae008-78fe-48b1-9551-ece52beb495d@gmail.com> + +Message-ID: + +I am confused, and maybe I should just butt out of this, but: + +(a) BOM are designed to, um, mark the byte order... + +(b) in connections.c we have + + if(checkBOM && con->inavail >= 2 && + ((int)con->iconvbuff[0] & 0xff) == 255 && + ((int)con->iconvbuff[1] & 0xff) == 254) { + con->inavail -= (short) 2; + memmove(con->iconvbuff, con->iconvbuff+2, con->inavail); + } + +which checks for the two first bytes being FF, FE. However, a big-endian BOM would be FE, FF and I see no check for that. + +Duncan's file starts + +> readBin('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', what="raw", n=10) + [1] ff fe 74 00 69 00 6d 00 65 00 + +so the BOM does indeed indicate little-endian, but apparently we proceed to discard it and read the file with system (big-)endianness, which strikes me as just plain wrong... + +I see no Mac-specific code for this, only win_iconv.c, so presumably we have potential issues on everything non-Windows? + +-pd + +> On 9 Sep 2024, at 01:11 , Simon Urbanek wrote: +> +> From the help page: +> +> The encodings ?"UCS-2LE"? and ?"UTF-16LE"? are treated specially, +> as they are appropriate values for Windows ?Unicode? text files. +> If the first two bytes are the Byte Order Mark ?0xFEFF? then these +> are removed as some implementations of ?iconv? do not accept BOMs. +> +> so "UTF-16LE" is the documented way to reliably read such files. +> +> Cheers, +> Simon +> +> +> +>> On 8 Sep 2024, at 21:23, Duncan Murdoch wrote: +>> +>> To R-SIG-Mac, with a copy to Jeff Newmiller: +>> +>> On R-help there's a thread about reading a remote file that is coded in UTF-16LE with a byte-order mark. Jeff Newmiller pointed out (https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that it would be better to declare the encoding as "UTF-16", because the BOM will indicate little endian. +>> +>> I tried this on my Mac running R 4.4.1, and it didn't work. I get the same incorrect result from all of these commands: +>> +>> # Automatically recognizing a URL and using fileEncoding: +>> read.delim( +>> 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>> fileEncoding = "UTF-16" +>> ) +>> +>> # Using explicit url() with encoding: +>> read.delim( +>> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>> encoding = "UTF-16") +>> ) +>> +>> # Specifying the endianness incorrectly: +>> read.delim( +>> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>> encoding = "UTF-16BE") +>> ) +>> +>> The only way I get the correct result is if I specify "UTF-16LE" explicitly, whereas Jeff got correct results on several different systems using "UTF-16". +>> +>> Is this a MacOS bug or an R for MacOS bug? +>> +>> Duncan Murdoch +>> +>> _______________________________________________ +>> R-SIG-Mac mailing list +>> R-SIG-Mac at r-project.org +>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac +>> +> +> _______________________________________________ +> R-SIG-Mac mailing list +> R-SIG-Mac at r-project.org +> https://stat.ethz.ch/mailman/listinfo/r-sig-mac + +-- +Peter Dalgaard, Professor, +Center for Statistics, Copenhagen Business School +Solbjerg Plads 3, 2000 Frederiksberg, Denmark +Phone: (+45)38153501 +Office: A 4.23 +Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com + + +From r|p|ey @end|ng |rom @t@t@@ox@@c@uk Mon Sep 9 11:30:12 2024 +From: r|p|ey @end|ng |rom @t@t@@ox@@c@uk (Prof Brian Ripley) +Date: Mon, 9 Sep 2024 10:30:12 +0100 +Subject: [R-SIG-Mac] Bug in reading UTF-16LE file? +In-Reply-To: <268E971B-BFAC-43D3-9B98-44CF23F92A5E@dcn.davis.ca.us> +References: <7cbae008-78fe-48b1-9551-ece52beb495d@gmail.com> + <268E971B-BFAC-43D3-9B98-44CF23F92A5E@dcn.davis.ca.us> +Message-ID: + +On 08/09/2024 23:41, Jeff Newmiller via R-SIG-Mac wrote: +> I don't know whether MacOSX uses libiconv, + +It no longer does although reports compatibility with GNU libiconv 1.13. +It is not at all compatible, which has caused a lot of extra work, not +least as the incompatibilities have been changed/increased at point +releases of macOS 14. OTOH, the minimum requirement of R's binary macOS +builds does use libiconv, probably 1.11 (which is old, 2006). So +testing iconv on macOS is a lottery. + +Note that neither Linux nor Windows use GNU libiconv, and AFAIR neither +does recent FreebSD. Last year when I worked on iconv I did not find a +platform currently using GNU libiconv and had to use a temporary +installation from the sources. + + > but I was looking at libiconv-1.17/lib/utf16.h and utf16_mbtowc +assumes the first argument has an istate element that is pre-initialized +to the architecture endianness. I don't have time to keep digging into +this right now (and no ARM mac to debug on), but if that was somehow +always set to LE in this context (by R?) then I think that would explain +this behavior. +> +> I know, most people will just bail on UTF16 and use the UTF16LE hack (hacky because the BOM is there you aren't supposed to use LE) to get on with life, but this seems to me like an unfortunate failure to follow the standard that ought to have been noticed by now. [1] +> +> [1] https://unicode.org/faq/utf_bom.html#bom10 item (4)... don't mix LE/BE specification with data that has a BOM. +> +> On September 8, 2024 2:23:36 AM PDT, Duncan Murdoch wrote: +>> To R-SIG-Mac, with a copy to Jeff Newmiller: +>> +>> On R-help there's a thread about reading a remote file that is coded in UTF-16LE with a byte-order mark. Jeff Newmiller pointed out (https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that it would be better to declare the encoding as "UTF-16", because the BOM will indicate little endian. +>> +>> I tried this on my Mac running R 4.4.1, and it didn't work. I get the same incorrect result from all of these commands: +>> +>> # Automatically recognizing a URL and using fileEncoding: +>> read.delim( +>> +>> 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>> fileEncoding = "UTF-16" +>> ) +>> +>> # Using explicit url() with encoding: +>> read.delim( +>> +>> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>> encoding = "UTF-16") +>> ) +>> +>> # Specifying the endianness incorrectly: +>> read.delim( +>> +>> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>> encoding = "UTF-16BE") +>> ) +>> +>> The only way I get the correct result is if I specify "UTF-16LE" explicitly, whereas Jeff got correct results on several different systems using "UTF-16". +>> +>> Is this a MacOS bug or an R for MacOS bug? +>> +>> Duncan Murdoch +> + + +-- +Brian D. Ripley, ripley at stats.ox.ac.uk +Emeritus Professor of Applied Statistics, University of Oxford + + +From tom@@@k@||ber@ @end|ng |rom gm@||@com Mon Sep 9 12:53:25 2024 +From: tom@@@k@||ber@ @end|ng |rom gm@||@com (Tomas Kalibera) +Date: Mon, 9 Sep 2024 12:53:25 +0200 +Subject: [R-SIG-Mac] Bug in reading UTF-16LE file? +In-Reply-To: +References: <7cbae008-78fe-48b1-9551-ece52beb495d@gmail.com> + + +Message-ID: <1a7efc08-8619-4ed2-9471-369bb6127a64@gmail.com> + + +On 9/9/24 10:53, peter dalgaard wrote: +> I am confused, and maybe I should just butt out of this, but: +> +> (a) BOM are designed to, um, mark the byte order... +> +> (b) in connections.c we have +> +> if(checkBOM && con->inavail >= 2 && +> ((int)con->iconvbuff[0] & 0xff) == 255 && +> ((int)con->iconvbuff[1] & 0xff) == 254) { +> con->inavail -= (short) 2; +> memmove(con->iconvbuff, con->iconvbuff+2, con->inavail); +> } +> +> which checks for the two first bytes being FF, FE. However, a big-endian BOM would be FE, FF and I see no check for that. +I think this is correct, it is executed only for encodings declared +little-endian (UTF-16LE, UCS2-LE) - so, iconv will still know what is +the byte-order from the name of the encoding, it will just not see the +same information in the BOM. +> +> Duncan's file starts +> +>> readBin('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', what="raw", n=10) +> [1] ff fe 74 00 69 00 6d 00 65 00 +> +> so the BOM does indeed indicate little-endian, but apparently we proceed to discard it and read the file with system (big-)endianness, which strikes me as just plain wrong... +I've tested we are not discarding it by the code above and that iconv +gets to see the BOM bytes. +> +> I see no Mac-specific code for this, only win_iconv.c, so presumably we have potential issues on everything non-Windows? + +I can reproduce the problem and will have a closer look, it may still be +there is a bug in R. We have some work-arounds for recent iconv issues +on macOS in sysutils.c. + +Tomas + +> +> -pd +> +>> On 9 Sep 2024, at 01:11 , Simon Urbanek wrote: +>> +>> From the help page: +>> +>> The encodings ?"UCS-2LE"? and ?"UTF-16LE"? are treated specially, +>> as they are appropriate values for Windows ?Unicode? text files. +>> If the first two bytes are the Byte Order Mark ?0xFEFF? then these +>> are removed as some implementations of ?iconv? do not accept BOMs. +>> +>> so "UTF-16LE" is the documented way to reliably read such files. +>> +>> Cheers, +>> Simon +>> +>> +>> +>>> On 8 Sep 2024, at 21:23, Duncan Murdoch wrote: +>>> +>>> To R-SIG-Mac, with a copy to Jeff Newmiller: +>>> +>>> On R-help there's a thread about reading a remote file that is coded in UTF-16LE with a byte-order mark. Jeff Newmiller pointed out (https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that it would be better to declare the encoding as "UTF-16", because the BOM will indicate little endian. +>>> +>>> I tried this on my Mac running R 4.4.1, and it didn't work. I get the same incorrect result from all of these commands: +>>> +>>> # Automatically recognizing a URL and using fileEncoding: +>>> read.delim( +>>> 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>>> fileEncoding = "UTF-16" +>>> ) +>>> +>>> # Using explicit url() with encoding: +>>> read.delim( +>>> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>>> encoding = "UTF-16") +>>> ) +>>> +>>> # Specifying the endianness incorrectly: +>>> read.delim( +>>> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>>> encoding = "UTF-16BE") +>>> ) +>>> +>>> The only way I get the correct result is if I specify "UTF-16LE" explicitly, whereas Jeff got correct results on several different systems using "UTF-16". +>>> +>>> Is this a MacOS bug or an R for MacOS bug? +>>> +>>> Duncan Murdoch +>>> +>>> _______________________________________________ +>>> R-SIG-Mac mailing list +>>> R-SIG-Mac at r-project.org +>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac +>>> +>> _______________________________________________ +>> R-SIG-Mac mailing list +>> R-SIG-Mac at r-project.org +>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac + + +From jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ Mon Sep 9 16:54:13 2024 +From: jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@ (Jeff Newmiller) +Date: Mon, 09 Sep 2024 07:54:13 -0700 +Subject: [R-SIG-Mac] Bug in reading UTF-16LE file? +In-Reply-To: +References: <7cbae008-78fe-48b1-9551-ece52beb495d@gmail.com> + + +Message-ID: <276DB8AB-7AB6-48A6-88BF-AF17CB2B14BE@dcn.davis.ca.us> + +Definitely not about R... but to the question: + +All C compilers (well, really all computer languages) logically regard integers as big-endian, regardless of whether the underlying bytes are BE or LE. Converting a byte stream (bytes) to wide character data (ints or uints) only needs to swap bytes in the LE case using bit shifting. + +You cannot rely on "same as my architecture" pointer re-interpretation of multi-byte values because most of the time the word size won't match and even if it does the word-boundary alignment will usually be off and the pointer dereference will fail. + +On September 9, 2024 1:53:45 AM PDT, peter dalgaard wrote: +>I am confused, and maybe I should just butt out of this, but: +> +>(a) BOM are designed to, um, mark the byte order... +> +>(b) in connections.c we have +> +> if(checkBOM && con->inavail >= 2 && +> ((int)con->iconvbuff[0] & 0xff) == 255 && +> ((int)con->iconvbuff[1] & 0xff) == 254) { +> con->inavail -= (short) 2; +> memmove(con->iconvbuff, con->iconvbuff+2, con->inavail); +> } +> +>which checks for the two first bytes being FF, FE. However, a big-endian BOM would be FE, FF and I see no check for that. +> +>Duncan's file starts +> +>> readBin('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', what="raw", n=10) +> [1] ff fe 74 00 69 00 6d 00 65 00 +> +>so the BOM does indeed indicate little-endian, but apparently we proceed to discard it and read the file with system (big-)endianness, which strikes me as just plain wrong... +> +>I see no Mac-specific code for this, only win_iconv.c, so presumably we have potential issues on everything non-Windows? +> +>-pd +> +>> On 9 Sep 2024, at 01:11 , Simon Urbanek wrote: +>> +>> From the help page: +>> +>> The encodings ?"UCS-2LE"? and ?"UTF-16LE"? are treated specially, +>> as they are appropriate values for Windows ?Unicode? text files. +>> If the first two bytes are the Byte Order Mark ?0xFEFF? then these +>> are removed as some implementations of ?iconv? do not accept BOMs. +>> +>> so "UTF-16LE" is the documented way to reliably read such files. +>> +>> Cheers, +>> Simon +>> +>> +>> +>>> On 8 Sep 2024, at 21:23, Duncan Murdoch wrote: +>>> +>>> To R-SIG-Mac, with a copy to Jeff Newmiller: +>>> +>>> On R-help there's a thread about reading a remote file that is coded in UTF-16LE with a byte-order mark. Jeff Newmiller pointed out (https://stat.ethz.ch/pipermail/r-help/2024-September/479933.html) that it would be better to declare the encoding as "UTF-16", because the BOM will indicate little endian. +>>> +>>> I tried this on my Mac running R 4.4.1, and it didn't work. I get the same incorrect result from all of these commands: +>>> +>>> # Automatically recognizing a URL and using fileEncoding: +>>> read.delim( +>>> 'https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>>> fileEncoding = "UTF-16" +>>> ) +>>> +>>> # Using explicit url() with encoding: +>>> read.delim( +>>> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>>> encoding = "UTF-16") +>>> ) +>>> +>>> # Specifying the endianness incorrectly: +>>> read.delim( +>>> url('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt', +>>> encoding = "UTF-16BE") +>>> ) +>>> +>>> The only way I get the correct result is if I specify "UTF-16LE" explicitly, whereas Jeff got correct results on several different systems using "UTF-16". +>>> +>>> Is this a MacOS bug or an R for MacOS bug? +>>> +>>> Duncan Murdoch +>>> +>>> _______________________________________________ +>>> R-SIG-Mac mailing list +>>> R-SIG-Mac at r-project.org +>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac +>>> +>> +>> _______________________________________________ +>> R-SIG-Mac mailing list +>> R-SIG-Mac at r-project.org +>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac +> + +-- +Sent from my phone. Please excuse my brevity. + +