Bump version.

RumbleDB · Nov 1, 2021 · 205bdba · 205bdba
1 parent a9efd99
commit 205bdba
Show file tree

Hide file tree

Showing 11 changed files with 195 additions and 47 deletions.
diff --git a/LICENSE-Apache-Commons-IO.txt b/LICENSE-Apache-Commons-IO.txt
@@ -0,0 +1,113 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and  To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/LICENSE.txt b/LICENSE.txt
@@ -39,6 +39,12 @@ For Apache Commons Lang 1.6
 
 See license/LICENSE-Apache-Commons-Lang.txt
 
+========================================================================
+For Apache Commons IO 2.11.0
+========================================================================
+
+See license/LICENSE-Apache-Commons-IO.txt
+
 ========================================================================
 For Apache HttpClient
 ========================================================================

diff --git a/docs/Getting started.md b/docs/Getting started.md
@@ -43,7 +43,7 @@ Create, in the same directory as RumbleDB to keep it simple, a file data.json an
 
 In a shell, from the directory where the RumbleDB .jar lies, type, all on one line:
 
-    spark-submit rumbledb-1.15.0.jar --shell yes
+    spark-submit rumbledb-1.16.0.jar --shell yes
                  
 The RumbleDB shell appears:
 

diff --git a/docs/HTTPServer.md b/docs/HTTPServer.md
@@ -4,7 +4,7 @@
 
 RumbleDB can be run as an HTTP server that listens for queries. In order to do so, you can use the --server and --port parameters:
 
-    spark-submit rumbledb-1.15.0.jar --server yes --port 8001
+    spark-submit rumbledb-1.16.0.jar --server yes --port 8001
 
 This command will not return until you force it to (Ctrl+C on Linux and Mac). This is because the server has to run permanently to listen to incoming requests.
 
@@ -94,19 +94,19 @@ Then there are two options
 - Connect to the master with SSH with an extra parameter for securely tunneling the HTTP connection (for example `-L 8001:localhost:8001` or any port of your choosing)
 - Download the RumbleDB jar to the master node
 
-    wget https://github.com/RumbleDB/rumble/releases/download/v1.12.0/rumbledb-1.15.0.jar
+    wget https://github.com/RumbleDB/rumble/releases/download/v1.12.0/rumbledb-1.16.0.jar
 
 - Launch the HTTP server on the master node (it will be accessible under `http://localhost:8001/jsoniq`).
 
-    spark-submit rumbledb-1.15.0.jar --server yes --port 8001
+    spark-submit rumbledb-1.16.0.jar --server yes --port 8001
 
 - And then use Jupyter notebooks in the same way you would do it locally (it magically works because of the tunneling)
 
 ### With the EC2 hostname
 
 There is also another way that does not need any tunnelling: you can specify the hostname of your EC2 machine (copied over from the EC2 dashboard) with the --host parameter. For example, with the placeholder <ec2-hostname>:
 
-    spark-submit rumbledb-1.15.0.jar --server yes --port 8001 --host <ec2-hostname>
+    spark-submit rumbledb-1.16.0.jar --server yes --port 8001 --host <ec2-hostname>
 
 You also need to make sure in your EMR security group that the chosen port (e.g., 8001) is accessible from the machine in which you run your Jupyter notebook. Then, you can point your Jupyter notebook on this machine to `http://<ec2-hostname>:8001/jsoniq`.
 

diff --git a/docs/Licenses.md b/docs/Licenses.md
@@ -5,6 +5,7 @@ RumbleDB uses the following software:
 - ANTLR v4 Framework - BSD License
 - Apache Commons Text - Apache License
 - Apache Commons Lang - Apache License
+- Apache Commons IO - Apache License
 - Apache HTTP client - Apache License
 - gson - Apache License
 - JLine terminal framework - BSD License

diff --git a/docs/Run on a cluster.md b/docs/Run on a cluster.md
@@ -5,16 +5,16 @@ simply by modifying the command line parameters as documented [here for spark-su
 
 If the Spark cluster is running on yarn, then the --master option can be changed from local[\*] to yarn compared to the getting started guide. Most of the time, though (e.g., on Amazon EMR), it needs not be specified, as this is already set up in the environment.
 
-    spark-submit rumbledb-1.15.0.jar --shell yes
+    spark-submit rumbledb-1.16.0.jar --shell yes
                  
 or explicitly:
 
-    spark-submit --master yarn --deploy-mode client rumbledb-1.15.0.jar --shell yes
+    spark-submit --master yarn --deploy-mode client rumbledb-1.16.0.jar --shell yes
 
 You can also adapt the number of executors, etc.
 
     spark-submit --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 rumbledb-1.15.0.jar --shell yes
+                 rumbledb-1.16.0.jar --shell yes
 
 The size limit for materialization can also be made higher with --materialization-cap (the default is 200). This affects the number of items displayed on the shells as an answer to a query, as well as any materializations happening within the query with push-down is not supported. Warnings are issued if the cap is reached.
 
@@ -59,15 +59,15 @@ Note that by default only the first 1000 items in the output will be displayed o
 RumbleDB also supports executing a single query from the command line, reading from HDFS and outputting the results to HDFS, with the query file being either local or on HDFS. For this, use the --query-path, --output-path and --log-path parameters.
 
     spark-submit --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 rumbledb-1.15.0.jar
+                 rumbledb-1.16.0.jar
                  --query-path "hdfs:///user/me/query.jq"
                  --output-path "hdfs:///user/me/results/output"
                  --log-path "hdfs:///user/me/logging/mylog"
 
 The query path, output path and log path can be any of the supported schemes (HDFS, file, S3, WASB...) and can be relative or absolute.
 
     spark-submit --num-executors 30 --executor-cores 3 --executor-memory 10g
-                 rumbledb-1.15.0.jar
+                 rumbledb-1.16.0.jar
                  --query-path "/home/me/my-local-machine/query.jq"
                  --output-path "/user/me/results/output"
                  --log-path "hdfs:///user/me/logging/mylog"

diff --git a/docs/install.md b/docs/install.md
@@ -64,7 +64,7 @@ After successful completion, you can check the `target` directory, which should
 
 The most straightforward to test if the above steps were successful is to run the RumbleDB shell locally, like so:
 
-    $ spark-submit target/rumbledb-1.15.0.jar --shell yes
+    $ spark-submit target/rumbledb-1.16.0.jar --shell yes
 
 The RumbleDB shell should start:
 

diff --git a/pom.xml b/pom.xml
@@ -26,7 +26,7 @@
 
     <groupId>com.github.rumbledb</groupId>
     <artifactId>rumbledb</artifactId>
-    <version>1.15.0</version>
+    <version>1.16.0</version>
     <packaging>jar</packaging>
     <name>RumbleDB</name>
     <description>A JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.</description>
@@ -200,19 +200,19 @@
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-core_2.12</artifactId>
-            <version>3.1.2</version>
+            <version>3.0.3</version>
             <scope>provided</scope>
         </dependency>
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-sql_2.12</artifactId>
-            <version>3.1.2</version>
+            <version>3.0.3</version>
             <scope>provided</scope>
         </dependency>
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-mllib_2.12</artifactId>
-            <version>3.1.2</version>
+            <version>3.0.3</version>
             <scope>provided</scope>
         </dependency>
         <dependency>
@@ -224,12 +224,12 @@
         <dependency>
             <groupId>org.apache.spark</groupId>
             <artifactId>spark-avro_2.12</artifactId>
-            <version>3.1.2</version>
+            <version>3.0.3</version>
         </dependency>
         <dependency>
             <groupId>org.antlr</groupId>
             <artifactId>antlr4-runtime</artifactId>
-            <version>4.8</version>
+            <version>4.7</version>
         </dependency>
         <dependency>
             <groupId>org.jline</groupId>
@@ -256,6 +256,11 @@
             <groupId>org.apache.commons</groupId>
             <artifactId>commons-lang3</artifactId>
             <version>3.9</version>
+        </dependency>
+		<dependency>
+            <groupId>commons-io</groupId>
+            <artifactId>commons-io</artifactId>
+            <version>2.11.0</version>
         </dependency>
 		<dependency>
 		  <groupId>org.apache.httpcomponents</groupId>

diff --git a/src/main/java/org/rumbledb/cli/Main.java b/src/main/java/org/rumbledb/cli/Main.java
@@ -22,6 +22,7 @@
 import java.io.IOException;
 import java.net.ConnectException;
 
+import org.apache.commons.io.IOUtils;
 import org.apache.spark.SparkException;
 import org.rumbledb.config.RumbleRuntimeConfiguration;
 import org.rumbledb.exceptions.OurBadException;
@@ -47,35 +48,9 @@ public static void main(String[] args) throws IOException {
             } else if (sparksoniqConf.getQuery() != null || sparksoniqConf.getQueryPath() != null) {
                 runQueryExecutor(sparksoniqConf);
             } else {
-                System.out.println("    ____                  __    __   ");
-                System.out.println("   / __ \\__  ______ ___  / /_  / /__ ");
-                System.out.println("  / /_/ / / / / __ `__ \\/ __ \\/ / _ \\");
-                System.out.println(" / _, _/ /_/ / / / / / / /_/ / /  __/");
-                System.out.println("/_/ |_|\\__,_/_/ /_/ /_/_.___/_/\\___/ ");
-                System.out.println("Usage:");
-                System.out.println("spark-submit <Spark arguments> <path to rumble jar> <Rumble arguments>");
-                System.out.println("");
-                System.out.println("Example usage:");
-                System.out.println("spark-submit rumbledb-1.15.0.jar --query '1+1'");
-                System.out.println("spark-submit rumbledb-1.15.0.jar --shell yes");
-                System.out.println("spark-submit --master local[*] rumbledb-1.15.0.jar --shell yes");
-                System.out.println("spark-submit --master local[2] rumbledb-1.15.0.jar --shell yes");
-                System.out.println(
-                    "spark-submit --master local[*] --driver-memory 10G rumbledb-1.15.0.jar --shell yes"
-                );
-                System.out.println("");
-                System.out.println("spark-submit --master yarn rumbledb-1.15.0.jar --shell yes");
-                System.out.println(
-                    "spark-submit --master yarn --executor-cores 3 --executor-memory 5G rumbledb-1.15.0.jar --shell yes"
-                );
-                System.out.println("spark-submit --master local[*] rumbledb-1.15.0.jar --query-path my-query.jq");
-                System.out.println("spark-submit --master local[*] rumbledb-1.15.0.jar --query-path my-query.jq");
-                System.out.println(
-                    "spark-submit --master yarn --executor-cores 3 --executor-memory 5G rumbledb-1.15.0.jar --query-path hdfs://server:port/my-query.jq --output-path hdfs://server:port/my-output.json"
-                );
-                System.out.println(
-                    "spark-submit --master local[*] rumbledb-1.15.0.jar --query-path my-query.jq --output-path my-output.json --log-path my-log.txt"
-                );
+                System.out.println(IOUtils.toString(Main.class.getResourceAsStream("/assets/banner.txt"), "UTF-8"));
+                System.out.println();
+                System.out.println(IOUtils.toString(Main.class.getResourceAsStream("/assets/defaultscreen.txt"), "UTF-8"));
             }
             System.exit(0);
         } catch (Exception ex) {

diff --git a/src/main/resources/assets/banner.txt b/src/main/resources/assets/banner.txt
@@ -1,6 +1,6 @@
     ____                  __    __     ____  ____ 
    / __ \__  ______ ___  / /_  / /__  / __ \/ __ )
   / /_/ / / / / __ `__ \/ __ \/ / _ \/ / / / __  |  The distributed JSONiq engine
- / _, _/ /_/ / / / / / / /_/ / /  __/ /_/ / /_/ /   1.15.0 "Ivory Palm" beta
+ / _, _/ /_/ / / / / / / /_/ / /  __/ /_/ / /_/ /   1.16.0 "Shagbark Hickory" beta
 /_/ |_|\__,_/_/ /_/ /_/_.___/_/\___/_____/_____/  
 
diff --git a/src/main/resources/assets/defaultscreen.txt b/src/main/resources/assets/defaultscreen.txt
@@ -0,0 +1,48 @@
+RumbleDB is a JSONiq engine that can be used both on your laptop or on a
+cluster (e.g. with Amazon EMR or Azure HDInsight).
+
+It runs on top of Apache Spark and must be invoked with spark-submit, both for
+local use and for cluster use. Spark must be installed either on your laptop,
+or on the cluster.
+
+Usage:
+spark-submit <Spark arguments> <path to RumbleDB's jar> <Rumble arguments>
+
+You can RumbleDB on a shell with:
+spark-submit rumbledb-1.16.0.jar --shell yes
+
+You can directly inline a query on the command line with:
+spark-submit rumbledb-1.16.0.jar --query '1+1'
+
+You can specify an output path with:
+spark-submit rumbledb-1.16.0.jar --query '1+1' --output-path my-output.txt
+
+You can specify a query path with:
+spark-submit rumbledb-1.16.0.jar --query-path my-query.jq
+
+You can run it as an HTTP server (e.g., for use with a Jupyter notebook) with:
+spark-submit rumbledb-1.16.0.jar --server yes --port 9090
+
+RumbleDB also supports Apache Livy for use in Jupyter notebooks, which may be
+even more convenient if you are using a cluster.
+
+For local use, you can control the number of cores, as well as allocated
+memory, with:
+spark-submit --master local[*] rumbledb-1.16.0.jar --shell yes
+spark-submit --master local[2] rumbledb-1.16.0.jar --shell yes
+spark-submit --master local[*] --driver-memory 10G rumbledb-1.16.0.jar --shell yes
+
+You can use RumbleDB remotely with:
+spark-submit --master yarn rumbledb-1.16.0.jar --shell yes
+
+(Although for clusters provided as a service, --master yarn is often implicit
+and unnecessary).
+
+For remote use (e.g., logged in on the Spark cluster with ssh), you can set the
+number of executors, cores and memory, you can use:
+spark-submit --executor-cores 3 --executor-memory 5G rumbledb-1.16.0.jar --shell yes
+
+For remote use, you can also use other file system paths such as S3, HDFS, etc:
+spark-submit rumbledb-1.16.0.jar --query-path hdfs://server:port/my-query.jq --output-path hdfs://server:port/my-output.json
+
+More documentation on https://www.rumbledb.org/