forked from apacheignite/zeppelin-demo
-
Notifications
You must be signed in to change notification settings - Fork 0
/
readme.txt
100 lines (90 loc) · 3.44 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Install and start Spark
-----------------------
1. Download and unzip Spark: http://www.apache.org/dyn/closer.cgi/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
2. Go to Spark home folder
3. Create env script from template: cp conf/spark-env.sh.template conf/spark-env.sh
4. Edit conf/spark-env.sh and add this line: export SPARK_JAVA_OPTS="-Dspark.executor.memory=8g"
5. Start master process: ./sbin/start-master.sh
6. Open Spark console (http://localhost:8080/) in browser and find master URL there.
7. Start worker process: ./bin/spark-class org.apache.spark.deploy.worker.Worker MASTER_URL
8. Reload console - worker should appear in the list.
Install and start Zeppelin
--------------------------
1. Download and unzip Zeppelin: http://www.us.apache.org/dist/incubator/zeppelin/0.5.6-incubating/zeppelin-0.5.6-incubating-bin-all.tgz
2. Go to Zeppelin home folder
3. Define custom port not to conflict with Spark: export ZEPPELIN_PORT=9090
4. Start daemon: ./bin/zeppelin-daemon.sh start
5. Open in browser: http://localhost:9090/
6. Click on Interpreter on the top navigation bar
7. Locate 'ignite' section and set JDBC URL: jdbc:ignite://localhost:11211/
8. Click 'Save' and then 'restart' buttons to restart the interpreter
Generate data
-------------
To generate the data open JsonGenerator class, customize ORG_CNT and PERSON_PER_ORG_CNT constants
if needed and run the class.
Two JSON files (organizations.json and persons.json) will be created in the root folder of this project.
Execute Spark SQL in Zeppelin
-----------------------------
1. Create new notebook.
2. Create data frames:
%spark
val orgDf = sqlContext.read.json("PATH_TO_THIS_PROJECT/organizations.json")
val personDf = sqlContext.read.json("PATH_TO_THIS_PROJECT/persons.json")
3. Register table names for data frames:
%spark
orgDf.registerTempTable("Organization")
personDf.registerTempTable("Person")
4. Execute queries using ''%sql' prefix (see samples below).
Execute Ignite SQL in Zeppelin
------------------------------
1. Start one or more data nodes using Node class.
2. Run LoadData class to load the data.
3. Create new notebook and execute queries using ''%ignite.ignitesql' prefix (see samples below).
Spark sample queries
--------------------
%sql
SELECT p.name as Employee, m.name as Manager, o.name as Organization
FROM Person p, Person m, Organization o
WHERE p.managerId = m.id
AND p.orgId = o.id
AND o.id = 10
ORDER BY p.name
%sql
SELECT o.name as Organization, avg(p.salary) as Salary
FROM Person p, Organization o
WHERE p.orgId = o.id
AND p.managerId is null
GROUP BY o.name
ORDER BY Salary
LIMIT 100
%sql
SELECT m.name as managerName, o.name as orgName, count(p.name) personCnt
FROM Person p, Person m, Organization o
WHERE p.orgId = o.id
AND p.managerId = m.id
GROUP BY o.name, m.name
LIMIT 100
Ignite sample queries
--------------------
%ignite.ignitesql
SELECT p.name as Employee, m.name as Manager, o.name as Organization
FROM Person p, Person m, "Organizations".Organization o
WHERE p.managerId = m.id
AND p.orgId = o.id
AND o.id = 10
ORDER BY p.name
%ignite.ignitesql
SELECT o.name as Organization, avg(p.salary) as Salary
FROM Person p, "Organizations".Organization o
WHERE p.orgId = o.id
AND p.managerId is null
GROUP BY o.name
ORDER BY Salary
LIMIT 100
%ignite.ignitesql
SELECT m.name as managerName, o.name as orgName, count(p.name) personCnt
FROM Person p, Person m, "Organizations".Organization o
WHERE p.orgId = o.id
AND p.managerId = m.id
GROUP BY o.name, m.name
LIMIT 100