顯示具有 spark 標籤的文章。 顯示所有文章
顯示具有 spark 標籤的文章。 顯示所有文章

2016年4月20日 星期三

spark 1.6.1 Standalone Mode install 最簡易部屬


spark 1.6.1  Standalone Mode install 

我們準備兩台 ubuntu 14.04的機器,並在上面建置 spark 1.6.1 Standalone Mode,最簡易部屬,假設第一台電腦 ip為192.168.1.3,第二台電腦 ip為192.168.1.4

1.從官網下載spark 1.6.1 ,package type 選擇: Pre-built for Hadoop 2.6 and later, 點 spark-1.6.1-bin-hadoop2.6.tgz  下載 並壓縮。

$ wget http://apache.stu.edu.tw/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz 
$ tar zxvf spark-1.6.1-bin-hadoop2.6.tgz

2. 在每台電腦安裝 java


$ sudo apt-get install openjdk-7-jdk



3. 在每台電腦家目錄 編輯 .bashrc,於最後檔案最後加上java路徑。
$vi  ~/.bashrc


....
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH

檢測是否成功:


$java -version

4.在每台電腦 編輯 


$ vi /etc/hosts  


192.168.1.3     ubuntu1
192.168.1.4     ubuntu2

5.建立加密連線不需要密碼
$ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$scp -r  ubuntu@192.168.1.3:~/.ssh ~


6.將ssh連線時詢問關掉。
$sudo vi /etc/ssh/ssh_config


 StrictHostKeyChecking no
為了關掉下面訊息。
ECDSA key fingerprint is 7e:21:58:85:c1:bb:4b:20:c8:60:7f:89:7f:3e:8f:15.
Are you sure you want to continue connecting (yes/no)?

7.設定 spark-env.sh
$ cd  /spark-1.6.1-bin-hadoop2.6/conf
$ mv spark-env.sh.template  spark-env.sh
$ vi spark-env.sh


.....
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
.....
7.設定 slaves
$mv slaves.template  slaves
$ vi slaves



#localhost
 ubuntu1
 ubuntu2


8.在spark目錄下 啟動Master 跟 slave
$sbin/start-all.sh


9.觀看是否已經啟動。


http://192.168.1.3:8080/





10.執行測試程式



$./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://ubuntu1:7077 \
  ./lib/spark-examples-1.6.1-hadoop2.6.0.jar \
  10


以下為用相同過程建置出的環境:


 ✘ paslab@fastnet  ~/zino/spark-1.5.1-bin-hadoop2.6  ./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://fastnet:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  ./lib/spark-examples-1.5.1-hadoop2.6.0.jar \
  10
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/04/22 15:57:06 INFO SparkContext: Running Spark version 1.5.1
16/04/22 15:57:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/22 15:57:06 INFO SecurityManager: Changing view acls to: paslab
16/04/22 15:57:06 INFO SecurityManager: Changing modify acls to: paslab
16/04/22 15:57:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(paslab); users with modify permissions: Set(paslab)
16/04/22 15:57:06 INFO Slf4jLogger: Slf4jLogger started
16/04/22 15:57:06 INFO Remoting: Starting remoting
16/04/22 15:57:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.2:45824]
16/04/22 15:57:06 INFO Utils: Successfully started service 'sparkDriver' on port 45824.
16/04/22 15:57:06 INFO SparkEnv: Registering MapOutputTracker
16/04/22 15:57:06 INFO SparkEnv: Registering BlockManagerMaster
16/04/22 15:57:06 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-654fc5a5-2c10-4cf9-90d0-aaa46dd6259d
16/04/22 15:57:07 INFO MemoryStore: MemoryStore started with capacity 530.3 MB
16/04/22 15:57:07 INFO HttpFileServer: HTTP File server directory is /tmp/spark-aea15ac1-2dfa-445a-aef4-1859becb1ee6/httpd-73c0986a-57f6-4b0b-9fc4-9f84366393c9
16/04/22 15:57:07 INFO HttpServer: Starting HTTP Server
16/04/22 15:57:07 INFO Utils: Successfully started service 'HTTP file server' on port 35012.
16/04/22 15:57:07 INFO SparkEnv: Registering OutputCommitCoordinator
16/04/22 15:57:07 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/04/22 15:57:07 INFO SparkUI: Started SparkUI at http://192.168.1.2:4040
16/04/22 15:57:07 INFO SparkContext: Added JAR file:/home/paslab/zino/spark-1.5.1-bin-hadoop2.6/./lib/spark-examples-1.5.1-hadoop2.6.0.jar at http://192.168.1.2:35012/jars/spark-examples-1.5.1-hadoop2.6.0.jar with timestamp 1461311827975
16/04/22 15:57:08 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/04/22 15:57:08 INFO AppClient$ClientEndpoint: Connecting to master spark://fastnet:7077...
16/04/22 15:57:08 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160422155708-0000
16/04/22 15:57:08 INFO AppClient$ClientEndpoint: Executor added: app-20160422155708-0000/0 on worker-20160422155620-192.168.1.2-43391 (192.168.1.2:43391) with 12 cores
16/04/22 15:57:08 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160422155708-0000/0 on hostPort 192.168.1.2:43391 with 12 cores, 20.0 GB RAM
16/04/22 15:57:08 INFO AppClient$ClientEndpoint: Executor added: app-20160422155708-0000/1 on worker-20160422155623-192.168.1.3-47961 (192.168.1.3:47961) with 12 cores
16/04/22 15:57:08 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160422155708-0000/1 on hostPort 192.168.1.3:47961 with 12 cores, 20.0 GB RAM
16/04/22 15:57:08 INFO AppClient$ClientEndpoint: Executor updated: app-20160422155708-0000/0 is now RUNNING
16/04/22 15:57:08 INFO AppClient$ClientEndpoint: Executor updated: app-20160422155708-0000/1 is now RUNNING
16/04/22 15:57:08 INFO AppClient$ClientEndpoint: Executor updated: app-20160422155708-0000/0 is now LOADING
16/04/22 15:57:08 INFO AppClient$ClientEndpoint: Executor updated: app-20160422155708-0000/1 is now LOADING
16/04/22 15:57:08 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46316.
16/04/22 15:57:08 INFO NettyBlockTransferService: Server created on 46316
16/04/22 15:57:08 INFO BlockManagerMaster: Trying to register BlockManager
16/04/22 15:57:08 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.2:46316 with 530.3 MB RAM, BlockManagerId(driver, 192.168.1.2, 46316)
16/04/22 15:57:08 INFO BlockManagerMaster: Registered BlockManager
16/04/22 15:57:08 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/04/22 15:57:08 INFO SparkContext: Starting job: reduce at SparkPi.scala:36
16/04/22 15:57:08 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:36) with 10 output partitions
16/04/22 15:57:08 INFO DAGScheduler: Final stage: ResultStage 0(reduce at SparkPi.scala:36)
16/04/22 15:57:08 INFO DAGScheduler: Parents of final stage: List()
16/04/22 15:57:08 INFO DAGScheduler: Missing parents: List()
16/04/22 15:57:08 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing parents
16/04/22 15:57:08 INFO MemoryStore: ensureFreeSpace(1888) called with curMem=0, maxMem=556038881
16/04/22 15:57:08 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1888.0 B, free 530.3 MB)
16/04/22 15:57:08 INFO MemoryStore: ensureFreeSpace(1202) called with curMem=1888, maxMem=556038881
16/04/22 15:57:08 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1202.0 B, free 530.3 MB)
16/04/22 15:57:08 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.2:46316 (size: 1202.0 B, free: 530.3 MB)
16/04/22 15:57:08 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:861
16/04/22 15:57:08 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32)
16/04/22 15:57:08 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
16/04/22 15:57:10 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@192.168.1.2:50829/user/Executor#1430081976]) with ID 0
16/04/22 15:57:10 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, 192.168.1.2, PROCESS_LOCAL, 2161 bytes)
16/04/22 15:57:10 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@192.168.1.3:45907/user/Executor#-348604070]) with ID 1
16/04/22 15:57:10 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.2:35234 with 10.4 GB RAM, BlockManagerId(0, 192.168.1.2, 35234)
16/04/22 15:57:10 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.3:36667 with 10.4 GB RAM, BlockManagerId(1, 192.168.1.3, 36667)
16/04/22 15:57:11 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.2:35234 (size: 1202.0 B, free: 10.4 GB)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 1057 ms on 192.168.1.2 (1/10)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 1060 ms on 192.168.1.2 (2/10)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 1120 ms on 192.168.1.2 (3/10)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 1124 ms on 192.168.1.2 (4/10)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1161 ms on 192.168.1.2 (5/10)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1146 ms on 192.168.1.2 (6/10)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 1142 ms on 192.168.1.2 (7/10)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 1149 ms on 192.168.1.2 (8/10)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 1148 ms on 192.168.1.2 (9/10)
16/04/22 15:57:11 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 1149 ms on 192.168.1.2 (10/10)
16/04/22 15:57:11 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/04/22 15:57:11 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) finished in 2.419 s
16/04/22 15:57:11 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 2.622159 s
Pi is roughly 3.144648
16/04/22 15:57:11 INFO SparkUI: Stopped Spark web UI at http://192.168.1.2:4040
16/04/22 15:57:11 INFO DAGScheduler: Stopping DAGScheduler
16/04/22 15:57:11 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/04/22 15:57:11 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/04/22 15:57:11 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/04/22 15:57:11 INFO MemoryStore: MemoryStore cleared
16/04/22 15:57:11 INFO BlockManager: BlockManager stopped
16/04/22 15:57:11 INFO BlockManagerMaster: BlockManagerMaster stopped
16/04/22 15:57:11 INFO SparkContext: Successfully stopped SparkContext
16/04/22 15:57:11 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/04/22 15:57:11 INFO ShutdownHookManager: Shutdown hook called
16/04/22 15:57:11 INFO ShutdownHookManager: Deleting directory /tmp/spark-aea15ac1-2dfa-445a-aef4-1859becb1ee6