您现在的位置是:首页 > 博文答疑 > Spark development in Windows博文答疑

Spark development in Windows

Zack2017-05-12【8】

简介迈出Spark开发第一步

Cover 4 major items in this doc:

     1.   Simulate Hadoop in Windows. 

2.       How to install Spark to Windows.

3.       How to install Scala-IDE to Windows.

4.       How to package Scala code into jar via SBT and run it on Spark.


 Simulate Hadoop in Windows:

 Download winutils.exe from official web:

 https://sundog-spark.s3.amazonaws.com/winutils.exe



Install Spark to Windows:

1.       Down load from office web:

http://spark.apache.org/downloads.html

 1.jpg.png


2.       Un-zip to folder C:\spark

2.jpg


3.       Set SPARK_HOME and PATH in SYSTEM user variables:

3.png

4.       Create the user PATH:

4.png

5.       Verify the install is successful:

Command in CMD folder C:\spark\bin

‘         spark-shell’

5.png

6.png



Install Scala IDE eclipse to Windows:

1.       Download zip from official web:

http://scala-ide.org/

1494597795240088.png

 

2.       Un-zip it to C:\eclipse

2.png

3.       You should have proper JRE/JDK installed properly. Then open ‘eclipse.exe’.

Choose the workspace new created folder ‘C:\SparkScala’

3.png

4.       Then you can create do ff.

a,  new Scala Project:

4.png

5.png

b,       new create Package under src

1494598217435546.png

7.png       

c, new scala code file

8.png

9.png



Install SBT to your PC:

1.       Download SBT from official web:

Download ZIP or TGZ package and expand it.

 

2.     将下载的包解压到你指定的目录, 比如解压到d:\sbt

 

3.     sbt\bin目录下创建sbtconfig.txt文件

 

4.       Set SBT_HOME and PATH in SYSTEM user variables:

11.png

5.       Create the user PATH:

12.png

6.       First run to download jar packages, which will take a quite long time.

Command: sbt command in the lib d:\sbt

Ctrl + C to stop if any issue.

7.       Creates a jar file using command ‘sbt package’

  1. 写好的scala代码,放到如下的文件结构里:

\test\src\main\scala\SimpleApp.scala

  1. sbt配置文件放到根文件里:

\test\simple.sbt

c. can find      the jar location from log:

D:\sbt\test\target\scala-2.11\simple-project_2.11-1.0.jar



d. run the      jar in Spark:

a)         Copy the jar into Spark bin lib:

C:\spark\bin\simple-project_2.11-1.0 

 

b)         Command in CMD C:\spark\bin\:

‘spark-submit simple-project_2.11-1.0.jar’

 

c)         Show below results:

找到文件中有几个a和几个b

Lines with a: 62. Lines with b: 30