您现在的位置是:首页 > 博文答疑 > Intellij IDEA搭建hadoop开发环境后系列问题博文答疑
Intellij IDEA搭建hadoop开发环境后系列问题
Leo2018-01-31【8】
简介Intellij IDEA搭建hadoop开发环境后运行WordCount系列问题
准备工作
1,安装JDK 1.8并配置(配置环境变量)
2,下载和安装IntelliJ IDEA(无需配置环境变量)
3,下载hadoop并且解压缩
实验:
1, 创建maven项目:
提醒:SDK如果没有自动搜索到,可以点‘New’然后自己指定JDK的安装路径
2, 在路径src下任意folder创建java class文件WordCount,我这里用src.main.java,
3,copy 如下code到WordCount
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import static sun.misc.Version.println; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); String test; test = args[0]; System.out.println(test); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } |
4, 不用修改pom.xml的dependanc.这是很多错误。我们可以把hadoop相关的include进来,
点击file->project structure
如上图添加新的Jars or folder,去hadoop解压路径分别选择和添加如下5个文件夹:
6, 配置 Run - > Edit Configurations, 新建Application 设置Mainclass: WordCount Program Argument : input/ output/
7, 尝试run下:遇到如下系列错误。
错误1:HADOOP_HOME and hadoop.home.dir are unset
解决:没有设置HADOOP_HOME,所以给hadoop配置环境变量
<img width="568" height="105" title="1517386176939361.png" style="width: 568px; height: 105px;" alt="图片.png" src="/upload/58/47/p>
HADOOP_HOME: hadopp解压路径
path: 加入 %HADOOP_HOME%\bin
错误2:Exception in thread "main" java.lang.NullPointerException atjava.lang.ProcessBuilder.start(Unknown Source)
解决:在Hadoop2后官方包里就没有winutils.exe 文件,让windows模拟hadoop环境来进行测试,下载一个放到hadopp解压缩路径的bin目录下面。
具体还想了解的可以访问https://wiki.apache.org/hadoop/WindowsProblems
下载的话可以直接去github上下载自己所需要的版本:https://github.com/steveloughran/winutils
错误3:Error: Exception in thread "main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
解决:C:\Windows\System32下缺少hadoop.dll,把这个文件拷贝到C:\Windows\System32下面即可。这个文件可以在“问题2”的github链接上找到下载。