Apache Hadoop Stack: MapReduce, Pig, Spark, Hive

Sunday, Jun 19, 2022

Apache Hadoop Stack: MapReduce, Pig, Spark, Hive 1. HDFS CLI: load input data list all directories hadoop fs -ls make a new directory to store movie data hadoop fs -mkdir movieData copy data file from local to hdfs hadoop fs -copyFromLocal u.data movieData/u.data 2. MapReduce with Python: movies sorted by rating counts script from mrjob.job import MRJob from mrjob.step import MRStep class RatingsBreakdown(MRJob): def steps(self): return [ MRStep(mapper=self.mapper_get_movie, combiner=self.combiner_count_ratings, reducer=self.reducer_count_ratings), MRStep(reducer=self.
@ rushi
6 minutes read

关于我

g1eny0ung 的 ❤️ 博客

记录一些 🌈 生活上,技术上的事

一名大四学生

马上(已经)毕业于 🏫 大连东软信息学院

职业是前端工程师

业余时间会做开源和 Apple App (OSX & iOS)

主要的技术栈是:

  • JavaScript & TypeScript
  • React.js
  • Electron
  • Rust

写着玩(写过):

  • Java & Clojure & CLJS
  • OCaml & Reason & ReScript
  • Dart & Swift

目前在 PingCAP 工作

– 2020 年 09 月 09 日更新

其他

如果你喜欢我的开源项目或者它们可以给你带来帮助,可以赏一杯咖啡 ☕ 给我。~

If you like my open source projects or they can help you. You can buy me a coffee ☕.~

PayPal

https://paypal.me/g1eny0ung

Patreon:

Become a Patron!

微信赞赏码

wechat

最好附加一下信息或者留言,方便我可以将捐助记录 📝 下来,十分感谢 🙏。

It is better to attach some information or leave a message so that I can record the donation 📝, thank you very much 🙏.