Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon

1.8K Views

August 04, 21

スライド概要

Presentation slide in ApacheCon Asia 2021 (https://www.apachecon.com/acasia2021/index.html) held on August 8, 2021.

profile-image

2023年10月からSpeaker Deckに移行しました。最新情報はこちらをご覧ください。 https://speakerdeck.com/lycorptech_jp

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

(ダウンロード不可)

関連スライド

各ページのテキスト
1.

The picture can't be displayed. Big Data Technical tips for secure Apache Hadoop cluster Akira Ajisaka, Kei Kori Yahoo Japan Corporation

2.

Akira Ajisaka (@ajis_ka) • Software Engineer in Hadoop team @ Yahoo! JAPAN – Upgraded HDFS to 3.3.0 and enabled RBF – R&D for more secure Hadoop cluster than just enabling Kerberos auth • Apache Hadoop committer/PMC – ~800 commits in various components in 6 years – Handled and announced several CVEs – Manages build and QA environment

3.

Kei KORI (@2k0ri) • Data Platform Engineer in Hadoop team @ Yahoo! JAPAN – Built upgrading to and continuous delivery for HDFS 3.3.0 – Research of operation for more secure Hadoop cluster • Kubernetes admin for Hadoop client environment – Migrates users from VM/BM to cloud native way – Integrates ML/DL workloads with Hadoop ecosystem

4.

Session Overview 4

5.

Session Overview Prerequisites: • Hadoop is not secure by default • Kerberos authentication is required This talk is to introduce further details in practice: • Wire encryption in Hadoop ecosystem • HDFS transparent data encryption at rest • Other considerations

6.

Wire encryption in Hadoop ecosystem 6

7.

Background For making Hadoop ecosystem more secure than perimeter security • Not only authenticate but encrypt communications • Protection and mitigation from internal threats like packet sniffing • Part of security compliance like NIST SP800-171

8.

Overview: wire encryption types between components • HTTP encryption – HDFS, YARN, MapReduce, KMS, HttpFS, Spark, Hive, Oozie, Livy • RPC encryption – HDFS, YARN, MapReduce, KMS, Spark, Hive, Oozie, ZooKeeper • Block data transfer encryption – HDFS • Shuffle encryption – MapReduce, Spark, Tez

9.

HTTP encryption for Hadoop • dfs.http.policy: HTTPS_ONLY in hdfs-site, yarn.http.policy: HTTPS_ONLY in yarn-site, mapreduce.jobhistory.http.policy: HTTPS_ONLY in mapred-site etc. – – • yarn.timeline-service.webapp.https.address in yarn-site, mapreduce.jobhistory.webapp.https.address in mapred-site – • Enable TLS on WebUI/REST API endpoints HTTP_AND_HTTPS while rolling update endpoints Set History/Timeline Server endpoints with HTTPS Storing certs and passphrases using Hadoop Credential Provider into hadoop.security.credential.provider.path – – Separates permissions from configs Prevents exposure outside of hadoop.security.sensitive-config-keys filtering

10.

RPC encryption for Hadoop • hadoop.rpc.protection: privacy in core-site – Encrypts RPC incl. Kerberos authentication on SASL layer – Propagates to hadoop.security.saslproperties.resolver.class, dfs.data.transfer.saslproperties.resolver.class and dfs.data.transfer.protection • hadoop.rpc.protection: privacy,authentication while rolling update whole Hadoop servers/clients – Accepts falling back to non-encrypted RPC

11.

Block data transfer encryption for Hadoop • dfs.encrypt.data.transfer: true, dfs.encrypt.data.transfer.cipher.suites: AES/CTR/NoPadding in hdfs-site – Only encrypts payload between HDFS client and DataNodes • Rolling update is not supported within configs – Needs managing list of encrypted nodes or extend/implement own dfs.trustedchannel.resolver.class – Trusted nodes by dfs.trustedchannel.resolver.class are forced to transfer without encryption regardless of its encryption status

12.

Encryption for Spark In spark-defaults: • HTTP encryption – spark.ssl.sparkHistory.enabled true • Switches protocol on 1 port, does not support HTTP_AND_HTTPS – spark.yarn.historyServer.address https://... • RPC encryption – spark.authenticate: true • Also in yarn-site – spark.authenticate.enableSaslEncryption true – spark.network.sasl.serverAlwaysEncrypt true • After all Spark components recognized enableSaslEncryption • Shuffle encryption – spark.network.crypto.enabled true – spark.io.encryption.enabled true • Encrypts spilled caches and RDDs on local disks

13.

Encryption for Hive • • • • hive.server2.thrift.sasl.qop: auth-conf in hive-site – – Encrypts JDBC between client and HiveServer2 binary mode And Thrift between clients and Hive Metastore – – Only for HS2 http mode HS2 binary mode cannot enable both TLS and SASL – Tez: hive.server2.use.SSL: true in hive-site Encryption for JDBC between HS2/Hive Metastore and remote RDBMS Shuffle encryption tez.runtime.shuffle.ssl.enable: true, tez.runtime.shuffle.keep-alive.enabled: true in tez-site – – MapReduce: mapreduce.ssl.enabled: true, mapreduce.shuffle.ssl.enabled: true in mapred-site Requires server certs for all NodeManagers

14.

Challenges in HTTP encryption: for Application Master / Spark Driver • Server certs for ApplicationMaster / SparkDriver need to be readable by the user who submitted it – ApplicationMaster and SparkDriver run as the user – WebApplicationProxy between ResourceManager and ApplicationMaster relies on this encryption • Applications support TLS and can bundle certs since – Spark 3.0.0: SPARK-24621 – MapReduce 3.3.0: MAPREDUCE-4669 – Tez: not supported yet

15.

Encryption for ZooKeeper server • Authenticate with SASL, encrypt with TLS – ZooKeeper doen not respect SASL QOP • Requires ZooKeeper 3.5.6 or above for servers/quorums – serverCnxnFactory=org.apache.zookeeper.server.Nett yServerCnxnFactory – sslQuorum=true – ssl.clientAuth=NONE – ssl.quorum.clientAuth=NONE • Needs ZOOKEEPER-4276 to follow Upgrading existing non-TLS cluster with no downtime – Makes ZK can serve only with secureClientPort

16.

Encryption for ZooKeeper client • Also Requires ZooKeeper 3.5.6 or above for clients -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket= org.apache.zookeeper.ClientCnxnSocketNetty in client JVM args – HADOOP_OPTS environment variable – mapreduce.admin.map.child.java.opts, mapreduce.admin.reduce.child.java.opts in mapred-site for Oozie Coordinator MapReduce jobs • Needs to replace and update ZooKeeper jars in all components which communicate with ZooKeeper – ZKFC, ResourceManager, Hive clients incl. HS2, Oozie and Livy – Apache Curator also be updated to 4.2.0, Netty from 4.0 to 4.1

17.

Enforcing Kerberos AuthN/Z for ZooKeeper • Requires ZooKeeper 3.6.0 or above for servers – 3.6.0+: zookeeper.sessionRequireClientSASLAuth=true – 3.7.0+: enforce.auth.enabled=true enforce.auth.schemes=sasl • Oozie Hive action will not work with forcing ZK SASL – when acquiring the lock for Hive Metastore – Has no mechanisms to delegate authentication or impersonation for ZooKeeper – Using HiveServer2 / Oozie Hive2 action solve it

18.

HDFS transparent data encryption (TDE) at rest 18

19.

Background HDFS blocks are written to local filesystem of the DataNodes • the data is not encrypted by default • encryption is required in several use cases Encryption can be done at several layers: • Application: most secure, but hardest to do • Database: most databases have this, but may incur performance penalties • Filesystem: high performance, transparent, but may not be flexible • Disk: only really protects against physical theft HDFS TDE fits between database and filesystem level

20.

Overview: encryption/decryption is transparent to the clients

21.

KeyProvider: Where KEK is saved Implementations of KeyProvider API • Hadoop KMS: JavaKeyStoreProvider – JCEKS files in Hadoop compatible filesystems (localFS, HDFS, cloud storage) – Not recommended • Apache Ranger KMS: RangerKeyStoreProvider – RDBMS – master key can be stored in Luna HSM (optional) – HSM is required in some use cases • PCI-DSS, FIPS 140-2

22.

Extending KeyProvider API is not difficult • Mandatory methods for HDFS TDE • Optional methods (nice to have for operation) • Use cases: – getKeyVersion, getCurrentKey, getMetadata – getKeys, getKeysMetadata, getKeyVersions, createKey, deleteKey, rollNewVersion – If not implemented, you need to create/delete/list/roll keys in some way – LinkedIn integrated with its own key management service, LiKMS https://engineering.linkedin.com/blog/2021/the-exabyte-club-linkedin-s-journey-of-scaling-the-hadoop-distr – Yahoo! JAPAN also integrated with our own credential store by only ~500 LOC (including test code)

23.

KeyProvider is actually stable, can be used safely • KeyProvider is @Public and @Unstable – @Unstable in Hadoop means "incompatible changes are allowed at any time" • Actually, the API is very stable – No incompatible changes – Ranger uses it since 2015: RANGER-247 • Provided a patch to mark it stable – HADOOP-17544

24.
[beta]
Hadoop KMS: Where KEK is
cached and performs
authorization
•
•

KMS interacts with HDFS clients, NameNodes, and KeyProvider
KMS have its own ACLs separated from HDFS ACLs
– An attacker cannot decrypt data even if HDFS ACLs are compromised
– If 'usera' reads/writes data in the encryption zone with 'keya', the
configuration in kms-acls.xml will be:
<property>
<name>key.acl.keya.DECRYPT_EEK</name>
<value>usera</value>
</property>

– The configuration is hot-reloaded

•

For HA and scalability, multiple KMS instances are supported

25.

How to deploy multiple KMS instances Two Approaches: 1. Behind a load-balancer or VIP 2. Using LoadBalancingKMSClientProvider – Implicitly used when multiple URIs are specified in hadoop.security.key.provider.path If you have a LB or VIP, use it • No configuration change to scale-out/decommission • LB saves clients' retry cost – LoadBalancingKMSClientProvider first try to connect to a KMS, if fails, then connect to another KMS

26.

How to configure multiple KMS instances • Delegation Token must be synchronized – Use ZKDelegationTokenSecretManager – Documented an example configuration: HADOOP-17794 • hadoop.security.token.service.use_ip – If true (default), fails to validate SSL certificates in multihomed environment – Documented: HADOOP-12665

27.

Tuning Hadoop KMS • Documented and discussed in HADOOP-15743 – – – – Reduce SSL session cache size and TTL Tuning https idle timeout Increase max file descriptors etc. • This tuning is effective in HttpFS as well – Both KMS/HttpFS use Jetty via HttpServer2

28.

Recap: HDFS TDE • Careful configuration required – – – – – How to save KEK Running multiple KMS instances KMS Tuning Where to create encryption zones ACLs (including key ACLs and impersonation) • They are not straightforward despite the long time since the feature was developed

29.

Other considerations 29

30.

Updating SSL certificates • Hadoop >= 3.3.1 allows updating SSL certificates without downtime: HADOOP-16524 – Use hot-reload feature in Jetty – Except DataNode since DN don't rely on Jetty • Useful especially for NameNode because it takes > 30 minutes to restart in large cluster

31.

Other considerations • It is important to be ready to upgrade at any time – Sometimes CVEs have been published and the vendors warn users to upgrade • Security requirements may increase later, so be prepared for that early • Operational considerations are also necessary – Not only the cluster configuration but also the operations will be change

32.

Conclusion & Future work We introduced many technical tips for secure Hadoop cluster • • However, they might change in the future Need to catch up with the OSS community Future work • • How to enable SSL/TLS in ApplicationMaster & Spark Driver Web UIs Impersonation does not work correctly in KMSClientProvider: HDFS-13697

33.

THANK YOU QUESTIONS? @aajisaka @2k0ri