Yahoo! JAPANが、どのようにCloud Foundryを使っているのか?


The Road to "JYU-BAI" - Adopting Cloud Foundry at Yahoo! JAPAN - 2017年6月20日


About me Yasuhiko Kubono Software Engineer Manager Yahoo! JAPAN


• Introducing Cloud Foundry into our services -Yasuhiko Kubono • How do we Actually Operate -Yusuke Kondo


Introducing Cloud Foundry into our services


Agenda • • • • About Yahoo! JAPAN Why we use Cloud Foundry? Introducing Cloud Foundry into our services Case study


About Yahoo! JAPAN


Outline Yahoo Japan Corporation (SoftBank Group) Businesses: Internet Advertising - e-Commerce - Member services - Others Headquarters: Tokyo Japan Founded: January 31, 1996 # of Employees: 5,826 (As of March 31, 2017)


# of Engineers & Designers 2,500


Web Services More than 100


Total requests 39,89M 67,4B Active User IDs Page Views 1Month ※2017年1-3月の平均


Why we use Cloud Foundry?


Why we use Cloud Foundry? Speed up development time JYU-BAI increase productivity by 10 times


Adoption Plan Here 2017 Full-scale implementation 2018 Expand implementation 2016 Initial introduction to a few services


Introducing Cloud Foundry into our Services


Programming Languages C, Perl, C++, PHP, Node.js, Java...


Architecture Differs by Web Service Small-scale web services Large-scale web services API Gate way e.g. Travel tips Search logic list logic cart logic e.g. Yahoo! Auction


Obstacles Same architecture does not fit in each web services


Solutions Enroll CF Coach in each web services around 20 staff / 15 services Coaches role: Promote cloud design methods that suit for each web services


Role map Core Team CF Coach For Shopping Shopping engineer CF Coach For Auction Shopping engineer Auction engineer CF Coach For Media Media engineer …


Case study


Where we started from


List Necessary Functions Service A Service B MySQL Oracle Service C Service D Service E ● ● ● ● Service F Service G ● ● ● KVS ObjectStrage ● C/C++ PHP ● ● ● Node.js ● Java ● advertisement ● beacon ● ● ● ● ● ● ● ● ● ● ● ● ●


Challenges we encountered Functions that can't be used in the cloud because of complicated dependency Internal security polices are not suited for cloud environment Most of our web services were stateful design


How we started We selected one web service, and started by preparing the necessary functions for that service Resolve issues each time they occur


So, which web service did we start with?


Criteria for the web service 1. Simplicity • Service with limited functions and external PF that can be used 2. Actively developed • Web services that actively developed so that the effectiveness of introducing CF can be measured


First target : CS tool n Characteristics • • • • Language: PHP Framework: cakephp Uses REST API MySQL HTTPS (our auction service) n Server Configuration • • • ATS ATS CS tool (apache) CS tool (apache) HTTP API Constructed with few servers in OpenStack environment WebServer: apache Apache Traffic Server (ATS) : Reverse Proxy MySQL MySQL


Partial release using ATS (entry points) Partially diverted entry point to CF apps using ATS: • So that CF or OpenStack can be switched in entry points HTTPS ATS ATS Some entry points CS tool (apache) CS tool (apache) HTTP API CS tool (CF)


Lessons learned from the first target • How to Implement in Production • Development method based on OSS • How to make service stateless on CF


Adopting & Expanding to other services Decide target Knowhow accumulated Solve issues Investigate issues Adopting knowledge


Next Presentation, How do we Actually Operate Photo by: Aflo


Hello CF Summit 2017! Yusuke Kondo or @konfoo Responsible to... • operating Cloud Foundry & Concourse on IaaS • increasing engineersʼ productivity by providing tools and best practices around CI/CD


Overview of Yahoo! JAPAN proprietary Infrastructure More than four DCs in Japan More than 90,000 VM running on OpenStack


Cluster Spec dev production Load Balancer Software Hardware x2 IaaS Openstack Openstack Hypervisor # 40 40


Current Status (As of Jun. 9, 2017) dev production Cluster # 1 1 Cell # per Cluster 40 30 Org # 136 38 App Instance # approx. 2,000 approx. 400 Rps at peak time N/A approx. 2,000


Future Plan (As of Jun. 9, 2017) dev production Cluster # 1 1 => 6 Cell # per Cluster 40 30 => 100 Org # 136 38 App Instance # approx. 2,000 approx. 400 Rps at peak time N/A approx. 2,000


Integration with Backend Services


Existent Platforms Cache Service MQ Service FaaS App Role Based ACL RDB Key Value Store Object Storag e


Integration with Existent Platforms • Cookie off-loading Route Service • On-demand MySQL (OpenStack Trove API) • Distributed pubsub service (Pulsar)


Marketplace Dashboard Goal: Providing all PFs in CF Marketplace


Issues we faced Platform ACL is based on IP address or hostname => Requesting for exceptional permission for accessing via IP Range with a limited term. => Migrating from Host-based ACL to Role-Based ACL in the long term


Integration with Logs and Metrics


What we already have In-house Monitoring & Alerting PF based on Apache Kafka, Hbase, Splunk, an enterprise log analytics platform


User-side Logs and Metrics PCF Cluster-1 VM App App VM Splunk App App App Loggregat or NoVMaction isVMneeded for App developers APP APP APP APP APP APP Monitoring PF


What we prepared Firehose Nozzle and Relay Server • Nozzle filters and formats the App logs streamed by Firehose • Relay Server forwards the log stream to specific index


Issue we faced High log traffic. 900 lines per sec ! (as of Jun. 8, 2017) => Provided users with CF friendly logger


Operator-side logs and Metrics Splunk • Platform logs such as CF component syslog Prometheus • • Bosh metrics, VM metrics, Firehose metrics Emitting alerts to our smartphone


49 Copyright © 2017 Yahoo Japan Corporation. All Rights Reserved.


Integration with other System


Integration with package monitoring tool Vulnerable Package Monitoring Tool Application Source Code Dependent packages Runtime Buildpack version Track the buildpack version which the App are staged with and report outdated apps.


Integration with package monitoring tool Application Source Code Dependent packages Scan whole source code Scan package version Runtime


Integration with Concourse We use Concourse for • deploying new Cloud Foundry release • updating buildpacks • syncing employee accounts with UAA • backup database to object storage • ...


Lessons learned


We are still on the way to change mind Changing your organization mind is the most essential part. • Educate not only users, but also platform division where you belong. • Work closely with your security paranoid team. Involve them to update the policy


Copyright © 2017 Yahoo Japan Corporation. All Rights Reserved.