{"id":1536,"date":"2021-01-19T14:50:37","date_gmt":"2021-01-19T07:50:37","guid":{"rendered":"https:\/\/vticloud.io\/?post_type=case_study&#038;p=1536"},"modified":"2021-03-05T15:31:57","modified_gmt":"2021-03-05T08:31:57","slug":"lyft-chi-phi-giam-toi-77-voi-amazon-ec2-spot-instances","status":"publish","type":"case_study","link":"https:\/\/vticloud.io\/en\/case_study\/lyft-chi-phi-giam-toi-77-voi-amazon-ec2-spot-instances\/","title":{"rendered":"Lyft Increases Simulation Capacity, Lowers Costs Using Amazon EC2 Spot Instances"},"content":{"rendered":"<p><\/p>\n<h2>About Lyft Level 5<\/h2>\n<p><a href=\"https:\/\/www.lyft.com\/\">Lyft<\/a>, one of the largest transportation networks in the United States and Canada, is on a mission: improve people\u2019s lives with the world\u2019s best transportation. Along with its focus on shared rides, bike-share systems, electric scooters, and public transit partnerships, Lyft launched its <a href=\"https:\/\/self-driving.lyft.com\/level5\/\">Level 5<\/a>\u00a0autonomous vehicle (AV) division in 2017 as part of its effort to achieve this mission. Using petabytes of data gathered from its AV fleet, Lyft\u2019s engineers run millions of simulations each year to improve the performance and safety of its self-driving system.<\/p>\n<p>But those simulations are compute-intensive, and Lyft knew it would need massive computing power that could scale up and down at an affordable price. The company, which has been using Amazon Web Services (AWS) for its rideshare platform since the day it launched in 2012, turned to AWS again to boost its compute capacity and lower costs, ultimately choosing a combination of <a href=\"https:\/\/aws.amazon.com\/ec2\/\">Amazon Elastic Compute Cloud<\/a>\u00a0(Amazon EC2)\u00a0<a href=\"https:\/\/aws.amazon.com\/ec2\/spot\/\">Spot Instances<\/a>\u00a0and\u00a0<a href=\"https:\/\/aws.amazon.com\/eks\/\">Amazon Elastic Kubernetes Service<\/a>\u00a0(Amazon EKS) for its AV simulation workload.<\/p>\n<h2 id=\"Running_Simulations_on_Amazon_EC2_Spot_Instances\" class=\"lb-txt-none lb-h3 lb-title\">Running Simulations on Amazon EC2 Spot Instances<\/h2>\n<div class=\"lb-rtxt\">\n<p>Running simulations on thousands of graphics processing units (GPUs) in parallel is critical to Level 5\u2019s success in testing and improving how AVs respond to various driving situations. \u201cSimulation is one of the key ways we improve the safety of our software before it goes anywhere\u2014even a test track,\u201d says Timothy Perrett, senior staff engineer at Lyft Level 5. Exploring the simulation space (such as varying the speed, position, or vehicle dynamics) requires repeated testing and thus a lot of computing flexibility.<\/p>\n<p>Early on, it was clear that Level 5 would have very different computing needs than Lyft\u2019s rideshare business. \u201cLevel 5 has different needs and constraints,\u201d says Perrett. \u201cMost of our computing needs are in servicing large, batch-style workloads that have a very spiky profile. We need the ability to burst up to high peak loads and then quickly turn everything down when we\u2019re not using it.\u201d<\/p>\n<p>Lyft could have invested in on-premises central processing units and GPUs, but the Lyft team\u2019s prior experience on AWS made the AWS Cloud its first choice. So the testing began. Level 5 engineers started by utilizing capacity from\u00a0<a href=\"https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/ec2-on-demand-instances.html\">Amazon EC2 On-Demand Instances<\/a>, in conjunction with Amazon EKS, the fully managed Kubernetes service that AWS offers.<\/p>\n<p>After experimenting with running simulations using On-Demand Instances, Lyft\u2019s Level 5 team quickly realized it could improve efficiency and reduce costs by shifting to Amazon EC2 Spot Instances. Now more than 90 percent of the simulations run on Amazon EC2 Spot Instances, including Amazon EC2 P3 Instances powered by NVIDIA V100 Tensor Core GPUs, and that enables Lyft to take advantage of unused Amazon EC2 capacity in the AWS Cloud at up to a 70 percent discount compared to On-Demand pricing. \u201cWhen we experimented with running on Amazon EC2 Spot Instances, we realized that as our program was growing quickly, there was an opportunity to significantly reduce our operational costs,\u201d says Perrett.<\/p>\n<\/div>\n<h2 id=\"Enabling_Simulations_to_Run_Efficiently_\" class=\"lb-txt-none lb-h3 lb-title\">Enabling Simulations to Run Efficiently<\/h2>\n<div class=\"lb-rtxt\">\n<p>The Level 5 team distributes its simulation workload in what Perrett calls a \u201cclever dance\u201d to ensure that simulations still run even when Amazon EC2 Spot Instances aren\u2019t available because of high demand. Engineering staff observed which clusters\u2014and pools within those clusters\u2014operated efficiently and took into account regional zone usage. \u201cWe became smarter about how we allocate work and how we relocate jobs in a given resource pool on a given day,\u201d Perrett notes. The team used Amazon EKS to prioritize and scale resource pools so jobs were efficiently using instances.<\/p>\n<p>The engineering team was also careful to design systems so that simulations would function on a variety of hardware, depending on what was available\u2014something Lyft calls fleet diversity. Perrett explains, \u201cWe put a lot of effort into making our stack work on whichever type of instance is available\u2014Amazon EC2 P3 Instances versus the Amazon EC2 P2 Instances, for example.\u201d This flexibility helps Level 5 engineers avoid having to wait to schedule simulations, even when demand is high.<\/p>\n<p>Lyft also has to manage a massive amount of data gathered from simulations and from its AV fleet, and it takes advantage of\u00a0<a href=\"https:\/\/aws.amazon.com\/s3\/\">Amazon Simple Storage Service<\/a>\u00a0(Amazon S3) to store and access an ever-expanding dataset as Lyft increases the number of sensors on its test vehicles. Gathering and storing all that information from its AVs and simulations amount to petabytes of data, and transferring that amount of data directly to the cloud, as the Level 5 team did in the early days, was costly. To reduce that cost, Lyft uses\u00a0<a href=\"https:\/\/aws.amazon.com\/directconnect\/\">AWS Direct Connect<\/a>, a dedicated network connection between its Level 5 engineering center and its cloud systems. \u201cWe have a very high-capacity network that connects to the places where we operate our AV fleet,\u201d Perrett notes. \u201cAnd then we upload the data for a much lower cost per petabyte.\u201d<\/p>\n<p>By carefully partitioning and directing its simulation traffic on Amazon EC2 Spot Instances, Lyft\u2019s Level 5 engineering team reduced the cost of simulations to just pennies for each execution. \u201cAbout 77 percent of our computing fleet across all Level 5 workloads\u2014and over 90 percent of our AV simulation workload\u2014is now on Amazon EC2 Spot Instances, and the cost savings overall has been around two-thirds,\u201d says Perrett. \u201cWe were able to scale up our computing capacity significantly while reducing the overall cost of operation.\u201d<\/p>\n<\/div>\n<div class=\"lb-rtxt\">\n<h2 id=\"Benefits_of_AWS\" class=\"lb-txt-normal lb-txt-none lb-h4 lb-title\">Benefits of AWS<\/h2>\n<div class=\"lb-rtxt\">\n<p>\u25cf Reduced compute costs by two-thirds<br \/>\n\u25cf Scaled up computing capacity significantly<br \/>\n\u25cf Increased velocity of development for AVs<\/p>\n<h2 id=\"AWS_Services_Used\" class=\"lb-txt-none lb-h4 lb-title\">AWS Services Used<\/h2>\n<div class=\"lb-rtxt\">\n<p><strong>Amazon Elastic Compute Cloud (Amazon EC2)<\/strong> is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.<\/p>\n<p><strong>Amazon EC2 Spot Instances<\/strong> let you take advantage of unused EC2 capacity in the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices. You can use Spot Instances for various stateless, fault-tolerant, or flexible applications.<\/p>\n<p><strong>Amazon EKS<\/strong> is a fully-managed Kubernetes service. EKS runs upstream Kubernetes and is certified Kubernetes conformant so you can leverage all benefits of open source tooling from the community.<\/p>\n<h2>Conclusion<\/h2>\n<p>Running millions of simulations at steep cost savings on AWS allows Lyft\u2019s engineering team to run its tests from inside its offices, enabling staff to gain confidence in software changes prior to taking physical vehicles out in the real world. \u201cSimulations are a more cost-effective means of validating software changes compared to taking a vehicle to the test track,\u201d Perrett says. \u201cThis improves iteration time for engineering staff and helps improve safety and software quality on a shorter time horizon.\u201d<\/p>\n<p>Instead of using On-demand Instances, using Spot Instances is one of the suitable solutions to help businesses optimize operating costs while ensuring business operations. If you are having problems related to EC2 cost, try to use Spot Instances, your business will definitely save a lot of money.<\/p>\n<h2 id=\"V\u1ec1-VTI-Cloud\" data-renderer-start-pos=\"9193\">About VTI Cloud<\/h2>\n<p data-renderer-start-pos=\"2984\"><strong>VTI Cloud<\/strong>\u00a0is the\u00a0<a href=\"https:\/\/vticloud.io\/en\/news_events\/vti-cloud-chinh-thuc-tro-thanh-advanced-consulting-partner-cua-aws\/\"><strong>Advanced Consulting Partner<\/strong><\/a>\u00a0of AWS in Vietnam, with a team of more than 50+ AWS certified solution engineers. With the desire to support customers in their digital transformation journey and moving to the AWS cloud, VTI Cloud is proud to be a pioneer in solution consulting, software development, and deployment of AWS infrastructure for customers in\u00a0<strong>Vietnam<\/strong>\u00a0and\u00a0<strong>Japan<\/strong>.<\/p>\n<p data-renderer-start-pos=\"3281\">Building secure, high-performance, flexible, and cost-optimized architectures for customers is\u00a0<strong>VTI Cloud<\/strong>\u2018s primary mission in the mission of enterprise technology.<\/p>\n<p data-renderer-start-pos=\"3281\">Reference: <a class=\"sc-kkGfuU keHSaj\" title=\"https:\/\/aws.amazon.com\/vi\/solutions\/case-studies\/Lyft-level-5-spot\/?nc1=f_ls\" href=\"https:\/\/aws.amazon.com\/vi\/solutions\/case-studies\/Lyft-level-5-spot\/?nc1=f_ls\" data-renderer-mark=\"true\"><u data-renderer-mark=\"true\">https:\/\/aws.amazon.com\/vi\/solutions\/case-studies\/Lyft-level-5-spot\/?nc1=f_ls<\/u><\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><\/p>","protected":false},"excerpt":{"rendered":"<p>About 77% of our computing fleet is now on Amazon EC2 Spot Instances. We were able to scale up our computing capacity significantly while reducing the overall cost of operation.<\/p>\n","protected":false},"featured_media":1538,"template":"","tags":[42,54,62,66],"_links":{"self":[{"href":"https:\/\/vticloud.io\/en\/wp-json\/wp\/v2\/case_study\/1536"}],"collection":[{"href":"https:\/\/vticloud.io\/en\/wp-json\/wp\/v2\/case_study"}],"about":[{"href":"https:\/\/vticloud.io\/en\/wp-json\/wp\/v2\/types\/case_study"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/vticloud.io\/en\/wp-json\/wp\/v2\/media\/1538"}],"wp:attachment":[{"href":"https:\/\/vticloud.io\/en\/wp-json\/wp\/v2\/media?parent=1536"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vticloud.io\/en\/wp-json\/wp\/v2\/tags?post=1536"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}