Solving the Most Complex Business Challenges with Deep Reinforcement Learning

By Surya Prabha Vadlamani, Vice President, Enterprise AI Solutions and Cognitive Engineering

Recently automakers such as Tesla and Toyota made news headlines by announcing the expansion of their driving technology. Tesla said it is expanding the roll-out of its Model Y and Model 3 vehicles without radar to customers in Europe and the Middle East. Toyota's Woven Planet Holdings announced that it is taking a new approach to advancing its self-driving technology without expensive vehicle sensors such as lidar. These developments have drawn more attention to Deep Reinforcement Learning, a subset of artificial intelligence (AI) that helps businesses such as automakers achieve breakthroughs with products and solutions. 

What Is Deep Reinforcement Learning?

Deep Reinforcement Learning is an advanced type of machine learning in which an AI application solves very complex problems, usually to achieve a goal. Let's break down the definition:

Machine learning is a field of AI in which a machine teaches itself complex tasks with minimal programming needed. In doing so, the machine mimics a human brain because it can figure out how to solve problems independently without being explicitly programmed to do so. 

Deep Reinforcement Learning takes machine learning to another level of performance. With Deep Reinforcement Learning, a machine can become more predictive. A machine learns from its mistakes, corrects them, and achieves complex goals such as winning a game or figuring out the fastest route for a self-driving car to travel amid constantly changing variables such as traffic patterns and weather conditions. 

Deep Reinforcement Learning experienced a breakthrough in 2015 when AlphaGo became the first computer program to defeat a professional 'Go' player – and a world champion at that.

DeepMind developed AlphaGo - a computer program designed specifically to play 'Go'. To defeat Lee Sedol, Go (human) 18-time world champion, AlphaGo's tree search evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning on human expert moves and reinforcement learning from self-play. 

Since then, the team responsible for AlphaGO has developed AlphaGo Zero, a more robust version created without using data from human games. AlphaGo Zero won 100–0 against the previously published, champion-defeating AlphaGo.

A Business Application

Today, businesses are deploying Deep Reinforcement Learning to achieve optimal results from scenarios that possess a considerable number of outcomes or variables. At Pactera EDGE, we did precisely that! We partnered with an enterprise logistics company to solve their most significant challenges: inefficient, gut-lead decisioning on route and line haul operations.

Their goal was to prioritize cost-minimizing operations and unlock route efficiencies for goods transported via their 38-hub network, located across several Indian cities, without impacting the time it takes to transport the goods or the overall customer experience. 

To help them achieve that, we focused on one (not-so-simple) task: assisting their team of network design engineers to optimize their line haul network, in turn reducing the cost per kilogram (CPK) of transported items. 

As the client's measure of network efficiency was CPK of goods transported, we needed to additionally take into account the hiring costs associated with the vehicles used. From there, we aligned on the target to transport at least 95% of any given load with the lowest possible CPK. We set out to build a solution that could maximize vehicle load utilization and reduce costs.

The line haul network team had several strategies available to them to maximize load utilization and minimize CPK, including:

  • Adding/removing vehicles 
  • Upsizing or downsizing said vehicles
  • Detouring the vehicles' paths by adding stop points
  • Consolidating the load at more significant hubs in separate vehicles to transport them to the destination 

However, their extensive network of hubs paired with everchanging freight size and the various vehicle types left them with over 75,000+ unique optimization opportunities. It was impossible to analyze the opportunities as an automated, out-of-the-box solution didn't exist, and it left them with no actionable intelligence or way to get started. 

Recognizing the opportunity at hand, we proposed a network optimization solution that leveraged a Deep Reinforcement Learning model, similar to AlphaGo Zero, to solve our 'infinite opportunity' challenge. 

As the human-knowledge data required for supervised learning was practically non-existent, we needed to deploy a model that could self-learn. We began by providing the model with all relevant data points and variations. From there, it was able to vigorously self-explore opportunities for optimization, observing the impact that each would have on the global CPK, allowing for informed, best-possible network design creation. 

The immediate challenge of 'infinite opportunities' technically equated to a large set (or space) of discrete actions (specific and well-defined actions, i.e., specific optimization measures) - 75,000+ to be precise. However, regular reinforcement learning algorithms usually utilized to solve problems with discrete action spaces (again, defined actions) can only handle a small amount of action space (in other words, manage a limited number of said possible actions). Certainly not the 75,000+ we were up against. 

By leveraging the policy-based, Deep Deterministic Policy Gradient (DDPG) algorithm, we were able to address the large amount of action space. However, as DDPG works only with continuous actions (not discrete actions), we needed to deploy a custom policy architecture simultaneously. The custom policy architecture transformed the continuous into discrete, allowing the model to learn and act in our environment efficiently.

So far, our solution has resulted in an average cost savings of 400K INR per day, with manual processes that typically took over a month complete in less than 2 hours. By applying Deep Reinforcement Learning, Pactera EDGE can solve most complex optimization concerns. 

A few of the business cases where Deep Reinforcement Learning is best-suited include:

  • Optimize the way you load goods of different dimensions into truck cubes or cargo ships, utilizing the maximum space.
  • Assign the best manufacturing plant for incoming orders to minimize the manufacturing, operational, and transportation cost
  • Optimize data center operations by utilizing server resources for the workloads

To apply Deep Reinforcement Learning for business value, contact Pactera EDGE.