Big Data Analytics for Master Production Scheduling

The very core of America’s automotive manufacturing dominance (Thomas, 2023) extends beyond the Big 3 OEMs (Ford, GM, and Stellantis), it is also found in the chain of Tier Automotive suppliers. Of the top 100 auto parts supply manufacturers in the United States, the first 20 collectively generate >$108B (Tenneco, 2021) in revenue. Across the country, over 2 million people are employed in auto manufacturing jobs, more than 80% work in supply manufacturing alone, (Alliance for Automotive Innovation, 2022) earning approximately $135B in payroll compensation. Needless to say, automotive suppliers manufacture the parts that drive the industry. The profits are costly, however, as major shifts in the industry are combining to hinder growth. Supply chain challenges, production changes from the OEMs, labor shortages, and external geopolitical dynamics are driving up suppliers’ costs. Bolt on the rise of the Electric Vehicle (EV) and industry adaptations for the suppliers are critical for future growth. Many suppliers are expanding their product portfolio, differentiating between Internal Combustion Engines (ICE) and EV, and increasing the products they sell to OEMs (Tominaga, et al., 2023). Others, ostensibly, can reassess their current state of business operations, relying on better analytics to trim the costs from already thin earnings. One of the strategies for big data analysis is utilizing Advanced Master Production Scheduling.

The following paper discusses how a tier automotive supplier might use existing operational, labor, and production related manufacturing data to build sophisticated machine learning models for predicting more accurate Master Production Schedules. The problem with master scheduling is due to the variabilities of the end-to-end supply chain and the randomness of workloads (Dormer Gunther Gujjula, 2013). The paper and associated data models will attempt to overcome the difficulties with scheduling by examining and learning from years of information, developing a method to plan labor, production, and equipment schedules that work congruently, giving suppliers clearer forecasts for the future. In choosing this effort, I am exploring a component of my business with the intent of piloting a process for helping my tier automotive manufacturing customers.

Project Structure

            In the Automotive Industry the Master Production Scheduling (MPS) is one of the primary business drivers. Other schedules and calendars are utilized, most offer useful forecasts for immediate results. A complimentary schedule called Production Planning is like MPS in that both provide time-bound supply and demand requirements. The biggest difference in the two schedules is that MPS specifies what needs to be produced, how much product needs manufactured and when, while the Production Planning schedule specifies how much material is needed to meet the MPS demand. Production planning occurs on a smaller scale too, usually on scales of days and weeks. MPS is broader, often looking months and years ahead. Building a useful long-term forecast requires multiple inputs and necessitates sufficient historical data.

            The variability of inputs also complicates the process. An MPS is influenced by supply and demand, product complexity, variations in production processes, accuracy of quality control plans, labor resource constraints, global supply of raw materials and component parts, and a streamlined operations team.

            Given that long-range forecasts are combined with the variability of significantly dynamic inputs, a strong MPS is difficult to make. This project focuses on a subset of the overall MPS using a combination of machine sensor data, production results, and fault-based analytics churned through advanced ML techniques to produce predictive outcomes. The ML techniques employed in the paper are presented not to derive a robust model that delivers an MPS methodology, but rather to demonstrate some of the capabilities that ML can provide. A handful of the techniques utilized in this paper include the following:

  1. Multi-class classification with TensorFlow and scikit-learn.
  2. Ensemble prediction with Random Forests.
  3. Deep learning models with Neural Networks.

The outcomes of these techniques only represent a basic example and further experimentation is required. In the following sections, I explain the questions to be answered by this study.

Technology Description

Tier Automotive customers generally utilize Enterprise Resource Planning (ERP) systems to manage data across various business units and departments. According to the Panorama Consulting Group (2020), 33.66% of global ERP systems use is in manufacturing (p.4). The advent of cloud-based ERP solutions makes it simpler for smaller automotive suppliers to implement and deploy them. Nearly 63% of ERP consumers selected them due to the low maintenance needs of Software as a Service (SaaS) systems (p. 17). Furthermore, given the cohesive nature of ERP systems, the data for the study will be confined to the schema and format of the inclusive databases and tables.  

The following list of data sets and types is used in the project. Below is a simple table outlining the sources, how large the data sets are, and what data types exist in them. Additionally, notes are available to assist in data understanding.

Data SetsSizeTypesNotes
Production CapacityMed (1600 Records)Str / Int / DecLabor to Demand
Production HistoryXLrg (>4000 Records)Str / Int / Dec / LongHistorical production
OEEMed (1600 Records)Int / DecPerformance Efficiency
Equipment FailuresMed (1600 Records)Str / Int / DecMachine Downtime
Raw Machine DataXLrg (>4000 Records)Str / DecRaw Sensor Data

Data Set Descriptions

Production Capacity –

            Each manufacturing facility sets a finite production capacity. The value of the capacity is the maximum production output the totality of equipment, labor, and resources can achieve.

Sample:

Day NumberProduction (hrs)Capacity (hrs)% Availability
120.252484.38%
219.752482.29%
3212487.50%
48.752436.46%
517.252471.88%
623.252496.88%
722.752494.79%

Production History –

            Manufacturing production history tallies the past records of produced parts. The production history data set is valuable for building a comprehensive time series population of records that reveal a manufacturer’s capabilities.
Sample:

Day NumberPieces ProducedPieces Scrapped% Quality Product
119850183091.56%
21916079096.04%
31890087095.60%
48710115088.34%
51811074096.07%
62511090096.54%
72048051097.57%

OEE (Overall Equipment Efficiency) –

            OEE is an aggregation of capacity, historical information, expected output, and represents the efficiency of the equipment which runs the shop floor. Factors like availability, performance, and quality are considered. OEE can be summarized as A*P*Q. Availability * Performance * Quality. (2023 Vorne Industries). The dataset used for the Regression studies compares Availability to the overall recorded OEE.

Sample:

AvailabilityOEE
0.8437575.73%
0.82291666776.67%
0.87575.28%
0.36458333332.06%
0.7187572.50%
0. 96875101.00%
0.94791666783.26%

Equipment Failures –

            The Equipment Failures dataset includes information about the downtime of machines. There are categories of downtimes, some planned, others unplanned. The dataset holds the records needed to identify when, how, and perhaps what actions were taken to address the downtime.

Sample:

Machine_IDStart_DateStart_TimeEnd_TimeTotal_DownReason_CodePlanned
M_00171/3/201907:11:2201/03/2019 07:46:4000:35:18Die Change1
M_00161/4/201910:44:0501/04/2019 11:17:0800:33:03Off1
M_00161/6/201901:09:4701/06/2019 02:47:4401:37:57Die Change1
M_00191/8/201907:18:5601/08/2019 07:51:0100:32:05Planned Maintenance1
M_00021/12/201921:52:1501/12/2019 22:25:5100:33:36Product Error0
M_00171/13/201901:49:4801/13/2019 02:34:3500:44:47Line Stop0

Raw Machine Data –

            The raw machine data provides a population of sensor outputs for multiple pieces of equipment. The information is studied with ML to cluster and categorize potential correlations. It is good data to foster awareness of how manufacturers struggle with big and puzzling data sets.

Sample:

Machine_IDFailureAVG.Sensor_1Sensor_2Sensor_3Sensor_4
M_000106.93475313.491619.31809810.896098.401777
M_000205.506416.35228412.186267.7576455.330642
M_000305.6891352.9820993.5224459.9475123.76125
M_000405.46793710.98812.6698715.5895396.975189
M_000505.50854913.181741.1793025.4854435.973635
M_000605.6336911.996356.0317083.97960214.74294
M_000714.5100026.9844216.2569246.6886690.43979
M_000815.0643957.81320710.8858812.192779.667473

Platform Technologies

The platform technologies for analysis include the following:

TechnologiesSizeNotes
Plex ERPXLrgERP, MES, QMS
SQLXLrgSql db, tbl, SQL scripts
SSMSN/AManaging SQL data
ODBCN/AConnector to ERP
APIN/AConnector to ERP
Apache SparkN/APlatform for ML
DatabricksN/APlatform for ML

            Plex ERP, a cloud-based enterprise resource planning system tailored for manufacturing operations, excels in providing comprehensive integration throughout manufacturing processes. From order management to production scheduling, inventory control, and quality management, Plex ERP streamlines operations with real-time insights and process automation. Its suitability for managing intricate manufacturing data in the automotive industry makes it an ideal choice. For this study, data was anonymously extracted from Plex ERP.

Fundamental tools for efficient database management and querying, SQL and SSMS play essential roles in the project. SQL, a standard language for managing relational databases, handles tasks like data retrieval, manipulation, and schema modification. Meanwhile, SSMS serves as a robust integrated environment for managing SQL Server databases. These tools, chosen for their capabilities in data storage, retrieval, and analysis, were employed to store the extracted Plex ERP data.

In the realm of data connectivity, ODBC serves as a standard interface for connecting with databases, while APIs facilitate communication and data sharing among different software applications. Together, ODBC and API are crucial components of the project, enabling seamless communication between diverse data sources and applications. ODBC played a key role in establishing a connection between the SSMS database and tables and Plex ERP. Plex ERP’s multiple libraries of APIs, facilitating HTTPS restful management of various data sources, further enhanced data connectivity.

Apache Spark, an open-source data processing engine, paired with the Scala programming language, offers a fast and versatile cluster-computing framework for big data processing. These tools are instrumental in enabling large-scale data processing, analysis, and transformation, aligning with the project’s requirements.

Databricks, an extension of the Apache Spark founders, stands as a commercialized web-based platform that leverages Spark. It efficiently manages processing clusters and facilitates ML pipeline development in an integrated environment. This platform, chosen for its cohesion and compatibility with Apache Spark, contributes to the project’s success in data processing and machine learning.

ML Modeling Process

            Each of the previously identified datasets are flush with possibilities. Assessing the potential needs of Tier Automotive Suppliers based on the datasets is an enjoyable and important component of the modeling process. When examining the questions and hypothetical outcomes, this paper focuses on the following scope:

  1. How can historical operational, financial, and production-related data be leveraged to improve the accuracy of Master Production Schedules?
  2. What machine learning algorithms are most effective in predicting and managing variabilities in the Tier Automotive manufacturing facilities?
  3. What predictive models can assist in the identification of line stoppages, labor shortages, or equipment failures?

I call on the emphasis of Richard McElreath to “start with the causes of the data” (2023 McElreath). The scope of these questions enforces the use Directed Acyclic Graphs (DAGs) and are presented as a causal model to start the study here.

            Start with Production History and Operational Records. Each of these populations of information directly affect the Manufacturing Capacity and Production Schedules. Production history dictates what is possible, operational records define what is negotiable. Buckle in the Equipment and Labor Availability and now Manufacturing Capacity becomes heavily influenced by the additional sources. For example, if labor is short, manufacturing capacity is short. If equipment is down, capacity is down. These two in combination with Manufacturing Capacity impact the Production Quality. Quality suffers or flourishes when the three primary drivers are congruently working in parallel. Production Quality in turn determines the Production History while Equipment Availability and Labor Availability influence the operational records where financial details are considered (Labor and Overhead). By understanding the DAG within the scope of my questions, I can pursue Machine Learning models that are useful.

To Continue… please read & download the report here:

Published by Benjamin Bird

Transform Everything! I am passionate about change. I am an agent of transformation. I lead companies through digital adaptations and integrations. I pursue ease of use and technologic agnosticism - all while delivering simple solutions to complex business problems. I believe in building teams, coaching, educating, and learning. I espouse growth through action, and train my teams to be trainers of others. If I’m not changing perspectives, asking questions and troubleshooting configurations, I can be found on a golf course or watching soccer. I live near Pittsburgh with my wife and two Australian Shepherds. Get Up! Get Out! Make a Change!

One thought on “Big Data Analytics for Master Production Scheduling

  1. This is absolutely fantastic. I think I have some questions, but I am going to read it one more time before I ask them just in case.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.