SE-ML | Insights into Recent Engineering Practices for Machine Learning

In our recent exploration of the latest trends in ML engineering best practices, we identified five key areas of focus. In this article, we zoom in on the first of these areas: engineering practices for ML and Responsible ML/AI (RAI).

This article is the second in a series aimed at exploring recent advancements in ML engineering best practices, highlighting emerging challenges, and discussing under-represented topics. The current analysis introduces new practices across five domains: requirements engineering, architecture design, AutoML, data engineering, and responsible ML/AI, and discusses them in relation to the current catalogue of practices.

We find that while progress has been made in some areas — particularly around RAI — other domains like data engineering remain comparatively underdeveloped. Furthermore, despite increased attention to the unique challenges of ML systems, many practices are still adaptations of traditional software engineering practices rather than novel solutions.

Requirements Engineering

Requirements engineering emerged as one of the most discussed topics within the recent engineering practices for ML. However, it is notably absent from the current catalogue of practices, which tends to focus more on the development and operation of ML-based systems.

We observe that incorporating ML components has not fundamentally changed the core processes of requirements elicitation and modeling. These processes still rely on established techniques such as interviews, card sorting, and focus groups for elicitation, as well as frameworks like STRIDE, data flow diagrams, and Goal-Oriented Requirements Engineering (GORE) for modeling [90, 93].

Nevertheless, ML introduces a more dynamic process, incorporating broader constraints related to RAI and emerging topics like compliance to upcoming regulations. The primary challenge here lies in specifying concrete requirements upfront, as it is difficult to predict algorithm performance based on available data, or estimate and measure RAI requirements such as degrees of explainability or interpretability upfront [90]. This uncertainty mandates iterative and adaptive approaches to requirements engineering.

Some emerging practices aim to address these challenges. For example, developing user personas tailored to RAI helps align ML systems with ethical expectations [34]. Additionally, writing ethical user stories, eliciting uncertainty levels, and defining clear data requirements — such as data volume, sources, and handling protocols — are considered critical for RAI [90]. A pyramidal model proposed in [78] emphasizes the need for requirements to drive ethical and technical considerations across the entire ML lifecycle, from data curation to deployment. Despite these advancements, the literature highlights a lack of actionable, novel methods, with ethical user stories and RAI-focused personas being notable but still limited contributions [90].

Architectural Design

Architectural design is another area where the current catalogue is lacking (despite our previous exploration of architectural tactics for ML-based systems), but which has received more interest recently. The literature reveals the development of new architectural pattern catalogues, which include novel patterns such as encapsulating ML components within traditional rule-based safeguards, together with more traditional architectural patterns that are adapted for ML [82]. These designs often employ a layered structure to separate concerns related to ML-based components or integrate and deploy multi-model systems via vertical or horizontal services, facilitating sequential or parallel model processing [82].

Additionally, novel architectures like hierarchical fallback mechanisms improve system resilience by allowing systems to switch to alternative models or strategies in the event of failures [21]. Nevertheless, there remains significant room for improvement in this domain, particularly as ML systems have evolved from just integrating ML-based components to increasingly relying on ML-driven processes or agents [29].

AutoML

As discussed in the introductory article, AutoML practices remain under-studied despite their potential to automatically optimise various stages of the ML pipeline. Recent literature provides guidelines on scheduling and executing AutoML jobs, complementing general AutoML practices by providing structured processes for automation [63]. For example, this involves explicitly modeling pipelines and loops for scheduling and evaluating multiple configurations, as well as acquiring and analyzing metadata [63]. Furthermore, automatically adapting model parameters in training loops, such as adjusting the learning rate or dynamically chosen parameters, adds structure to the AutoML processes [63].

Additionally, practices that leverage AutoML to improve fairness without compromising performance are presented. For example, by prioritising equitable outcomes for population groups and sub-groups through automated hyper-parameter tuning or model selection [70].

Nevertheless, there is still significant potential for improvement in AutoML practices, particularly in expanding capabilities beyond hyperparameter tuning. Future practices could focus on addressing automatic data quality evaluation and selection (especially for large datasets) or leveraging AutoML for MLOps, such as using AutoML for monitoring and model deployment or replacement. Furthermore, employing AutoML for compliance purposes could further improve its utility and integration into broader workflows.

Data Engineering

Data engineering has also been identified as an under-studied topic, despite its importance to developing robust ML systems. Recent literature investigates data collection objectives, by introducing a data requirements goal model, which ensures data is collected systematically to meet overall system objectives [13].

Conversely, data quality anti-patterns, such as missing values, schema violations, imbalanced class distributions, or label overlaps, highlight common pitfalls that impact model performance [36]. Best practices in this area include checking the ratio of rows to columns in tabular data, verifying data distributions, and addressing label-related issues to ensure robust data pipelines [36]. These practices complement the more general practice on input data validation from the existing catalogue.

A notable gap persists in data engineering practices, particularly concerning the engineering of scalable data pipelines.

Responsible ML/AI (RAI)

Practices for RAI components were the most prevalent, which is encouraging given the recent emphasis on ethical considerations and compliance with upcoming legislation.

Comprehensive catalogues of fairness-aware practices that span the ML lifecycle, from fair data balancing and multitask learning to maximize minority group representation, to fairness regularization, hyper-parameter tuning, and post-training calibration have been proposed [35]. These practices highlight that fairness spans multiple concerns across the Data and Training practices, and can be integrated across many practices from the existing catalogue.

Beyond fairness, a RAI pattern catalogue addresses broader concerns, including industrial-level governance, leadership commitment to RAI, and the establishment of ethics committees [95]. Standardized reporting and advanced risk management, such as failure mode and effects analysis, are particularly promising for developing responsible-by-design systems. These patterns extend to organizational and team-level practices, ensuring RAI principles are embedded throughout the development process.

Embedding explainability into ML artifacts is critical for transparency, with techniques like summary plots, or dependence plots providing insights into feature importance and model behavior [69].

Nevertheless, fairness practices prove to be context-dependent, as operational domains significantly impact their effectiveness, making it still challenging to abstract and recommend universal solutions [12].

Conclusions

Our review of recent practices in ML engineering highlights both encouraging progress and persistent gaps. While domains like RAI have seen the development of comprehensive practice catalogues, other critical areas — particularly requirements engineering, AutoML, or data engineering — remain less mature, with many practices still emerging or lacking actionable guidance.

The inherently dynamic and uncertain nature of ML continues to challenge traditional engineering approaches, especially in specifying requirements, designing resilient architectures, and ensuring data quality at scale. Moving forward, there is a clear need for more systematic, ML-specific engineering practices that go beyond adaptations of classical software development methods.

Broader empirical validation across different domains is also necessary, in order to bring evidence regarding how well these practices generalize across different industries, scales, and ML application domains.

Notes

The reference notation is based on the references available in this article.