Following the recent recognition of our article on engineering best practices for ML by the Journal of Systems and Software (JSS) as one of the two papers of the year 2024, we conducted a follow-up study that analyzes the latest trends in ML engineering best practices. Since our initial collection of practices was finalized around 2021, we identified and reviewed 108 new articles published from 2022 to 2025 in leading software engineering journals and on arXiv.

This article is the first in a series designed to reveal the latest developments in ML engineering best practices, pinpoint new challenges, and discuss topics that have received less attention. Our current analysis identifies five major research directions in the current literature and three areas that are underrepresented.

We discuss these findings, give details on article selection and analysis, and present a complete list of references categorized by the identified topics.

Data Collection

To gather relevant articles, we initially parsed the titles and abstracts of all published articles from 2022 to 2025 in prominent venues in the field, such as ICSE, ESEM, TSE, and others. These lists were obtained either directly from the publication sites or through dblp. For example, dblp conveniently compiles all TSE publications from a single year onto one page, which simplified our parsing process.

Next, we parsed all articles from arXiv’s computer science software engineering (cs.SE) category between 2022 and 2025 using the official API.

The titles and abstracts were first filtered automatically to check for the presence of keywords such as “ML”, “machine learning”, “artificial intelligence”, and others. The remaining titles were manually reviewed, resulting in the selection of 108 relevant articles, which are included in the references section.

Additionally, we attempted to use the phi-4 model to filter relevant articles by providing the title and abstract in a prompt. The prompt asked whether the article pertains to (a) software engineering for machine learning (SE4ML), (b) machine learning for software engineering (ML4SE), or (c) solely software engineering without any machine learning components. Unfortunately, the results were unsatisfactory as the model often failed to distinguish between SE4ML and ML4SE. More advanced prompting techniques, such as providing explicit examples, might have yielded better results. However, manually reviewing the titles ensured the accuracy of our selection.

To identify trends from the selected articles, we began by reviewing and categorizing them based on the primary engineering topics they addressed. Subsequently, for each topic, we identified subtopics to provide a more detailed analysis. This resulted in five core topics, discussed below.

1. Engineering Practices for ML and Responsible ML

The development of engineering practices for ML is steadily evolving, with increasing emphasis on Responsible ML. Among general ML practices, requirements engineering emerges as the most extensively represented area, with studies addressing elicitation techniques for general ML systems [90, 93], frameworks tailored to Responsible ML [1, 83], and broader mapping efforts that explore the landscape of ML-specific requirements engineering [34, 78]. Closely following are contributions in architecture design, where researchers propose novel architectures [21] and reusable architectural patterns for AI and ML systems [29, 82, 106].

Other areas, such as AutoML practices [63, 70] and data engineering [13, 36], remain significantly underrepresented despite their foundational role in successful ML development. This gap is addressed further in the challenges section, where the need for more robust and systematic data practices is discussed. Additionally, a notable portion of the literature focuses on practices within diverse development contexts [3, 14, 18, 25, 73, 80, 81, 86, 87], highlighting how real-world constraints shape ML engineering.

The most substantial body of work centers on Responsible ML – covering key concerns such as fairness, bias mitigation, transparency, security, and ethical governance. Within this domain, several comprehensive catalogues of best practices and design patterns have been proposed to guide practitioners [10, 12, 17, 35, 69, 95, 96]. Specific themes such as Green ML/AI [31, 43, 51] and AI governance [2, 23, 33, 38, 95] are also gaining traction, reflecting a broader shift toward aligning innovation with long-term social, environmental, and ethical responsibilities.

2. MLOps, Model Management, and Deployment

The operationalization of ML models emerged as another area of focus, with MLOps practices gaining considerable traction in both research and industry. A substantial body of work investigates the challenges, perceptions, and adoption of MLOps, highlighting gaps in automation, tooling, collaboration, and quality assurance across ML pipelines [32, 37, 40, 48, 50, 52, 76, 79, 89, 91, 94, 104, 108].

A second group of studies contributes to the development of MLOps practices and pipeline design, proposing architectural frameworks and best practices for automating training, deployment, and explainability within production settings [45, 60, 72, 103]. Complementing this, research on model management and maintenance addresses versioning, reuse, and governance – key to sustainable ML operations in dynamic environments [16, 24, 26].

Tooling support is another growing area, with emerging solutions aimed at improving collaborative model development and version control, such as MGit and Git-Theta [68, 71].

3. Off-the-Shelf Models and Agentic Architectures

Agentic architectures are emerging as a dynamic paradigm in ML system design, where autonomous agents interact, collaborate, and adapt within complex, often real-time environments. This approach is increasingly powered by foundation models and supported through multi-agent coordination frameworks. The studies identified investigate how to design [44, 54, 56], develop [20], and deploy [27] such systems responsibly [56].

Complementing it, a growing body of work addresses design patterns and software architectures for foundation model-based systems, offering taxonomies and reference frameworks that help develop these systems [74, 77].

4. ML Software Quality Assessment

A growing body of research explores ML-specific quality assessment, including testing techniques which highlighting unique challenges such as non-determinism, generalization, and evaluation cost trade-offs [19, 22, 41, 61, 62, 65, 97]. Complementing this, more specialized techniques like test case prioritization have been explored to improve continuous integration workflows [98], while studies on testing in practice offer insights from real-world deployments [100].

Parallel efforts in Responsible ML testing aim to adopt fairness, explainability, and ethical testing in ML pipelines – proposing tools, metrics, and frameworks for fairness auditing, risk assessment, and certification [6, 9, 46, 47, 66, 67, 84, 85, 107, 42].

Maintaining ML systems over time demands robust maintenance practices, with recent studies analyzing the evolution of ML maintenance practices in open-source ecosystems [55], the management of technical debt [58], and the identification of code smells and architectural tactics for maintenance [92, 101, 102, 105]. These efforts illustrate a maturing field, moving beyond performance-centric evaluation to more integrated engineering strategies that account for long-term risk, responsible development, and maintenance.

5. Using ML to Improve Engineering Practices

Last, an emerging direction explores how ML and LLMs can be used to improve engineering practices. Recent work investigate the role of LLMs in software quality assurance [4], in developing trustworthy software [28] or for engineering practice recommendations [64].

Open Challenges and Underrepresented Topics

Considering the trends discussed, we explore three topics that remain underdeveloped and warrant further attention. It is important to note that the perspectives presented here are subjective and open to debate.

1. Data Engineering

Regardless of the ML methodology employed, the literature consistently underscores the critical role of high-quality data in achieving optimal results and maintaining model performance over time. Therefore, it is essential to focus on developing advanced data pipelines that enable efficient model debugging, maintenance, and evolution.

These pipelines should prioritize automation, where monitoring systems can automatically flag outliers detected in deployed models and send them to annotation teams for review. Similarly, new data should undergo automatic quality and relevance checks before being integrated into annotation pipelines. Once new annotations are accepted, they should automatically trigger new training pipelines, ensuring that models are continually updated with the most accurate and relevant data.

While some practices in the data category indicate progress in these directions, more complex capabilities are essential for maintaining model relevance in dynamic environments. However, they are rarely implemented or studied on a large scale, pointing to a significant area for further research and development.

2. Higher Focus on Scaling and Model Efficiency Engineering

Recently, there has been a surge in publications exploring advanced scaling and optimization techniques for training and runing inference on ML models (e.g., [109, 110, 111]). However, these techniques are seldom adopted or discussed within the software engineering community, and they are not often translated into practical applications. Methods such as model compression, quantization, distillation, and more sophisticated optimization strategies like developing custom kernels require further investigation, particularly as larger models frequently operate inefficiently in real-world settings.

While leading companies are already leveraging these techniques – exemplified by organizations like DeepSeek, OpenAI, and Mistral (which has one of the fastest inference speeds) – there is an urgent need for the broader software engineering community to examine, abstract, and convert these methods into practical guidelines. This will help ensure that the advantages of these advanced optimization techniques are more widely accessible and can be effectively utilized across diverse applications and industries.

3. Incentivization and Collection of User Feedback from Deployed Models

Collecting and incentivizing users to provide feedback for deployed models is essential for understanding model performance, identifying biases and areas for improvement, and improving user satisfaction. However, the most effective methods for collecting this feedback and the contexts in which they should be applied remain unclear. This presents a significant opportunity for the software engineering community to develop and refine best practices.

Consider, for example, suggesting improved translations to DeepL or providing more detailed feedback for a generated phrase in Mistral. These platforms have begun to integrate user feedback mechanisms, but they are still clumsy or incomplete. Here, the software engineering community can play a more important role in creating standardized approaches for collecting and utilizing user feedback, ensuring that deployed models continuously evolve and improve based on real-world usage and user insights. Given the current state of ML, fostering collective participation and collaboration is essential to mitigate some of the biases and inefficiencies present in current systems.

Conclusions

In this first article, we explore trends in the evolution of machine learning engineering practices, building upon the prior work that resulted in the catalog of practices available on this website. By analyzing 108 articles published between 2022 and 2025, we identified 5 key trends and 3 underrepresented areas in the field of ML engineering. Our initial findings highlight significant progress especially in the area of Responsible ML and persistent challenges.

In the upcoming articles, we will explore each identified trend in greater detail, examining novel practices and research directions while potentially uncovering new challenges and underrepresented topics.

References in ordered descending by date

  1. How to Elicit Explainability Requirements? A Comparison of Interviews, Focus Groups, and Surveys
  2. Toward Effective AI Governance: A Review of Principles
  3. Software Engineering for Self-Adaptive Robotics: A Research Agenda
  4. Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques
  5. Future of Code with Generative AI: Transparency and Safety in the Era of AI Generated Software
  6. What Makes a Fairness Tool Project Sustainable in Open Source?
  7. Can We Recycle Our Old Models? An Empirical Evaluation of Model Selection Mechanisms for AIOps Solutions
  8. Understanding Large Language Model Supply Chain: Structure, Domain, and Vulnerabilities
  9. Whence Is A Model Fair? Fixing Fairness Bugs via Propensity Score Matching
  10. Explainability for Embedding AI: Aspirations and Actuality
  11. Who Speaks for Ethics? How Demographics Shape Ethical Advocacy in Software Development
  12. Extending Behavioral Software Engineering: Decision-Making and Collaboration in Human-AI Teams for Responsible Software Engineering
  13. Data Requirement Goal Modeling for Machine Learning Systems
  14. Towards practicable Machine Learning development using AI Engineering Blueprints
  15. Systematic Literature Review of Automation and Artificial Intelligence in Usability Issue Detection
  16. Model Lake: a New Alternative for Machine Learning Models Management and Governance
  17. Contextual Fairness-Aware Practices in ML: A Cost-Effective Empirical Evaluation
  18. A Systematic Survey on Debugging Techniques for Machine Learning Systems
  19. Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy
  20. Agentic AI Software Engineers: Programming with Trust
  21. Hierarchical Fallback Architecture for High Risk Online Machine Learning Inference
  22. A quantitative framework for evaluating architectural patterns in ML systems
  23. Securing the AI Frontier: Urgent Ethical and Regulatory Imperatives for AI-Driven Cybersecurity
  24. Variability-Aware Machine Learning Model Selection: Feature Modeling, Instantiation, and Experimental Case Study
  25. The Current Challenges of Software Engineering in the Era of Large Language Models
  26. An Efficient Model Maintenance Approach for MLOps
  27. Microservice-based edge platform for AI services
  28. Engineering Trustworthy Software: A Mission for LLMs
  29. Architectural Patterns for Designing Quantum Artificial Intelligence Systems
  30. Overview of Current Challenges in Multi-Architecture Software Engineering and a Vision for the Future
  31. Do Developers Adopt Green Architectural Tactics for ML-Enabled Systems? A Mining Software Repository Study
  32. Machine Learning Operations: A Mapping Study
  33. Responsible AI in Open Ecosystems: Reconciling Innovation with Risk Assessment and Disclosure
  34. How Mature is Requirements Engineering for AI-based Systems? A Systematic Mapping Study on Practices, Challenges, and Future Research Directions
  35. A Catalog of Fairness-Aware Practices in Machine Learning Engineering
  36. Data Quality Antipatterns for Software Analytics
  37. A Large-Scale Study of Model Integration in ML-Enabled Software Systems
  38. Balancing Innovation and Ethics in AI-Driven Software Development
  39. Project Archetypes: A Blessing and a Curse for AI Development
  40. Initial Insights on MLOps: Perception and Adoption by Practitioners
  41. ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks
  42. Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models
  43. Innovating for Tomorrow: The Convergence of SE and Green AI
  44. Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents
  45. Automating the Training and Deployment of Models in MLOps by Integrating Systems with Machine Learning
  46. Software Fairness Debt
  47. How fair are we? From conceptualization to automated assessment of fairness definitions
  48. Professional Insights into Benefits and Limitations of Implementing MLOps Principles
  49. Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects
  50. An Empirical Study of Challenges in Machine Learning Asset Management
  51. Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures
  52. Towards MLOps: A DevOps Tools Recommender System for Machine Learning System
  53. A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research
  54. Design Patterns for Machine Learning Based Systems with Human-in-the-Loop
  55. Analyzing the Evolution and Maintenance of ML Models on Hugging Face
  56. Towards Responsible Generative AI: A Reference Architecture for Designing Foundation Model based Agents
  57. Continuous Management of Machine Learning-Based Application Behavior
  58. An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software
  59. Secure Software Development: Issues and Challenges
  60. Towards an MLOps Architecture for XAI in Industrial Applications
  61. Identifying Concerns When Specifying Machine Learning-Enabled Systems: A Perspective-Based Approach
  62. Software Testing of Generative AI Systems: Challenges and Opportunities
  63. A General Recipe for Automated Machine Learning in Practice
  64. On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers
  65. The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models
  66. A New Perspective on Evaluation Methods for Explainable Artificial Intelligence (XAI)
  67. Revisiting the Performance-Explainability Trade-Off in Explainable Artificial Intelligence (XAI)
  68. MGit: A Model Versioning and Management System
  69. Designing Explainable Predictive Machine Learning Artifacts: Methodology and Practical Demonstration
  70. Fix Fairness, Don’t Ruin Accuracy: Performance Aware Fairness Repair using AutoML
  71. Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
  72. Responsible Design Patterns for Machine Learning Pipelines
  73. QB4AIRA: A Question Bank for AI Risk Assessment
  74. A Taxonomy of Foundation Model based Systems through the Lens of Software Architecture
  75. Towards machine learning guided by best practices
  76. Scaling ML Products At Startups: A Practitioner’s Guide
  77. A Reference Architecture for Designing Foundation Model based Systems
  78. Machine Learning with Requirements: a Manifesto
  79. Toward End-to-End MLOps Tools Map: A Preliminary Study based on a Multivocal Literature Review
  80. Analysis of Software Engineering Practices in General Software and Machine Learning Startups
  81. A Case Study on AI Engineering Practices: Developing an Autonomous Stock Trading System
  82. Design Patterns for AI-based Systems: A Multivocal Literature Review and Pattern Repository
  83. Requirements Engineering Framework for Human-centered Artificial Intelligence Software Systems
  84. Framework for Certification of AI-Based Systems
  85. Towards Concrete and Connected AI Risk Assessment (C$^2$AIRA): A Systematic Mapping Study
  86. What are the Machine Learning best practices reported by practitioners on Stack Exchange?
  87. Towards Modular Machine Learning Solution Development: Benefits and Trade-offs
  88. An architectural technical debt index based on machine learning and architectural smells
  89. Studying the Characteristics of AIOps Projects on GitHub
  90. Requirements Engineering for Artificial Intelligence Systems: A Systematic Mapping Study
  91. Quality Assurance in MLOps Setting: An Industrial Perspective
  92. Capabilities for Better ML Engineering
  93. Requirements Engineering for Machine Learning: A Review and Reflection
  94. Operationalizing Machine Learning: An Interview Study
  95. Responsible AI Pattern Catalogue: A Collection of Best Practices for AI Governance and Engineering
  96. A Methodology and Software Architecture to Support Explainability-by-Design
  97. Software Testing for Machine Learning
  98. Comparative Study of Machine Learning Test Case Prioritization for Continuous Integration Testing
  99. Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability
  100. Exploring ML testing in practice – Lessons learned from an interactive rapid review with Axis Communications
  101. Achieving Guidance in Applied Machine Learning through Software Engineering Techniques
  102. Code Smells for Machine Learning Applications
  103. MLOps best practices, challenges and maturity models
  104. How Do Model Export Formats Impact the Development of ML-Enabled Systems? A Case Study on Model Integration
  105. Architectural tactics to achieve quality attributes of machine-learning-enabled systems: a systematic literature review
  106. Design Patterns for AI-based Systems: A Multivocal Literature Review and Pattern Repository
  107. Fairness-aware machine learning engineering: how far are we?
  108. We Have No Idea How Models will Behave in Production until Production: How Engineers Operationalize Machine Learning

Other references

  1. The Ultra-Scale Playbook: Training LLMs on GPU Clusters
  2. DeepSeek-V3 Technical Report
  3. Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

References ordered by groups

Development of practices for Responsible ML/AI

  1. How to Elicit Explainability Requirements? A Comparison of Interviews, Focus Groups, and Surveys
  2. Toward Effective AI Governance: A Review of Principles
  3. Software Engineering for Self-Adaptive Robotics: A Research Agenda
  4. Explainability for Embedding AI: Aspirations and Actuality
  5. Extending Behavioral Software Engineering: Decision-Making and Collaboration in Human-AI Teams for Responsible Software Engineering
  6. Data Requirement Goal Modeling for Machine Learning Systems
  7. Towards practicable Machine Learning development using AI Engineering Blueprints
  8. Contextual Fairness-Aware Practices in ML: A Cost-Effective Empirical Evaluation
  9. A Systematic Survey on Debugging Techniques for Machine Learning Systems
  10. Hierarchical Fallback Architecture for High Risk Online Machine Learning Inference
  11. Securing the AI Frontier: Urgent Ethical and Regulatory Imperatives for AI-Driven Cybersecurity
  12. The Current Challenges of Software Engineering in the Era of Large Language Models
  13. Architectural Patterns for Designing Quantum Artificial Intelligence Systems
  14. Do Developers Adopt Green Architectural Tactics for ML-Enabled Systems? A Mining Software Repository Study
  15. Responsible AI in Open Ecosystems: Reconciling Innovation with Risk Assessment and Disclosure
  16. How Mature is Requirements Engineering for AI-based Systems? A Systematic Mapping Study on Practices, Challenges, and Future Research Directions
  17. A Catalog of Fairness-Aware Practices in Machine Learning Engineering
  18. Data Quality Antipatterns for Software Analytics
  19. Balancing Innovation and Ethics in AI-Driven Software Development
  20. Project Archetypes: A Blessing and a Curse for AI Development
  21. Innovating for Tomorrow: The Convergence of SE and Green AI
  22. Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures
  23. A General Recipe for Automated Machine Learning in Practice
  24. Designing Explainable Predictive Machine Learning Artifacts: Methodology and Practical Demonstration
  25. Fix Fairness, Don’t Ruin Accuracy: Performance Aware Fairness Repair using AutoML
  26. QB4AIRA: A Question Bank for AI Risk Assessment
  27. Towards machine learning guided by best practices
  28. Machine Learning with Requirements: a Manifesto
  29. Analysis of Software Engineering Practices in General Software and Machine Learning Startups
  30. A Case Study on AI Engineering Practices: Developing an Autonomous Stock Trading System
  31. Design Patterns for AI-based Systems: A Multivocal Literature Review and Pattern Repository
  32. Requirements Engineering Framework for Human-centered Artificial Intelligence Software Systems
  33. What are the Machine Learning best practices reported by practitioners on Stack Exchange?
  34. Towards Modular Machine Learning Solution Development: Benefits and Trade-offs
  35. Requirements Engineering for Artificial Intelligence Systems: A Systematic Mapping Study
  36. Requirements Engineering for Machine Learning: A Review and Reflection
  37. Responsible AI Pattern Catalogue: A Collection of Best Practices for AI Governance and Engineering
  38. A Methodology and Software Architecture to Support Explainability-by-Design
  39. Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability
  40. Design Patterns for AI-based Systems: A Multivocal Literature Review and Pattern Repository

MLOps, model management, and deployment

  1. Can We Recycle Our Old Models? An Empirical Evaluation of Model Selection Mechanisms for AIOps Solutions
  2. Model Lake: a New Alternative for Machine Learning Models Management and Governance
  3. Variability-Aware Machine Learning Model Selection: Feature Modeling, Instantiation, and Experimental Case Study
  4. An Efficient Model Maintenance Approach for MLOps
  5. Machine Learning Operations: A Mapping Study
  6. A Large-Scale Study of Model Integration in ML-Enabled Software Systems
  7. Initial Insights on MLOps: Perception and Adoption by Practitioners
  8. Automating the Training and Deployment of Models in MLOps by Integrating Systems with Machine Learning
  9. Professional Insights into Benefits and Limitations of Implementing MLOps Principles
  10. An Empirical Study of Challenges in Machine Learning Asset Management
  11. Towards MLOps: A DevOps Tools Recommender System for Machine Learning System
  12. Towards an MLOps Architecture for XAI in Industrial Applications
  13. MGit: A Model Versioning and Management System
  14. Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
  15. Responsible Design Patterns for Machine Learning Pipelines
  16. Scaling ML Products At Startups: A Practitioner’s Guide
  17. Toward End-to-End MLOps Tools Map: A Preliminary Study based on a Multivocal Literature Review
  18. Studying the Characteristics of AIOps Projects on GitHub
  19. Quality Assurance in MLOps Setting: An Industrial Perspective
  20. Operationalizing Machine Learning: An Interview Study
  21. MLOps best practices, challenges and maturity models
  22. How Do Model Export Formats Impact the Development of ML-Enabled Systems? A Case Study on Model Integration
  23. We Have No Idea How Models will Behave in Production until Production: How Engineers Operationalize Machine Learning

Off-the-Shelf Models and Agentic Architectures

  1. Agentic AI Software Engineers: Programming with Trust
  2. Microservice-based edge platform for AI services
  3. Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents
  4. Design Patterns for Machine Learning Based Systems with Human-in-the-Loop
  5. Towards Responsible Generative AI: A Reference Architecture for Designing Foundation Model based Agents
  6. A Taxonomy of Foundation Model based Systems through the Lens of Software Architecture
  7. A Reference Architecture for Designing Foundation Model based Systems

ML software quality assessment

  1. What Makes a Fairness Tool Project Sustainable in Open Source?
  2. Understanding Large Language Model Supply Chain: Structure, Domain, and Vulnerabilities
  3. Whence Is A Model Fair? Fixing Fairness Bugs via Propensity Score Matching
  4. Challenges in Testing Large Language Model Based Software: A Faceted Taxonomy
  5. A quantitative framework for evaluating architectural patterns in ML systems
  6. ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks
  7. Software Fairness Debt
  8. How fair are we? From conceptualization to automated assessment of fairness definitions
  9. Analyzing the Evolution and Maintenance of ML Models on Hugging Face
  10. An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software
  11. Identifying Concerns When Specifying Machine Learning-Enabled Systems: A Perspective-Based Approach
  12. Software Testing of Generative AI Systems: Challenges and Opportunities
  13. The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models
  14. A New Perspective on Evaluation Methods for Explainable Artificial Intelligence (XAI)
  15. Revisiting the Performance-Explainability Trade-Off in Explainable Artificial Intelligence (XAI)
  16. Framework for Certification of AI-Based Systems
  17. Towards Concrete and Connected AI Risk Assessment (C$^2$AIRA): A Systematic Mapping Study
  18. Capabilities for Better ML Engineering
  19. Software Testing for Machine Learning
  20. Comparative Study of Machine Learning Test Case Prioritization for Continuous Integration Testing
  21. Exploring ML testing in practice – Lessons learned from an interactive rapid review with Axis Communications
  22. Achieving Guidance in Applied Machine Learning through Software Engineering Techniques
  23. Code Smells for Machine Learning Applications
  24. Architectural tactics to achieve quality attributes of machine-learning-enabled systems: a systematic literature review
  25. Fairness-aware machine learning engineering: how far are we?
  26. Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models

Using ML to improve engineering practices

  1. Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques
  2. Future of Code with Generative AI: Transparency and Safety in the Era of AI Generated Software
  3. Engineering Trustworthy Software: A Mission for LLMs
  4. On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers