Most of the time, training machine learning models is preceded by an exploratory phase, in which non-structured code is written, or manual steps are performed in order to get the data in the right format, merge several data sources, etc. Especially when using notebooks, there is a tendency to write ad-hoc data processing scripts, which depend on variables already stored in memory when running previous cells.
Before moving to the training phase, it is important to convert this code into reusable scripts and move it into methods which can be called and tested individually. This will enable code reuse and ease integration into processing pipelines.
- Best Practices in Machine Learning Infrastructure
- Data management challenges in production machine learning
- ML Ops: Machine Learning as an engineered disciplined