The time to invest on a AL/ML initiative is now. Recent surveys indicate the pandemic has caused companies to invest in new AI initiatives. On the other hand studies indicates many enterprises still struggle to operationalize the AI initiatives they have started. Other than the data quality challenges, the one major reason for this struggle is most companies don’t have the right team composition or the right objectives set for the teams to take the models to production.
Planning upfront for the right team composition with complementing skills and setting the objectives are critical for a successful rollout of an AI initiative and better ROI
As a new AI initiative is kicked off, the first resources to come on board will be your data scientists / ML engineers along with the business teams to explain the data and the context. Most initiatives start small. The initial objective of this team will be to prove a hypothesis with a limited set of data to explore and build the initial models.
The first challenge the team will face after the hypothesis validation is on how to train on a massive production data set. Now the data is no longer a CSV export, to which feature engineering can be applied, and models can be fit in a single machine.
Data pipeline architectures to be defined for data delivery, and greater scalability. The infrastructure required for optimal extraction, transformation, and loading of data to be defined. Data pipelines ensure the right data is made available to the models. Parallel data processing and distributed training need to be addressed. You need the Data Engineer in your team to do the heavy lifting.
Bring your Data Engineers to the team right from the beginning. Data engineering skills can effectively utilize the network, storage, and compute resources.
Loosely Coupled Architecture lets the team work independently and deliver value. Solution Architects should ensure modularity in the ML systems. Solution architects work at the intersection of multiple disciplines and they have to be great communicators and with people skills.
For ML engagements along with your Code, Data and Model need to be versioned, monitored, reproducibility to be ensured, and ML-workflow steps are automated.
Onboard MLOps engineer from get go to set the principles of operations.
Data versioning includes your Data preparation pipelines, Features store, Datasets, Metadata. Model Versioning includes ML model training pipeline, ML model (object), Hyperparameters, Experiment tracking. Reproducibility to be ensured by MLOps team that the same ‘hyper tuning’ parameters are applied to your production models. Plan for the new monitoring needs. There will be data distribution changes (training vs. serving data), measuring the model drifting, computational performance of the model.
ML initiatives involve a lot of experimentation and development. Plan for tracking, automating, and monitoring your experiments well ahead. Invest time on Governance and Security early in cycle
As stated from ML-Ops.org, make sure to set your guidelines around tests for features and data, tests for model development, and tests for ML infrastructure.
QA Engineers doesn’t need to be experts in algorithms. But it is important to understand feature importance and correlation of features in the business context. Create test scenarios to measure the inference score while adding new features or dropping new features. Model metrics like RMSE are important, but QA Engineers plan to correlate with business metrics on the impact of a prediction, like % reduction in false positives, etc. Automated testing for end to end ML Pipeline to be planned as part of quality assurance.
Chandrasekhar Somasekhar is the Chief Architect at Cleareye.ai. He is responsible for product development, strategic technology direction, implementing and governing solution architecture methodologies at Cleareye.ai. Chandra defines and governs the enterprise solution architecture strategy. Additionally, he provides architectural direction. He is passionate about developing, mentoring, and motivating a high-performance team.