So this first blog here will break down what I will probably be considering. After reading and attempting to learn as much as possible on machine learning. Disclaimer, this at own risk to adopt. I have not tried solving real world machine learning problem, only attempted a few during the course. Having been a supply chain analyst for many years, this is my intuitive perspective to guide myself.
Machine learning is easy to understand as much as being confused on. In my best effort to explain what actually is being done in layman’s term:
- Decide on what is the task and purpose you want to achieve with your data.
- Decide on a suitable learning method or the type of algorithm to use:
- Supervised learning – find the attributing features (factors) to form the dataset for the system to learn on. Preset random weight to each factor first for the system to learn its optimised weight later.
- Unsupervised learning – define the type of patterns and number of data clusters. Finding attributing features in data is optional since mostly we do not know how features affect the pattern. Pre-set each data cluster with some random weight.
- Gather relevant data, sort and prepare them to be trained by the system.
- Design and programme the system learning logic to train algorithm. For both new and to be revised algorithm.
- Train the algorithm:
- Supervised learning – through checking how accurate each weight (result) score against the data you have gathered. Improve accuracy by adjusting the weight a little using gradient descent or stochastic gradient descent.
- Unsupervised learning – through assigning each data to its closest cluster. Then for each cluster, find the average distance (centre) point of all its assigned data. Adjust the weight of the cluster to the centre point found.
- Create repetitive loops on Step 5:
- Supervised learning – to check on adjusted weight accuracy and continuously adjust the weight values. Eventually, the algorithms accuracy score will improve and get the optimal weight for all the attributing features (factors).
- Unsupervised learning – so that all data can be optimally sorted into each cluster. Refresh step 4 training a few times (or many times), each with different random clusters starting point. This will create a few different algorithms and the one with the lowest errors scored should be used.
- Repeat Steps 1-6 when there is new data or algorithm is no longer suitable for the current task.
- If the algorithm is not improving, troubleshoot where the problem lies and try to mitigate or fix it.
- For most difficult tasks, it is common for the data to go through a pipeline consisting different machine learning tasks before delivering on the main task. Steps 1-6 need to be planned for each module in a pipeline.
Above are my views on how to solve machine learning problems and everyone will have an approach they are comfortable with. To really do meaningful cool stuff, it’s all about wits, grit and lots of dedicated time.