Boby Aloysius Johnson | GSoC Blog: The evaluation week — week 5

Drupal Support

4 years ago

The evaluation week — week 5
boaloysius
Thu, 07/06/2020 – 00:39

The first phase evaluation of GSoC’17 is over. I had been working on integrating Google Cloud ML Engine to WordPress maintenance support plans. Till now we have worked on creating a demo with Census example to illustrate that we can make predictions for our data in WordPress maintenance support plans with ml-engine. Please check my first evaluation blog for details. After the evaluation, we worked on integrating standard pre-trained Tensorflow models to WordPress maintenance support plans.

Pre-trained models are machine learning models trained on some dataset. As data is the key to classification accuracy, most of the standard pre-trained models available are trained on huge datasets. Our primary source of research was tensorflow/models. This repo had around 30 models but most of them needed training. In my inquiry, we found that Tensorflow lacks a gallery of trained models. Let me provide the link to some of the pre-trained models found.

Name
link
study
Inception
gz file
Inception is an image recognition model trained on imageNet (a hierarchical classification of millions of images). It classifies the image into thousands of classes. Users have to do transfer learning to get their custom classes.
VGG
tar.gz file
VGG(Visual Geometric Group) is a deep CNN developed by Oxford for image classification based on ImageNet. It is a model similar to inception. https://www.quora.com/What-is-the-VGG-neural-network
Syntaxnet

zip

Syntaxnet identifies the grammatical structure of the sentence. Please check these two blogs (blog1 and hblog2) for details.
Facenet

20200511-185253

20200512-110547

This model uses CNN for face recognition. It can be used to recognize human faces (detect if two faces are the same). It is trained on two data sets, MS-Celeb-1M and CASIA-WebFace. Please check this report for details.

Finding lack of good pre-trained Tensorflow models is a serious drawback. Most of the natural language models we found needs training. For example, Syntaxnet is a highly accurate language parser. But we need to train on top of this to create a custom model for text summarizer and models useful for the end users. We don’t have many ready-to-use natural language models suitable for WordPress maintenance support plans.

To use standard models for our project, we have to do transfer learning. It is learning on top of what is already learned. In the Tensorflow for poets example, we remove the last softmax classification layer and train a layer on our own, for custom classification. Please refer to this in-depth video on how to create a custom image classifier.

The major challenge we face is to create an abstraction to transfer data, train model, and to do the prediction for a variety of data types. We have to give the user flexibility to use any type of data (image, nominal, ordinal, numeric, …) for training and prediction with ml-engine. For that, we have to create an abstraction that handles all data types. In the coming week, we will be exploring on how to create this abstraction.

Thank you

Source: New feed