Track changes to your machine learning project with Neptune – News Couple
ANALYTICS

Track changes to your machine learning project with Neptune


an introduction

Working as an ML engineer, it’s common to be in situations where you spend hours building a great model with desired metrics after doing multiple iterations and tuning the hyperparameter but you can’t return the same results with the same model just because you missed a single small hyperparameter record.

What can save one from such situations is to keep track of the experiences you have in the process of solving an ML problem.

  • If you have ever worked on any ML project, you will know that the most difficult part is getting to perform well Which makes it necessary to perform many experiments to modify the different parameters and track each of them.
  • You don’t want to waste time looking for that good model you got in the past Repurchasing all the experiences you had in the past makes it hassle free.
  • Just a small change in the alpha and accuracy of the model touches the ceiling Capturing the small changes we make in our model and the associated metrics saves a lot of time.
  • All your experiences under one roof Experience tracking helps compare all the different runs you do by putting all the information under one roof.

Should we just keep track of machine learning model parameters?

Okey, no. When running any machine learning experiment, you should ideally track multiple numbers of things to enable experimentation to be reproduced and access to an optimized model:

picture 1
  • cipher: The code used to perform the experiments
  • data: Copies of saving data used in training and evaluation
  • environment: Save environment configuration files such as “Dockerfile”, “requirements.txt”, etc.
  • Factors: Save the different hyperparameters used for the model.
  • Metrics: Training on registration and validation metrics for all pilot processes.

Why not use an excel sheet?

Do not use an excel sheet |  Neptune

Spreadsheets are something we all love because they are so easy to use! However, recording all information about experiments in a spreadsheet is only possible when we perform a limited number of iterations.

Whether you are a beginner or an expert in data science, you will know how difficult the process of building an ML model with many things happening simultaneously such as multiple versions of data, hyperparameters of different models, many versions of laptops, etc. Make it pointless to go for manual recording.

Fortunately, there are many tools available to help you. Neptune is one such tool that can help us keep track of all our ML experiences within a project.

Let’s see him in action!

Install Neptune in Python

To install Neptune, we can run the following command:

pip install neptune-client

To import the Neptune client, we can use the following line:

import neptune.new as Neptune

Do you need credentials?

We need to pass our credentials to the neptune.init() method to enable logging of metadata to Neptune.

run = neptune.init(project="",api_token='')

We can create a new project by logging in to https://app.neptune.ai/ and then fetching the project name and API token.

Recording parameters in Neptune

We use the iris dataset here and apply a random forest classifier to the dataset. Thus we record the parameters of the models and scales using Neptune.

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from joblib import dump
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data,
                                                    data.target,
                                                    test_size=0.4,
                                                   random_state=1234)
params = 'n_estimators': 10,
          'max_depth': 3,
          'min_samples_leaf': 1,
          'min_samples_split': 2,
          'max_features': 3,
          
clf = RandomForestClassifier(**params)
clf.fit(X_train, y_train)
y_train_pred = clf.predict_proba(X_train)
y_test_pred = clf.predict_proba(X_test)
train_f1 = f1_score(y_train, y_train_pred.argmax(axis=1), average="macro")
test_f1 = f1_score(y_test, y_test_pred.argmax(axis=1), average="macro")

To register the parameters of the above form, we can use the run object we started before as follows:

run['parameters'] = params

Neptune also allows tracing of code and environment during run object creation as follows:

run = neptune.init(project=" stateasy005/iris",api_token='', source_files=['*.py', 'requirements.txt'])

Can I record metrics as well?

The training and evaluation metrics can be logged back using the run object we created:

run['train/f1'] = train_f1
run['test/f1'] = test_f1

Shortcut to record everything at once?

We can create a summary of our classifier model which by itself will capture different parameters of the model, diagnostic charts and a test folder with actual predictions, prediction probabilities and different scores for all categories like accuracy, recall, support etc.

This summary can be obtained using the following code:

import neptune.new.integrations.sklearn as npt_utils
run["cls_summary "] = npt_utils.create_classifier_summary(clf, X_train, X_test, y_train, y_test)

This creates the following
Folders on the Neptune user interface as shown below:

What is inside the folders?

The Diagnostic Charts Folder It is useful as one can evaluate their experiences using multiple metrics only with a single line of code in the workbook summary.

The “all_params” Folder It includes various hyperparameters of the model. These hyperparameters help one compare the performance of the model in a set of values ​​and propagate their tuning through some level. Tracking hyperparameters also helps to get back to the exact same form (with the same hyperparameter values) when one needs to.

The trained model is also saved as a “.pkl” file which can be fetched later for use. The ‘Test’ Folder It contains the predictions, prediction probabilities, and outcomes in the test data set.

What about regression and agglomeration using Neptune

We can get a similar summary if we have a regression model using the following lines:

import neptune.new.integrations.sklearn as npt_utils
run['rfr_summary'] = npt_utils.create_regressor_summary(rfr, X_train, X_test, y_train, y_test)

Similarly, for compilation also, we can create an abstract with the help of the following lines of code:

import neptune.new.integrations.sklearn as npt_utils
run['kmeans_summary'] = npt_utils.create_kmeans_summary(km, X, n_clusters=5)

Here, km is the name of the k-mean model.

How do I upload my data to Neptune?

We can also register csv files to play and watch on the Neptune user interface using the following lines of code:

run['test/preds'].upload('path/to/test_preds.csv')

Download artifacts to Neptune

Any character drawn using libraries like matplotlib, plotly, etc. can also be registered to Neptune.

import matplotlib.pyplot as plt
plt.plot(data)
run["dataset/distribution"].log(plt.gcf())

To download the same files later programmatically, we can use the download method of the “run” object with the following line of code:

run['artifacts/images'].download()

last thoughts

In this article, I have tried to cover why it is important to track experiments and how Neptune can help facilitate this which in turn increases productivity while running various ML experiments for your projects. This article focused on ML experience tracking but we can implement code version, notebook version, data version, environment version as well with Neptune.

There are of course many similar libraries available online for tracking runs which I will try to cover in my next articles.

About the author

Nepdita Dutta

Nibedita is an MSc in Chemical Engineering from IIT Kharagpur and currently working as a Senior Consultant at AbsolutData Analytics. In her current capacity, she builds AI/Machine Learning-based solutions for clients from a range of industries.

image source

Picture 1: https://tinyurl.com/em429czk

The media described in this article is not owned by Analytics Vidhya and is used at the author’s discretion.



Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button