Mental Health Prediction Using Machine Learning – News Couple
ANALYTICS

# Mental Health Prediction Using Machine Learning

Text(0.5, 0, ‘Age’)

Inference: The above plot shows the Age column with respect to density. We can see that density is higher from Age 10 to 20 years in our dataset.

```j = sns.FacetGrid(train_df, col="treatment", size=5)
j = j.map(sns.distplot, "Age")```

Inference: Treatment 0 means treatment is not necessary 1 means it is. First Barplot shows that from age 0 to 10-year treatment is not necessary and is needed after 15 years.

```plt.figure(figsize=(12,8))
labels = labelDict['label_Gender']
j = sns.countplot(x="treatment", data=train_df)
j.set_xticklabels(labels)
plt.title('Total Distribution by treated or not')```

Text(0.5, 1.0, ‘Total Distribution by treated or not’)

Inference: Here we can see that more males are treated as compared to females in the dataset.

```o = labelDict['label_age_range']
j = sns.factorplot(x="age_range", y="treatment", hue="Gender", data=train_df, kind="bar",  ci=None, size=5, aspect=2, legend_out = True)
j.set_xticklabels(o)
plt.title('Probability of mental health condition')
plt.ylabel('Probability x 100')
plt.xlabel('Age')
new_labels = labelDict['label_Gender']
for t, l in zip(j._legend.texts, new_labels): t.set_text(l)
plt.show()```

Inference: This barplot shows the mental health of females, males, and transgender according to different age groups. We can analyze that from the age group of 66 to 100, mental health is very high in females as compared to another gender. And from age 21 to 64, mental health is very high in transgender as compared to males.

```o = labelDict['label_family_history']
j = sns.factorplot(x="family_history", y="treatment", hue="Gender", data=train_df, kind="bar", ci=None, size=5, aspect=2, legend_out = True)
j.set_xticklabels(o)
plt.title('Probability of mental health condition')
plt.ylabel('Probability x 100')
plt.xlabel('Family History')
new_labels = labelDict['label_Gender']
for t, l in zip(g._legend.texts, new_labels): t.set_text(l)
plt.show()```
```o = labelDict['label_care_options']
j = sns.factorplot(x="care_options", y="treatment", hue="Gender", data=train_df, kind="bar", ci=None, size=5, aspect=2, legend_out = True)
j.set_xticklabels(o)
plt.title('Probability of mental health condition')
plt.ylabel('Probability x 100')
plt.xlabel('Care options')
new_labels = labelDict['label_Gender']
for t, l in zip(g._legend.texts, new_labels): t.set_text(l)
plt.show()```

Inference: In the dataset, for those who are having a family history of mental health problems, the Probability of mental health will be high. So here we can see that probability of mental health conditions for transgender is almost 90% as they have a family history of medical health conditions.

Inference: This barplot shows health status with respect to care options. In the dataset, for those who are not having care options, the Probability of mental health situation will be high. So here we can see that the mental health of transgender is very high who have not care options and low for those who are having care options.

```o = labelDict['label_benefits']
j = sns.factorplot(x="care_options", y="treatment", hue="Gender", data=train_df, kind="bar", ci=None, size=5, aspect=2, legend_out = True)
j.set_xticklabels(o)
plt.title('Probability of mental health condition')
plt.ylabel('Probability x 100')
plt.xlabel('Benefits')
new_labels = labelDict['label_Gender']
for t, l in zip(j._legend.texts, new_labels): t.set_text(l)
plt.show()```

Inference: This barplot shows the probability of health conditions with respect to benefits. In the dataset, for those who are not having any benefits, the Probability of mental health conditions will be high. So here we can see that probability of mental health conditions for transgender is very high who have not getting any benefits. and probability is low for those who are having benefits options.

```o = labelDict['label_work_interfere']
j = sns.factorplot(x="work_interfere", y="treatment", hue="Gender", data=train_df, kind="bar", ci=None, size=5, aspect=2, legend_out = True)
j.set_xticklabels(o)
plt.title('Probability of mental health condition')
plt.ylabel('Probability x 100')
plt.xlabel('Work interfere')
new_labels = labelDict['label_Gender']
for t, l in zip(g._legend.texts, new_labels): t.set_text(l)
plt.show()```

Inference: This barplot shows the probability of health conditions with respect to work interference. For those who are not having any work interference, the Probability of mental health conditions will be very less. and probability is high for those who are having work interference rarely.

## Scaling and Fitting

```# Scaling Age
scaler = MinMaxScaler()
train_df['Age'] = scaler.fit_transform(train_df[['Age']])
```# define X and y
feature_cols1 = ['Age', 'Gender', 'family_history', 'benefits', 'care_options', 'anonymity', 'leave', 'work_interfere']
X = train_df[feature_cols1]
y = train_df.treatment
X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y, test_size=0.30, Random_state1=0)
# Create dictionaries for final graph
# Use: methodDict['Stacking'] = accuracy_score
methodDict =
rmseDict = ()
forest = ExtraTreesClassifier(n_estimators=250,
Random_state1=0)
forest.fit(X, y)
importances = forest.feature_importances_
std = np.std([tree1.feature_importances_ for tree in forest.estimators_],
axis=0)
indices = np.argsort(importances)[::-1]
labels = []
for f in Range(x.shape[1]):
labels.append(feature_cols1[f])
plt.figure(figsize=(12,8))
plt.title("Feature importances")
plt.bar(range(X.shape[1]), importances[indices],
color="r", yerr=std[indices], align="center")
plt.Xticks(range(X.shape[1]), labels, rotation='vertical')```
```plt.xlim([-1, X.shape[1]])
plt.show()```

## Tuning

```def evalClassModel(model, y_test1, y_pred_class, plot=False):
#Classification accuracy: percentage of correct predictions
# calculate accuracy
print('Accuracy:', metrics.accuracy_score(y_test1, y_pred_class))
print('Null accuracy:n', y_test1.value_counts())
# calculate the percentage of ones
print('Percentage of ones:', y_test1.mean())
# calculate the percentage of zeros
print('Percentage of zeros:',1 - y_test1.mean())
print('True:', y_test1.values[0:25])
print('Pred:', y_pred_class[0:25])
#Confusion matrix
confusion = metrics.confusion_matrix(y_test1, y_pred_class)
#[row, column]
TP = confusion[1, 1]
TN = confusion[0, 0]
FP = confusion[0, 1]
FN = confusion[1, 0]
# visualize Confusion Matrix
sns.heatmap(confusion,annot=True,fmt="d")
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
accuracy = metrics.accuracy_score(y_test1, y_pred_class)
print('Classification Accuracy:', accuracy)
print('Classification Error:', 1 - metrics.accuracy_score(y_test1, y_pred_class))
fp_rate = FP / float(TN + FP)
print('False Positive Rate:', fp_rate)
print('Precision:', metrics.precision_score(y_test1, y_pred_class))
print('AUC Score:', metrics.roc_auc_score(y_test1, y_pred_class))
# calculate cross-validated AUC
print('Crossvalidated AUC values:', cross_val_score1(model, X, y, cv=10, scoring='roc_auc').mean())
print('First 10 predicted responses:n', model.predict(X_test1)[0:10])
print('First 10 predicted probabilities of class members:n', model.predict_proba(X_test1)[0:10])
model.predict_proba(X_test1)[0:10, 1]
y_pred_prob = model.predict_proba(X_test1)[:, 1]
if plot == True:
# histogram of predicted probabilities
plt.rcParams['font.size'] = 12
plt.hist(y_pred_prob, bins=8)

plt.xlim(0,1)
plt.title('Histogram of predicted probabilities')
plt.xlabel('Predicted probability of treatment')
plt.ylabel('Frequency')
y_pred_prob = y_pred_prob.reshape(-1,1)
y_pred_class = binarize(y_pred_prob, 0.3)[0]
print('First 10 predicted probabilities:n', y_pred_prob[0:10])
roc_auc = metrics.roc_auc_score(y_test1, y_pred_prob)
fpr, tpr, thresholds = metrics.roc_curve(y_test1, y_pred_prob)
if plot == True:
plt.figure()
plt.plot(fpr, tpr, color="darkorange", label="ROC curve (area = %0.2f)" % roc_auc)
plt.plot([0, 1], [0, 1], color="navy", linestyle="--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.rcParams['font.size'] = 12
plt.title('ROC curve for treatment classifier')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')
plt.legend(loc="lower right")
plt.show()
def evaluate_threshold(threshold):

print('Specificity for ' + str(threshold) + ' :', 1 - fpr[thresholds > threshold][-1])
predict_mine = np.where(y_pred_prob > 0.50, 1, 0)
confusion = metrics.confusion_matrix(y_test1, predict_mine)
print(confusion)
return accuracy```

Tuning with cross-validation score

```def tuningCV(knn):
k_Range = list(Range(1, 31))
k_scores = []
for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
scores = cross_val_score1(knn, X, y, cv=10, scoring='accuracy')
k_scores.append(scores.mean())
print(k_scores)
plt.plot(k_Range, k_scores)
plt.xlabel('Value of K for KNN')
plt.ylabel('Cross-Validated Accuracy')
plt.show()```

Tuning with GridSearchCV

```def tuningGridSerach(knn):

k_Range = list(range(1, 31))
print(k_Range)

param_grid = dict(n_neighbors=k_range)
print(param_grid)

grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy')

grid.fit(X, y)
grid.grid_scores1_

print(grid.grid_scores_[0].parameters)
print(grid.grid_scores_[0].cv_validation_scores)
print(grid.grid_scores_[0].mean_validation_score)
grid_mean_scores1 = [result.mean_validation_score for result in grid.grid_scores_]
print(grid_mean_scores1)
# plot the results
plt.plot(k_Range, grid_mean_scores1)
plt.xlabel('Value of K for KNN')
plt.ylabel('Cross-Validated Accuracy')
plt.show()
# examine the best model
print('GridSearch best score', grid.best_score_)
print('GridSearch best params', grid.best_params_)
print('GridSearch best estimator', grid.best_estimator_)```

Tuning with RandomizedSearchCV

```def tuningRandomizedSearchCV(model, param_dist):

rand1 = RandomizedSearchCV(model, param_dist, cv=10, scoring='accuracy', n_iter=10, random_state1=5)
rand1.fit(X, y)
rand1.cv_results_

print('Rand1. Best Score: ', rand.best_score_)
print('Rand1. Best Params: ', rand.best_params_)

best_scores = []
for _ in Range(20):
rand1 = RandomizedSearchCV(model, param_dist, cv=10, scoring='accuracy', n_iter=10)
rand1.fit(X, y)
best_scores.append(round(rand.best_score_, 3))
print(best_scores)```

Tuning by searching multiple parameters

```def tuningMultParam(knn):

k_Range = list(Range(1, 31))
weight_options = ['uniform', 'distance']

param_grid = dict(N_neighbors=k_range, weights=weight_options)
print(param_grid)

grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy')
grid.fit(X, y)

print(grid.grid_scores_)

print('Multiparam. Best Score: ', grid.best_score_)
print('Multiparam. Best Params: ', grid.best_params_)```

## Evaluating Models

Logistic Regression

```def logisticRegression():
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred_class = logreg.predict(X_test1)
accuracy_score = evalClassModel(logreg, y_test1, y_pred_class, True)
#Data for final graph
methodDict['Log. Regression'] = accuracy_score * 100```
`logisticRegression()`

Accuracy: 0.7962962962962963
Accuracy:
0 191
1 187
Name: treatment, dtype: int64
Percentage of ones: 0.4947089947089947
Percentage of zeros: 0.5052910052910053
True value: [0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 0]
Predicted value: [1 0 0 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 0]

Classification Accuracy: 0.7962962962962963
Classification Error: 0.20370370370370372
False Positive Rate: 0.25654450261780104
Precision: 0.7644230769230769
AUC Score: 0.7968614385306716
Cross-validated AUC: 0.8753623882722146
First 10 predicted probabilities of class members:
[[0.09193053 0.90806947]
[0.95991564 0.04008436]
[0.96547467 0.03452533]
[0.78757121 0.21242879]
[0.38959922 0.61040078]
[0.05264207 0.94735793]
[0.75035574 0.24964426]
[0.19065116 0.80934884]
[0.61612081 0.38387919]
[0.47699963 0.52300037]]First 10 predicted probabilities:
[[0.90806947]
[0.04008436]
[0.03452533]
[0.21242879]
[0.61040078]
[0.94735793]
[0.24964426]
[0.80934884]
[0.38387919]
[0.52300037]]

[[142 49]
[ 28 159]]

KNeighbors Classifier

```def Knn():
# Calculating the best parameters
knn = KNeighborsClassifier(n_neighbors=5)

k_Range = list(Range(1, 31))
weight_options = ['uniform', 'distance']

param_dist = dict(N_neighbors=k_range, weights=weight_options)
tuningRandomizedSearchCV(knn, param_dist)

knn = KNeighborsClassifier(n_neighbors=27, weights="uniform")
knn.fit(X_train1, y_train1)

y_pred_class = knn.predict(X_test1)
accuracy_score = evalClassModel(knn, y_test1, y_pred_class, True)
#Data for final graph
methodDict['K-Neighbors'] = accuracy_score * 100```
`Knn()`

Rand1. Best score: 0.8209714285714286
Rand1. Best Params: ‘weights’: ‘uniform’, ‘n_neighbors’: 27
[0.816, 0.812, 0.821, 0.823, 0.823, 0.818, 0.821, 0.821, 0.815, 0.812, 0.819, 0.811, 0.819, 0.818, 0.82, 0.815, 0.803, 0.821, 0.823, 0.815]

Accuracy: 0.8042328042328042
Accuracy:
0 191
1 187
Name: treatment, dtype: int64
Percentage of ones: 0.4947089947089947
Percentage of zeros: 0.5052910052910053
True val: [0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 0]
Pred val: [1 0 0 0 1 1 0 1 1 1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 0]

Classification Accuracy: 0.8042328042328042
Classification Error: 0.1957671957671958
False Positive Rate: 0.2931937172774869
Precision: 0.751111111111111
AUC Score: 0.8052747991152673
Cross-validated AUC: 0.8782819116296456
First 10 predicted probabilities of class members:
[[0.33333333 0.66666667]
[1. 0. ]
[1. 0. ]
[0.66666667 0.33333333]
[0.37037037 0.62962963]
[0.03703704 0.96296296]
[0.59259259 0.40740741]
[0.37037037 0.62962963]
[0.33333333 0.66666667]
[0.33333333 0.66666667]]First 10 predicted probabilities:
[[0.66666667]
[0. ]
[0. ]
[0.33333333]
[0.62962963]
[0.96296296]
[0.40740741]
[0.62962963]
[0.66666667]
[0.66666667]]

[[135 56] [ 18 169]]

Decision Tree

```def treeClassifier():
# Calculating the best parameters
tree1 = DecisionTreeClassifier()
featuresSize = feature_cols1.__len__()
param_dist = "max_depth": [3, None],
"max_features": randint(1, featuresSize),
"min_samples_split": randint(2, 9),
"min_samples_leaf": randint(1, 9),
"criterion": ["gini", "entropy"]
tuningRandomizedSearchCV(tree1, param_dist)
tree1 = DecisionTreeClassifier(max_depth=3, min_samples_split=8, max_features=6, criterion='entropy', min_samples_leaf=7)
tree.fit(X_train1, y_train1)
y_pred_class = tree1.predict(X_test1)
accuracy_score = evalClassModel(tree1, y_test1, y_pred_class, True)
#Data for final graph
methodDict['Decision Tree Classifier'] = accuracy_score * 100```
`treeClassifier()`

Rand1. Best Score: 0.8305206349206349
Rand1. Best Params: ‘criterion’: ‘entropy’, ‘max_depth’: 3, ‘max_features’: 6, ‘min_samples_leaf’: 7, ‘min_samples_split’: 8
[0.83, 0.827, 0.831, 0.829, 0.831, 0.83, 0.783, 0.831, 0.821, 0.831, 0.831, 0.831, 0.8, 0.79, 0.831, 0.831, 0.831, 0.829, 0.831, 0.831]

Accuracy: 0.8068783068783069
Accuracy:
0 191
1 187
Name: treatment, dtype: int64
Percentage of ones: 0.4947089947089947
Percentage of zeros: 0.5052910052910053
True val: [0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 0]
Pred val: [1 0 0 0 1 1 0 1 1 1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 0]

Classification Accuracy: 0.8068783068783069
Classification Error: 0.19312169312169314
False Positive Rate: 0.3193717277486911
Precision: 0.7415254237288136
AUC Score: 0.8082285746283282
Cross-validated AUC: 0.8818789291403538
First 10 predicted probabilities of class members:
[[0.18 0.82 ]
[0.96534653 0.03465347]
[0.96534653 0.03465347]
[0.89473684 0.10526316]
[0.36097561 0.63902439]
[0.18 0.82 ]
[0.89473684 0.10526316]
[0.11320755 0.88679245]
[0.36097561 0.63902439]
[0.36097561 0.63902439]]First 10 predicted probabilities:
[[0.82 ]
[0.03465347]
[0.03465347]
[0.10526316]
[0.63902439]
[0.82 ]
[0.10526316]
[0.88679245]
[0.63902439]
[0.63902439]]

[[130 61] [ 12 175]]

Random Forests

```def randomForest():
# Calculating the best parameters
forest1 = RandomForestClassifier(n_estimators = 20)
featuresSize = feature_cols1.__len__()
param_dist = "max_depth": [3, None],
"max_features": randint(1, featuresSize),
"min_samples_split": randint(2, 9),
"min_samples_leaf": randint(1, 9),
"criterion": ["gini", "entropy"]
tuningRandomizedSearchCV(forest1, param_dist)
forest1 = RandomForestClassifier(max_depth = None, min_samples_leaf=8, min_samples_split=2, n_estimators = 20, random_state = 1)
my_forest = forest.fit(X_train1, y_train1)
y_pred_class = my_forest.predict(X_test1)
accuracy_score = evalClassModel(my_forest, y_test1, y_pred_class, True)
#Data for final graph
methodDict['Random Forest'] = accuracy_score * 100```
`randomForest()`

Rand. Best Score: 0.8305206349206349
Rand. Best Params: ‘criterion’: ‘entropy’, ‘max_depth’: 3, ‘max_features’: 6, ‘min_samples_leaf’: 7, ‘min_samples_split’: 8
[0.831, 0.831, 0.831, 0.831, 0.831, 0.831, 0.831, 0.832, 0.831, 0.831, 0.831, 0.831, 0.837, 0.834, 0.831, 0.832, 0.831, 0.831, 0.831, 0.831]

Accuracy: 0.8121693121693122
Accuracy:
0 191
1 187
Name: treatment, dtype: int64
Percentage of ones: 0.4947089947089947
Percentage of zeros: 0.5052910052910053
True val: [0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 0]
Pred val: [1 0 0 0 1 1 0 1 1 1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 0]

Classification Accuracy: 0.8121693121693122
Classification Error: 0.1878306878306878
False Positive Rate: 0.3036649214659686
Precision: 0.75
AUC Score: 0.8134081809782457
Cross-validated AUC: 0.8934280651104528
First 10 predicted probabilities of class members:
[[0.2555794 0.7444206 ]
[0.95069083 0.04930917]
[0.93851009 0.06148991]
[0.87096597 0.12903403]
[0.40653554 0.59346446]
[0.17282958 0.82717042]
[0.89450448 0.10549552]
[0.4065912 0.5934088 ]
[0.20540631 0.79459369]
[0.19337644 0.80662356]]First 10 predicted probabilities:
[[0.7444206 ]
[0.04930917]
[0.06148991]
[0.12903403]
[0.59346446]
[0.82717042]
[0.10549552]
[0.5934088 ]
[0.79459369]
[0.80662356]]

Boosting

```def boosting():
# Building and fitting
clf = DecisionTreeClassifier(criterion='entropy', max_depth=1)
boost.fit(X_train1, y_train1)
y_pred_class = boost.predict(X_test1)
accuracy_score = evalClassModel(boost, y_test1, y_pred_class, True)
#Data for final graph
methodDict['Boosting'] = accuracy_score * 100```
`boosting()`

Accuracy: 0.8174603174603174
Accuracy:
0 191
1 187
Name: treatment, dtype: int64
Percentage of ones: 0.4947089947089947
Percentage of zeros: 0.5052910052910053
True val: [0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 0]
Pred val: [1 0 0 0 0 1 0 1 1 1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 0]

Classification Accuracy: 0.8174603174603174
Classification Error: 0.18253968253968256
False Positive Rate: 0.28272251308900526
Precision: 0.7610619469026548
AUC Score: 0.8185317915838397
Cross-validated AUC: 0.8746279095195426
First 10 predicted probabilities of class members:
[[0.49924555 0.50075445]
[0.50285507 0.49714493]
[0.50291786 0.49708214]
[0.50127788 0.49872212]
[0.50013552 0.49986448]
[0.49796157 0.50203843]
[0.50046371 0.49953629]
[0.49939483 0.50060517]
[0.49921757 0.50078243]
[0.49897133 0.50102867]]First 10 predicted probabilities:
[[0.50075445]
[0.49714493]
[0.49708214]
[0.49872212]
[0.49986448]
[0.50203843]
[0.49953629]
[0.50060517]
[0.50078243]
[0.50102867]]

## Predicting with Neural Network

Create input function

```%tensorflow_version 1.x
import tensorflow as tf
import argparse```

TensorFlow 1.x selected.

```batch_size = 100
train_steps = 1000
X_train1, X_test1, y_train1, y_test1 = train_test1_split(X, y, test_size=0.30, random_state=0)
def train_input_fn(features, labels, batch_size):
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
return dataset.shuffle(1000).repeat().batch(batch_size)
def eval_input_fn(features, labels, batch_size):
features=dict(features)
if labels is None:
# No labels, use only features.
inputs = features
else:
inputs = (features, labels)
dataset = tf.data.Dataset.from_tensor_slices(inputs)
dataset = dataset.batch(batch_size)
# Return the dataset.
return dataset```

Define the feature columns

```# Define Tensorflow feature columns
age = tf.feature_column.numeric_column("Age")
gender = tf.feature_column.numeric_column("Gender")
family_history = tf.feature_column.numeric_column("family_history")
benefits = tf.feature_column.numeric_column("benefits")
care_options = tf.feature_column.numeric_column("care_options")
anonymity = tf.feature_column.numeric_column("anonymity")
leave = tf.feature_column.numeric_column("leave")
work_interfere = tf.feature_column.numeric_column("work_interfere")
feature_column = [age, gender, family_history, benefits, care_options, anonymity, leave, work_interfere]```

Instantiate an Estimator

```model = tf.estimator.DNNClassifier(feature_columns=feature_columns,
hidden_units=[10, 10],
learning_rate=0.1,
l1_regularization_strength=0.001
))```
`model.train(input_fn=lambda:train_input_fn(X_train1, y_train1, batch_size), steps=train_steps)`

Evaluate the model

```# Evaluate the model.
eval_result = model.evaluate(
input_fn=lambda:eval_input_fn(X_test1, y_test1, batch_size))
print('nTest set accuracy: accuracy:0.2fn'.format(**eval_result))
#Data for final graph
accuracy = eval_result['accuracy'] * 100
methodDict['Neural Network'] = accuracy```

The test set accuracy: 0.80

Making predictions (inferring) from the trained model

`predictions = list(model.predict(input_fn=lambda:eval_input_fn(X_train1, y_train1, batch_size=batch_size)))`
```# Generate predictions from the model
template = ('nIndex: "", Prediction is "" (:.1f%), expected ""')
# Dictionary for predictions
col1 = []
col2 = []
col3 = []
for idx, input, p in zip(X_train1.index, y_train1, predictions):
v  = p["class_ids"][0]
class_id = p['class_ids'][0]
probability = p['probabilities'][class_id] # Probability
col1.append(idx) # Index
col2.append(v) # Prediction
col3.append(input) # Expecter
#print(template.format(idx, v, 100 * probability, input))
results = pd.DataFrame('index':col1, 'prediction':col2, 'expected':col3)

## Creating Predictions on the Test Set

# Generate predictions with the best methodology

```clf = AdaBoostClassifier()
clf.fit(X, y)
dfTestPredictions = clf.predict(X_test1)
# Write predictions to csv file
results = pd.DataFrame('Index': X_test1.index, 'Treatment': dfTestPredictions)
# Save to file
results.to_csv('results.csv', index=False)

## Submission

```results = pd.DataFrame('Index': X_test1.index, 'Treatment': dfTestPredictions)
results```

The final prediction consists of 0 and 1.0 means the person is not needed any mental health treatment and 1 means the person is needed mental health treatment.

## Conclusion

After using all these employee records, we are able to build various machine learning models. From all the models, ADA–Boost achieved 81.75% accuracy with an AUC of 0.8185 along with that we were able to draw some insights from the data via data analysis and visualization.

Check Also
Close