Week 6. Vowpal Wabbit. Tutorial + Programming Assignment

This week we will get acquainted with the popular Vowpal Wabbit library and try it on site visit data.

6 week plan:

  • Part 1. Article on Vowpal Wabbit
  • Part 2. Application of Vowpal Wabbit to Site Visit data
  • 2.1. Data Preparation
  • 2.2. Validation by Deferred Sampling
  • 2.3. Validation by test Sampling (Public Leaderboard)

In this part of the project, videos of the following lectures of the course "Learning from marked data" may be useful to us:

[Presentation] will also be useful(https://github.com/esokolov/ml-course-msu/blob/master/ML15/lecture-notes/Sem08_vw.pdf ) lecturer of specialization Evgeny Sokolov. And, of course, documentation Vowpal Wabbit.

Part 1. Article about Vowpal Wabbit

Let's read the article about Vowpal Wabbit on Habra from the OpenDataScience open course series on machine learning. We can download notebook, attached to the article, view the code, study it and change it. This is the only way to deal with Vowpal Wabbit.

Part 2. Applying Vowpal Wabbit to Site Visit Data

2.1. Data preparation

Next, let's look at Vowpal Wabbit in action. However, in the task of our competition for binary classification of web sessions, we will not notice a difference - both in quality and in speed (although you can check). Therefore, we will demonstrate all the agility of VW in the task of classification into 400 classes. The initial data is still the same, but 400 users have been allocated, and the task of identifying them is being solved. Download the data from here, and here we will fill in the result - files train_sessions_400users.csv and test_sessions_400users.csv.

import os
import pandas as pd
import numpy as np
import scipy.sparse as sps
from scipy.sparse import csr_matrix
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn import preprocessing
from sklearn.metrics import accuracy_score
PATH_TO_DATA = '/content/drive/MyDrive/DATA/Stepik/Kaggle'

Let's upload the training and test samples. It can be noticed that the test sessions here are clearly separated in time from the sessions in the training sample.

train_df_400 = pd.read_csv(os.path.join(PATH_TO_DATA,'train_sessions_400users.csv'), 
                           index_col='session_id')
test_df_400 = pd.read_csv(os.path.join(PATH_TO_DATA,'test_sessions_400users.csv'), 
                           index_col='session_id')
test_df_400.shape
(46473, 20)
train_df_400.head()
site1 time1 site2 time2 site3 time3 site4 time4 site5 time5 site6 time6 site7 time7 site8 time8 site9 time9 site10 time10 user_id
session_id
1 23713 2014-03-24 15:22:40 23720.0 2014-03-24 15:22:48 23713.0 2014-03-24 15:22:48 23713.0 2014-03-24 15:22:54 23720.0 2014-03-24 15:22:54 23713.0 2014-03-24 15:22:55 23713.0 2014-03-24 15:23:01 23713.0 2014-03-24 15:23:03 23713.0 2014-03-24 15:23:04 23713.0 2014-03-24 15:23:05 653
2 8726 2014-04-17 14:25:58 8725.0 2014-04-17 14:25:59 665.0 2014-04-17 14:25:59 8727.0 2014-04-17 14:25:59 45.0 2014-04-17 14:25:59 8725.0 2014-04-17 14:26:01 45.0 2014-04-17 14:26:01 5320.0 2014-04-17 14:26:18 5320.0 2014-04-17 14:26:47 5320.0 2014-04-17 14:26:48 198
3 303 2014-03-21 10:12:24 19.0 2014-03-21 10:12:36 303.0 2014-03-21 10:12:54 303.0 2014-03-21 10:13:01 303.0 2014-03-21 10:13:24 303.0 2014-03-21 10:13:36 303.0 2014-03-21 10:13:54 309.0 2014-03-21 10:14:01 303.0 2014-03-21 10:14:06 303.0 2014-03-21 10:14:24 34
4 1359 2013-12-13 09:52:28 925.0 2013-12-13 09:54:34 1240.0 2013-12-13 09:54:34 1360.0 2013-12-13 09:54:34 1344.0 2013-12-13 09:54:34 1359.0 2013-12-13 09:54:34 1346.0 2013-12-13 09:54:34 1345.0 2013-12-13 09:54:34 1344.0 2013-12-13 09:58:19 1345.0 2013-12-13 09:58:19 601
5 11 2013-11-26 12:35:29 85.0 2013-11-26 12:35:31 52.0 2013-11-26 12:35:31 85.0 2013-11-26 12:35:32 11.0 2013-11-26 12:35:32 52.0 2013-11-26 12:35:32 11.0 2013-11-26 12:37:03 85.0 2013-11-26 12:37:03 10.0 2013-11-26 12:37:03 85.0 2013-11-26 12:37:04 273

We see that there are 182793 sessions in the training sample, 46473 in the test sample, and the sessions really belong to 400 different users.

train_df_400.shape, test_df_400.shape, train_df_400['user_id'].nunique()
((182793, 21), (46473, 20), 400)

Vowpal Wabbit likes class labels to be distributed from 1 to K, where K is the number of classes in the classification problem (in our case, 400). Therefore, we will have to use LabelEncoder, and then add +1 (Label Encoder translates labels into the range from 0 to K-1). Then it will be necessary to apply the reverse transformation.

y = train_df_400.user_id
class_encoder = preprocessing.LabelEncoder()
y_for_vw = class_encoder.fit_transform(y)+1

Next, we will compare VW with SGDClassifier and with logistic regression. All these models need input data preprocessing. Let's prepare sparse matrices for sklearn models, as we did in part 5:

  • combine training and test samples
  • we will select only sites (signs from 'site1' to 'site10')
  • replace the omissions with zeros (our sites were numbered from 0)
  • we will translate into a sparse csr_matrix format
  • let's break back into the training and test parts
train_test_df = pd.concat([train_df_400, test_df_400])
sites = ['site' + str(i) for i in range(1, 11)]
train_test_df_sites = train_test_df[sites]
train_test_df_sites.isnull().sum().sum()
train_test_df_sites = train_test_df_sites.fillna(0)
idx_split = train_df_400.shape[0]

train_test_sparse = csr_matrix((np.ones(train_test_df_sites.values.size, dtype=np.uint8),
                          train_test_df_sites.values.reshape(-1),
                          np.arange(train_test_df_sites.values.shape[0] + 1) * train_test_df_sites.values.shape[1]))[:, 1:]
X_train_sparse = train_test_sparse[:idx_split, :]
X_test_sparse = train_test_sparse[idx_split:, :]
y = train_df_400['user_id'].values

2.2. Validation by deferred sampling

Let's select the training (70%) and deferred (30%) parts of the original training sample. We do not mix the data, we take into account that the sessions are sorted by time.

train_share = int(.7 * train_df_400.shape[0])
train_df_part = train_df_400[sites].iloc[:train_share, :]
valid_df = train_df_400[sites].iloc[train_share:, :]
X_train_part_sparse = X_train_sparse[:train_share, :]
X_valid_sparse = X_train_sparse[train_share:, :]
y_train_part = y[:train_share]
y_valid = y[train_share:]
y_train_part_for_vw = y_for_vw[:train_share]
y_valid_for_vw = y_for_vw[train_share:]

We implement a function, arrays_to_vw, which translates the training sample into the Vowpal Wabbit format.

Entrance:

  • X - matrix `NumPy' (training sample)
  • y (optional) - response vector (NumPy). Optional, since we will process the test matrix with the same function
  • train - flag, True in the case of a training sample, False in the case of a test sample
  • out_file – the path to the file .vw to which the recording will be made

Details:

  • it is necessary to go through all the rows of the matrix X and write down all the values separated by a space, first adding the necessary class label from the vector y and the separator sign |
  • in the test sample, in place of the labels of the target class, you can write arbitrary, for example, 1
def arrays_to_vw(X, y=None, train=True, out_file='tmp.vw'):
    X = np.nan_to_num(X)
    X = X.astype(int)
    
    with open(out_file, 'w') as f:
        print(X.shape)
        for i in range(X.shape[0]):
            string =  ' '.join([str(x) for x in X[i]])
            if y is None:
                f.write(str(1) + " | " + string + "\n")
            else:
                f.write(str(y[i]) + " | " + string + "\n")

Let's apply the written function to the part of the training sample (train_df_part, y_train_part_for_vw), to the deferred sample (valid_df, y_valid_for_vw), to the entire training sample and to the entire test sample. It should be noted that our method accepts matrices and vectors NumPy as input

%%time

arrays_to_vw(train_df_part.values, y_train_part_for_vw, True, os.path.join(PATH_TO_DATA,'train_part.vw'))
arrays_to_vw(valid_df.values, y_valid_for_vw, False, os.path.join(PATH_TO_DATA,'valid.vw'))
arrays_to_vw(train_df_400[sites].values, y_for_vw, True, os.path.join(PATH_TO_DATA,'train.vw'))
arrays_to_vw(test_df_400[sites].values, None, False, os.path.join(PATH_TO_DATA,'test.vw'))
(127955, 10)
(54838, 10)
(182793, 10)
(46473, 10)
CPU times: user 4.03 s, sys: 22.6 ms, total: 4.05 s
Wall time: 4.16 s

Let's check the result

!head -3 $PATH_TO_DATA/train_part.vw
262 | 23713 23720 23713 23713 23720 23713 23713 23713 23713 23713
82 | 8726 8725 665 8727 45 8725 45 5320 5320 5320
16 | 303 19 303 303 303 303 303 309 303 303
!head -3  $PATH_TO_DATA/valid.vw
4 | 7 923 923 923 11 924 7 924 838 7
160 | 91 198 11 11 302 91 668 311 310 91
312 | 27085 848 118 118 118 118 11 118 118 118
!head -3 $PATH_TO_DATA/test.vw
1 | 9 304 308 307 91 308 312 300 305 309
1 | 838 504 68 11 838 11 838 886 27 305
1 | 190 192 8 189 191 189 190 2375 192 8

Let's train the Vowpal Wabbit model on a sample of train_part.vw. We indicate that the classification problem with 400 classes (--oaa) is being solved, we will make 3 passes through the sample (--passes). Let's set some cache file (--cache_file, you can just specify the -c flag), so VW will be faster to do all the next passes after the first one (the last cache file is deleted using the -k argument). We also specify the value of the parameter b=26. This is the number of bits used for hashing, in this case you need more than 18 by default. Finally, specify random_seed=17. We are not changing the other parameters yet.

train_part_vw = os.path.join(PATH_TO_DATA, 'train_part.vw')
valid_vw = os.path.join(PATH_TO_DATA, 'valid.vw')
train_vw = os.path.join(PATH_TO_DATA, 'train.vw')
test_vw = os.path.join(PATH_TO_DATA, 'test.vw')
model = os.path.join(PATH_TO_DATA, 'vw_model.vw')
pred = os.path.join(PATH_TO_DATA, 'vw_pred.csv')
%%time
!vw --oaa 400 /content/drive/MyDrive/DATA/Stepik/Kaggle/train_part.vw --passes 3 -c -k -b 26 --random_seed 17 -f /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_model.vw
final_regressor = /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_model.vw
Num weight bits = 26
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
tcmalloc: large alloc 1073741824 bytes == 0x5596c0ee8000 @  0x7f0c99005001 0x7f0c98ba1b5f 0x7f0c98bafa21 0x7f0c98c52e00 0x7f0c98c40be3 0x7f0c98c48395 0x7f0c98c48c44 0x5596be56c237 0x5596be56ba8b 0x7f0c981c0bf7 0x5596be56c05a
creating cache_file = /content/drive/MyDrive/DATA/Stepik/Kaggle/train_part.vw.cache
Reading datafile = /content/drive/MyDrive/DATA/Stepik/Kaggle/train_part.vw
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0      262        1       11
1.000000 1.000000            2            2.0       82      262       11
1.000000 1.000000            4            4.0      241      262       11
1.000000 1.000000            8            8.0      352      262       11
1.000000 1.000000           16           16.0      135       16       11
1.000000 1.000000           32           32.0       71      112       11
0.968750 0.937500           64           64.0      358      231       11
0.976562 0.984375          128          128.0      348      346       11
0.941406 0.906250          256          256.0      202      202       11
0.947266 0.953125          512          512.0       30        1       11
0.925781 0.904297         1024         1024.0       36      290       11
0.908203 0.890625         2048         2048.0       21      128       11
0.880127 0.852051         4096         4096.0       80      229       11
0.856323 0.832520         8192         8192.0      307      356       11
0.828003 0.799683        16384        16384.0       59      193       11
0.795441 0.762878        32768        32768.0      262       30       11
0.760468 0.725494        65536        65536.0      171      238       11
0.724008 0.724008       131072       131072.0        6        6       11 h
0.697339 0.670672       262144       262144.0       12       12       11 h

finished run
number of examples per pass = 115160
passes used = 3
weighted example sum = 345480.000000
weighted label sum = 0.000000
average loss = 0.661352 h
total feature number = 3800280
CPU times: user 249 ms, sys: 48.4 ms, total: 297 ms
Wall time: 34 s

Let's write down the forecasts on the valid sample.vw in vw_valid_phead.csv.

%%time
!vw -i /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_model.vw -t -d /content/drive/MyDrive/DATA/Stepik/Kaggle/valid.vw -p /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_valid_pred.csv
only testing
predictions = /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_valid_pred.csv
Num weight bits = 26
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = /content/drive/MyDrive/DATA/Stepik/Kaggle/valid.vw
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0        4      188       11
1.000000 1.000000            2            2.0      160      220       11
0.750000 0.500000            4            4.0      143      143       11
0.750000 0.750000            8            8.0      247      247       11
0.687500 0.625000           16           16.0      341       30       11
0.593750 0.500000           32           32.0      237      237       11
0.609375 0.625000           64           64.0      178      178       11
0.640625 0.671875          128          128.0      132      228       11
0.656250 0.671875          256          256.0       14       14       11
0.646484 0.636719          512          512.0      370      370       11
0.663086 0.679688         1024         1024.0      189      189       11
0.655762 0.648438         2048         2048.0      311      311       11
0.657227 0.658691         4096         4096.0      195      318       11
0.660156 0.663086         8192         8192.0      171      195       11
0.657654 0.655151        16384        16384.0      362       51       11
0.655121 0.652588        32768        32768.0      248      248       11

finished run
number of examples per pass = 54838
passes used = 1
weighted example sum = 54838.000000
weighted label sum = 0.000000
average loss = 0.654583
total feature number = 603218
CPU times: user 87 ms, sys: 31.6 ms, total: 119 ms
Wall time: 11.3 s

We count the forecasts of kaggle_data/vw_valid_phead.csv from the file and look at the proportion of correct answers on the deferred part.

vw_valid = pd.read_csv( os.path.join(PATH_TO_DATA, 'vw_valid_pred.csv'), header=None)
print('The percentage of correct responses on the deferred sample for Vowpal Wabbit: %f' % accuracy_score(y_valid_for_vw, 
                                                                                             vw_valid))
The percentage of correct responses on the deferred sample for Vowpal Wabbit: 0.345417

Now we will train SGDClassifier (3 sample passes, logistic loss function) and LogisticRegression on 70% of the sparse training sample – (X_train_part_sparse, y_train_part), make a forecast for the delayed sample (X_valid_sparse, y_valid) and calculate the proportion of correct answers. Logistic regression will not be trained quickly – this is normal. We will specify random_state=17, n_jobs=-1 everywhere. For SGDClassifier, we will also specify max_iter=3.

logit = LogisticRegression(random_state=17, n_jobs=-1)
sgd_logit =  SGDClassifier(loss='log', random_state=17, max_iter=3)
%%time
logit.fit(X_train_part_sparse, y_train_part)
CPU times: user 1.86 s, sys: 289 ms, total: 2.14 s
Wall time: 5min 57s
LogisticRegression(n_jobs=-1, random_state=17)
%%time
sgd_logit.fit(X_train_part_sparse, y_train_part)
CPU times: user 24.6 s, sys: 6.5 ms, total: 24.6 s
Wall time: 24.5 s
/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_stochastic_gradient.py:700: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.
  ConvergenceWarning,
SGDClassifier(loss='log', max_iter=3, random_state=17)

Question 1. Calculate the proportion of correct answers on the deferred sample for Vowpal Wabbit, round to 3 decimal places.

Question 2. Calculate the proportion of correct answers on the deferred sample for SGD, round to 3 decimal places.

Question 3. Calculate the proportion of correct answers on the deferred sample for logistic regression, round to 3 decimal places.

vw_valid_acc = accuracy_score(y_valid_for_vw, vw_valid)
sgd_valid_acc = accuracy_score(y_valid, sgd_logit.predict(X_valid_sparse))
logit_valid_acc = accuracy_score(y_valid, logit.predict(X_valid_sparse))
def write_answer_to_file(answer, file_address):
    with open(file_address, 'w') as out_f:
        out_f.write(str(answer))
write_answer_to_file(round(vw_valid_acc, 3), os.path.join(PATH_TO_DATA, 'answer6_1.txt'))
write_answer_to_file(round(sgd_valid_acc, 3), os.path.join(PATH_TO_DATA, 'answer6_2.txt'))
write_answer_to_file(round(logit_valid_acc, 3), os.path.join(PATH_TO_DATA, 'answer6_3.txt'))

2.3. Валидация по тестовой выборке (Public Leaderboard)

Let's train a VW model with the same parameters on the entire training sample - train.vw.

%%time
!vw --oaa 400 /content/drive/MyDrive/DATA/Stepik/Kaggle/train.vw --passes 3 -c -k -b 26 --random_seed 17 -f /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_model.vw
final_regressor = /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_model.vw
Num weight bits = 26
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
tcmalloc: large alloc 1073741824 bytes == 0x56465f178000 @  0x7f5207ec5001 0x7f5207a61b5f 0x7f5207a6fa21 0x7f5207b12e00 0x7f5207b00be3 0x7f5207b08395 0x7f5207b08c44 0x56465e302237 0x56465e301a8b 0x7f5207080bf7 0x56465e30205a
creating cache_file = /content/drive/MyDrive/DATA/Stepik/Kaggle/train.vw.cache
Reading datafile = /content/drive/MyDrive/DATA/Stepik/Kaggle/train.vw
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0      262        1       11
1.000000 1.000000            2            2.0       82      262       11
1.000000 1.000000            4            4.0      241      262       11
1.000000 1.000000            8            8.0      352      262       11
1.000000 1.000000           16           16.0      135       16       11
1.000000 1.000000           32           32.0       71      112       11
0.968750 0.937500           64           64.0      358      231       11
0.976562 0.984375          128          128.0      348      346       11
0.941406 0.906250          256          256.0      202      202       11
0.947266 0.953125          512          512.0       30        1       11
0.925781 0.904297         1024         1024.0       36      290       11
0.908203 0.890625         2048         2048.0       21      128       11
0.880127 0.852051         4096         4096.0       80      229       11
0.856323 0.832520         8192         8192.0      307      356       11
0.828003 0.799683        16384        16384.0       59      193       11
0.795441 0.762878        32768        32768.0      262       30       11
0.760468 0.725494        65536        65536.0      171      238       11
0.725319 0.690170       131072       131072.0      180      159       11
0.692989 0.692989       262144       262144.0       88      221       11 h

finished run
number of examples per pass = 164514
passes used = 3
weighted example sum = 493542.000000
weighted label sum = 0.000000
average loss = 0.642595 h
total feature number = 5428962
CPU times: user 368 ms, sys: 49.4 ms, total: 417 ms
Wall time: 44.5 s

Let's make a forecast for the test sample.

%%time
!vw -t -d /content/drive/MyDrive/DATA/Stepik/Kaggle/test.vw -i /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_model.vw -p /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_test_pred.csv
only testing
predictions = /content/drive/MyDrive/DATA/Stepik/Kaggle/vw_test_pred.csv
Num weight bits = 26
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = /content/drive/MyDrive/DATA/Stepik/Kaggle/test.vw
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0        1       90       11
1.000000 1.000000            2            2.0        1       21       11
1.000000 1.000000            4            4.0        1      265       11
1.000000 1.000000            8            8.0        1      137       11
1.000000 1.000000           16           16.0        1      273       11
1.000000 1.000000           32           32.0        1      384       11
1.000000 1.000000           64           64.0        1      139       11
1.000000 1.000000          128          128.0        1       85       11
1.000000 1.000000          256          256.0        1       25       11
0.994141 0.988281          512          512.0        1      364       11
0.990234 0.986328         1024         1024.0        1      202       11
0.992188 0.994141         2048         2048.0        1      181       11
0.993652 0.995117         4096         4096.0        1       21       11
0.994629 0.995605         8192         8192.0        1      137       11
0.995300 0.995972        16384        16384.0        1      326       11
0.994568 0.993835        32768        32768.0        1       10       11

finished run
number of examples per pass = 46473
passes used = 1
weighted example sum = 46473.000000
weighted label sum = 0.000000
average loss = 0.994642
total feature number = 511203
CPU times: user 109 ms, sys: 24.4 ms, total: 134 ms
Wall time: 10.8 s

Let's write the forecast to a file, apply the reverse conversion of labels (there was a LabelEncoder and then +1 in the label) and send the solution to Kaggle.

def write_to_submission_file(predicted_labels, out_file,
                             target='user_id', index_label="session_id"):
    # turn predictions into data frame and save as csv file
    predicted_df = pd.DataFrame(predicted_labels,
                                index = np.arange(1, predicted_labels.shape[0] + 1),
                                columns=[target])
    predicted_df.to_csv(out_file, index_label=index_label)
vw_pred = pd.read_csv('/content/drive/MyDrive/DATA/Stepik/Kaggle/vw_test_pred.csv', header=None)
vw_subm = class_encoder.inverse_transform(np.ravel(vw_pred) - 1)
write_to_submission_file(vw_subm, os.path.join(PATH_TO_DATA, '/content/drive/MyDrive/DATA/Stepik/Kaggle/vw_pred_kaggle.csv'))

Let's do the same for SGD and logistic regression.

sgd_logit = SGDClassifier(loss='log', random_state=17, max_iter=3, n_jobs=-1)
sgd_logit.fit(X_train_part_sparse, y_train_part)
sgd_logit_test_pred = sgd_logit.predict(X_test_sparse)
/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_stochastic_gradient.py:700: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.
  ConvergenceWarning,
logit = LogisticRegression(random_state=17, n_jobs=-1, solver = 'lbfgs')
logit.fit(X_train_sparse, y)
logit_test_pred = logit.predict(X_test_sparse)
write_to_submission_file(sgd_logit_test_pred, 
                         os.path.join(PATH_TO_DATA, '/content/drive/MyDrive/DATA/Stepik/Kaggle/sgd_pred.csv'))
write_to_submission_file(logit_test_pred, 
                         os.path.join(PATH_TO_DATA, '/content/drive/MyDrive/DATA/Stepik/Kaggle/logit_pred.csv'))

Let's look at the proportion of correct answers on the public part (public leaderboard) of the test sample this competitions.

Question 4. What is the proportion of correct answers on the public part of the test sample (public leaderboard) for Vowpal Wabbit?

Question 5. What is the proportion of correct answers on the public part of the test sample (public leaderboard) for SGD?

Question 6. What is the proportion of correct answers on the public part of the test sample (public leaderboard) for logistic regression?

vw_lb_score, sgd_lb_score, logit_lb_score = 0.18164, 0.16994, 0.19060

write_answer_to_file(round(vw_lb_score, 3), os.path.join(PATH_TO_DATA,'answer6_4.txt'))
write_answer_to_file(round(sgd_lb_score, 3), os.path.join(PATH_TO_DATA,'answer6_5.txt'))
write_answer_to_file(round(logit_lb_score, 3), os.path.join(PATH_TO_DATA,'answer6_6.txt'))

Logistic regression showed the best result among the other two algorithms Vowpal Wabbit and SGD, but more time is spent on its training. SGD showed the worst result, but nevertheless he learns quickly. Vowpal Wabbit showed higher quality than SGD.