Issue
I'm trying to execute the following code in Azure ML Studio notebook:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.cross_validation import KFold, cross_val_score
for C in np.linspace(0.01, 0.2, 30):
cv = KFold(n=X_train.shape[0], n_folds=7, shuffle=True, random_state=12345)
clf = LogisticRegression(C=C, random_state=12345)
print C, sum(cross_val_score(clf, X_train_scaled, y_train, scoring='roc_auc', cv=cv, n_jobs=2)) / 7.0
and I'm getting this error:
Failed to save <type 'numpy.ndarray'> to .npy file:
Traceback (most recent call last):
File "/home/nbcommon/env/lib/python2.7/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 271, in save
obj, filename = self._write_array(obj, filename)
File "/home/nbcommon/env/lib/python2.7/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 231, in _write_array
self.np.save(filename, array)
File "/home/nbcommon/env/lib/python2.7/site-packages/numpy/lib/npyio.py", line 491, in save
pickle_kwargs=pickle_kwargs)
File "/home/nbcommon/env/lib/python2.7/site-packages/numpy/lib/format.py", line 585, in write_array
array.tofile(fp)
IOError: 19834920 requested and 8384502 written
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-29-9740e9942629> in <module>()
6 cv = KFold(n=X_train.shape[0], n_folds=7, shuffle=True, random_state=12345)
7 clf = LogisticRegression(C=C, random_state=12345)
----> 8 print C, sum(cross_val_score(clf, X_train_scaled, y_train, scoring='roc_auc', cv=cv, n_jobs=2)) / 7.0
/home/nbcommon/env/lib/python2.7/site-packages/sklearn/cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
1431 train, test, verbose, None,
1432 fit_params)
-> 1433 for train, test in cv)
1434 return np.array(scores)[:, 0]
1435
/home/nbcommon/env/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
808 # consumption.
809 self._iterating = False
--> 810 self.retrieve()
811 # Make sure that we get a last message telling us we are done
812 elapsed_time = time.time() - self._start_time
/home/nbcommon/env/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in retrieve(self)
725 job = self._jobs.pop(0)
726 try:
--> 727 self._output.extend(job.get())
728 except tuple(self.exceptions) as exception:
729 # Stop dispatching any new job in the async callback thread
/home/nbcommon/env/lib/python2.7/multiprocessing/pool.pyc in get(self, timeout)
565 return self._value
566 else:
--> 567 raise self._value
568
569 def _set(self, i, obj):
IOError: [Errno 28] No space left on device
With n_jobs=1
it works fine.
I think this is because joblib
library tries to save my data to /dev/shm
. The problem is that it has only 64M capacity:
Filesystem Size Used Avail Use% Mounted on
none 786G 111G 636G 15% /
tmpfs 56G 0 56G 0% /dev
shm 64M 0 64M 0% /dev/shm
tmpfs 56G 0 56G 0% /sys/fs/cgroup
/dev/mapper/crypt 786G 111G 636G 15% /etc/hosts
I can't change this folder by setting JOBLIB_TEMP_FOLDER
environment variable (export
doesn't work).
In [35]: X_train_scaled.nbytes
Out[35]: 158679360
Thanks for any advice!
Solution
The /dev/shm
is a virtual filesystem for passing data between programs that implementation of traditional shared memory on Linux.
So you could not increase it via set up some options on Application Layout.
But for example, you can remount /dev/shm
with 8G size in Linux Shell with administrator permission like root
as follows.
mount -o remount,size=8G /dev/shm
However, it seems that Azure ML studio not support remote access via SSH protocol, so the feasible plan is upgrade the standard tier if using free tier at present.
Answered By - Peter Pan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.