This is only a very short post that contains some tips you need when scaling your data and (maybe) some problems you'll meet during this process.
There are many state-of-the-art libraries can handle this problem easily for you. I'll introduce the one I am mostly familiar with, scikit-learn
in Python.
This is the most top 5 rows in our sample dataset, where open
, high
, 'low', 'volume' and 'amount' are our features and close
is the target we want to be able to predict after the model is trained.
But wait, before we start throwing our data into the model training process, what did you forget?
You need to standardize features by removing the mean and scaling to unit variance.
def standard_scaler(X_train, X_test):
train_samples, train_nx, train_ny = X_train.shape
test_samples, test_nx, test_ny = X_test.shape
X_train = X_train.reshape((train_samples, train_nx * train_ny))
X_test = X_test.reshape((test_samples, test_nx * test_ny))
preprocessor = prep.StandardScaler().fit(X_train)
X_train = preprocessor.transform(X_train)
X_test = preprocessor.transform(X_test)
X_train = X_train.reshape((train_samples, train_nx, train_ny))
X_test = X_test.reshape((test_samples, test_nx, test_ny))
return X_train, X_test
TODO...
TODO...
TODO...
网友评论