import pandas as pd
fish = pd.read_csv('<https://bit.ly/fish_csv_data>')
fish_input = fish[['Weight','Length','Diagonal','Height','Width']].to_numpy()
fish_target = fish['Species'].to_numpy()
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
train_input, test_input, train_target, test_target = train_test_split(
fish_input, fish_target, random_state=42)
ss = StandardScaler()
ss.fit(train_input)
train_scaled = ss.transform(train_input)
test_scaled = ss.transform(test_input)
train_test_split
함수를 이용해 train dataset과 test dataset으로 나눔.from sklearn.linear_model import SGDClassifier
sc = SGDClassifier(loss='log', max_iter=10, random_state=42) # loss를 log로
sc.fit(train_scaled, train_target)
print(sc.score(train_scaled, train_target))
print(sc.score(test_scaled, test_target))
Scikit-learn에서 제공하는 SGDClassifier import
객체를 만들 때 2개의 파라미터를 지정하게 됨.
loss
: 손실 함수의 종류 지정max_iter
: 수행할 epoch 횟수Output
sc.partial_fit(train_scaled, train_target)
print(sc.score(train_scaled, train_target))
print(sc.score(test_scaled, test_target))
SGD는 점진적 학습이 가능함.
호출할 때 마다 1에포크씩 이어서 훈련할 수 있도록 partial_fit
을 이용함.
Output