1. Maximum Likelihood Estimation (MLE)

최대 우도 추정

2. Overfitting

주어진 데이터에 대해 과도하게 학습된 것.
주어진 데이터를 잘 설명하는 것은 좋으나, 쓸 때 없이 과도하게 설명하는 것은 좋지 않음!
Overfitting 피하는 법 : 훈련 셋 / Development set(Validation Set) / 테스트 셋을 나눠서 이용한다. 훈련 셋이 과적합 되었는지 확인하기 위해 테스트 셋을 이용하고, 테스트 셋 또한 과적합 될 수 있으므로 이를 방지/확인 하기 위해 Dev Set(Validation set)을 이용한다.
Epoch에 따른 Train set의 Loss와 Validation Set의 Loss를 확인 해 보면, 이런 경우가 있다. Train Set의 Loss는 훈련에 따라 계속 낮아지는데, Validation set의 Loss는 일정 값에서 다시 Loss가 증가하게 되는 경우가 있다. 이 때 그 증가하는 지점부터 Overfitting이 되고 있는 것일 알 수 있다. 우리는 Validation Set의 Loss가 가장 작은 지점의 Epoch을 선택하면 된다!

3. Preventing Overfitting

Data를 많이 모으기 : 데이터가 적을 수록 편향될 가능성이 높다.
feature를 적게 사용하기 : 사람의 얼굴을 설명할 때 특정 데이터를 통해 얼굴에 주근깨가 있고, 이빨이 뾰족하고, 눈썹이 짙고 ... 와 같은 feature들을 다 취하면 그 데이터에 맞게 오버피팅 가능성 높아짐.
Regularization

4. Regularization

Early Stopping : Validation Loss가 더이상 낮아지지 않는 지점을 취한다.
Neural Network의 Size 줄이기
Weight Decay : NN의 Weight Parameter 제한
Dropout : 딥러닝 이용시 사용방법 1 (중요)
Batch Normalization : 딥러닝 이용시 사용방법 2 (중요)

5. DNN 과정 훑어보기

(1) NN 구조 설계 : 입력 데이터, 피쳐 수 등 설계
(2) 훈련 시작 : 오버 피팅이 되기 전까지 깊고 넓게 사이즈를 늘려간다. 오버피팅이 되면 Regularization 등을 이용한다.
(3) 2번 반복!

6. 이론 실습

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.maunal_seed(1)

# 트레이닝 셋 (x_train : m * 3, y_train : m)
x_train = torch.FloatTensor([[1, 2, 1],
                             [1, 3, 2],
                             [1, 3, 4],
                             [1, 5, 5],
                             [1, 7, 5],
                             [1, 2, 5],
                             [1, 6, 6],
                             [1, 7, 7],
                            ])
y_train = torch.LongTensor([2, 2, 2, 1, 1, 1, 0, 0])

# 테스트 셋 (x_test : m * 3, y_test : m)
x_test = torch.FloatTensor([[2, 1, 1], [3, 1, 2], [3, 3, 4]])
y_test = torch.LongTensor([2, 2, 2])

# Model
class SoftmaxClassifierModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(3, 3) # 3 차원을 받아 3개로 classification 함
  def forward(self, x):
    return self.linear(x) # linear를 통과시키는 모델임
  
model = SoftmaxClassifierModel()

optimizer = optim.SGD(model.parameters(), lr=0.1)

def train(model, optimizer, x_train, y_train):
  nb_epochs = 20
  for epoch in range(nb_epochs):
    prediction = model(x_train)
    
    cost = F.cross_entropy(prediction, y_train)
    
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    
    print('Epoch {:4d}/{} Cost: {:.6f}'.format(epoch, nb_epochs, cost.item()))

def test(model, optimizer, x_text, y_test):
  prediction = model(x_test) # test에 대한 예측값
  predicted_calsses = prediction.max(1)[1]
  correct_count = (predicted_classed == y_test).sum().item()
  cost = F.cross_entropy(prediction, y_test) # test에 대한 정답값과 예측값을 cross_entropy 실행 
  
  print('Accuracy: {}% Cost: {:.6f}'.format(correct_count / len(y_test) * 100, cost.item()))
  
  
# 실행
train(model, optimizer, x_train, y_train) # Loss : 0.98
test(model, optimizer, x_test, y_test) # Loss : 1.42

# 결과 : train set에서는 Loss가 Epoch에 따라 계속 내려가지만, test set에서는 Loss가 그만큼 내려가지 못하고 조금 올라와 있는 것을 알 수 있다. 오버피팅 발생 한 것!

7. Learning Rate : lr이 너무 큰 경우

Gradient descent는 lr 값에 따라 업데이트 된다.
lr이 너무 큰 경우 : lr 값이 너무 크면 기울기를 깎아 내려가지 못하고 반대편으로 팅겨 점차 더 높은 기울기로 점차 발산해버릴 수 있다.

model = SoftmaxClassifierModel()
optimizer = optim.SGD(model.parameters(), lr=le5) # lr이 너무 큰 경우
train(model, optimizer, x_train, y_train) # cost = 1198379 (발산해버림)

8. Learning Rate : lr이 너무 작은 경우

lr이 너무 작은 경우 : 기울기를 원하는 방향으로 깎아내려가기는 하지만, 너무 작게 깎아 내려가서 loss 변화가 적음.

model = SoftmaxClassifierModel()
optimizer = optim.SGD(model.parameters(), lr=le-10) # lr이 너무 작은 경우
train(model, optimizer, x_train, y_train) # cost = 3.187324 (Epoch에 따른 cost 변화가 크게 없음)

9. Learning Rate

lr을 잘 설정해야 한다. 데이터와 모델에 따라서 잘 설정해야 하기 때문에 정해진 값은 없고, 여러번 테스트 해서 최적의 값을 찾아야 한다.
lr을 처음에 0.1로 시도해 보고, 추이에 따라 cost가 발산하면 작게, cost가 안줄어들면 크게 조절한다.

10. Data Preprocessing (데이터 전처리)

모델이 학습을 잘 하게 데이터를 전처리 하는 것이 중요하다.
ex) 데이터가 [[1000, 0.1], [999, 0.2] ...]와 같다면 데이터의 두 컬럼 모두 중요한데, NN은 1000과 999만 중요하게 생각할 것이다. 따라서 이 컬럼들을 학습 하는데 온 힘을 쏟게 될 것이다. 하지만 이것을 우리가 미리 전처리 해 놓는다면, 두 컬럼 다 똑같은 범위의 값으로 만들 수 있기 때문에 NN이 두 컬럼 다 동등하게 학습할 수 있다.
아래는 데이터 전처리 중 standardization(정규분포)의 방법이다.
우리의 데이터를 mu와 sigma를 통해 정규분포화 한다.

x_train = torch.FloatTensor([73, 80, 75],
                            [93, 88, 93],
                            [89, 91, 90],
                            [96, 98, 100],
                            [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

mu = x_train.mean(dim=0) # mu
sigma = x_train.std(dim=0) # sigma
norm_x_train = (x_train - mu) / sigma # mu와 sigma를 통해 normalization 한다.
print(norm_x_train) # [-1.2, 0.5, 0.8] ...

class MultivariateLinearRegressionModel(nn.Module): # nn.Module 상속받아서 이용
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(3, 1) # 독립변수로 3차원을 입력받아 종속변수로 1개의 class를 분류 
    
  def forward(self, x):
    return self.linear(x)

model = MultivariateLinearRegressionModel() # 모델 선언
optimizer = optim.SGD(model.parameters(), lr = le-1)

def train(model, optimizer, x_train, y_train):
  nb_epochs = 20
  for epoch in range(nb_epochs):
  prediction = model(x_train)
  cost = F.mse_loss(prediction, y_train)
  
  optimizer.zero_grad()
  cost.backward()
  optimizer.step()
  
  print('Epoch {:4d}/{} Cost: {:.6f}'.format(epoch, nb_epochs, cost.item()))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7-1. Tips.md

7-1. Tips.md

1. Maximum Likelihood Estimation (MLE)

2. Overfitting

3. Preventing Overfitting

4. Regularization

5. DNN 과정 훑어보기

6. 이론 실습

7. Learning Rate : lr이 너무 큰 경우

8. Learning Rate : lr이 너무 작은 경우

9. Learning Rate

10. Data Preprocessing (데이터 전처리)

Files

7-1. Tips.md

Latest commit

History

7-1. Tips.md

File metadata and controls

1. Maximum Likelihood Estimation (MLE)

2. Overfitting

3. Preventing Overfitting

4. Regularization

5. DNN 과정 훑어보기

6. 이론 실습

7. Learning Rate : lr이 너무 큰 경우

8. Learning Rate : lr이 너무 작은 경우

9. Learning Rate

10. Data Preprocessing (데이터 전처리)