1. Hypothesis function 복습

Hypothesis function은 인공신경망의 구조를 나타낸다.
H(x) = Wx + b (w = weight, b = bias)
주어진 인풋 x에 대해 어떤 아웃풋 y를 예측할지 알려준다.

W = torch.zeros(1, requires_grad = True) # 초기화
b = torch.zeros(1, requires_grad = True)
hypothesis = x_train * W + b

2. 사용할 모의 data 확인

인풋 값과 동일한 아웃풋을 출력하는 데이터를 이용해 본다.
W = 1일 때 가장 정확한 모델이다.

x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[1], [2], [3]])

3. Cost function 이해

(1) Costfunction 이란?

Cost function : 모델의 예측 값이 실제 데이터와 얼마나 다른지 나타내는 값이다.
잘 학습된 모델일 수록 낮은 Cost(에러)를 갖는다.
모의 data 예제에서는 W = 1 일 때 cost가 0이고, 1에서 멀어질 수록 cost가 높아 진다.

(2) MSE

Linear regression에서는 MSE(Mean squared error)를 cost function으로 이용한다.
실제값과 예측값을 제곱한 평균을 구한다.

cost = torch.mean((hypothsis - y_train) ** 2)

4. Gradient descent 이론

(1) Gradient descent 란?

Cost function은 최소화 되어야 한다. 높은 cost를 갖고 있다면, 깎아 내려가며 cost(에러)를 낮춰야 한다!
기울기가 음수 일 때는 w를 키우고, 기울기가 양수 일때는 w를 낮추자.
기울기가 가파를 수록 cost가 크다. w를 크게 바꾸자. 기울기가 평평할 수록 cost가 낮다. w를 작게 바꾸자.
Gradient : 기울기

(2) 수식

W : = W -a▽W1 (a : learning rate, W1 : gradient)

5. Gradient descent 구현

gradient = 2 * torch.mean((W * x_train - y_train) * x_train) # 전체의 평균 gradient를 구한다.
lr = 0.1
W -= lr * gradient # W 업데이트!

6. Gradient descent 구현 full code

# 데이터
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[1], [2], [3]])
# 모델 초기화
W = torch.zeros(1)
# learning rate 설정
lr = 0.1

nb_epochs = 10 # 학습 횟수
for epoch in range(nb_epochs + 1):
    
    # H(x) 계산
    hypothesis = x_train * W
    
    # cost gradient 계산
    cost = torch.mean((hypothesis - y_train) ** 2)
    gradient = torch.sum((W * x_train - y_train) * x_train)

    print('Epoch {:4d}/{} W: {:.3f}, Cost: {:.6f}'.format(
        epoch, nb_epochs, W.item(), cost.item()
    )) # W는 점차 1에 수렴하고, cost는 줄어든다.

    # cost gradient로 H(x) 개선
    W -= lr * gradient

7. Gradient descent 구현 (nn.optim)

torch.optim 으로도 gradient descent 할 수 있다.

# 데이터
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[1], [2], [3]])
# 모델 초기화
W = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W], lr=0.15) #학습 가능 변수 W와 lr 지정

nb_epochs = 10
for epoch in range(nb_epochs + 1):
    
    # H(x) 계산
    hypothesis = x_train * W
    
    # cost 계산
    cost = torch.mean((hypothesis - y_train) ** 2)

    print('Epoch {:4d}/{} W: {:.3f} Cost: {:.6f}'.format(
        epoch, nb_epochs, W.item(), cost.item()
    ))

    # cost로 H(x) 개선
    optimizer.zero_grad() # optimizer에 저장되어 있는 모든 학습 가능한 변수의 gradient를 0으로 초기화
    cost.backward() # cost function의 미분에서 각 gradient 들의 변수들을 채운다.
    optimizer.step() # 저장된 gradient descent 값으로 gradient descent를 실행한다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3. Deeper Look at GD.md

3. Deeper Look at GD.md

1. Hypothesis function 복습

2. 사용할 모의 data 확인

3. Cost function 이해

(1) Costfunction 이란?

(2) MSE

4. Gradient descent 이론

(1) Gradient descent 란?

(2) 수식

5. Gradient descent 구현

6. Gradient descent 구현 full code

7. Gradient descent 구현 (nn.optim)

Files

3. Deeper Look at GD.md

Latest commit

History

3. Deeper Look at GD.md

File metadata and controls

1. Hypothesis function 복습

2. 사용할 모의 data 확인

3. Cost function 이해

(1) Costfunction 이란?

(2) MSE

4. Gradient descent 이론

(1) Gradient descent 란?

(2) 수식

5. Gradient descent 구현

6. Gradient descent 구현 full code

7. Gradient descent 구현 (nn.optim)