Tensorflow and Keras Framework for Stochastic Gradient Descent

6 min readAug 27, 2020

Stochastic Gradient Descent (SGD) เป็นวิธีการหลักในการ Train Neural Network Model โดยใช้ Gradient หรือ ความชัน เป็นตัวบอกขนาดและทิศทางในการปรับ Parameters ที่จะทำให้ Loss Value เคลื่อนที่ไปยัง จุดต่ำสุดของพื้นผิว (Minima) การทำความเข้าใจแนวคิดของ SGD จึงเป็นสิ่งสำคัญในการที่จะทำให้สามารถปรับจูน Neural Network โดยเฉพาะ Deep Learning Model ให้มีประสิทธิภาพมากยิ่งขึ้น

เราจะได้ทำความเข้าใจพฤติกรรมการเคลื่อนที่ของ Loss Value ในแต่ละรอบของการ Train Model แบบ Linear Regression โดยใช้ Tensorflow และ Keras Framework ซึ่งจะมีวิธีการ 2 แบบ ได้แก่

1) Gradient Descent
2) Stochastic Gradient Descent

Gradient Descent Method

Two Dimensional Parabola Graph

ตัวอย่างกราฟพาราโบลา y=x²-2x+3 ซึ่งมีจุดต่ำสุดที่จุด x เท่ากับ 1 ดังภาพด้านบน จุดมุ่งหมายของ Gradient Descent Method คือการหาค่าของ x (Weight หรือ Bias) ที่ทำให้ y (Loss Value) มีค่าต่ำสุด โดยการปรับค่า x ให้ค่อยๆ เคลื่อนที่ไปตามทางลาด (Descending) ของพื้นผิวนั้นๆ
สมมติว่าเราผูกผ้าปิดตาแล้วถูกนำไปวางไว้ที่จุด x เท่ากับ -4 การที่จะเดินไปยังจุดต่ำสุดได้ เราจะต้องอาศัยสัมผัสของเท้าทั้ง 2 ข้างเพื่อประเมินว่าจะเดินไปทางซ้ายหรือทางขวา โดยในการประเมิน เราจะหาอนุพันธ์ของฟังก์ชัน y เทียบกับ x (หา Gradient)

ดังนั้นที่จุด x เท่ากับ -4 ความชันของกราฟพาราโบลาจะมีค่าเท่ากับ -10

Gradient = 2x-2
         = (2)(-4)-(2)
         = -10

ซึ่งเมื่อความชันเป็นลบ เราจึงรู้ได้ว่าพื้นผิวที่ยืนอยู่นั้น มีการลาดเอียงมาทางขวามือ เราจึงเดินไป 10 ก้าว ยังจุดที่ x เท่ากับ 6

Update x = x-(-10)
         = (-4)-(-10)
         = 6

อย่างไรก็ตามการเดินถึง 10 ก้าว ทำให้เราเคลื่อนที่ไปยังอีกฝั่งของหลุมที่ความชันเป็นบวก แทนที่จะค่อยๆ เดินลงหลุมไปยังจุดต่ำสุด ดังนั้นในการ Train Model จริง จึงต้องมีการ Update ค่า x ด้วยจำนวนก้าวที่ไม่มากนัก โดยการทำให้ Gradient มีขนาดเล็กลง ด้วยการคูณด้วย Learning Rate ที่มีค่าอยู่ระหว่าง 0–1

Learning_Rate = 0.01
Update x = x - Learning_Rate*Gradient
         = (-4)-(0.01)(-10)
         = -3.9

Linear Regression with Tensorflow

ใส่รูป

เราจะเริ่มต้นด้วยการ Implement Neural Network Model แบบ Linear Regression ด้วย Tensorflow Framework เพื่อศึกษาการเคลื่อนที่ของ Loss Value โดยใช้ Gradient Descent Method

โดยเริ่มต้นจากการ import library ที่จำเป็นต้องใช้งาน

import tensorflow as tf

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
import seaborn as seabornInstance 
from sklearn.model_selection import train_test_split
from sklearn import metrics

%matplotlib inline

กำหนด Random Seed และจำนวน Epoch ที่จะ Train

np.random.seed(seed=13)

EPOCH = 500

ก่อนอื่นเราต้องกำลัง dataset ก่อน ซึ่ง dataset ที่เราจะทำกันคือ “Salary.csv”
Load ไฟล์ Salary.csv ซึ่งพบว่ามีทั้งหมด 35 Row โดยเราจะนำข้อมูลใน Column user_friends มาทำเป็น Input Data หรือตัวแปรอิสระ (Predictor) และ user_favourites มาทำเป็นผลเฉลย หรือตัวแปรตาม (Response)

dataset = pd.read_csv('Salary.csv')
dataset.shape

จากนั้นจะรันได้ผลดังนี้

Plot user_friends และ user_favourites เพื่อดูลักษณะของข้อมูล

dataset.plot(x='YearsExperience', y='Salary', style='o')  
plt.title('YearsExperience vs Salary')  
plt.xlabel('YearsExperience')  
plt.ylabel('Salary')
plt.savefig('min_max_temp.jpeg', dpi=300)
plt.show()

ดูการกระจายตัวของ Price

plt.figure(figsize=(15,10))
plt.tight_layout()
seabornInstance.distplot(dataset['YearsExperience'])
plt.savefig('dis_user_favourites.jpeg', dpi=300)

แยก Dataset เป็น Input Data (x) และผลเฉลย (y)

X = dataset['YearsExperience'].values.reshape(-1,1)
y = dataset['Salary'].values.reshape(-1,1)X.shape

สุ่มแบ่งข้อมูลเป็น 2 ชุด สำหรับ Train 80% และ Test 20%

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle= True)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

นิยาม Model ด้วย Tensorflow โดยจะมีการนำ X_train เข้า Model ทั้งก้อนขนาด 95,232 Row

W = tf.Variable(tf.random.uniform([1], -1.0, 1.0))
b = tf.Variable(tf.random.uniform([1], -1.0, 1.0))

y = W * X_train + b

นิยาม Loss Function แบบ Mean Squared Error (MSE)

loss = tf.reduce_mean(tf.square(y - y_train))

กำหนด Optimizer และ Learning Rate

optimizer = tf.train.GradientDescentOptimizer(0.0001)

train = optimizer.minimize(loss)

เคลียร์ Tensorflow Variable

init = tf.global_variables_initializer()

สร้าง session และรัน init เพื่อเคลียร์ค่า Variable จริงๆ

sess = tf.Session()
sess.run(init)

Train Model (sess.run(train))

his=[]
wb = []

for step in range(EPOCH):
    sess.run(train)
    his.append(sess.run(loss))
    print(step, sess.run(W), sess.run(b), sess.run(loss))
    wb.append([sess.run(W)[0], sess.run(b)[0], sess.run(loss)])

ดึง Weight (W) และ Bias (b) มาสร้าง Linear Regression Model

M = sess.run(W)
C = sess.run(b)

นิยาม Function Predict

def predict(X, M, C):
    y = M*X+C
    return y[0]

แปลง Loss Value List เป็น DataFrame

df = pd.DataFrame(his, columns=['loss'])

Plot Loss

import plotly
import plotly.graph_objs as go

plotly.offline.init_notebook_mode(connected=True)

h1 = go.Scatter(y=df['loss'], 
                    mode="lines", line=dict(
                    width=2,
                    color='blue'),
                    name="loss")

data = [h1]

layout1 = go.Layout(title='Loss',
                   xaxis=dict(title='epochs'),
                   yaxis=dict(title=''))
fig1 = go.Figure(data, layout=layout1)
plotly.offline.iplot(fig1)

Predict Salary

y_pred = [predict(i, M, C) for i in X_test]

y_test.shape

y_test = y_test.reshape(-1)
y_test.shape

แสดงผลการ Predict 10 แถวแรก

df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df.head(10)

Plot กราฟเปรียบเทียบผลการทำนายกับค่าจริง

df1 = df.head(25)
df1.plot(kind='bar',figsize=(16,10))
plt.grid(which='major', linestyle='-', linewidth='0.5', color='green')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
plt.show()

แสดง Model ที่สร้างจากการ Train ใน Epoch ที่ 1

M = [i[0] for i in wb]
L = [i[2] for i in wb]
C = [i[1] for i in wb]
y_pred = [predict(i, M[0], C[0]) for i in X_test]

plt.scatter(X_test, y_test,  color='gray')
plt.plot(X_test, y_pred, color='red', linewidth=2)
plt.show()

แสดง Model ที่สร้างจากการ Train ใน Epoch ที่ 5

y_pred = [predict(i, M[4], C[4]) for i in X_test]

plt.scatter(X_test, y_test,  color='gray')
plt.plot(X_test, y_pred, color='red', linewidth=2)
plt.savefig('5_model.jpeg', dpi=300)
plt.show()

แสดง Model ที่สร้างจากการ Train ใน Epoch ที่ 10

y_pred = [predict(i, M[9], C[9]) for i in X_test]

plt.scatter(X_test, y_test,  color='gray')
plt.plot(X_test, y_pred, color='red', linewidth=2)
plt.savefig('10_model.jpeg', dpi=300)
plt.show()

แสดง Model ที่สร้างจากการ Train 500 Epoch

y_pred = [predict(i, M[499], C[499]) for i in X_test]

plt.scatter(X_test, y_test,  color='gray')
plt.plot(X_test, y_pred, color='red', linewidth=2)
plt.show()

ดู Loss Value เทียบกับค่า Weight

plt.scatter(M, L,  color='gray')
plt.savefig('weight.jpeg', dpi=300)
plt.show()

ดู Loss Value เทียบกับค่า Bias

plt.scatter(C, L, color='gray')
plt.savefig('bias.jpeg', dpi=300)
plt.show()

ดู Loss Value เทียบกับค่า Weight และ Bias

import plotly.express as px

df = pd.DataFrame({'W' : M, 'Bias' : C, 'Loss' : L})
fig = px.scatter_3d(df, x='W', y='Bias', z='Loss')
fig.show()

วัดประสิทธิภาพของ Model ด้วย Mean Absolute Error, Mean Squared Error และ Root Mean Squared Error

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))  
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Stochastic Gradient Descent Method

Linear Regression with Keras

ลำดับต่อไปเราจะทดลอง Train Neural Network Model แบบ Linear Regression โดยใช้ Keras Framework ซึ่งในการ Train Model ด้วย Keras นั้นจะใช้เวลาค่อนข้างมาก การจะนำ Dataset เข้า Train Model เป็นก้อนใหญ่ๆ จึงไม่เหมาะสม
ดังนั้นเราจะใช้การสุ่มแบ่ง Dataset เป็นก้อนเล็กๆ ขนาด 64 Row (Batch Size เท่ากับ 64) เพื่อนำไป Train Model ซึ่งเราจะเรียก Gradient Descent แบบที่มีการสุ่มแบ่ง Dataset เป็นก้อนขนาดเล็กว่า Stochastic Gradient Descent
Import Library

from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense

from keras import backend as K

นิยาม Root Mean Squared Error

def rmse(y_true, y_pred):
    return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))

นิยาม Model

model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='random_uniform', activation='linear'))
model.summary()

Compile Model

model.compile(loss='mse', optimizer='adam', metrics=['mae', 'mse', rmse])

Train Model โดยการสุ่มแบ่งข้อมูลสำหรับ Train 80% และ Validate อีก 20% โดยกำหนด Batch Size เท่ากับ 64

history = model.fit(X_train, y_train, epochs=EPOCH, batch_size=64,  verbose=1, validation_split=0.2, shuffle=True)

Plot Loss และ Validate Loss

h2 = go.Scatter(y=history.history['loss'], 
                    mode="lines", line=dict(
                    width=2,
                    color='blue'),
                    name="loss")

h3 = go.Scatter(y=history.history['val_loss'], 
                    mode="lines", line=dict(
                    width=2,
                    color='green'),
                    name="val_loss")
                    
data = [h2, h3]

layout1 = go.Layout(title='Loss',
                   xaxis=dict(title='epochs'),
                   yaxis=dict(title=''))
fig1 = go.Figure(data, layout=layout1)
plotly.offline.iplot(fig1)

Predict

y_pred = model.predict(X_test)

แปลงเป็น DataFrame

y_pred = y_pred.flatten()

df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df.head(10)

Plot กราฟเปรียบเทียบผลการทำนายกับค่าจริง

df1 = df.head(25)
df1.plot(kind='bar',figsize=(16,10))
plt.grid(which='major', linestyle='-', linewidth='0.5', color='green')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
plt.show()

แสดง Model ที่สร้างจากการ Train 500 Epoch

plt.scatter(X_test, y_test,  color='gray')
plt.plot(X_test, y_pred, color='red', linewidth=2)
plt.savefig('keras_500_model.jpeg', dpi=300)
plt.show()

วัดประสิทธิภาพของ Model ด้วย Mean Absolute Error, Mean Squared Error และ Root Mean Squared Error

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))  
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))