Segmentation Model(HRNet) 소개

Date: October 27, 2021

HRNet(High Resolution Network)

최근 많은 연구들이 이뤄져, HRNet 기반으로 많은 모델들이 나오고 있음
Semantic Segmentation 1,2등 모델이 HRNet 기반으로 만들어짐

History

Image Classification Task의 경우 단순하게 image의 해상도를 줄여가며 특징을 뽑아내는 작업으로 이뤄졌으나, Segmentation Task의 경우 위치정보가 중요하기 때문에 해상도를 다시 올리는 방식으로 진행되어 왔다
- 해상도를 줄이는 이유
  1. 이미지 내의 모든 특징이 필요하지 않음
  2. 효율적인 연산 및 넓은 receptive field를 갖음
  3. 중요한 특징만 추출하여 과적합 방지

Segmentation 특징
- 이미지의 공간적인 정보가 중요하고 픽셀 주변의 context 파악이 중요해짐
- 모든 픽셀의 자세한 정보가 필요함(pooling을 할 경우 픽셀 정보를 잃기 때문에 불리해짐)
segmentation model들
- FCN, DeconvNet, SegNet, U-Net..
- Receptive field를 높이기 위해 해상도를 극단적으로 낮춘 후 다시 upsampling을 통해 해상도를 복구하여 사용하게됐는데, 해상도를 낮추는 과정에서 정보손실이 일어나 복구해도 해상도가 낮게됨

기존 Segmentation model의 단점 보완(DeepLabV3+, DilatedNet)
- 해상도는 높게 유지, 넓은 Receptive Field를 갖을 수 있도록
- Dilated Convolution 사용
- 하지만 아직 해상도를 줄여서 예측함

이전 segmentation Model들을 보완하기 위해서 중/저 해상도 → 고해상도 복원이 아닌 고해상도를 유지하기 위한 방법으로 HRNet이 고안됨

구성

고해상도의 image를 끝까지 가져가며 중간에 중/저 해상도의 image 정보를 받아서 처리하는 형태
고해상도(HR)을 유지하며 연산이진행됨(실제 이미지의 1/4크기)
1/4로 줄이는데 왜 HR로 불릴까? 지금까지 Deeplabv3+(1/16), U-net(1/20)의 경우 input image의 해상도를 낮춰서 연산해왔었지만, HRNet의 경우 1/4 정도로만 줄여서 사용했기 때문에 상대적으로 고해상도의 visual recognition이 가능함

고해상도 ↔ 저해상도 summation을 통해 위치 민감도와 픽셀에 대한 Semantic 정보를 적절하게 섞어줌

Task에 따른 Representation Head 종류

HRNetV1 : 최종 출력으로 저해상도 정보는 빼고 고해상도 특징만을 사용하여 PoseEstimation Task에 활용

HRNetV2 : 저해상도 특징들을 Bilinear Upsampling을 통해 고해상도 크기로 변환하여 출력으로 사용하여 Semantic Segmentation task에 활용

HRNetV2P : HRNetV2의 결과에서 Downsampling 진행 후 출력하여 Object Detection에 활용

구현

Stage1

block1 : ("input(ch.64)→conv(1x1,ch.64)→BN→relu→conv(1x1,ch.256)→BN→relu→conv(1x1)→BN" + "input(ch.64)→conv(1x1,ch.256)→BN")→relu 
block2 : ("input(ch.256)→conv(1x1,ch.256)→BN→relu→conv(1x1,ch.256)→BN→relu→conv(1x1)→BN" + "input(ch.256)")→relu
block3 : ("input(ch.256)→conv(1x1,ch.256)→BN→relu→conv(1x1,ch.256)→BN→relu→conv(1x1)→BN" + "input(ch.256)")→relu    
block4 : ("input(ch.256)→conv(1x1,ch.256)→BN→relu→conv(1x1,ch.256)→BN→relu→conv(1x1)→BN" + "input(ch.256)")→relu
high→high block : "input(ch.256)→conv(3x3,ch.48)→BN→relu"
high→low block : "input(ch.256)→conv(3x3,ch.96,stride=2)→BN→relu" 이미지 크기 1/2

Stage2

block1_lv1 x4: ("input(ch.48)→conv(3x3,ch.48)→BN→relu→conv(3x3,ch.48)→BN" + "input(ch.48)")→relu
block1_lv2 x4: ("input(ch.96)→conv(3x3,ch.96)→BN→relu→conv(3x3,ch.96)→BN" + "input(ch.96)")→relu 

high→high block : "input(ch)→conv(3x3,ch)→BN→relu"
high→low block : "input(ch*2^n)→conv(3x3,ch*2^n)→BN→relu"이미지 크기 1/2배
low→high block : "input(ch/2^n)→conv(3x3,ch/2^n)→BN→relu" 이미지 크기 2배

Stage3 x4

block1_lv1 x4: ("input(ch.48)→conv(3x3,ch.48)→BN→relu→conv(3x3,ch.48)→BN" + "input(ch.48)")→relu
block1_lv2 x4: ("input(ch.96)→conv(3x3,ch.96)→BN→relu→conv(3x3,ch.96)→BN" + "input(ch.96)")→relu 
block1_lv3 x4: ("input(ch.192)→conv(3x3,ch.192)→BN→relu→conv(3x3,ch.192)→BN" + "input(ch.192)")→relu 
high→high block : "input(ch)→conv(3x3,ch)→BN→relu"
high→low block : "input(ch*2^n)→conv(3x3,ch*2^n)→BN→relu"이미지 크기 1/2^n배
low→high block : "input(ch/2^n)→conv(3x3,ch/2^n)→BN→relu" 이미지 크기 2^n배

Stage4 x3

block1_lv1 x4: ("input(ch.48)→conv(3x3,ch.48)→BN→relu→conv(3x3,ch.48)→BN" + "input(ch.48)")→relu 
block1_lv2 x4: ("input(ch.96)→conv(3x3,ch.96)→BN→relu→conv(3x3,ch.96)→BN" + "input(ch.96)")→relu 
block1_lv3 x4: ("input(ch.192)→conv(3x3,ch.192)→BN→relu→conv(3x3,ch.192)→BN" + "input(ch.192)")→relu 
block1_lv4 x4: ("input(ch.384)→conv(3x3,ch.384)→BN→relu→conv(3x3,ch.384)→BN" + "input(ch.384)")→relu 
high→high block : "input(ch)→conv(3x3,ch)→BN→relu"
high→low block : "input(ch*2^n)→conv(3x3,ch*2^n)→BN→relu"이미지 크기 1/2^n배
low→high block : "input(ch/2^n)→conv(3x3,ch/2^n)→BN→relu" 이미지 크기 2^n배

📌reference

boostcourse AI tech

💡 수정 필요한 내용은 댓글이나 메일로 알려주시면 감사하겠습니다!💡 

Hong Yohan

Segmentation Model(HRNet) 소개

Date: October 27, 2021

HRNet(High Resolution Network)

History

구성

Task에 따른 Representation Head 종류

구현

댓글

관련 글