COCO bounding box format, scale factor

Knowhow/Vision

COCO bounding box format, scale factor

침닦는수건 2024. 4. 25. 10:20

COCO bounding box format은 document를 보면 (x,y,w,h)을 따른다고 적혀있다. 순서대로 사각형의 left top (x, y) 값과 사각형의 (width, height)라는 뜻이다.

그런데 실제로 보면 값이 다음과 같이 0~1 값으로 normalize 되어있는 것을 볼 수 있다.

45 0.479492 0.688771 0.955609 0.5955
45 0.736516 0.247188 0.498875 0.476417
50 0.637063 0.732938 0.494125 0.510583
45 0.339438 0.418896 0.678875 0.7815
49 0.646836 0.132552 0.118047 0.0969375
49 0.773148 0.129802 0.0907344 0.0972292
49 0.668297 0.226906 0.131281 0.146896
49 0.642859 0.0792187 0.148063 0.148062

어떻게 normalize한건지 몰라서 조금 헤맸던 경험 때문에 정리둔다.

Denormalization

당연히 첫 값은 class id다

다음 값부터 순서대로 (x, y, w, h)는 맞으며 이미지 해상도와 사각형 높이/너비 2개를 기준으로 normalize되어 있다.

w_denorm = w * img_w
h_denorm = h * img_h

x_denorm = x * img_w - w_denorm/2
y_denorm = y * img_h - h_denorm/2

(width, height)는 이미지 해상도 너비, 높이만 곱해주면 되고,

left top (x, y)는 이미지 해상도 너비, 높이를 곱해준 뒤, "Denormalized width, height의 절반 을 빼주면 된다."

간단히 확인해볼 수 있는 코드는 다음과 같다.

import os
import cv2
import numpy as np

def draw_bbox(img, xywh=None, id=None):
    canvas = img.copy()
    x, y, w, h = xywh
    lt = (int(x), int(y))
    rt = (int(x+w), int(y))
    lb = (int(x), int(y+h))
    rb = (int(x+w), int(y+h))

    cv2.line(canvas, lt, rt, (200, 0, 0), 2)
    cv2.line(canvas, lt, lb, (200, 0, 0), 2)
    cv2.line(canvas, rt, rb, (200, 0, 0), 2)
    cv2.line(canvas, lb, rb, (200, 0, 0), 2)

    if id is not None:
        font = cv2.FONT_HERSHEY_PLAIN
        canvas = cv2.putText(canvas, str(id), lt, font, 1, (0, 255, 0), 2, cv2.LINE_AA)
    return canvas

if __name__ == "__main__":
    root = ".../datasets/coco128"
    img_dir = os.path.join(root, "images", "train2017")
    gt_dir = os.path.join(root, "labels", "train2017")

    img_names = sorted(os.listdir(img_dir))

    img_paths = [os.path.join(img_dir, img_name) for img_name in img_names]
    gt_paths = [os.path.join(gt_dir, img_name.split('.')[0]+".txt") for img_name in img_names]

    for img_path, gt_path in zip(img_paths, gt_paths):

        img = cv2.imread(img_path)
        img_h, img_w = np.shape(img)[:2]
        gts = []
        with open(gt_path, 'r') as f:
            lines = f.readlines()
            for line in lines:
                line_split = line.split(' ')
                id = int(line_split[0])
                w = float(line_split[3]) * img_w
                h = float(line_split[4]) * img_h
                x = float(line_split[1]) * img_w - w/2
                y = float(line_split[2]) * img_h - h/2

                gt = np.array([id, x,y,w,h]).reshape(1,5)
                gts.append(gt)
            gts = np.concatenate(gts, axis=0)

        for gt in gts:
            img = draw_bbox(img, gt[1:], gt[0])

        cv2.imshow("vis", img)
        cv2.waitKey(0)

저작자표시 비영리 변경금지

'Knowhow > Vision' 카테고리의 다른 글

Trimesh to Open3d TriangleMesh (0)	2024.07.12
Opencv imread/imwrite vs PIL open/save speed 및 memory 비교 (0)	2024.06.22
Double sphere 모델 projection-failed region (0)	2024.02.28
Fisheye 카메라 모델도 solvePnP 이용해서 자세 초기화/추정하는 방법 (0)	2024.02.28
COLMAP[python] pycolmap 보다 편하게 colmap 사용하기 (0)	2023.12.07

현재글COCO bounding box format, scale factor

CODERNER

COCO bounding box format, scale factor

Denormalization

'Knowhow > Vision' 카테고리의 다른 글

'Knowhow/Vision'의 다른글

티스토리툴바

COCO bounding box format, scale factor

Denormalization

'Knowhow > Vision' 카테고리의 다른 글

'Knowhow/Vision'의 다른글

관련글

티스토리툴바