开源好项目：在海上救援数据集上微调 Faster R-CNN – 小物体检测：PyTorch

在航拍图像中检测小物体，尤其是对于海上救援等关键应用而言，是一项独特的挑战。及时检测水中的人可能意味着生死之间的差别。我们的研究重点是微调 Faster R-CNN（一种强大的两阶段物体检测器），以满足这一重要需求。

我们研究的核心是SeaDroneSee 数据集，这是一组重要的图像集合，用于训练模型识别遇险海员。我们通过将图像预处理为块来增强模型的学习能力，使其能够专注于更小、更详细的区域，并显著提高检测准确性。此外，我们探索了这种方法与 SAHI 的高级切片技术之间的协同作用，并比较了它们的有效性。

我们的方法强调了数据预处理和高级后处理技术的重要性。通过针对小物体检测的具体挑战定制这些步骤，我们的目标是实现顶级结果并突破航空影像分析的界限。

加入我们，探索这一激动人心的微调 Faster R-CNN应用，以实现拯救生命的目标！

Faster-RCNN 将如何处理这个困难的情况？要查看检测结果，请继续滚动或立即单击此处。

要访问本文中的代码并尝试使用 Pytorch 自行微调Faster R-CNN，只需单击“下载代码”按钮。

为什么要在 2024 年对 Faster R-CNN 进行微调？
理解数据集
补丁创建：作为一种预处理技术
代码演练：微调 Faster R-CNN
数据类准备：微调 Faster R-CNN
微调 Faster R-CNN PyTorch 训练配置
预测：微调 Faster R-CNN
将 SAHI 与微调的 Faster R-CNN 相结合
使用 SAHI 与不使用 SAHI 的 Fast RCNN 检测以及使用补丁作为输入的 Fast RCNN 检测的比较
关键要点
结论
参考：

为什么要在 2024 年对 Faster R-CNN 进行微调？

尽管出现了最先进的、非常准确的或低延迟的物体检测算法，Faster R-CNN 仍然是检测小物体的可靠选择之一。之一，这与我们的应用程序相辅相成。

Faster R-CNN 利用区域提议网络 (RPN)与检测网络共享全图卷积特征，从而提高生成潜在物体边界框的效率和准确性。这种共享机制对于捕获小物体特别有益，因为它允许网络将更多的处理能力用于较小感兴趣区域中的细微特征和区别。因此，这使得 Faster R-CNN 擅长处理感兴趣物体较小且需要精确定位的场景，这在监测和检测广阔海域中的物体等场景中至关重要。

您可以快速阅读以了解有关Faster-RCNN中的区域提议网络 (RPN) 的更多信息工作原理的更多信息。

理解数据集

无人驾驶飞行器(UAV) 部署速度快、成本相对较低，与传统方法相比风险小得多。它们配备各种传感器，可全面概览现场，并可自主覆盖大片区域以搜索物体或人员。

该项目旨在开发一种无人机，以协助人道主义搜救场景。这是柯林斯航空航天公司和图宾根大学合作的成果。使用机载视觉传感器和遥测数据，帮助搜索感兴趣的物体，并向地面站的操作员报告检测到的异常情况。

下面是该创新解决方案所涉及的组件的说明。

想象一下这样的场景：一架无人机在海洋上空翱翔，搜寻幸存者。这就是SeaDronesSee背后的目标，SeaDronesSee 是一个庞大的数据集，旨在训练用于搜索和救援 (SAR)的计算机视觉系统任务。

该数据集就像是嵌入式计算机视觉的训练场。它包含真实的海洋环境视频片段，其中的挑战在于发现水中的人。

对计算机视觉无人机编程感兴趣？查看我们的基本指南！

SeaDronesSee 分为三个部分：

物体检测：这教会系统识别浩瀚海洋中的物体，例如人。
单目标跟踪：一旦发现一个人，系统就会学会跟随他们，即使他们四处走动。
多目标跟踪：实际 SAR 任务中可能会有多名幸存者。此部分训练系统同时跟踪所有幸存者

通过分析这些数据，无人机可以更熟练地协助搜救任务，成为更智能的救生员。

本文重点关注SeaDroneSee 数据集的对象检测 v2 子集，其中包含：

8930列车
1547 瓦尔
3750 测试

此类数据集中的一个关键挑战是实现对对象的标签的准确识别，特别是因为许多类别非常小且难以检测。

需要注意的是，整个数据集中的图像尺寸并不一致。

以下是数据集的图像尺寸（宽，高）：

(5436,3632)
(3840,2160)
(1230,932)
(1231,933)
(3632,5456)
(1920,1080)

类别：
0：’忽略’， 1：’游泳者’，2：’船’，3：’水上摩托艇’，4：’救生设备’，5：“浮标”

“忽略”区域包含由于分辨率低、人群密集或数据集中不需要的对象，因此难以注释。

我们还观察到该数据集是不平衡的，游泳者、浮标和救生设备等小物体实例之间存在明显的类别分布差异。

补丁创建：作为一种预处理技术

在我们的数据集中，每张图像都是4k高分辨率。这些高分辨率图像的规模非常庞大，因此会带来挑战，因为这可能会增加计算资源和内存容量。通过将这些图像分成多个块并保存，我们可以独立处理这些较小的部分，从而减少计算负荷并使模型能够专注于更精细的细节。这在我们检测小物体（例如浩瀚海洋中非常远的游泳者或船只）的用例中尤其有益。

在我们的方法中，我们使用0.2的块重叠率。这种重叠可确保块之间不会丢失关键信息。通过重叠区域，模型可以从同一区域的多个角度进行学习，并学习这些对象的显著特征。此外，块创建可以增加数据增强，从而有效增加训练样本的数量。

现在猜猜怎么着？小物体检测问题已成为典型的物体检测问题。这听起来很直观，对吧？

图像补丁创建反映了卷积神经网络(CNN)的操作，因为这两种技术都涉及处理图像的局部区域以有效地提取和学习特征表示。

下载代码为了轻松学习本教程，请点击下面的按钮下载代码。免费！

下载代码

http://www.gitpp.com/opencv/learnopencv-cn

代码演练：微调 Faster R-CNN

让我们首先使用 bash 命令从Kaggle下载数据集，如下所示：

1234	`# !pip install -qq torch torchvision kaggle#!sudo apt-get install unzip -y` `!sudo apt-get install tree`

1	`!kaggle datasets download` `-d ubiratanfilho/sds-dataset`

下载的数据集结构如下：

12345678 compressed├── annotations│ ├── instances_train.json│ └── instances_val.json├── images│ ├── train│ └── val └── test

和instances_train.json将instances_val.json包含图像 ID以及无人机（摄像机源）相应的无人机元数据，例如纬度、经度、速度等。除了这些元数据之外，还存在其他注释，例如边界框和类别或类别 ID，这些都是我们感兴趣的选择。

1	`"annotations": [{"id": 14579, "image_id": 3388, "bbox": [3619, 1409, 75, 38], "area": 2850, "category_id": 2}, {"id": 14581, "image_id": 3389, "bbox": [3524, 1408, 73, 37], "area": 2701, "category_id": 2},`

安装依赖项

我们将使用 torchvision 库设置我们的模型微调管道并torchmetrics计算pycocotools评估指标。

12345 # !pip install -qq torchvision# !pip install -qq torch!pip install -qq torchmetrics[detection]!pip install -qq pycocotools!pip install -qq tensorboard

为了使我们的训练代码和实用程序适应torchvision 对象检测，我们将简单地克隆官方 torchvision 存储库。

1	`!git clone https://github.com/pytorch/vision.git` `#Training Metric Utilities from Torchvision`

导入库

然后，导入必要的库。

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四三十五三十六 import osimport gcimport jsonimport mathimport randomimport requestsimport zipfileimport numpy as npfrom PIL import Image, ImageDraw, ImageFont, ImageOps, ImageStatimport PILimport torchimport torch.optim as optimfrom torch.utils.data import Dataset, DataLoaderfrom torch.utils.tensorboard import SummaryWriter import matplotlib.pyplot as pltimport matplotlib.patches as patchesfrom matplotlib.patches import Patchimport loggingfrom tqdm import tqdm from torchmetrics.detection.mean_ap import MeanAveragePrecisionfrom dataclasses import dataclassimport torchvisionfrom vision.references.detection import utils import torchvision.transforms as Tfrom torchvision.transforms import v2 as Tv2from torchvision import tv_tensorsfrom torchvision.transforms import functional as Ffrom torchvision.transforms.functional import to_pil_image import torchvision.models.detection as detectionfrom torchvision.models.detection import FasterRCNNfrom torchvision.models.detection.faster_rcnn import FastRCNNPredictorfrom torchvision.models.detection.transform import GeneralizedRCNNTransform

让我们设置可重复性的种子。

123456789101112十三14 def set_seeds(): # fix random seeds SEED_VALUE = 42 random.seed(SEED_VALUE) np.random.seed(SEED_VALUE) torch.manual_seed(SEED_VALUE) if torch.cuda.is_available(): torch.cuda.manual_seed(SEED_VALUE) torch.cuda.manual_seed_all(SEED_VALUE) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = Trueset_seeds()

下载修补数据集

为了节省时间并直接计算，请通过运行以下脚本下载我们创建的补丁。如果您想跳过补丁创建的数据集预处理功能，可以使用此脚本。

12345 if not os.path.exists('SeaDroneSee'): os.mkdir('SeaDroneSee') !wget -O SeaDroneSee/SeaDroneSee.zip "https://www.dropbox.com/scl/fi/0oyv9pki57laqgmq7matd/SeaDroneSee.zip?rlkey=yasyxr0u3450dylv5musks1s0&st=q12t3tc3&dl=1"!wget -O SeaDroneSee/SeaDroneSee_test.zip "https://www.dropbox.com/scl/fi/4qidpahgu9mogam33uxlz/SeaDroneSee_test.zip?rlkey=1gt6mebuppxg4ehzhicwqafav&st=5g01mcdb&dl=1"

123456789101112十三14151617181920212223 def download_file(url, save_name): if not os.path.exists(save_name): # Handling potential redirection in requests with requests.get(url, allow_redirects=True) as r: if r.status_code == 200: with open(save_name, 'wb') as f: f.write(r.content) else: print("Failed to download the file, status code:", r.status_code) def unzip(zip_file=None, target_dir=None): try: with zipfile.ZipFile(zip_file, 'r') as z: z.extractall(target_dir) print("Extracted all to:", target_dir) except zipfile.BadZipFile: print("Invalid file or error during extraction: Bad Zip File") except Exception as e: print("An error occurred:", e) save_path = 'SeaDroneSee/SeaDroneSee.zip'model_ckpt_url = 'https://www.dropbox.com/scl/fi/xmftrum0a8rgjp82j6n65/model_ckpt.zip?rlkey=aywwl28rbcbiejggdps0durfu&st=dda61bld&dl=1'model_save_path = 'SeaDroneSee/Model_ckpt.zip'

12345 download_file(model_ckpt_url, model_save_path)unzip(zip_file=model_save_path, target_dir='SeaDroneSee') # Specify target directory for the model checkpointunzip(zip_file=save_path)test_save_path= 'SeaDroneSee/SeaDroneSee_test.zip'unzip(zip_file = test_save_path, target_dir='SeaDroneSee')

要创建具有您选择的补丁大小、重叠率和要保存的补丁数量的补丁，请继续享受理解以下补丁创建代码部分的过程。

实用程序：用于微调 Faster R-CNN

让我们进行类映射并为每个标签或类别 ID分配唯一的颜色。

123456789101112十三1415161718 classes_to_idx = { 0: 'ignored', 1: 'swimmer', 2: 'boat', 3: 'jetski', 4: 'life_saving_appliances', 5: "buoy"} # Mapping category IDs to colorscategory_colors = { 0: 'black', # ignored 1: 'red', # swimmer 2: 'orange', # boat 3: 'blue', # jetski 4: 'purple', # life saving appliances 5: 'yellow' # buoy}

理解数据集是任何深度学习任务中的关键步骤。因此，我们将花大量时间研究该数据集的一些预处理技术。

为了检查和可视化基本事实注释，让我们定义draw_bounding_boxes效用。
对象检测的一个关键方面是边界框格式，如果处理不当，它会妨碍我们微调 Faster RCNN 管道。由于我们的数据集注释是XYWH格式，我们需要将它们转换为XYXY，这是 PIL 的图像绘制函数的预期格式。

123456789101112十三 def draw_bounding_boxes(image, bboxes): draw = ImageDraw.Draw(image) font_size = int(min(image.size) * 0.02) # Adjust font size based on image size font_path = "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf" font = ImageFont.truetype(font_path, font_size) if os.path.exists(font_path) else ImageFont.load_default() for bbox, category_id in bboxes: x, y, w, h = bbox x1, y1, x2, y2 = x, y, x + w, y + h color = category_colors.get(category_id, 'white') # Default to white if category_id is unknown draw.rectangle([x1, y1, x2, y2], outline=color, width=4) draw.text((x1, y1 - font_size), str(category_id), fill=color, font=font) return image

该实用程序采用或load_annotations的路径来加载它们并返回它们的注释。 instances_train.jsoninstaces_val.json

1234	`def` `load_annotations(annotation_path):` `with` `open(annotation_path,` `'r') as f:` `annotations` `=` `json.load(f)` `return` `annotations`

加载它们之后，我们将使用来自注释的图像 ID 迭代每个边界框。然后使用 matplotlib，我们将绘制训练和验证图像以及它们的基本事实注释，以获得我们选择的任意数量的样本。

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四三十五三十六三十七三十八三十九 def visualize_samples(image_dir, annotation_path, num_samples=5): annotations = load_annotations(annotation_path) images_info = annotations['images'] bboxes_info = annotations['annotations'] images_with_bboxes = {} for bbox in bboxes_info: image_id = bbox['image_id'] if image_id not in images_with_bboxes: images_with_bboxes[image_id] = [] images_with_bboxes[image_id].append((bbox['bbox'], bbox['category_id'])) # Shuffle list of images random.shuffle(images_info) # Visualize samples plt.figure(figsize=(15, num_samples * 5)) sample_count = 0 for image_info in images_info: if sample_count >= num_samples: break image_path = os.path.join(image_dir, image_info['file_name']) if not os.path.exists(image_path): continue # Skip this image if the file does not exist image = Image.open(image_path) image_id = image_info['id'] # print(f"Img ID: {image_id} Image Dimension: {image.size}") if image_id in images_with_bboxes: bboxes = images_with_bboxes[image_id] image = draw_bounding_boxes(image, bboxes) plt.subplot(num_samples, 1, sample_count + 1) plt.imshow(image) plt.axis('off') plt.title(f"Image ID: {image_id}") sample_count += 1 plt.tight_layout() plt.show()

数据清理

让我们检查整个训练和验证图像样本，并绘制它们的真实注释，然后将它们保存到我们的磁盘中，以供手动目视检查。

123456789101112十三1415161718192021222324二十五二十六二十七二十八 # Directoriesimage_dir = 'compressed/images/train'output_dir = 'compressed/train_gt/bbox_ann_images'annotation_path = 'compressed/annotations/instances_train.json'# Create the output directory if it doesn't existif not os.path.exists(output_dir): os.makedirs(output_dir) # Process all images according to annotationsdef process_and_save_annotated_images(image_dir, output_dir, annotations): image_annotations = {} for annot in annotations['annotations']: image_annotations.setdefault(annot['image_id'], []).append(annot) for image_id, annots in image_annotations.items(): image_path = os.path.join(image_dir, f"{image_id}.jpg") if os.path.exists(image_path): image = Image.open(image_path) annotated_image = draw_bounding_boxes(image, annots, category_colors) # Pass all annotations for the image output_image_path = os.path.join(output_dir, f"annotated_{image_id}.jpg") annotated_image.save(output_image_path) print(f"Saved annotated image to {output_image_path}") # Load annotationsannotations = load_annotations(annotation_path) # Annotate and save imagesprocess_and_save_annotated_images(image_dir, output_dir, annotations)

通过仔细观察我们保存的所有图像，我们发现大约73 个移动物体样本的真实边界框注释存在偏移或错误。因此，我们将通过文件名手动删除它们，因为这会给微调 Faster R-CNN 模型带来噪音。

深度学习的一个著名原则是解决“垃圾进垃圾出”（GIGO 原则）。

123456789101112十三1415161718192021222324 # List of file names to removetrain_file_remove_list = [ 3391, 3392, 3393, 3413, 3414, 3415, 3416, 3417, 6952, 6957, 7002, 7000, 6999, 7023, 7046, 7093, 7092, 7091, 7527, 7558, 7611, 7631, 7987, 7988, 8091, 8097, 8098, 8099, 8113, 8114, 8422, 8438, 8441, 8443, 10246, 10260, 10263, 10264, 10265, 10266, 10269, 10271, 10330, 10348, 10369, 10368, 10379, 11785, 11814, 11828, 11862, 11865, 11869, 11877, 11887, 11891, 11908, 11910, 12001, 12003, 12195, 12312, 12327, 12332, 12417, 13035, 15809, 15808, 15913, 15914, 16140, 16270, 16271] val_file_remove_list = [10465] # Directory containing the imagesimage_dir = './compressed/images/train' # Iterate over the file list and attempt to remove each filefor file_id in train_file_remove_list: file_path = os.path.join(image_dir, f"{file_id}.jpg") if os.path.exists(file_path): try: os.remove(file_path) print(f"Removed: {file_path}") except OSError as e: print(f"Error removing {file_path}: {e}") else: print(f"File does not exist: {file_path}")

数据预处理：补丁创建

如前所述，我们将忽略某些图像中蒙版剪切区域内的边界框注释。我们将通过平均蒙版剪切区域内边界框内的像素颜色来实现这一点，从而将这些边界框从注释文件中排除。

123456789101112十三141516 def is_bbox_ignored(image, bbox, threshold=10): """Check if the entire region inside a bounding box is predominantly black. Args: image (PIL.Image): The image to check. bbox (list): The bounding box [x, y, width, height]. threshold (int): The threshold below which a region is considered black. Returns: bool: True if the region is predominantly black, False otherwise. """ x, y, w, h = bbox cropped_image = image.crop((x, y, x + w, y + h)) stat = ImageStat.Stat(cropped_image) avg_color = stat.mean # Average color (R, G, B) # Check if all color channels are below the threshold return all(channel < threshold for channel in avg_color)

为了在数据加载器中的所有图像中保持宽度大于高度的一致纵横比，我们将把高度大于其宽度的图像转换为宽度大于其高度的图像。

如果图像的高度大于其宽度，那么使用 PIL，我们将逆时针旋转图像90 度，并在必要时填充剩余区域，以使用 expand=True 参数保持纵横比。

1234567 def check_and_rotate_image(image): """Rotate the image if its height is greater than its width and return the image and a flag indicating rotation.""" width, height = image.size if height > width: image = image.rotate(90, expand=True) # Rotates 90 counter-clockwise return image, True return image, False

接下来是关键的一步，bbox会根据图像的旋转进行调整。当我们逆时针旋转图像时，新的尺寸会相对调整。

123456789101112十三1415161718192021222324二十五二十六 def adjust_bbox_for_rotation(bbox, image_width, image_height): """Adjust bounding boxes for 90 degree counter clockwise rotation.""" x, y, w, h = bbox new_x = y new_y = image_width - (x + w) new_w = h new_h = w return [new_x, new_y, new_w, new_h] def rotate_image_and_adjust_bbox(image, annotations, original_dims): """Rotate image and adjust bounding boxes accordingly.""" rotated_image = image.rotate(90, expand=True) new_annotations = [] original_width, original_height = original_dims for ann in annotations: x, y, w, h = ann['bbox'] new_x = y new_y = original_width - (x + w) new_w = h new_h = w new_ann = ann.copy() new_ann['bbox'] = [new_x, new_y, new_w, new_h] new_annotations.append(new_ann) return rotated_image, new_annotations

接下来，我们预处理的主要方面是补丁创建逻辑。我们之前直观地理解了这一点；现在，让我们在代码中实现它。

此函数将创建尺寸为图像一半、重叠率为 0.2 的补丁。通过在图像上滑动，我们将获得四个补丁。然后保存这些补丁，并返回它们的坐标位置（左、上、右、下）。这对于调整相对于原始图像中每个补丁的边界框至关重要。

123456789101112十三1415161718192021222324 def create_patches(image, output_dir, image_filename, overlap_ratio=0.2): """Create image patches and handle image rotation if necessary.""" image, was_rotated = check_and_rotate_image(image) width, height = image.size patch_width = int(width / 2) patch_height = int(height / 2) overlap_width = int(patch_width * overlap_ratio) overlap_height = int(patch_height * overlap_ratio) patches = [] for i in range(2): # Three rows for j in range(2): # Three columns left = i * (patch_width - overlap_width) top = j * (patch_height - overlap_height) right = left + patch_width bottom = top + patch_height patch = image.crop((left, top, right, bottom)) patch_filename = f'{os.path.splitext(image_filename)[0]}_{i}_{j}.jpg' patch_path = os.path.join(output_dir, patch_filename) patch.save(patch_path) patches.append((patch_filename, left, top, right, bottom, was_rotated)) return patches

然后根据面片坐标调整这些边界框。我们限制这些值以确保新注释不超过面片尺寸。我们还确保不会保存任何非正边界框值。

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四三十五三十六三十七 def adjust_bbox_for_patch(bbox, patch_coords): """Adjust the bounding box to the coordinates of the patch with enhanced error handling.""" left, top, right, bottom = patch_coords x, y, w, h = bbox x1, y1, x2, y2 = x, y, x + w, y + h logging.debug(f"Original bbox: {bbox}") logging.debug(f"Patch coordinates: {patch_coords}") # Ensure the bounding box intersects with the patch if x2 <= left or x1 >= right or y2 <= top or y1 >= bottom: # logging.warning("Bounding box does not intersect with the patch.") return None # No intersection # Clamp the bounding box to the patch boundaries clamped_x1 = max(x1, left) clamped_y1 = max(y1, top) clamped_x2 = min(x2, right) clamped_y2 = min(y2, bottom) adjusted_width = clamped_x2 - clamped_x1 adjusted_height = clamped_y2 - clamped_y1 # Check for non-positive dimensions if adjusted_width <= 0 or adjusted_height <= 0: logging.warning("Adjusted bounding box has non-positive dimensions.") return None # Check if adjusted bounding box exceeds patch size if adjusted_width > (right - left) or adjusted_height > (bottom - top): logging.warning("Adjusted bounding box exceeds patch dimensions.") return None adjusted_bbox = [clamped_x1 - left, clamped_y1 - top, adjusted_width, adjusted_height] logging.debug(f"Adjusted bbox: {adjusted_bbox}") return adjusted_bbox

以下函数结合了所有注释实用程序并返回每个补丁内的实例注释。

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四三十五三十六三十七 def get_annotations_for_patches(image, annotations, patches, original_image_id): """Adjust annotations for each patch.""" patch_annotations = [] annotation_id = 0 for patch_filename, left, top, right, bottom, was_rotated in patches: patch_coords = (left, top, right, bottom) patch_annots = [] for ann in annotations: if ann['image_id'] != original_image_id: continue bbox = ann['bbox'] if was_rotated: bbox = adjust_bbox_for_rotation(bbox, right - left, bottom - top) # Check if the bbox should be ignored if is_bbox_ignored(image, bbox): continue adjusted_bbox = adjust_bbox_for_patch(bbox, patch_coords) if adjusted_bbox: new_ann = { "id": annotation_id, "image_id": patch_filename, "bbox": adjusted_bbox, "area": (adjusted_bbox[2] * adjusted_bbox[3]), "category_id": ann['category_id'] } patch_annots.append(new_ann) annotation_id += 1 if patch_annots: patch_annotations.extend(patch_annots) return patch_annotations

现在，是时候整合所有这些预处理步骤了。我们将首先读取注释文件，对其进行迭代，旋转图像尺寸，创建补丁，最后调整边界框并将它们保存到训练和验证集的输出目录中。

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四三十五三十六三十七三十八 def process_images_and_annotations(base_dir): annotation_files = ['instances_train.json','instances_val.json'] image_dirs = ['train','val'] all_new_annotations = {"annotations": []} for annotation_file, image_dir in zip(annotation_files, image_dirs): annotation_path = os.path.join(base_dir, 'annotations', annotation_file) with open(annotation_path, 'r') as f: annotations = json.load(f) for image_info in annotations['images']: image_filename = image_info['file_name'] image_path = os.path.join(base_dir, 'images', image_dir, image_filename) if not os.path.exists(image_path): continue original_dims = (image_info['width'], image_info['height']) image = Image.open(image_path) if image_info['height'] > image_info['width']: rotated_image, image_annotations = rotate_image_and_adjust_bbox(image.copy(), annotations['annotations'], original_dims) else: rotated_image = image.copy() image_annotations = annotations['annotations'] output_dir = os.path.join(base_dir, 'output_patches', 'images', image_dir) os.makedirs(output_dir, exist_ok=True) patches = create_patches(rotated_image, output_dir, image_filename) new_annotations = get_annotations_for_patches(rotated_image, image_annotations, patches, image_info['id']) all_new_annotations["annotations"].extend(new_annotations) annotation_dir = os.path.join(base_dir, 'output_patches', 'annotations') os.makedirs(annotation_dir, exist_ok=True) annotations_output_path = os.path.join(base_dir, 'output_patches', 'annotations', f'instances_patches_{image_dir}.json') with open(annotations_output_path, 'w') as f: json.dump(all_new_annotations, f, indent=4)

12	`base_dir` `=` `'./compressed/'process_images_and_annotations(base_dir)`

现在，在instances_patches_train.json和instances_patches_val.json单个图像中的四个补丁的注释如下所示：

123	`"annotations": [` `{"id": 0,"image_id": "3390_0_1.jpg","bbox": 1863, 542,57, 36 ], "area": 2052, "category_id": 2},` `{ "id": 0,"image_id": "3399_0_2.jpg","bbox": [1731,288, 70,35 ], "area": 2450, "category_id": 2 },`

数据类准备：微调 Faster R-CNN

此行定义了一个名为DatasetConfig的数据类，用于存储数据集的配置参数。

1234567 @dataclassclass DatasetConfig: root: str annotations_file: str train_img_size: tuple subset: str = 'train' # Default to 'train' transforms: any = None

这里的CustomAerialDataset类旨在处理航空图像数据集并执行诸如加载图像、处理注释和准备用于微调 Faster R-CNN 模型的数据等任务。

以下是其主要功能的简要概述：

该类采用DatasetConfig包含根目录、图像大小、子集（训练/验证/测试）和任何转换的对象。
它初始化图像和注释的路径并调用方法来加载它们。

123456789101112十三14 class CustomAerialDataset(Dataset): def __init__(self, config: DatasetConfig): self.root = config.root self.transforms = config.transforms self.train_img_size = config.train_img_size self.subset = config.subset self.annotations_file = os.path.join(self.root, 'annotations', f'instances_patches_{self.subset}.json') self.imgs = [] self.img_annotations = {} self._load_images() self._load_annotations() def __len__(self): return len(self.imgs)

该_load_images方法扫描指定的子集目录并将有效的图像文件路径附加到imgs列表中。
每个图像最初都被赋予一个空的注释。

123456789101112十三 class CustomAerialDataset(Dataset):... def _load_images(self): # Load all images from the subset directory images_path = os.path.join(self.root, 'images', self.subset) for image_filename in os.listdir(images_path): image_path = os.path.join(images_path, image_filename) if os.path.isfile(image_path) and image_path.endswith(('.png', '.jpg', '.jpeg')): self.imgs.append(image_path) image_id = os.path.basename(image_path) # Initialize empty annotations for each image if image_id not in self.img_annotations: self.img_annotations[image_id] = {'boxes': [], 'labels': []}

然后该_load_annotations方法读取包含边界框注释的 JSON 文件。
它将每个注释与其对应的图像进行匹配，并存储边界框坐标和类别 ID。

123456789101112十三 class CustomAerialDataset(Dataset):... def _load_annotations(self): with open(self.annotations_file, 'r') as f: data = json.load(f) for annotation in data['annotations']: image_id = annotation["image_id"] bbox = annotation["bbox"] category_id = annotation["category_id"] image_path = os.path.join(self.root, 'images', self.subset, image_id) if image_id in self.img_annotations: self.img_annotations[image_id]['boxes'].append(bbox) self.img_annotations[image_id]['labels'].append(category_id)

这 __getitem__方法通过索引检索图像及其注释。
如果图像中没有实例，它会通过返回零张量和虚拟目标来处理丢失的图像。这样做是因为在枚举一批图像及其对应的目标时，损失计算需要一个形状为( N ,4)的目标。假设我们只是传递没有实例的图像，这将引发错误，例如预期形状为 (N,4) 的张量，但收到的是 torch.size([0])。
然后将图像调整为指定的尺寸，并相应地缩放边界框以减少训练时间和 GPU 小时数。
这里的目标将是盒子张量的字典及其分别具有数据类型float32和的等效标签张量int64。
无论 torch.transforms.function 进行什么转换，都将应用于图像和目标，以确保执行有效的增强，从而提高模型性能。

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四三十五三十六三十七 class CustomAerialDataset(Dataset):... def __getitem__(self, idx): img_path = self.imgs[idx] if not os.path.exists(img_path): # Return a default image (like a zero tensor) and a dummy target default_img = torch.zeros(3, *self.train_img_size) # Assuming 3 color channels default_target = {'boxes': torch.tensor([[0, 0, 0, 0]], dtype=torch.float32), 'labels': torch.tensor([0], dtype=torch.int64)} # Background return default_img, default_target img = Image.open(img_path).convert("RGB") orig_width, orig_height = img.size scale_x = self.train_img_size[0] / orig_width scale_y = self.train_img_size[1] / orig_height img = img.resize(self.train_img_size, Image.BILINEAR) img = F.to_tensor(img) annotations = self.img_annotations[os.path.basename(img_path)] if annotations['boxes']: scaled_boxes = [[max(0, min(bbox[0] * scale_x, self.train_img_size[0])), max(0, min(bbox[1] * scale_y, self.train_img_size[1])), max(0, min((bbox[0] + bbox[2]) * scale_x, self.train_img_size[0])), max(0, min((bbox[1] + bbox[3]) * scale_y, self.train_img_size[1]))] for bbox in annotations['boxes']] labels = annotations['labels'] else: scaled_boxes = [[0, 0, 0, 0]] labels = [0] boxes = torch.tensor(scaled_boxes, dtype=torch.float32) labels = torch.tensor(labels, dtype=torch.int64) target = {'boxes': boxes, 'labels': labels} if self.transforms: img, target = self.transforms(img, target) return img, target

如果指定了转换，则它们会在返回之前应用于图像及其注释。

1234567 def get_transform(train): transforms = [] # if train: # transforms.append(Tv2.RandomHorizontalFlip(0.5)) transforms.append(Tv2.ToDtype(torch.float, scale=True)) transforms.append(Tv2.ToPureTensor()) return Tv2.Compose(transforms)

我们定义的CustomAerialDataset类为准备数据加载器提供了一个强大的框架，确保正确加载和格式化图像和注释以进行模型训练。

然后，初始化训练和验证配置。我们将训练和验证的大小调整为(384,216)图像大小，即(W, H)。

123456789101112十三141516171819 root = "SeaDroneSee/output_patches" # Configuration for training and validation datasetstrain_config = DatasetConfig(root, annotations_file='', # This is now set based on subset in the __init__ train_img_size=(384, 216), subset='train', transforms=get_transform(train=True))val_config = DatasetConfig(root, annotations_file='', train_img_size=(384, 216), subset='val', transforms=get_transform(train=False)) train_dataset = CustomAerialDataset(train_config)val_dataset = CustomAerialDataset(val_config) print(f"Length of Train Dataset: {len(train_dataset)}")print(f"Length of Validation Dataset: {len(val_dataset)}")

创建补丁后，共有35270 张训练图像和6188 张验证图像，它们将成为微调 Faster R-CNN 模型的最终输入图像集。

现在，让我们定义一个自定义方法 collate function 处理没有注释的图像。我们还需要传递这些空实例图像，因为它们可以提高模型的性能并避免误报（此处Background被误认为是对象实例）。

123456789101112 def collate_fn(batch): imgs, targets = zip(*batch) imgs = torch.stack(imgs, dim=0) real_targets = [] for target in targets: # Filter out dummy boxes mask = target['boxes'].sum(dim=1) > 0 real_targets.append({'boxes': target['boxes'][mask], 'labels': target['labels'][mask]}) return imgs, real_targets train_data_loader = DataLoader(train_dataset, batch_size=10, shuffle=True, collate_fn=collate_fn, num_workers=12)val_data_loader = DataLoader(val_dataset, batch_size=10, shuffle=False, collate_fn=collate_fn, num_workers=12)

让我们可视化来自train_data_loader的样本，以检查我们的自定义数据数据集类是否定义正确。

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四三十五三十六三十七三十八三十九4041四十二43 def show_image_with_boxes(img, targets, ax, category_colors): """Plot an image with its bounding boxes on an axis object.""" # Convert tensor image to PIL for display if needed if isinstance(img, torch.Tensor): img = to_pil_image(img) print(img.size) ax.imshow(img) # Check and plot each bounding box with class-specific color if 'boxes' in targets and 'labels' in targets: boxes = targets['boxes'].cpu().numpy() labels = targets['labels'].cpu().numpy() for bbox, label in zip(boxes, labels): w = bbox[2]-bbox[0] h = bbox[3]-bbox[1] color = category_colors.get(label, 'gray') # Use gray for unmapped classes rect = patches.Rectangle((bbox[0], bbox[1]), w, h, linewidth=2, edgecolor=color, facecolor='none') ax.add_patch(rect) ax.text(bbox[0], bbox[1], str(label), color='white', fontsize=12, bbox=dict(facecolor=color, alpha=0.5)) def visualize_samples(data_loader, category_colors, num_samples=20): """Visualize a specified number of samples from a DataLoader in a single column.""" num_rows = num_samples # All samples in a single column num_cols = 1 fig, axs = plt.subplots(nrows=num_rows, ncols=num_cols, figsize=(15, 25 * num_rows // 4)) # Adjust height based on rows samples_visualized = 0 for images, targets in data_loader: for i, ax in enumerate(axs.flat): if samples_visualized >= num_samples: break # Stop after displaying the desired number of samples show_image_with_boxes(images[i], targets[i], ax, category_colors) ax.axis('off') # Turn off axis for cleaner look samples_visualized += 1 # If enough samples visualized, break the loop to avoid extra iterations if samples_visualized >= num_samples: break plt.tight_layout() plt.show()visualize_samples(train_data_loader, category_colors, num_samples=4)

我们可以看到一切都很好，相应的边界框也得到了完美的缩放。现在从数据准备转向模型准备，这是微调 Faster R-CNN 或任何深度学习训练的另一个关键方面。

微调 Faster R-CNN PyTorch 训练配置

我们将进行50 个时期的微调，并将best_map初始化为，-inf以保证第一个计算的评估指标始终超过该值，确保初始模型权重被视为最佳基线。

1234	`num_epochs` `=` `50best_map` `=` `-float('inf')` `# Training loop# print(best_map)DEVICE` `=` `torch.device('cuda')` `if` `torch.cuda.is_available()` `else` `torch.device('cpu')`

我们有四个基于COCO数据集训练的检测主干模型，这些模型来自torchvision 模型，用于微调 Faster R-CNN。我们将利用这些预训练权重在更少的周期内实现非常好的检测精度。

但是，如果您愿意尝试的话，还有其他对象检测架构，例如SSD、RetinaNet等。

为了适应我们的 Google Colab T4 GPU内存，我们将选择一个轻量级的Mobilenet V3 Large主干，它在MSCOCO数据集上具有大约19.4M 个参数、4.49 GFLOPS 和32.8 Box mAP。

由于我们的数据集包含六个类别，我们将修改预训练分类头的最后一层以反映 SeaDroneSee 中的类别数量。我们还将使用动量为0.9的SGD 优化器、初始学习率为 5e-4，并使用 StepLR每total_epochs/2调整一次学习率（即在第 25 个时期，总共 50 个时期）。

123456789101112十三1415161718192021222324 def get_model(num_classes): model = detection.fasterrcnn_mobilenet_v3_large_fpn(weights="DEFAULT") #Get the number of input features for the classifier in_features = model.roi_heads.box_predictor.cls_score.in_features #Replace pretrained head with new one model.roi_heads.box_predictor = FastRCNNPredictor(in_features,num_classes) return model num_classes = 6model = get_model(num_classes)model.to(DEVICE) print(model)# print(model.fc1(x).size())params = [p for p in model.parameters() if p.requires_grad]optimizer = optim.SGD(params,lr=0.0005,momentum=0.9,weight_decay=0.0005)# and a learning rate schedulerlr_scheduler = torch.optim.lr_scheduler.StepLR( optimizer, step_size=num_epochs//2, gamma=0.1)scaler = torch.cuda.amp.GradScaler()

为了节省计算和训练时间，我们将使用 CUDA自动混合精度 (AMP) 。这可以通过torch.cuda.amp.GradScaler()对某些计算使用较低精度（16 位）来实现混合精度，同时对关键部分保持单精度（32 位）以确保准确性。

我们将使用TensorBoard通过和监控所有训练和验证指标以及验证预测。对于数据加载器中的每个批次，图像和目标都会移动到指定的设备（CUDA或 CPU）。将模型设置为训练模式，并计算预测和损失。然后使用优化器和学习率调度程序反向传播损失。对于多 GPU 训练，损失在所有 GPU 上取平均值。我们的训练管道有效地使用了来自 torchvision 实用程序的指标记录器来显示每个时期结束时的指标。torch.utils.tensorboardadd_scalaradd_figure

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四三十五三十六三十七三十八三十九4041四十二43四十四四十五四十六四十七 # Initialize TensorBoard writerwriter = SummaryWriter(log_dir='runs/aerial_detection') def train_one_epoch(model, data_loader, device, optimizer, print_freq, epoch, scaler=None): model.train() metric_logger = utils.MetricLogger(delimiter=" ") metric_logger.add_meter("lr", utils.SmoothedValue(window_size=1, fmt="{value:.6f}")) header = f"Training Epoch {epoch}:" model.to(device) with tqdm(data_loader, desc=header) as tq: lr_scheduler = None for i, (images, targets) in enumerate(tq): images = [img.to(device) for img in images] targets = [{k: v.to(device) for k, v in t.items()} for t in targets] with torch.cuda.amp.autocast(enabled=scaler is not None): loss_dict = model(images, targets) losses = sum(loss for loss in loss_dict.values()) loss_value = losses.item() optimizer.zero_grad() if scaler is not None: scaler.scale(losses).backward() scaler.step(optimizer) scaler.update() else: losses.backward() optimizer.step() if lr_scheduler is not None: lr_scheduler.step() metric_logger.update(loss=losses, **loss_dict) metric_logger.update(lr=optimizer.param_groups[0]["lr"]) # Update tqdm postfix to display loss on the progress bar tq.set_postfix(loss=losses.item(), lr=optimizer.param_groups[0]["lr"]) # Log losses to TensorBoard writer.add_scalar('Loss/train', losses.item(), epoch * len(data_loader) + i) for k, v in loss_dict.items(): writer.add_scalar(f'Loss/train_{k}', v.item(), epoch * len(data_loader) + i) print(f"Average Loss: {metric_logger.meters['loss'].global_avg:.4f}") writer.add_scalar('Loss/avg_train', metric_logger.meters['loss'].global_avg, epoch)

随后，我们将通过将模型设置为评估模式来定义评估函数。使用torch.no_grad，不会发生梯度计算或权重更新。基于其mAP50或mAP50-95（平均精度）评估对象检测模型。为此，torchmetrics 库的MeanAveragePrecision类很有用。我们将预测和基本事实从验证数据加载器传递给它。

为简单起见，平均准确率 (AP) 是准确率-召回率曲线下的面积。平均准确率 (mAP)是所有检测到的类别的 AP 的平均值。

mAP = 1/n * sum(AP)，其中n是类别的数量。

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四三十五三十六三十七三十八三十九4041四十二43四十四四十五四十六四十七四十八49505152 def evaluate(model, data_loader, device, epoch, save_dir): model.eval() metric = MeanAveragePrecision(iou_type="bbox") total_iou = 0 total_detections = 0 header = "Validation:" total_steps = len(data_loader) samples = [] with torch.no_grad(), tqdm(total=total_steps, desc=header) as progress_bar: for i, (images, targets) in enumerate(data_loader): images = [img.to(device) for img in images] targets = [{k: v.to(device) for k, v in t.items()} for t in targets] outputs = model(images) # Convert outputs for torchmetrics preds = [ {"boxes": out["boxes"], "scores": out["scores"], "labels": out["labels"]} for out in outputs ] targs = [ {"boxes": tgt["boxes"], "labels": tgt["labels"]} for tgt in targets ] # Update metric for mAP calculation num_cols = 1 fig, axs = plt.subplots(nrows=num_rows, ncols=num_cols, figsize=(15, 25 * num_rows // 4)) # Adjust height based on rows for idx, (img, output) in enumerate(zip(images, outputs)): if idx >= num_samples: break # Stop after displaying the desired number of samples show_image_with_boxes(img.cpu(), output, axs[idx], category_colors) axs[idx].axis('off') # Turn off axis for cleaner look plt.tight_layout() plt.show()

从一批Val数据加载器中选择一些样本，并绘制结果图表。

我们可以看到，使用补丁对 Faster R-CNN 进行微调的结果非常好；它甚至可以捕获非常小的实例。

将 SAHI 与微调的 Faster R-CNN 相结合

传统物体检测模型通常难以处理小物体，因为它们的尺寸有限，而且图像中可用的上下文信息也有限。这就是SAHI发挥作用的地方，它以其出色的结果而大放异彩。SAHI 通过采用专门关注增强数据集以突出显示这些小实例的技术来解决这个问题。它通过使用诸如将图像切成更小的块等方法来增强训练过程，在这些块中，小物体变得更加突出且更容易检测。

要了解有关切片辅助超推理 (SAHI)的更多信息，请将此内容加入书签以供日后使用。

让我们安装并导入 SAHI 依赖项。

1	`!pip install` `-qq` `-U sahi`

123456 from sahi import AutoDetectionModelfrom sahi.predict import get_sliced_prediction, predict, get_predictionfrom sahi.utils.file import download_from_urlfrom sahi.prediction import visualize_object_predictionsfrom sahi.utils.cv import read_imagefrom IPython.display import Image

我们将选择torchvision作为模型类型，并使用SAHI 的模块。我们首先将置信度阈值设置为0.7，并将图像大小设置为输入图像的最长尺寸，因为我们的图像具有矩形尺寸。AutoDetectionModel

12345678 detection_model = AutoDetectionModel.from_pretrained( model_type='torchvision', model=model, #Faster RCNN Model confidence_threshold=0.7, image_size=5436, #Image's longest dimension device="cpu", # or "cuda:0" load_at_init=True,)

使用切片高度和切片宽度，我们可以控制滑动窗口的尺寸。由于我们的模型是在图像尺寸一半大小的块尺寸上进行训练的，因此我们将相应地选择切片宽度和切片高度。

12345678910 img_path = 'test/7882.jpg'img_filename_temp = img_path.split('/')[1]img_filename = img_filename_temp.split('.')[0] # print(img_filename)img_pil = PIL.Image.open(img_path)W,H = img_pil.size# print(W)s_h,s_w = H/2,W/2s_h ,s_w = int(s_h),int(s_w)

返回检测到的对象实例及其、和get_sliced_prediction的列表。在这里我们可以看到类 id 是正确的，但相应的标签 id 与 COCO 类一致。因此，我们将通过定义一些执行类映射的自定义函数来解决这个问题，并绘制与类别 id 匹配的边界框。bboxscorecategory id

123456789 result = get_sliced_prediction( img_path, detection_model, slice_height=s_h, slice_width=s_w, overlap_height_ratio=0.2, overlap_width_ratio=0.2,)result.object_prediction_list

12345 [ObjectPrediction< bbox: BoundingBox: <(1754.8331298828125, 1062.62841796875, 1823.0999755859375, 1103.5548362731934), w: 68.266845703125, h: 40.92641830444336>, mask: None, score: PredictionScore: <value: 0.9949936270713806>, category: Category: <id: 1, name: person>>]

该custom draw_bounding_boxes()实用程序接收图像并object_prediction_list从 SAHI 中get_sliced_predictions绘制出令人赏心悦目的预测。

123456789101112十三1415161718192021222324二十五二十六二十七二十八二十九三十31三十二33三十四 def draw_bounding_boxes(image, object_prediction_list): draw = ImageDraw.Draw(image) font_size = int(min(image.size) * 0.008) # Adjust font size based on image size font_path = "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf" font = ImageFont.truetype(font_path, font_size) if os.path.exists(font_path) else ImageFont.load_default() for prediction in object_prediction_list: bbox = prediction.bbox.to_xywh() category_id = prediction.category.id x, y, w, h = bbox x1, y1, x2, y2 = x, y, x + w, y + h color = category_colors.get(category_id, 'white') # Default to white if category_id is unknown draw.rectangle([x1, y1, x2, y2], outline=color, width=6) # draw.text((x1, y1 - font_size), str(classes_to_idx[category_id]), fill=color, font=font) return image # Draw bounding boxesimage_with_bboxes = draw_bounding_boxes(img_pil, result.object_prediction_list) # Define the output pathoutput_directory = 'sahi_ouput_data'output_path = os.path.join(output_directory, f'result_{img_filename}.png') # Create the directory if it doesn't existos.makedirs(output_directory, exist_ok=True) # Save the resulting imageoutput_path = f'sahi_ouput_data/result_{img_filename}.png'image_with_bboxes.save(output_path) # Display the image (optional, if running in an environment that supports it)image_with_bboxes.show()

使用 SAHI 与不使用 SAHI 的 Fast RCNN 检测以及使用补丁作为输入的 Fast RCNN 检测的比较

比较 1：使用 Mobilenet v3 大型主干网络微调 Faster R-CNN

原始图像前向传递

让我们直接将原始图像调整大小以训练图像大小 (382,216)，无需 SAHI或无需 Patch Creation，即可将其传递给使用 Mobilenet v3 Large 模型进行微调的 Faster R-CNN。

图 15：test-7882.jpg 前向传递 – 微调 Faster R-CNN

图 16：测试 – 1070.jpg 前向传递 – 微调 Faster R-CNN

图 17：测试 – 2843.jpg 前向传递 – 微调 Faster R-CNN

在这里，我们可以看到模型完全错过了许多实例并且表现非常差。

使用 Patches 作为输入的 Faster R-CNN Mobilenet v3 大型推理

现在，在执行与 Faster R-CNN 模型微调期间执行的相同的预处理步骤（例如补丁创建和调整大小以训练图像大小）之后，它能够捕获几乎所有实例，而只有少数实例未被发现。

图 18：测试 – 7882.jpg 作为补丁 – 微调 Faster R-CNN

图 19：测试 – 1070.jpg 作为补丁 – 微调 Faster R-CNN

图 20：测试 – 2843.jpg 作为补丁 – Fine-tuning Faster R-CNN

使用 SAHI 进行 Faster R-CNN Mobilenet v3 大规模推理

现在，SAHI 的强度已使检测结果非常清晰，边界框完美对齐。我们可以注意到，无论有没有 SAHI，班级游泳运动员检测到的实例都不同，这表明他们的技术存在差异。

比较 2：使用 Resnet v2 Backbone 对 Faster R-CNN 进行微调

原始图像前向传递

与我们的比较 1 部分相同，让我们直接将原始图像调整大小以训练图像大小 (382,216)（无需 SAHI或无需 Patch Creation ），并将其传递给使用Resnet50 v2模型微调的 Faster R-CNN ，并查看结果。

通过将补丁作为输入来加快 R-CNN Resnet50 v2 推理速度

图 27：测试 – 7832.jpg 作为补丁 – Fine-tuning Faster R-CNN

图 28：测试 – 6166.jpg 作为补丁 – 微调 Faster R-CNN

图 29：测试 – 1669.jpg 作为补丁 – Fine-tuning Faster R-CNN

使用 SAHI 实现更快的 R-CNN Resnet50 v2 推理

通过微调 Faster R-CNN Resnet50v2，我们可以看到预测结果与微调的 Faster R-CNN Mobilenet 相比有显著改善。Faster R-CNN Resnet50v2 的附加参数大小和 mAP 无疑使其成为 2024 年的有力竞争者。

结果令人印象深刻，对吧？向上滚动以了解有关实际代码实现的更多信息。

关键要点

Patch Work 发挥了神奇的作用：我们将训练图像分成多个块，让它专注于细节。这个技巧帮助它更好地看到微小物体。即使没有 SAHI 的帮助，我们经过微调的 Faster R-CNN 也几乎像鹰眼一样敏锐，几乎完美匹配 SAHI 的结果。
SAHI 的方法：此外，将我们经过微调的 Faster R-CNN 模型与 SAHI 集成，显著提高了检测准确率。SAHI 将图像切成更小部分的先进技术改变了游戏规则，有效地减少了误报并实现了近乎完美的边界框实例。这种组合展示了数据准备和强大的后处理技术的强大协同作用。
数据质量至关重要：我们的实验强调了数据准备和预处理的重要性，并证明了深思熟虑的数据增强的价值。一切都与基础工作有关！

结论

我们研究的目的是强调在 SeaDronesSee 等具有挑战性的数据集中精心准备数据的重要性。尽管 Faster R-CNN 的延迟和 GFLOPS 很高，但事实证明，即使在 2024 年，它也是一个有价值的候选者。通过探索延迟更低、实时处理和准确度更高的轻量级模型，可以进一步改进此实验。无人机和机器人开发人员可以使用我们的研究结果来改进和增强其关键任务的检测系统。

我们的研究有什么影响？通过更好的检测和响应能力，有可能挽救无数生命。这才是真正的超级英雄救援任务。

确实如此，让我们一起实现它吧！’