目录
1. EfficientNet 网络
2. depth、width、resolution
3. EfficientNet 网络的结构
4. dos 命令train 网络
5. 代码
5.1 model
5.2 dataset
5.3 utils
5.4 train
5.5 predict
EfficientNet 对网络的重要三个参数进行的探索:图像分辨率、网络的宽度、网络的深度
图像分辨率:特征图的size,h*w就是图像的空间分辨率
网络的宽度:网络中特征图的个数,也就是卷积核的个数或者输出的channel
网络的深度:网络的层数,resnet34,resnet101等等
如下:
不知道从什么时候开始,224*224的图像分辨率输入似乎成为了神经网络的输入标准,导致后来的网络几乎输入都是224*224的尺寸大小
虽然有的网络规定是224*224大小,但是输入是别的尺寸例如300*300也没问题。这个是有问题的,因为大多数的代码,在全连接层之前用的是自适应池化层。
否则,输入的图像尺寸不正确会影响到全连接层的参数,就会报错
因此,在规定了分辨率的这一基础下,后面的网络都在width或者depth上面下功夫。例如resnet可以增加到1000层的深度
下面简单说说三个参数的作用
宽度:增加channel的数量 ,更广泛的网络往往能够捕获更细粒度的特征,并且更容易训练。然而,极宽但较浅的网络往往难以捕捉更高层次的特征。经验结果表明,当网络变得更宽且w更大时,精度很快饱和。
深度:增加网络的层数,缩放网络深度是许多卷积神经网络最常用的方法。更深入的ConvNet可以捕获更丰富和更复杂的特征,并在新任务上很好地泛化。然而,由于梯度消失问题,更深层次的网络也更难训练。尽管一些技术,如shortcut和批量归一化缓解了训练问题,但深度网络的精度增益会降低
分辨率:使用更高分辨率的输入图像,卷积可以潜在地捕获更细粒度的模式。其中更高的分辨率确实可以提高精度,但对于非常高的分辨率,精度增益会减少
作者得出的结论:
EfficientNet 提出,将这三个参数如何平衡的缩放是很重要的。因为,不同尺度尺度之间并不是相互独立的。直观地说,对于更高分辨率的图像,我们应该增加网络深度,这样更大的接受域可以帮助捕获在更大的图像中包含更多像素的相似特征。相应的,在分辨率较高时,也应增加网络宽度为了在高分辨率图像中捕获更多像素的细粒度模式。这些直觉表明,我们需要协调和平衡不同的缩放维度,而不是传统的一维缩放。
EfficientNet 网络的基本模块称为 MBConv,首先采用1*1卷积进行升维度,然后dw卷积,然后经过了SE注意力机制,在1*1卷积降维,经过dropout。如果用shortcut的话,加在一起输出
其中SE注意力机制如下:
然后EfficientNet B0的结构如下:
EfficientNet B1 - B7 就是在B0的基础上增加了宽度和深度的超参数,当改变这两个数的时候,输入图像的size要手动的根据表格改变
-h 查看可以定义的参数,这里将epochs 设定为30
训练过程:
预测:这里只预测单张图像
EfficientNet 网络的代码
import math
import copy
from functools import partial
from collections import OrderedDict
from typing import Optional, Callable
import torch
import torch.nn as nn
from torch import Tensor
from torch.nn import functional as F
def _make_divisible(ch, divisor=8, min_ch=None):
if min_ch is None:
min_ch = divisor
new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
if new_ch < 0.9 * ch:
new_ch += divisor
return new_ch
def drop_path(x, drop_prob: float = 0., training: bool = False):
"""
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
"Deep Networks with Stochastic Depth", https://arxiv.org/pdf/1603.09382.pdf
This function is taken from the rwightman.
It can be seen here:
https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/layers/drop.py#L140
"""
if drop_prob == 0. or not training:
return x
keep_prob = 1 - drop_prob
shape = (x.shape[0],) + (1,) * (x.ndim - 1)
random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
random_tensor.floor_()
output = x.div(keep_prob) * random_tensor
return output
class DropPath(nn.Module):
"""
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
"Deep Networks with Stochastic Depth", https://arxiv.org/pdf/1603.09382.pdf
"""
def __init__(self, drop_prob=None):
super(DropPath, self).__init__()
self.drop_prob = drop_prob
def forward(self, x):
return drop_path(x, self.drop_prob, self.training)
class ConvBNActivation(nn.Sequential):
def __init__(self,
in_planes: int,
out_planes: int,
kernel_size: int = 3,
stride: int = 1,
groups: int = 1,
norm_layer: Optional[Callable[..., nn.Module]] = None,
activation_layer: Optional[Callable[..., nn.Module]] = None):
padding = (kernel_size - 1) // 2
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if activation_layer is None:
activation_layer = nn.SiLU
super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
out_channels=out_planes,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=False),
norm_layer(out_planes),
activation_layer())
class SqueezeExcitation(nn.Module):
def __init__(self,
input_c: int,
expand_c: int,
squeeze_factor: int = 4):
super(SqueezeExcitation, self).__init__()
squeeze_c = input_c // squeeze_factor
self.fc1 = nn.Conv2d(expand_c, squeeze_c, 1)
self.ac1 = nn.SiLU()
self.fc2 = nn.Conv2d(squeeze_c, expand_c, 1)
self.ac2 = nn.Sigmoid()
def forward(self, x: Tensor) -> Tensor:
scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
scale = self.fc1(scale)
scale = self.ac1(scale)
scale = self.fc2(scale)
scale = self.ac2(scale)
return scale * x
class InvertedResidualConfig:
def __init__(self,
kernel: int,
input_c: int,
out_c: int,
expanded_ratio: int,
stride: int,
use_se: bool,
drop_rate: float,
index: str,
width_coefficient: float):
self.input_c = self.adjust_channels(input_c, width_coefficient)
self.kernel = kernel
self.expanded_c = self.input_c * expanded_ratio
self.out_c = self.adjust_channels(out_c, width_coefficient)
self.use_se = use_se
self.stride = stride
self.drop_rate = drop_rate
self.index = index
@staticmethod
def adjust_channels(channels: int, width_coefficient: float):
return _make_divisible(channels * width_coefficient, 8)
class InvertedResidual(nn.Module):
def __init__(self,
cnf: InvertedResidualConfig,
norm_layer: Callable[..., nn.Module]):
super(InvertedResidual, self).__init__()
if cnf.stride not in [1, 2]:
raise ValueError("illegal stride value.")
self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)
layers = OrderedDict()
activation_layer = nn.SiLU
if cnf.expanded_c != cnf.input_c:
layers.update({"expand_conv": ConvBNActivation(cnf.input_c,
cnf.expanded_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=activation_layer)})
layers.update({"dwconv": ConvBNActivation(cnf.expanded_c,
cnf.expanded_c,
kernel_size=cnf.kernel,
stride=cnf.stride,
groups=cnf.expanded_c,
norm_layer=norm_layer,
activation_layer=activation_layer)})
if cnf.use_se:
layers.update({"se": SqueezeExcitation(cnf.input_c,cnf.expanded_c)})
layers.update({"project_conv": ConvBNActivation(cnf.expanded_c,
cnf.out_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=nn.Identity)})
self.block = nn.Sequential(layers)
self.out_channels = cnf.out_c
self.is_strided = cnf.stride > 1
if self.use_res_connect and cnf.drop_rate > 0:
self.dropout = DropPath(cnf.drop_rate)
else:
self.dropout = nn.Identity()
def forward(self, x: Tensor) -> Tensor:
result = self.block(x)
result = self.dropout(result)
if self.use_res_connect:
result += x
return result
class EfficientNet(nn.Module):
def __init__(self,
width_coefficient: float,
depth_coefficient: float,
num_classes: int = 1000,
dropout_rate: float = 0.2,
drop_connect_rate: float = 0.2,
block: Optional[Callable[..., nn.Module]] = None,
norm_layer: Optional[Callable[..., nn.Module]] = None
):
super(EfficientNet, self).__init__()
default_cnf = [[3, 32, 16, 1, 1, True, drop_connect_rate, 1],
[3, 16, 24, 6, 2, True, drop_connect_rate, 2],
[5, 24, 40, 6, 2, True, drop_connect_rate, 2],
[3, 40, 80, 6, 2, True, drop_connect_rate, 3],
[5, 80, 112, 6, 1, True, drop_connect_rate, 3],
[5, 112, 192, 6, 2, True, drop_connect_rate, 4],
[3, 192, 320, 6, 1, True, drop_connect_rate, 1]]
def round_repeats(repeats):
return int(math.ceil(depth_coefficient * repeats))
if block is None:
block = InvertedResidual
if norm_layer is None:
norm_layer = partial(nn.BatchNorm2d, eps=1e-3, momentum=0.1)
adjust_channels = partial(InvertedResidualConfig.adjust_channels,width_coefficient=width_coefficient)
bneck_conf = partial(InvertedResidualConfig,width_coefficient=width_coefficient)
b = 0
num_blocks = float(sum(round_repeats(i[-1]) for i in default_cnf))
inverted_residual_setting = []
for stage, args in enumerate(default_cnf):
cnf = copy.copy(args)
for i in range(round_repeats(cnf.pop(-1))):
if i > 0:
cnf[-3] = 1
cnf[1] = cnf[2]
cnf[-1] = args[-2] * b / num_blocks
index = str(stage + 1) + chr(i + 97)
inverted_residual_setting.append(bneck_conf(*cnf, index))
b += 1
layers = OrderedDict()
layers.update({"stem_conv": ConvBNActivation(in_planes=3,
out_planes=adjust_channels(32),
kernel_size=3,
stride=2,
norm_layer=norm_layer)})
for cnf in inverted_residual_setting:
layers.update({cnf.index: block(cnf, norm_layer)})
last_conv_input_c = inverted_residual_setting[-1].out_c
last_conv_output_c = adjust_channels(1280)
layers.update({"top": ConvBNActivation(in_planes=last_conv_input_c,
out_planes=last_conv_output_c,
kernel_size=1,
norm_layer=norm_layer)})
self.features = nn.Sequential(layers)
self.avgpool = nn.AdaptiveAvgPool2d(1)
classifier = []
if dropout_rate > 0:
classifier.append(nn.Dropout(p=dropout_rate, inplace=True))
classifier.append(nn.Linear(last_conv_output_c, num_classes))
self.classifier = nn.Sequential(*classifier)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode="fan_out")
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m, nn.BatchNorm2d):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.zeros_(m.bias)
def _forward_impl(self, x: Tensor) -> Tensor:
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
def forward(self, x: Tensor) -> Tensor:
return self._forward_impl(x)
def efficientnet_b0(num_classes=1000):
return EfficientNet(width_coefficient=1.0,depth_coefficient=1.0,dropout_rate=0.2,num_classes=num_classes)
def efficientnet_b1(num_classes=1000):
return EfficientNet(width_coefficient=1.0,depth_coefficient=1.1,dropout_rate=0.2,num_classes=num_classes)
def efficientnet_b2(num_classes=1000):
return EfficientNet(width_coefficient=1.1,depth_coefficient=1.2,dropout_rate=0.3,num_classes=num_classes)
def efficientnet_b3(num_classes=1000):
return EfficientNet(width_coefficient=1.2,depth_coefficient=1.4,dropout_rate=0.3,num_classes=num_classes)
def efficientnet_b4(num_classes=1000):
return EfficientNet(width_coefficient=1.4,depth_coefficient=1.8,dropout_rate=0.4,num_classes=num_classes)
def efficientnet_b5(num_classes=1000):
return EfficientNet(width_coefficient=1.6,depth_coefficient=2.2,dropout_rate=0.4,num_classes=num_classes)
def efficientnet_b6(num_classes=1000):
return EfficientNet(width_coefficient=1.8,depth_coefficient=2.6,dropout_rate=0.5,num_classes=num_classes)
def efficientnet_b7(num_classes=1000):
return EfficientNet(width_coefficient=2.0,depth_coefficient=3.1,dropout_rate=0.5,num_classes=num_classes)
from PIL import Image
import torch
from torch.utils.data import Dataset
class MyDataSet(Dataset):
"""自定义数据集"""
def __init__(self, images_path: list, images_class: list, transform=None):
self.images_path = images_path
self.images_class = images_class
self.transform = transform
def __len__(self):
return len(self.images_path)
def __getitem__(self, item):
img = Image.open(self.images_path[item])
if img.mode != 'RGB':
raise ValueError("image: {} isn't RGB mode.".format(self.images_path[item]))
label = self.images_class[item]
if self.transform is not None:
img = self.transform(img)
return img, label
@staticmethod
def collate_fn(batch):
images, labels = tuple(zip(*batch))
images = torch.stack(images, dim=0)
labels = torch.as_tensor(labels)
return images, labels
import os
import sys
import json
import random
import torch
from tqdm import tqdm
import matplotlib.pyplot as plt
def read_split_data(root: str, val_rate: float = 0.2):
random.seed(0)
assert os.path.exists(root), "dataset root: {} does not exist.".format(root)
flower_class = [cla for cla in os.listdir(root) if os.path.isdir(os.path.join(root, cla))]
flower_class.sort()
class_indices = dict((k, v) for v, k in enumerate(flower_class))
json_str = json.dumps(dict((val, key) for key, val in class_indices.items()), indent=4)
with open('class_indices.json', 'w') as json_file:
json_file.write(json_str)
train_images_path = []
train_images_label = []
val_images_path = []
val_images_label = []
every_class_num = []
supported = [".jpg", ".JPG", ".png", ".PNG"]
for cla in flower_class:
cla_path = os.path.join(root, cla)
images = [os.path.join(root, cla, i) for i in os.listdir(cla_path)
if os.path.splitext(i)[-1] in supported]
images.sort()
image_class = class_indices[cla]
every_class_num.append(len(images))
val_path = random.sample(images, k=int(len(images) * val_rate))
for img_path in images:
if img_path in val_path:
val_images_path.append(img_path)
val_images_label.append(image_class)
else:
train_images_path.append(img_path)
train_images_label.append(image_class)
print("{} images were found in the dataset.".format(sum(every_class_num)))
print("{} images for training.".format(len(train_images_path)))
print("{} images for validation.".format(len(val_images_path)))
assert len(train_images_path) > 0, "number of training images must greater than 0."
assert len(val_images_path) > 0, "number of validation images must greater than 0."
plot_image = False
if plot_image:
plt.bar(range(len(flower_class)), every_class_num, align='center')
plt.xticks(range(len(flower_class)), flower_class)
for i, v in enumerate(every_class_num):
plt.text(x=i, y=v + 5, s=str(v), ha='center')
plt.xlabel('image class')
plt.ylabel('number of images')
plt.title('flower class distribution')
plt.show()
return train_images_path, train_images_label, val_images_path, val_images_label
def train_one_epoch(model, optimizer, data_loader, device, epoch,batch):
model.train()
loss_function = torch.nn.CrossEntropyLoss()
running_loss = 0.0
data_loader = tqdm(data_loader, file=sys.stdout)
for images, labels in data_loader:
optimizer.zero_grad()
pred = model(images.to(device))
loss = loss_function(pred, labels.to(device))
loss.backward()
running_loss += loss.item()
optimizer.step()
data_loader.desc = "[epoch {}] mean loss {}".format(epoch, round(loss.item()/batch, 3))
return running_loss
@torch.no_grad()
def evaluate(model, data_loader, device):
model.eval()
sum_num = 0
data_loader = tqdm(data_loader, file=sys.stdout)
for images, labels in data_loader:
pred = model(images.to(device))
pred = torch.max(pred, dim=1)[1]
sum_num += torch.eq(pred, labels.to(device)).sum()
return sum_num
import os
import math
import argparse
import torch
import torch.optim as optim
from torchvision import transforms
import torch.optim.lr_scheduler as lr_scheduler
from torch.utils.data import DataLoader
from model import efficientnet_b0 as create_model
from my_dataset import MyDataSet
from utils import read_split_data, train_one_epoch, evaluate
def main(args):
device = torch.device(args.device if torch.cuda.is_available() else "cpu")
if os.path.exists("./weights") is False:
os.makedirs("./weights")
train_images_path, train_images_label, val_images_path, val_images_label = read_split_data(args.data_path)
img_size = {"B0": 224,"B1": 240,"B2": 260,"B3": 300,"B4": 380,"B5": 456,"B6": 528,"B7": 600}
num_model = "B0"
data_transform = {
"train": transforms.Compose([transforms.RandomResizedCrop(img_size[num_model]),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
"val": transforms.Compose([transforms.Resize(img_size[num_model]),
transforms.CenterCrop(img_size[num_model]),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}
train_dataset = MyDataSet(images_path=train_images_path,images_class=train_images_label,transform=data_transform["train"])
val_dataset = MyDataSet(images_path=val_images_path,images_class=val_images_label,transform=data_transform["val"])
num_trainSet = len(train_dataset)
num_valSet = len(val_dataset)
batch_size = args.batch_size
nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])
train_loader = DataLoader(train_dataset,batch_size=batch_size,shuffle=True,pin_memory=True,num_workers=nw,collate_fn=train_dataset.collate_fn)
val_loader = DataLoader(val_dataset,batch_size=batch_size,shuffle=False,pin_memory=True,num_workers=nw,collate_fn=val_dataset.collate_fn)
model = create_model(num_classes=args.num_classes).to(device)
if args.weights != "":
if os.path.exists(args.weights):
weights_dict = torch.load(args.weights, map_location=device)
load_weights_dict = {k: v for k, v in weights_dict.items() if model.state_dict()[k].numel() == v.numel()}
print(model.load_state_dict(load_weights_dict, strict=False))
else:
raise FileNotFoundError("not found weights file: {}".format(args.weights))
if args.freeze_layers:
for name, para in model.named_parameters():
if ("features.top" not in name) and ("classifier" not in name):
para.requires_grad_(False)
else:
print("training {}".format(name))
pg = [p for p in model.parameters() if p.requires_grad]
optimizer = optim.SGD(pg, lr=args.lr, momentum=0.9, weight_decay=1E-4)
lf = lambda x: ((1 + math.cos(x * math.pi / args.epochs)) / 2) * (1 - args.lrf) + args.lrf
scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
best_acc = 0.0
for epoch in range(args.epochs):
loss_all = train_one_epoch(model=model, optimizer=optimizer, data_loader=train_loader, device=device, epoch=epoch, batch=batch_size)
scheduler.step()
acc_all = evaluate(model=model, data_loader=val_loader, device=device)
print("[epoch :%d],train loss:%.4f ,test accuracy: %.4f" % (epoch, loss_all / num_trainSet, acc_all / num_valSet))
if acc_all > best_acc:
best_acc = acc_all
torch.save(model.state_dict(),'./weights/shuffleNet_V2.pth')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--num_classes', type=int, default=5)
parser.add_argument('--epochs', type=int, default=5)
parser.add_argument('--batch-size', type=int, default=16)
parser.add_argument('--lr', type=float, default=0.1)
parser.add_argument('--lrf', type=float, default=0.01)
parser.add_argument('--data-path', type=str,default="./data/flower")
parser.add_argument('--weights', type=str, default='',help='initial weights path')
parser.add_argument('--freeze-layers', type=bool, default=False)
parser.add_argument('--device', default='cuda:0', help='device id (i.e. 0 or 0,1 or cpu)')
opt = parser.parse_args()
print('start training....')
main(opt)
print("finish training!!!")
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'
import json
import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
from model import efficientnet_b0 as create_model
def main():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
img_size = {"B0": 224,"B1": 240,"B2": 260,"B3": 300,"B4": 380,"B5": 456,"B6": 528,"B7": 600}
num_model = "B0"
data_transform = transforms.Compose(
[transforms.Resize(img_size[num_model]),
transforms.CenterCrop(img_size[num_model]),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
img_path = "./OIP-C.jpg"
img = Image.open(img_path)
plt.imshow(img)
img = data_transform(img)
img = torch.unsqueeze(img, dim=0)
json_path = './class_indices.json'
assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)
with open(json_path, "r") as f:
class_indict = json.load(f)
model = create_model(num_classes=5).to(device)
model_weight_path = "./weights/shuffleNet_V2.pth"
model.load_state_dict(torch.load(model_weight_path, map_location=device))
model.eval()
with torch.no_grad():
output = torch.squeeze(model(img.to(device))).cpu()
predict = torch.softmax(output, dim=0)
predict_cla = torch.argmax(predict).numpy()
print_res = "class: {} prob: {:.3}".format(class_indict[str(predict_cla)],predict[predict_cla].numpy())
plt.title(print_res)
for i in range(len(predict)):
print("class: {:10} prob: {:.3}".format(class_indict[str(i)],predict[i].numpy()))
plt.show()
if __name__ == '__main__':
main()
相关知识
EfficientNet实现农业病害识别(FastDeploy部署和安卓端部署)
细粒度分类数据集汇总
实验一:鸢尾花数据集分类
pytorch——AlexNet——训练花分类数据集
数据集划分,Oxford Flower102花卉分类数据集,分为训练集、测试集、验证集
KNN算法实现鸢尾花数据集分类
做一个logitic分类之鸢尾花数据集的分类
细粒度分类常用数据集介绍及下载
102类花卉分类数据集(已划分,有训练集、测试集、验证集标签)
【机器学习】经典数据集鸢尾花的分类识别
网址: EfficientNet 分类花数据集 https://m.huajiangbk.com/newsview486863.html
上一篇: 田鼠阿佛 |
下一篇: 花言鲜花商城开题报告 |