Vision Encoder does not scale well on batched images as input
#38
by
Gear12312
- opened
Hi, awsome model, but when I was using this, I noticed that following batch_answer() funcion, the batched image encoding
with this dataloader
from torch.utils.data import Dataset
class ImageFolderDataset(Dataset):
def __init__(self, folder_path):
self.folder_path = folder_path
self.image_files = [f for f in os.listdir(folder_path) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
def __len__(self):
return len(self.image_files)
def __getitem__(self, idx):
image_path = os.path.join(self.folder_path, self.image_files[idx])
image = Image.open(image_path).convert('RGB')
return image
def collate_fn(batch):
return batch
dataloader= DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=1, collate_fn=collate_fn, drop_last=True)
Does not seem to have batched speedup when I run with different batch sizes in this code
for i, batch in enumerate(dataloader):
model.encode_image(batch)
and for batch size of 1, its around 1s per img, but batch size of 10, its around 10s per 10 imgs
am I doing something wrong here?