Unable to produce the same Eval and Test Results

#2
by clive777 - opened

Dataset used : https://huggingface.co/datasets/conll2003

Evaluation Metric : load_metric("seqeval")

**Results Obtained : **
{'eval_loss': 2.3160810470581055,
'eval_precision': 0.6153949670300094,
'eval_recall': 0.7696061932009425,
'eval_f1': 0.6839153518283106,
'eval_accuracy': 0.9621769588508859,
'eval_runtime': 556.8392,
'eval_samples_per_second': 5.838,
'eval_steps_per_second': 0.731}

Ner label alignment code : Code from : https://huggingface.co/course/chapter7/2

def align_labels_with_tokens(labels, word_ids):
new_labels = []
current_word = None
for word_id in word_ids:
if word_id != current_word:
# Start of a new word!
current_word = word_id
label = -100 if word_id is None else labels[word_id]
new_labels.append(label)
elif word_id is None:
# Special token
new_labels.append(-100)
else:
# Same word as previous token
label = labels[word_id]
# If the label is B-XXX we change it to I-XXX
if label % 2 == 1:
label += 1
new_labels.append(label)

return new_labels

Compute Metric

def compute_metrics(eval_preds):
logits, labels = eval_preds
predictions = np.argmax(logits, axis=-1)

true_labels = [[label_names[l] for l in label if l != -100] for label in labels]
true_predictions = [
    [id2labels[str(p)] for (p, l) in zip(prediction, label) if l != -100]
    for prediction, label in zip(predictions, labels)
]
all_metrics = metric.compute(predictions=true_predictions, references=true_labels)
return {
    "precision": all_metrics["overall_precision"],
    "recall": all_metrics["overall_recall"],
    "f1": all_metrics["overall_f1"],
    "accuracy": all_metrics["overall_accuracy"],
}

Note : Using id2labels from ur models. Please comment on this

Was there any update on this? Did you manage to reproduce the results?

Sign up or log in to comment