SUFEHeisenberg
commited on
Commit
•
e4079f7
1
Parent(s):
9bb9651
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,43 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- financial_phrasebank
|
5 |
+
- pauri32/fiqa-2018
|
6 |
+
- zeroshot/twitter-financial-news-sentiment
|
7 |
+
language:
|
8 |
+
- en
|
9 |
+
metrics:
|
10 |
+
- accuracy
|
11 |
+
pipeline_tag: text-classification
|
12 |
+
tags:
|
13 |
+
- finance
|
14 |
---
|
15 |
+
|
16 |
+
|
17 |
+
We collects financial domain terms from Investopedia's Financia terms dictionary, NYSSCPA's accounting terminology guide
|
18 |
+
and Harvey's Hypertextual Finance Glossary to expand RoBERTa's vocab dict.
|
19 |
+
|
20 |
+
Based on added-financial-terms RoBERTa, we pretrained our model on multilple financial corpus:
|
21 |
+
|
22 |
+
- Financial Terms
|
23 |
+
- [Investopedia's Financia terms dictionary](https://www.investopedia.com/financial-term-dictionary-4769738)
|
24 |
+
- [NYSSCPA's accounting terminology guide](https://www.nysscpa.org/professional-resources/accounting-terminology-guide)
|
25 |
+
- [Harvey's Hypertextual Finance Glossary](https://people.duke.edu/~charvey/Classes/wpg/glossary.htm)
|
26 |
+
- Financial Datasets
|
27 |
+
- [FPB](https://huggingface.co/datasets/financial_phrasebank)
|
28 |
+
- [FiQA SA](https://huggingface.co/datasets/pauri32/fiqa-2018)
|
29 |
+
- [SemEval2017 Task5](https://aclanthology.org/S17-2089/)
|
30 |
+
- [Twitter Financial News Sentiment](https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment)
|
31 |
+
- Earnings Call
|
32 |
+
2016-2023 NASDAQ 100 components stocks's Earnings Call Transcripts.
|
33 |
+
|
34 |
+
|
35 |
+
In continual pretraining step, we apply following experiments settings to achieve better finetuned results on Four Financial Datasets:
|
36 |
+
|
37 |
+
1. Masking Probability: 0.4 (instead of default 0.15)
|
38 |
+
2. Warmup Steps: 0 (deriving better results than models with warmup steps)
|
39 |
+
3. Epochs: 1 (is enough in case of overfitting)
|
40 |
+
4. weight_decay: 0.01
|
41 |
+
5. Train Batch Size: 64
|
42 |
+
6. FP16
|
43 |
+
|