Meet REFIND - Relation Extraction Financial Dataset

REFinD (pronounced "refined") is the biggest financial-domain relation-extraction dataset with 29k instances.

The REFIND Challenge


The competition is now CLOSED. It was hosted on this Codalab Link .


Why REFIND?



HIGH-QUALITY

Verified by Financial Experts.



LARGE-SCALE

29k instances and 22 relations amongst 8 types of entity pairs



FINANCE-DOMAIN

Reasoning over financial reports.


Explore


The REFinD dataset is the first domain specific financial relation-extraction dataset built using raw text from various 10-X (10-K, 10-Q, etc. broadly known as 10-X) reports of publicly traded companies that were obtained from US Securities and Exchange Commission (SEC) website. An example instance from the dataset is shown below :

Statistics


REFinD is largest financial domain annotated dataset of relations, with 29K instances and 22 relations amongst 8 types of entity pairs, generated entirely over financial documents. The figure below shows the 22 relations between 8 entity types. This dataset, although built on financial reports and highlighting financial-domain specific challenges, can also be leveraged by other domains such as legal, risk modeling, and econometrics.


The zoomed-in statistics for the relation-types unique to REFinD are shown below.


REFinD contains much longer sentences than TACRED wherein the average sentence length in TACRED dataset is 36.2 whereas REFinD is 53.7. The distribution of sentence length when compared to TACRED is shown below.


Download (Train/Test Data, Code)


The competition dataset can be downloaded from the Codalab Link.

Paper


To read the dataset paper, please go to this link

The paper has been accepted to SIGIR conference. Please cite the paper in the referene format:

        Simerjot Kaur∗, Charese Smiley∗, Akshat Gupta, Joy Sain, Dongsheng Wang,Suchetha Siddagangappa, Toyin Aguda, and Sameena Shah. 2023. 
        REFinD: Relation Extraction Financial Dataset. 
        In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
        (SIGIR ’23), July 23–27, 2023, Taipei, Taiwan. ACM, New York, NY, USA,
      

Contact



Have any questions or suggestions? Feel free to contact us!