Welcome to Nobias’ abstractive summarization challenge in collaboration with Finance Club, IIT Bombay. At Nobias, they build machine learning models to automatically tag financial news articles as either bullish or bearish. The first step in creating such labels for articles is to understand what the article is talking about. You will create a machine-learning model to do exactly that!
Your machine learning model should output a short (3-5 sentences) abstractive summary of a given news article. Your model should then use the summaries to rank each stock mentioned in the articles (these stock tickers are available in both the training and test data, no need to extract them yourselves). To rank the stocks, give them a rating between 1-5 with the following criteria:
- 1 - Very bearish
- 2 - Slightly bearish
- 3 - Neutral
- 4 - Slightly bullish
- 5 - Very bullish
In NLP, there’s two types of summarization tasks. Extractive summarization selects important phrases from the original source to create a concise summary of it. Abstractive summarization does not simply copy important phrases from the source text but also potentially come up with new phrases that are relevant. This technique entails identifying key pieces, interpreting the context, and re-creating them in a new way. Due to the difficulty of both extracting relevant information from a document as well as automatically generating coherent text, abstractive summarization has been considered a more complex problem than extractive summarization.
The datasetWe constructed a dataset of a 100k news articles about various US-listed stocks from Nobias’ database. Each data point contains the raw content from the news article, a one-sentence summary of the article, and a stock ticker for the stock we want to focus on for the article.
The dataset is organized in one single json file, with all articles in an array. Each object in the array contains five properties “id”, “date”, “article”, “stock” and “summary”.
Requirements
Your models will be evaluated on a similarly constructed dataset of 20k news articles. We will use the ROUGE-L metric to evaluate your summaries in comparison to the human written summaries. More specifically, we will use Google’s open source implementation of this metric.
As a baseline, I’ve trained an off-the-shelf model on the same dataset, and evaluated it using the same test dataset we will use for your models. These are the scores from my model:
{'rouge1': 0.6646918466051159, 'rouge2': 0.5104217742791, 'rougeL': 0.5992350411986138, 'rougeLsum': 0.599405937015661}
Treat this baseline as a sanity-check for your own models, but beating it does not guarantee you will win this competition.
Similarly for the stock rankings, we will evaluate the performance of your stocks on a 6-month period. We will allocate funds as following to each ranking
- 1 - Short with $50
- 2 - Short with $20
- 3 - Do nothing
- 4 - Long with $20
- 5 - Long with $50
The portfolio with the highest balance at the end of the evaluation period (we will disclose this after submissions are over) will win this metric.
We will use a combination of these metrics to determine a winner, but will focus more on the portfolio balance instead of ROUGE.
Submission
To submit your model, you must also submit the predictions made by your model. We will give you our test dataset a day before the competition ends. You must submit:
-
Your model’s architecture as a python file (model.py)
-
Your model’s checkpoints/weights in whatever format your preferred ML framework allows (for eg, .ckpt in pytorch)
-
Your model’s predictions as a JSON file (predictions.json), in the following format:
{ "predictions": [ { "id": 1, "summary": "this is a sample summary." }, { "id": 2, "summary": "this is another sample summary." } ], "rankings": [ { "ticker": "brk.b", "rank": 5 }, { "ticker": "meta", "rank": 3 } ] }
You can use this script to generate a ROUGE score for yourself. We will use this exact script to evaluate all models in this competition.
Prizes
Winner
Devpost Achievements
Submitting to this hackathon could earn you:
Judges
Tania Ahuja
Nobias
Ananth
Nobias
Judging Criteria
-
Submission
The standard of the submission
Questions? Email the hackathon manager
Tell your friends
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
