Dec 15th, 2020
This fall I took a class entirely led by students called CS196-Computer Science Honors here at University of Illinois at Urbana Champaign. This class is unique from other classes because it teaches skills that developers use in real life such as bash, git as well as specialized programming languages like Rust or in the past years, Haskell. However, the most unique component of this class is the project groups. Each group has students of different experience levels, backgrounds and skills. Students just like us lead these project groups and guide them into producing a Minimal Viable Product (MVP) and creating a pitch for this product by the end of the semester. My group and I created a Lecture Summarizing model called LCTRS (github url: https://github.com/CS196Illinois/Group6)
The MVP for this project was to create a website that summarizes lecture transcripts into useful text using Natural Language Processing (NLP) . We also wanted to avoid the main issues of summarization which include missing key points, or overly verbose summaries.
For pretraining, we used a variety of datasets like gigaword, reddit_tifu, aeslc, newsroom, arixiv and cnn-dailymail. However, most of these datasets don’t use academic language or are not conversational in tone like lecture transcripts tend to be. Also the length of the summary compared to the length of the transcript is very small, and typically you learn more than a couple of sentences worth of information from an hour lecture. So we created a custom dataset of lectures from all of the classes we were taking at the time. We had introductory math, science, english and social sciences classes represented in the sample.
My partner and I used Rouge Metrics to compare the language model’s automatically generated summary to the predefined summaries from our dataset. Rouge metrics compare how many words of the user generated summary match with the computer generated summary to quantify the model’s precision and recall. We compared the Rouge scores of DistillBERT, BERT, and T5 and we settled on T5 with it’s Rouge score of about 25. We chose this model since it was space saving and took only a small amount of fine tuning to perform well. After fine-tuning the model on our custom dataset, we were able to improve this score to ~30.
We used flask to render our web application. You can login using google O-auth and add summaries to your account for future viewing. In the dialog, you can put your summary in and the percentage you want to summarize to and the model will output the summary onto the page.
Prior to this project, I had absolutely no experience in NLP or models, however I’m really thankful for this experience and my PM who really helped guide our team towards creating an MVP that we can all be proud of.
Link to our final project presentation: https://drive.google.com/file/d/13qRtjcR-cRBddweiGvVDA5glAL3bco3r/view
<- back to Writing