University of Southern California

CSCI 544 — Applied Natural Language Processing

Research Project

Latest Announcements

Due dates


The research project is an in-depth activity that will be carried out in teams of four. The project can be on any aspect of natural language processing. You will formulate a research question, identify resources and tools to address the question, implement and evaluate a system that uses these resources and tools, demonstrate the system, and write up a report.


Proposal structure

The proposal describes your plan for the research project, and will serve as the skeleton for the final report. As a plan it is subject to change and does not represent a firm commitment, but it should show that you’ve thought through the relevant aspects of your research. The proposal should be a document of about 500 words, written in English in good academic style. Proposals that substantially exceed this length (above 600 words) will be penalized. The structure of the document should be as follows.

The proposal should be written after you have received some feedback about the general direction of your project.

You will receive written feedback about your proposal, which should help you with writing the final report; however, feedback on the proposal might take some time, so don’t delay collecting your data and implementing your system while waiting for comments on your proposal.

Research article presentation

Each project team has selected an original research article to present to the class, related to their research project. Time slots for the presentation are as follows (note that class starts at 3:30 PM as usual – I will use the first 10 minutes of class for announcements; also, there will be a lecture following the presentations).

March 24

March 31

April 7

Presenting your article

Code demonstration

Submit a 5-minute video which explains the work, demonstrates how the code works, and gives an idea of how you intend to proceed. Also submit a link to the code repository. The code demo is a progress check; you should have some working code to show, but it is not expected to be a final version.

To submit the video, just put it somewhere on the web that is accessible, and submit the URL as a note on Piazza. One good option is to use Zoom with cloud recording, which should be available if you use your USC Zoom account. With Zoom, all teammates can speak and share screens on the same recording; just remember that it may take Zoom several hours to process a recording, and you want to leave time to review the final recording and retake if necessary.

Poster presentation

Create a poster presentation of your work, including preliminary results. We will hold a poster session online, where you will present your posters to yor classmates and the instructor, and view the posters of all of your classmates.

Procedure for the online poster session:

  1. The session will take place on a Discord server; the link to access the server has been published on Piazza.
  2. The server has 20 text channels and 20 voice channels: one “general” channel of each type, and one channel of each type for each team.
  3. The idea is that the channels are like the people congregating around a poster. So everyone who wants to discuss team X’s poster will be on the the team X channel. People can move freely between channels. Each person can only be at one voice channel at a time.
  4. Every team should post their poster to the respective text channel by the beginning of class. You can attach a file by clicking the “+” button at the left of the message box.
  5. Each team should monitor their own channels to see if someone wants to discuss their poster. You’re not expected to be at your poster the whole time, but someone from the team should keep an eye out for new messages, answer text questions within a reasonable amount of time, and be ready to move to the voice channel if someone wants to talk.
  6. The class will start at 3:30 as usual. We will not take a scheduled break: instead we will finish 20 minutes early, at 6:30. Students may take breaks as they see fit, but please coordinate with your teammates so that your poster/channel is monitored by someone throughout the class.
  7. We will all start on the general voice channel at 3:30, and then split to viewing and presenting posters shortly thereafter.
  8. I will visit all the posters, which gives me a little less than 10 minutes for each poster, on average; some visits may be longer and some shorter. Students should visit whichever posters interest you.
  9. I understand that due to connectivity issues, time zone differences, and other difficulties, some students may not be able to attend the entire poster session. Please coordinate with your teammates to make sure the poster is presented. If you find that such issues prevent the presentation of the poster, please let me know ahead of time.

Final report

The final report describes the research you have done, reporting on the method and results, relating the research to other work in the field, and offering conclusions and directions for future work. The report should be about 2000 words long, not counting the references; reports that substantially exceed this length will be penalized. The structure is similar to the proposal, but with more detail, and two additional sections following the method section.

The six main content sections (introduction, materials, procedure, evaluation, results, and discussion) carry equal weight. Therefore, they should be of similar lengths – this means reserving about 300–350 words for each section. This is only a general guideline, as you may find that some sections require more text than others. However, if you find you have more to say than fits within the length requirement, then you’ll need to concentrate on the more important aspects of your project.

When giving examples of text in languages other than English, please use the following multi-line format, to make the examples readable to English speakers. Below is an example for how to present a sentence in Hindi.

किसने दवाई को खरीदा (the original text in its native script)
kisne davaaii kokhariidaa (a transcription into Latin script)
whoERG medicine ACCbought (a word-by-word gloss)
‘Who bought the medicine?’ (a translation into English)

The explanations on the right (in parentheses) are part of the instructions: they do not need to be repeated with the example. The second line (transcription into Latin script) is not needed if the language natively uses a version of the Latin script.


The grade for the assignment will be broken down as follows.

The research project counts for 30% of the overall course grade.