#1 DexOnline – Romanian Literature Crawler


My name is Alin Ungureanu and this summer I’m coding for DexOnline. The project “Romanian Literature Crawler” was assigned to me and it has the following objectives:

  • Find words that DEX online doesn’t know, but that occur frequently on the Internet. Write a script to crop usage examples for these words and pass them on to a team of linguists so that they can write definitions for them.
  • Show usage examples along with our definitions. Offer an interface where admins can select the most relevant examples.
  • Compute statistics on diacritics. For example, compute that, in the context ”abcdSefgh”, S has 90% probability and Ș has 10% probability. This can be used to insert diacritics in a text.

A week has passed by and we haven’t yet agreed on the design document: new ideas are flowing, some of them are too big and can be considered standalone internship projects. It is mandatory to finalize this document until late next week because the clock is ticking and I am eager to write some code:).