#7 dexonline – Approximate search using trigrams

And finally, my last post this summer.

The project is now finished. After analysing the data from the log file and comparing the two algorithms, Levenshtein and trigram, my mentor and I decided that the winner is…both of them :) . Trigram’s results are, from my point of view, better than I expected. Even if at first I was a little bit sceptic that the suggestions made would be the ones one would expect, they are actually pretty close to those given by the Levenshtein. And this is thanks to the fact that before a word is split into trigrams, “##” and “%%” are added at the beggining and end of it. So, for example, for the word “trigram”, the vector or trigrams is ['##t', '#tr', 'tri', 'rig', 'igr', 'gra', 'ram', 'am%', 'm%%'].

Of course that the results are not as precise as those given by the Levenshtein, because in that case, the positions of the letters on the keyboard were taken into consideration. Despite this and the fact that both are winners, I can say that trigram’s golden medal is more shiny, due to the fact that the execution time is very low. If before the start of the project the average time/search was of 0.6 seconds, the one for Levenshtein is 0.9 and for trigrams is of only 0.2 seconds.

And if it is to talk some more about statistics, only 25% of the searches don’t return a suggestion, whereas at the beggining, the percentace was 74%. Moreover, 32% of them return a single suggestion, redirecting to that page (before, only 13% were redirected).

The reason why I said that both of the algorithms are winners is that, although at first only the trigram was, we decided today to combine them by using the Levenshtein as a filter for the trigram’s results, in order to sort the suggested words according to the position of the letters, but since this was the last day of the internship and I wouldn’t have had time to finish it, my mentor decided to help me by getting the job done. Because Levenshtein is applied on a maximum of 20 words (at which trigram’s number of suggestions is limited), and not on the whole dictionary as it was in the first place, the execution time for this part is insignificant, and as a whole it still remains very low.

All in all, it was a great experience to work at this project during this summer, and I can say that the results are great and I learnt lots of things, considering that I hadn’t worked in php or mysql before.

#6 Vmchecker

This is my last post on the vmchecker gui project.

I have finished the “Add course” page.

The page contains a dynamically added javascript calendar for selecting the dates.

Dynamically added input forms for the virtual machines.

If all the mandatory fields are not created then the course will not be created and the user will see a message.

If the course already exists the user will be asked if he wants to configure it.

I have started working on “Evaluate assignment”.

I have already created most of the scripts. I just have to join them together and make it work.

I have thought of a method to make vmchecker more secure by getting the courses out of the public_html directory and make them inaccessible through a web browser.

I have fixed some minor bugs that appeared on the menu and other pages.


#6 [WHC] Weekly Report

Did you know? OpenCL is very error-prone. A bug in you program can crash  device driver and if you use Linux, only restart can help.

This is the 6th and probably the last blog post. WHC IDE is almost finished and it’s marked now as beta version. I’ve managed to code all important components and they work really well. Since my last blogpost i’ve added two new key features:

- Device Query

- Execution with temporary folders and running on multiple devices simultaneously.

Device query implementation raised a big problem: To add or not to add an additional OpenCL dependency to IDE. Theoretically you can’t work with no OpenCL SDK installed, because you can’t compile and execute, but we decided that maybe, IDE will be used as Text Editor. And i added a build-in OpenCL library for queering available devices.

Available devices are used when executing on multiple devices, you have an option to select which devices you want to use.

Temporary folders were added to extend diagram flexibility and to pack execution together with copying output files to multiple data folders and to allow task-to-task interconnections.

Few task still remain, but it’s only some basics: add save files before building, tune settings menu,display device info as a tree …

Want to tell that i’m really happy with the result, because WHC IDE is a fully functional Ide and it will become a powerful tool for OpenCL developers.

#6 dexonline – Approximate search using trigrams

Not much time left untill the end of rsoc. Since my last post, I’ve been working on the trigram algorithm and I can say it works pretty well.

I have created a new table in the DEX database, NGram, that contains the trigrams and the id of the word in which they occur. When a user searches for a word, it is split into groups of trigrams and, after searching in the NGram table, the words that have the most trigrams in common with the misspelt one are suggested as corrections. The code is not online at this moment because it is a little messy and I have to improve it a little bit.

I have also made some improvements to the old code, the one with Levenshtein algorithm, and the search time about which I was complaining in my last post was considerably improved.

In this last week I will finish the trigram implementation and after a little more tests, my mentor and I will decide on a final form for the project.

kreator.js checkin

I’ve reached a feature freeze state with the project, implemented all the functionalities that I wanted, the last one being image upload and resize. The files “uploaded” using the file API and the user can manipulate them in the browser but they are not included in the zip file because of size issues so he has to include them himself in the /img folder included in the archive. I have started cleaning up the code and looking for bugs to fix, I’m looking on improving the overall quality of the code I have written so far. I want to improve the interface if possible, I want a first time user to be able to get going with creating a presentation as soon as he lands on the page. I think a splash page is in order since right now the first impression for many might be confusing.

You can see the project in its latest version at http://piatra.jit.su/ or http://piatra.nodejitsu.com/

I want to implement a step by step walk through for first time users. Something basic made up of speech bubbles that will move around the interface describing the basic usage of the interface so as to eliminate and confusion. Also I am going to make another screencast of the application in use.

#6 Chat, improved communication

The last two weeks passed very quickly, but the project has moved with big steps to the end.

On these weeks, Alex merged the code on the main project and put it online to the wouso-next, where the real users could test my creation. On the start there were a couple of bugs on the contact-box, but they were solved.

Because of the long name of some users, we decided to create nick names for chat users. In the start I had make by default the user name to be his nick name, but this wasn’t enough, so I created a page where every user has the possibility to change their nick name and first name. For the moment there are just those two fields, but there is place for more.

Another thing made recently, is that I implemented some permissions which will change a chat-user to a super-chat-user. This means that, the user will have access to some new commands. For the moment three commands are available: /kick user, /ban user, /unban user. All of those commands have to be written on the global chat, and have the purpose to kick a user from the chat page for a period of time.

At this moment, I started to watch what has be finished and what I can change for a better use.

#5 dexonline – Approximate search using trigrams

Nothing special in the last 2 weeks, due to the fact that I had 12 days off in which I was abroad. First, I was in Hungary with my highschool teachers and a group of 40 students for a week, and the next day after I arrived home, I went in Vienna for 4 more days with some friends. It was real fun, we have visited lots of interesting places and made hundreds of pictures.

But back to work now :) . In this period I analysed the log of the approximate search queries, and I concluded that, despite the fact that the average search time takes a little bit longer than I expected, 54% of them return at least a suggestion of correction, which I think is pretty good, considering that before, the percentage was just 25%. Anyway, the rest of 46% are searches that have nothing to do with the dictionary, like words in another languages or multiple words. If I’d show you some of them, you’ll probabil laugh all day.

I’ve also been making some research on trigrams and I agreed with my mentor on the steps that have to be made further on.

#5 Vmchecker

I have finally finished my user stories [0] and also created some enhanced user stories [1].

I have created the menus for the:

student – [2]

teacher – [3]

administrator – [4]

[0] https://github.com/cosmin1123/vmchecker/wiki/User-Stories

[1] https://github.com/cosmin1123/vmchecker/wiki/Ehanced-user-stories

[2] https://elf.cs.pub.ro/vmchecker-rsoc/menuStudent.htm

[3] https://elf.cs.pub.ro/vmchecker-rsoc/menuTeacher.htm

[4] https://elf.cs.pub.ro/vmchecker-rsoc/menuAdmin.htm


#5 [WHC] Weekly Report

Did you know? Blog post’s title states that this is weekly report, but it’s actually posted every 2 weeks :D

Hello, it’s time for another major feature and this time, I’ve managed to get some tasks execution based on workflow diagram. Well, I only managed to execute a task with one data folder as input and one output folder to store the result.

First part of this feature was to figure out de order of task execution. Because all tasks are stored random, and it can’t execute one task if data from another one is required. While I was thinking about this problem, I remembered that we learnt an algorithm: Topological Sort. After few hours all tasks were sorted, and i was ready to execute them.

The second part, execution itself, I used a queue to store task location and command line arguments. This approach allows me easily to modify the way that data are stored in queue, when execution itself remain untouched. Now, the execution queue is a default FIFO that simply take a task from execution order and add all inputs, but it’s not suited for workflow optimization. Another algorithm coming soon.

Also, i must define a way to store outputs, when from task’s output connector goes multiple arrows.

Execution was the last, most difficult part, from now I must fix all bugs,implement missing features, optimize execution and test entire application :)

Cosmin, tell us what you did this 2 weeks :


this time I worked at the deploy part for Windows.The point was that:I needed to copy all static dependencies in an install folder specified trough an install wizard.First I needed to know all dependencies, I have found them with Dependency Walker,after that I used a script for Inno Setup Compiler to make the installer.

I also changed the highlight class to load into memory the data and
I began to make the interface for editor settings.

#4 Kreator – Intelligent grid system

The project reached quite a milestone recently, I’ve implemented most of the features I’ve hoped for and I’m really proud of the latest addition: an intelligent grid system. The way it works is pretty straight forward: when you select the grid button to draw it a canvas element is added to the DOM and you draw your lines (the grid system only works on horizontal/vertical lines for now) . The end and start points (x,y points) are stored in the localStorage. When you move an element the position is checked against each line and if the element is in range of a certain line it is drawn on canvas and snaps to it.

I am currently working on a File API way to store uploaded images, so you can add them several times in your slides or reuse them in other presentations. The way it would work is that you can drop the images or select them via a input file button and I would use File API permanent storage to store them. I would also add a nice interface for the user to  access the images later and I think it adds up to a interesting feature.

I hope to add that latest feature as soon as possible so I can get on refactoring some of the code and fix the bugs :D