#7 dexonline – Approximate search using trigrams

And finally, my last post this summer.

The project is now finished. After analysing the data from the log file and comparing the two algorithms, Levenshtein and trigram, my mentor and I decided that the winner is…both of them :) . Trigram’s results are, from my point of view, better than I expected. Even if at first I was a little bit sceptic that the suggestions made would be the ones one would expect, they are actually pretty close to those given by the Levenshtein. And this is thanks to the fact that before a word is split into trigrams, “##” and “%%” are added at the beggining and end of it. So, for example, for the word “trigram”, the vector or trigrams is ['##t', '#tr', 'tri', 'rig', 'igr', 'gra', 'ram', 'am%', 'm%%'].

Of course that the results are not as precise as those given by the Levenshtein, because in that case, the positions of the letters on the keyboard were taken into consideration. Despite this and the fact that both are winners, I can say that trigram’s golden medal is more shiny, due to the fact that the execution time is very low. If before the start of the project the average time/search was of 0.6 seconds, the one for Levenshtein is 0.9 and for trigrams is of only 0.2 seconds.

And if it is to talk some more about statistics, only 25% of the searches don’t return a suggestion, whereas at the beggining, the percentace was 74%. Moreover, 32% of them return a single suggestion, redirecting to that page (before, only 13% were redirected).

The reason why I said that both of the algorithms are winners is that, although at first only the trigram was, we decided today to combine them by using the Levenshtein as a filter for the trigram’s results, in order to sort the suggested words according to the position of the letters, but since this was the last day of the internship and I wouldn’t have had time to finish it, my mentor decided to help me by getting the job done. Because Levenshtein is applied on a maximum of 20 words (at which trigram’s number of suggestions is limited), and not on the whole dictionary as it was in the first place, the execution time for this part is insignificant, and as a whole it still remains very low.

All in all, it was a great experience to work at this project during this summer, and I can say that the results are great and I learnt lots of things, considering that I hadn’t worked in php or mysql before.

#6 Vmchecker

This is my last post on the vmchecker gui project.

I have finished the “Add course” page.

The page contains a dynamically added javascript calendar for selecting the dates.

Dynamically added input forms for the virtual machines.

If all the mandatory fields are not created then the course will not be created and the user will see a message.

If the course already exists the user will be asked if he wants to configure it.

I have started working on “Evaluate assignment”.

I have already created most of the scripts. I just have to join them together and make it work.

I have thought of a method to make vmchecker more secure by getting the courses out of the public_html directory and make them inaccessible through a web browser.

I have fixed some minor bugs that appeared on the menu and other pages.


#6 [WHC] Weekly Report

Did you know? OpenCL is very error-prone. A bug in you program can crash  device driver and if you use Linux, only restart can help.

This is the 6th and probably the last blog post. WHC IDE is almost finished and it’s marked now as beta version. I’ve managed to code all important components and they work really well. Since my last blogpost i’ve added two new key features:

- Device Query

- Execution with temporary folders and running on multiple devices simultaneously.

Device query implementation raised a big problem: To add or not to add an additional OpenCL dependency to IDE. Theoretically you can’t work with no OpenCL SDK installed, because you can’t compile and execute, but we decided that maybe, IDE will be used as Text Editor. And i added a build-in OpenCL library for queering available devices.

Available devices are used when executing on multiple devices, you have an option to select which devices you want to use.

Temporary folders were added to extend diagram flexibility and to pack execution together with copying output files to multiple data folders and to allow task-to-task interconnections.

Few task still remain, but it’s only some basics: add save files before building, tune settings menu,display device info as a tree …

Want to tell that i’m really happy with the result, because WHC IDE is a fully functional Ide and it will become a powerful tool for OpenCL developers.

#6 dexonline – Approximate search using trigrams

Not much time left untill the end of rsoc. Since my last post, I’ve been working on the trigram algorithm and I can say it works pretty well.

I have created a new table in the DEX database, NGram, that contains the trigrams and the id of the word in which they occur. When a user searches for a word, it is split into groups of trigrams and, after searching in the NGram table, the words that have the most trigrams in common with the misspelt one are suggested as corrections. The code is not online at this moment because it is a little messy and I have to improve it a little bit.

I have also made some improvements to the old code, the one with Levenshtein algorithm, and the search time about which I was complaining in my last post was considerably improved.

In this last week I will finish the trigram implementation and after a little more tests, my mentor and I will decide on a final form for the project.

kreator.js checkin

I’ve reached a feature freeze state with the project, implemented all the functionalities that I wanted, the last one being image upload and resize. The files “uploaded” using the file API and the user can manipulate them in the browser but they are not included in the zip file because of size issues so he has to include them himself in the /img folder included in the archive. I have started cleaning up the code and looking for bugs to fix, I’m looking on improving the overall quality of the code I have written so far. I want to improve the interface if possible, I want a first time user to be able to get going with creating a presentation as soon as he lands on the page. I think a splash page is in order since right now the first impression for many might be confusing.

You can see the project in its latest version at http://piatra.jit.su/ or http://piatra.nodejitsu.com/

I want to implement a step by step walk through for first time users. Something basic made up of speech bubbles that will move around the interface describing the basic usage of the interface so as to eliminate and confusion. Also I am going to make another screencast of the application in use.

#6 Chat, improved communication

The last two weeks passed very quickly, but the project has moved with big steps to the end.

On these weeks, Alex merged the code on the main project and put it online to the wouso-next, where the real users could test my creation. On the start there were a couple of bugs on the contact-box, but they were solved.

Because of the long name of some users, we decided to create nick names for chat users. In the start I had make by default the user name to be his nick name, but this wasn’t enough, so I created a page where every user has the possibility to change their nick name and first name. For the moment there are just those two fields, but there is place for more.

Another thing made recently, is that I implemented some permissions which will change a chat-user to a super-chat-user. This means that, the user will have access to some new commands. For the moment three commands are available: /kick user, /ban user, /unban user. All of those commands have to be written on the global chat, and have the purpose to kick a user from the chat page for a period of time.

At this moment, I started to watch what has be finished and what I can change for a better use.