This week I learned how to use Idiorm, a php library for mysql databases and Iimplemented the crawler DB side. I found out that the Idiorm INSERT usage was quite obscurely implemented because I didn’t find an example on the web so I started reading the library implementation. Finally I found out that you have to use $obj = ORM::for_table(‘table_name’)->create(); to make an object with the table fields as php variables, then you have to set the coresponding variables values ($obj->field_1 = $val_1;$obj->field_n = $val_n) and finally call $obj->save(); I wrote this code because the other DexOnline intern will need it.
I also wrote a mechanism to manipulate URLs (transform relative to canonical URLs, a mechanism to find if an URL is used (hash + special cases).
I stumbled upon saving the rawPage and parsedText to the filesystem because of directory rights. I didn’t want to change the directory owner so I moved the files to /tmp/DexContent/, but it still doesn’t want to save the files. I’m using file_put_contents($filename, $string) and $filename contains only alfanumeric values and the ‘_’ char.