Progress Update 1st period evaluation
I should have written this blog earlier but works has been done mostly as schedule. The over all workflow of the OCR is to do detection first, then recognition and finally merge the detection results. The detection is build on a R-CNN architecture, which is very fast to train while having pretty robust result. However during the 2nd and 3rd week, I was having a lot of trouble on the HPC, for instance, the CPU was too old for certain versions of tensorflow, the CUDA module installed was not compatible with itself, and memory copying invoked some error in the system so that I cannot train it on multicore.
I managed to finish training on my own machine for the head layers, but fine tuning requires more than 8 GB of GPU ram which I don’t have. I will try to train it on some other HPC later on.
As that being said, to keep up with the work schedule, I moved on to other tasks. Right now, the main framework has completed, the module is using tesseract OCR as the recognition part. This imporves the design in the Rosetta system where it can only recognize English text. After testing, tesseract OCR works poorly when reading an image that has text of different sizes (it would return only 1 text result), but worked pretty well if texts were seperated out.
Now the code can read the results from the detection model, filter out areas of each frame and feed that into the OCR module.
So far, although the schedule has changed a little bit, it is still managable and should be finished more or less with the same timeline.