Text Extraction from Product Images Using State-of-the-Art Deep Learning Techniques
Extracting texts of various sizes, shapes and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to e-commerce, augmented reality assistance system in a natural scene, content moderation in social media platform, etc. The text from the image can be a richer and more accurate source of data than human inputs which can be used in several applications like Attribute Extraction, Offensive Text Classification, Product Matching, Compliance use cases, etc. Extracting text is achieved in 2 stages. Text detection: The detector detects the character locations in an image and then combines all characters close to each other to form a word based on an affinity score which is also predicted by the network. Since the model is at a character level, it can detect in any orientation. Post this, the text is then sent through the Recognizer module. Text Recognition: Detected text regions are sent to the CRNN-CTC network to obtain the final text. CNN’s are incorporated to obtain image features that are then passed to the LSTM network as shown in the below figure. Connectionist Temporal Classification(CTC) decoder operation is then applied to the LSTM outputs for all the time steps to finally obtain the raw text from the image.
Key Takeaways:
1. Understanding the need for text extraction from Product Images.
2. Deep Learning Techniques for detecting highly oriented text.
3. End to End understanding of CRNN-CTC network for text recognition with TF 2.0
4. Need for CTC loss and theoretical understanding of the same.
5. Usage of Text Extraction in various fields/domains.
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner