In the ever-evolving world of e-commerce, organizing and classifying products correctly is essential—not only for seamless navigation and discovery but also for maximizing conversion rates. Recently, atronous.ai deployed a cutting-edge machine learning pipeline that brings automation, intelligence, and precision to product taxonomy assignments, reducing manual effort and inconsistencies.
đź§ The Challenge: From Chaos to Classification
Retailers often receive product data in various formats—images, URLs, spreadsheets, or PDFs—from a wide range of suppliers. These items lack standardized taxonomies or come with misaligned categories that don’t fit the retailer’s existing classification system. This leads to a fragmented shopping experience and lost revenue opportunities.
🔍 Step 1: Identifying the Object from an Image URL
Using advanced computer vision models, the platform ingests product image URLs and identifies the object type in the photo. For example, in a recent engagement with a leading retailer, the platform processed entries such as:
Using a combination of convolutional neural networks (CNNs) and pretrained models (like CLIP or BLIP), the system outputs a probable object class like Armchair, Sofa, or Barstool, without any manual labeling.
🧬 Step 2: Training on the Retailer’s Current Taxonomy
What makes Atronous truly adaptive is its ability to learn from a retailer’s own classification structure. By training on the existing taxonomy tree—whether pulled from a Shopify store, ERP system, or internal database—the ML engine understands how categories are organized and how products typically flow through them.
For this retailer’s use case, Atronous automatically mapped these detected objects to specific categories such as: – Furniture → Seating → Armchairs – Furniture → Living Room → Sofas – Furniture → Dining → Barstools
Once trained, the system uses similarity scoring between image embeddings and taxonomy embeddings to assign each product to the most appropriate category.
For instance: – An image classified as Barstool with mid-century aesthetics was assigned to Furniture > Dining > Barstools with over 95% confidence. – A decorative stool misclassified in the source data was automatically reclassified as an Accent Piece based on visual and semantic similarity.
This not only improves classification accuracy but allows the taxonomy to adapt dynamically to the retailer’s style and structure.
đź§ľ Bonus Step: Taxonomy Auditing & Alternative Suggestions
In a powerful fourth step, atronous audited the current taxonomy assignments and flagged inconsistencies or misclassifications. It also suggested potential improvements.
For example: – Products categorized under Chairs that visually match Recliners were highlighted. – Items split across Living Room Decor and Accent Chairs were recommended for consolidation or reclassification.
These suggestions are provided in an easy-to-review spreadsheet or directly synced into PIM systems.
🚀 The Impact
This AI-first approach has transformed how retailers handle product onboarding: – 90% reduction in manual taxonomy assignment time – 30% increase in accuracy of product categorization – Improved shopper UX via cleaner navigation and product grouping
As product catalogs grow and become more complex, automation like this isn’t just helpful, it becomes mission critical.
Ready to Streamline Your Taxonomy?
atronous.ai is bringing intelligence to the very core of digital merchandising. If your team is struggling with inconsistent categorization, long onboarding cycles, or messy product feeds, we’d love to help.
Schematics are the backbone of the retail industry but extracting them from PDFs remains a persistent challenge. Marketers, merchandisers, and product developers often waste valuable time navigating through cluttered documents to find key diagrams such as shelf layouts, packaging designs, or product schematics. This inefficiency not only impacts project deadlines but also drives up costs.
At atronous.ai we’ve developed the PDF Schematic Extractor, as an automated solution that precisely extracts and labels schematics from PDFs, URLs, or HTML files. By streamlining this process, we help teams save time, improve accuracy, and accelerate product development, all while reducing the risk of costly delays.
On the left is a complex page from a product document, and on the right is the clean schematic we extracted
Overview of the PDF Schematic Extractor
The PDF Schematic Extractor is a Python package designed to convert files (URLs, HTML) into PDFs and extract schematics using PyMuPDF, OpenCV, Tesseract, and Gemini for labeling. It supports both command-line and Streamlit web-based interfaces, enabling users to upload files, extract schematics, and download results as ZIP files. Recent enhancements have focused on improving the schematic extraction process, addressing accuracy, efficiency, and usability challenges.
Workflow of the PDF Schematic Extractor
The project follows a structured pipeline to process inputs and extract schematics, as illustrated in the flowchart:
1. Input Handling: The tool accepts three types of inputs – URLs, HTML files/strings, or PDFs. URLs and HTML inputs are processed using the url_to_pdf and html_to_pdf functions in core.py, which leverage wkhtmltopdf to convert them into PDFs. If the input is already a PDF, it is directly used for further processing.
2. PDF to Image Conversion: The PDF (whether input directly or converted from URL/HTML) is converted into images using PyMuPDF in the “Convert PDF into Images” step. Each page of the PDF is rendered as an individual image to facilitate image-based processing.
3. Image Processing and Text Detection: Each image undergoes preprocessing in the “Image Processing” step to enhance quality (e.g., adjusting contrast). Tesseract OCR is then applied in the “OCR Text Extraction” step to detect text, and a “Filter Text-Heavy Areas” step identifies and excludes regions dominated by text, ensuring focus on schematic-like areas.
4. Schematic Detection and Extraction: The “Contour Detection” step uses OpenCV to identify potential schematic regions by detecting contours. These contours are grouped in the “Group Contours” step to form cohesive schematic entities, reducing fragmentation. The “Extract Schematics from Images” step isolates these regions as individual schematic images.
5. Labeling with Gemini: The “Schematic Validation” step uses Gemini to confirm that extracted regions are indeed schematics. Nearby text is extracted in the “Get Nearby Text” step, and Gemini generates specific labels in the “Generate Specific Labels with Gemini” step, providing context-aware labels (e.g., “Circuit Diagram: Power Supply Unit”).
6. Output Generation: The extracted schematics are saved as images in the “Schematic Images” step. A “Debugging with Bounding Boxes” option overlays bounding boxes and labels on the images for user review. The final output, including metadata (e.g., page number, label), is saved as a JSON file and can be downloaded as a ZIP file via the web interface.
Enhancements to Schematic Extraction
1. Improved Detection Accuracy with Contour Grouping and Filtering: The initial schematic detection often missed or fragmented schematics in complex PDFs. We introduced a “Group Contours” step (as shown in the flowchart) to cluster nearby contours into cohesive schematic entities based on proximity and size.Additionally, a “Filter Text-Heavy Areas” step using Tesseract OCR excludes text-dominated regions, ensuring focus on schematic-like areas (e.g., technical drawings). This has boosted detection accuracy , especially in
mixed-content PDFs.
2. Enhanced Labeling with Gemini Integration: Labeling previously suffered from inconsistent OCR results. By integrating Gemini for “Schematic Validation” and “Generate Specific Labels with Gemini,” we improved label precision. Gemini validates schematic regions and generates context-aware labels by analyzing nearby text and visual content. For example, a circuit diagram might now be labeled as “Circuit Diagram: Power Supply Unit” instead of “Diagram,” enhancing usability for downstream applications like cataloging.
3. Optimized Preprocessing for Faster Extraction: The original preprocessing pipeline was slow for large PDFs due to redundant conversions. We optimized the “Image Processing” and “Convert PDF into Images” stages by implementing parallel processing for multi-page PDFs and reducing intermediate image resolution without quality loss. The “Extract Schematics from Images” step now uses adaptive thresholding (via the –threshold parameter), cutting processing time while maintaining accuracy.
4. User Customization and Debugging Support: To address “No Schematics Extracted” issues, we enhanced configurability with tunable parameters (–min-area, –padding, –threshold) via the CLI. A “Debugging with Bounding Boxes” output option was added, generating schematic images with overlaid bounding boxes and labels. This transparency helps users debug and adjust extraction settings effectively.
Future Possible Developments
1. Real-Time Processing for Web Interface
The Streamlit web app processes PDFs in batch mode, which is slow for large files. Implementing real-time processing—extracting and displaying schematics as each page is processed would improve user experience. This could leverage asynchronous processing and streaming inputs for faster feedback.
2. Enhanced Labeling with Contextual Understanding and Local LLMs/SLMs
While Gemini has improved labeling, it can misinterpret context due to limited document-wide understanding and reliance on an external API. Future work could integrate a natural language processing (NLP) model to analyze the entire PDF’s text, providing contextual cues (e.g., identifying the document’s domain as electrical engineering) to improve labeling accuracy. Additionally, incorporating local large language models (LLMs) or small language models (SLMs) such as LLaMA, Phi-4-multimodal, DistilBERT would enable on-device processing, reducing dependency on external APIs like Gemini. This would enhance
privacy, lower latency, and allow offline functionality. For instance, a local SLM could be fine-tuned on technical terminology to label a schematic as “Transistor Layout” with higher precision, even in resource-constrained environments.
3. Support for 3D Schematics and Interactive Outputs
Modern PDFs often include 3D or interactive schematics. Extending the tool to detect and extract these, potentially rendering them as interactive 3D models in the web app using libraries like Three.js, would add value. This requires algorithms to parse embedded 3D data in PDFs and render it dynamically.
4. Cross-Platform Compatibility and Cloud Deployment
Manual dependency installation (e.g., wkhtmltopdf, Tesseract) can be challenging for users. Containerizing the application with Docker would simplify cross-platform deployment. Additionally, deploying the tool as a cloud service on AWS or Google Cloud would enable users to process PDFs without local setup, increasing accessibility.
Conclusion
The schematic extraction process in the PDF Schematic Extractor has been significantly enhanced through improved detection, labeling, preprocessing, and user support. These changes have made the tool more accurate, efficient, and user-friendly. Looking ahead, integrating machine learning, real-time processing, local LLMs/SLMs, and cloud deployment can further elevate the tool’s capabilities, making it a versatile solution for schematic