This article examines how an integrated autonomous picking system — combining a lightweight 6-DOF robotic arm, depth-camera-based vision, adaptive grasping, and intelligent task management — was developed and deployed to address these challenges.

Autonomous Mobile Picking Robots: Bridging the Gap Between Warehouse Automation and Retail Fulfillment
Autonomous Mobile Picking Robots: Bridging the Gap Between Warehouse Automation and Retail Fulfillment

Case Study from Realman Robotics

The Growing Automation Gap in Order Picking

Order picking remains one of the most labor-intensive operations in retail and warehouse environments. Despite decades of progress in warehouse automation, the actual process of identifying, grasping, and transferring individual products from shelves to order containers has proven remarkably difficult to automate. According to industry estimates, manual picking accounts for over 50% of total warehouse operating costs, and the challenge intensifies as e-commerce order volumes continue to grow and consumer expectations for faster fulfillment rise.

The difficulty is not simply mechanical. Warehouses and retail stores stock thousands of SKUs that vary dramatically in size, shape, weight, and fragility. A single picking run might require handling a glass bottle, a soft-packaged snack, an irregularly shaped cosmetics item, and a small electronic accessory — all within the same order. Add to that the complexity of navigating dynamic environments where human workers, carts, and temporarily placed inventory create constantly shifting obstacles, and it becomes clear why fully autonomous picking has remained an unsolved challenge for most operations.

This article examines how an integrated autonomous picking system — combining a lightweight 6-DOF robotic arm, depth-camera-based vision, adaptive grasping, and intelligent task management — was developed and deployed to address these challenges. Rather than focusing on a single component, we explore the system-level engineering decisions that make reliable, multi-category autonomous picking possible in real-world retail and warehouse settings.

 

Why Autonomous Picking Demands a Robotic Solution

Before examining the technical architecture, it is worth understanding why order picking is particularly well-suited for robotic automation — and why the problem has resisted conventional approaches.

Order picking is characterized by three properties that make it a strong candidate for automation:

  • High repetition and physical intensity. Workers perform the same reach-grasp-transfer cycle hundreds or thousands of times per shift. Fatigue leads to declining accuracy, increased product damage, and higher injury rates over time.
  • Variability across deployments. Picking environments range from compact convenience stores of a few dozen square meters to large-scale distribution centers spanning thousands of square meters. A viable solution must adapt to different layouts, product mixes, and order volumes without extensive hardware reconfiguration.
  • Growing labor constraints. Rising labor costs and the difficulty of recruiting workers for repetitive, physically demanding warehouse roles create a strong economic case for automation — one that strengthens as operations scale.

At the same time, conventional fixed automation (conveyor-based systems, goods-to-person shuttle systems) often requires significant infrastructure investment and is difficult to retrofit into existing spaces. Mobile autonomous picking robots offer a fundamentally different approach: they bring the automation to the products, navigating existing layouts and working alongside or in place of human pickers.

 

System Architecture: From Cloud Scheduling to Physical Execution

The W1 autonomous picking robot was designed as a vertically integrated system, meaning that the hardware, perception, control, and task management layers were co-developed to work as a cohesive unit rather than assembled from loosely coupled third-party components. This design philosophy was driven by a core insight: reliable autonomous picking in unstructured environments requires tight coordination between perception, manipulation, and navigation — latency or miscommunication between any two layers can cause failures that cascade through the entire picking workflow.

The system is organized into four functional layers, following a distributed “cloud scheduling + local execution” architecture:

Business Layer

The topmost layer interfaces with the customer’s existing business systems — ERP platforms, warehouse management systems, or e-commerce order management tools. It ingests order data, product information, and inventory positions, then translates business requirements into structured picking task requests. This separation ensures the robotic system can be integrated into diverse operational environments without requiring changes to upstream business processes.

Scheduling Layer (TMS)

A proprietary Task Management System (TMS) serves as the operational brain of the deployment. Running on a cloud-based platform, TMS is responsible for task parsing, optimal path planning, equipment scheduling, and operational data aggregation. When multiple W1 robots are deployed in the same environment, TMS coordinates their assignments to avoid conflicts, minimize idle time, and balance workloads. The scheduling layer also provides a real-time monitoring dashboard for supervisors, tracking robot status, task completion rates, and exception events.

Control Layer

Deployed on each robot’s onboard industrial PC (IPC), the control layer receives task instructions from TMS and orchestrates the robot’s subsystems — mobile base, robotic arm, suction gripper, elevator mechanism, and vision system — to execute the physical picking sequence. The control layer manages the precise timing and coordination between navigation, product identification, grasping, and placement operations. It continuously reports execution status back to TMS, enabling closed-loop monitoring and rapid fault response.

Execution Layer

The physical hardware constitutes the execution layer. The W1 integrates the following key components:

  • RM65-B 6-DOF robotic arm — an ultra-lightweight (7.2 kg including integrated controller) collaborative arm with a 5 kg payload capacity, 610 mm reach, and ±0.05 mm repeatability. Its compact form factor and integrated controller eliminate the need for an external control cabinet, a critical advantage for mobile deployment.
  • Intel RealSense stereo depth camera paired with a YOLO-based object detection model, providing real-time product recognition across diverse categories, including stacked and irregularly arranged items.
  • Adaptive suction gripper system with integrated pressure sensors and pneumatic control, capable of handling items up to 1.5 kg with automatic suction force adjustment based on product weight and surface properties.
  • Autonomous mobile base with dual-mode navigation (LiDAR for global path planning, vision-assisted for local obstacle avoidance), achieving movement speeds up to 1.5 m/s with infrared proximity sensors for collision prevention.
  • Stepper motor-driven elevator and tray mechanism for precise vertical positioning of product trays and the shopping container during picking operations.

Hardware Architecture Diagram — Annotated diagram showing all hardware components and their arrangement

 

The Picking Workflow: End-to-End Automation

A complete picking cycle proceeds through the following sequence, illustrating how the four system layers coordinate in practice:

1. Order ingestion. The business system sends order details (product list, shelf locations, delivery point) to TMS.

2. Task planning. TMS parses the order, plans an optimized picking route across the store or warehouse, and dispatches the task to an available W1 robot.

3. Container retrieval. The robot navigates to the container staging area. The elevator mechanism lowers, the stepper motor extends the tray arm, and the suction system secures a shopping container. The container is retracted and locked in place.

4. Navigation to product location. Using LiDAR-based SLAM for global positioning and vision-assisted local navigation, the robot travels to the first product station, dynamically rerouting around any detected obstacles.

5. Product identification and grasping. The elevator adjusts to shelf height. The stepper motor extends the tray to access the product bin. The RealSense camera activates, and the YOLO model identifies the target product’s position and orientation. The vision-guided grasping software calculates the optimal grasp point, calibrates for any positional offset, and commands the robotic arm to execute the pick. The suction gripper engages with automatically adjusted pressure.

6. Transfer to container. The robotic arm transfers the grasped product to the shopping container. Anti-vibration control is applied during the transfer to prevent item displacement.

7. Repeat or proceed. If additional items remain on the order, the robot navigates to the next product location and repeats the pick cycle. Multi-item orders are fulfilled in a single continuous run.

8. Container delivery. Once all items are picked, the robot navigates to the designated drop-off point and releases the completed container.

9. Task closure. The control layer reports task completion to TMS, which updates the order status and synchronizes with the business system. Full traceability data (timestamps, product IDs, pick success/failure) is logged.

Control Logic Flowchart — End-to-end workflow from order receipt to task closure

 

Engineering Challenges and Solutions

Developing a reliable autonomous picking system required solving several interlocking technical problems. The following sections describe the four most significant challenges encountered during development and the engineering approaches used to address them.

Challenge 1: Reliable Grasping Across Diverse Product Categories

The fundamental difficulty in automated picking is that a single gripper must handle products with vastly different physical properties. A fragile glass bottle requires gentle suction at a precise angle; an irregularly packaged snack offers few clean surfaces for suction attachment; a small item like a nail clipper or earphone case is easily dropped during transfer.

The solution combined three complementary strategies:

  • Hardware: The adaptive suction gripper incorporates embedded pressure sensors that continuously monitor and adjust suction force in real time. For fragile items, suction is modulated to prevent surface damage; for heavier items (up to 1.5 kg), full suction force is applied. The gripper’s compliant design conforms to irregular surfaces, significantly expanding the range of graspable geometries.
  • Vision algorithms: The YOLO detection model was augmented with a custom product feature library developed through iterative training on real-world product images. The system identifies optimal grasp points while avoiding edges and fragile zones, achieving positional accuracy of ±1 mm.
  • Motion control: A pre-grasp posture calibration module automatically adjusts the robotic arm’s approach angle to maximize surface contact area. Post-grasp anti-vibration control dampens oscillations during arm movement, preventing the picked item from shifting or falling during transfer to the container.

Challenge 2: Autonomous Navigation in Dynamic Environments

Unlike controlled factory floors, retail stores and small warehouses present dynamic environments with unpredictable obstacles — customers, temporarily placed inventory, carts, and other mobile robots. Traditional single-mode navigation systems frequently fail in such conditions, either generating suboptimal paths or failing to react to sudden obstacles in time.

The W1 employs a dual-mode navigation architecture. LiDAR-based SLAM handles global localization and route planning, providing a stable, real-time map of the environment and computing optimal picking paths. Simultaneously, the vision navigation subsystem handles local obstacle detection and avoidance with finer granularity, particularly for low-profile or transparent obstacles that LiDAR alone may miss. Infrared proximity sensors mounted around the base provide a final safety layer, triggering automatic deceleration when any object comes within 30 cm — preventing collisions even in scenarios where both primary navigation modes are briefly obstructed.

Challenge 3: Robust Visual Recognition Under Variable Conditions

Warehouse and retail lighting conditions are rarely ideal. Backlighting near windows, dim aisles, and overhead fluorescent glare can all degrade vision system performance. Additionally, products are often stacked or partially occluded, hiding the visual features needed for reliable identification.

Hardware-side, the Intel RealSense stereo depth camera was selected for its ability to generate accurate depth maps even under varying illumination. An integrated supplemental lighting module automatically adjusts intensity based on ambient light levels, and the camera mounting angle was optimized through empirical testing to minimize specular reflections from product packaging.

On the algorithm side, transfer learning was applied to the YOLO model using training datasets that deliberately included difficult lighting conditions and stacked product arrangements. A feature fusion algorithm was implemented to enable product identification from partially visible features — even when a product is partially occluded due to stacking, the system can still achieve reliable identification by matching the exposed features against the trained model.

Challenge 4: Multi-Module Coordination and System Stability

Perhaps the most underappreciated challenge in autonomous picking systems is ensuring stable, low-latency coordination across all hardware and software modules. The W1 system involves continuous communication between TMS (cloud), the onboard control system, the robotic arm controller, the vision pipeline, and the navigation subsystem. Any delay or communication failure at any point can cause a cascading disruption — for example, if the robotic arm and the vision system fall out of sync, or if navigation and task scheduling become decoupled, the resulting disruption can halt the entire picking operation.

Three design principles were applied to address this:

  • Interface optimization: Communication latency between the cloud TMS and onboard control system was reduced to under 100 ms through protocol optimization. Redundant communication channels ensure automatic failover if the primary link is interrupted.
  • Parallel execution with sequential verification: The control system pipelines operations where possible — for instance, the camera begins pre-scanning the next product while the arm is still completing the current transfer. A sequential verification step confirms each operation’s success before the next dependent action begins.
  • Self-diagnostic monitoring: The system runs continuous hardware and software health checks during operation. If any subsystem reports an anomaly, the task is automatically paused and the issue is escalated to TMS before it can affect downstream operations. Periodic firmware updates and parameter recalibration further ensure long-term stability.

 

Performance Results

The W1 system has been validated in a 300 m² warehouse environment handling a diverse product mix. Key performance metrics include:

Metric

Result

Average order completion time

1.5 minutes per order (3 items)

Throughput

40 orders per hour (continuous operation)

Product size range

5 cm to 30 cm (cross-category)

Operation mode

24/7 unattended operation

Positioning repeatability

±0.05 mm (robotic arm)

Grasp point accuracy

±1 mm (vision-guided)

 

Beyond raw throughput, several operational capabilities distinguish the system:

  • Multi-category handling without changeover. The robot transitions seamlessly between fragile goods, irregularly shaped items, and standard packaged products within the same picking run, with no manual intervention or tooling changes required.
  • Rapid scene adaptation. Deploying in a new environment requires only software configuration and parameter tuning — the robot can be operational in a new store or warehouse layout without physical infrastructure modifications.
  • Full traceability. Every picking action is logged with timestamps, product identification data, and success/failure status, providing the data foundation for continuous process optimization and regulatory compliance.
  • Multi-robot coordination. TMS supports multi-robot deployments within the same facility, dynamically allocating tasks to balance workloads and avoid spatial conflicts between robots.

Retail Deployment Scene — W1 operating in a live convenience store environment

 

Scalability and Cross-Industry Applications

One of the core design goals for the W1 platform was replicability: the ability to deploy the same fundamental system across different industries with minimal hardware modification. Because the system’s intelligence resides primarily in software — vision models, path planning algorithms, and task scheduling logic — adapting to a new domain typically requires only retraining the vision model on new product categories and reconfiguring TMS task parameters.

Industries with high replication potential include:

  • Pharmaceutical warehousing and unmanned pharmacies: Handling small, fragile, and irregularly packaged pharmaceutical products while maintaining the traceability and precision standards required by regulatory frameworks.
  • Cosmetics fulfillment: Precise picking of small, high-value items (lipsticks, skincare samples, fragrance bottles) where product damage directly impacts revenue.
  • Convenience retail and micro-fulfillment: Compact robot footprint enables deployment in space-constrained urban retail environments where traditional warehouse automation is impractical.
  • Light manufacturing and assembly: Picking and kitting small components for production lines, interfacing with existing MES (Manufacturing Execution System) platforms for just-in-time material delivery.

For system integrators, the standardized interface architecture reduces integration complexity and cost. The TMS API supports direct connection to existing ERP, WMS, and MES systems, enabling integrators to embed the picking capability into broader automation solutions without developing custom middleware. For end users, the operational model is straightforward: the robot’s ongoing maintenance requirements are minimal, and scaling up means adding additional W1 units and letting TMS automatically balance the expanded fleet.

 

Looking Forward: The Evolution of Autonomous Picking

The deployment described in this article represents an early but meaningful proof point for what we believe is a broader industry transition. As vision models become more capable, robotic arms become lighter and more dexterous, and edge computing delivers more processing power in smaller packages, the performance envelope of autonomous picking systems will continue to expand.

Several development directions are particularly promising. Foundation models for robotic manipulation could reduce the need for product-specific training, enabling robots to handle novel items with little or no prior exposure. Improved sim-to-real transfer techniques could accelerate deployment by allowing systems to be largely trained in simulation before physical installation. And tighter integration between robotic picking systems and broader warehouse orchestration platforms could enable truly lights-out fulfillment operations.

For the near term, however, the most impactful advances will likely come not from any single technological breakthrough, but from the continued refinement of system-level integration — making all the pieces work together more reliably, more efficiently, and across a wider range of real-world conditions. That is where the hardest engineering problems remain, and where the greatest operational value is yet to be unlocked.

 

RealMan Group is a system-level infrastructure platform company built for the era of embodied intelligence. Powered by a three-pillar flywheel of "hardware + data + remote operation networks," RealMan has forged an integrated development path spanning power-dense actuator joints, robotic arms, and complete robot systems; a labor-capability network that reconstructs work across time and space; data assets; a global robot super-factory; and a broad partner ecosystem. By continuously driving the iteration and evolution of embodied intelligence technologies, RealMan is accelerating its vision of robots serving human society at scale.

For more information, visit: www.realman-robotics.com

 

The content & opinions in this article are the author’s and do not necessarily represent the views of RoboticsTomorrow

Featured Product

Onward Robotics - Meet Me Fulfillment Automation

Onward Robotics - Meet Me Fulfillment Automation

Meet Me uniquely brings talent and technology together: providing end-to-end process efficiency and enabling accurate and continuous fulfillment workflows. Proprietary Pyxis technology uniquely orchestrates picker and Lumabot AMR workflows independently, delivering fast, accurate, and efficient fulfillment from induction to pack out. Learn more about Meet Me Automation: Download Overview Brochure