RobotSmith: Generative Robotic Tool Design
for Acquisition of Complex Manipulation Skills

1University of Massachusetts Amherst, 2Massachusetts Institute of Technology, 3National University of Singapore 4NVIDIA 5MIT-IBM Watson AI Lab

Abstract

Endowing robots with tool design abilities is critical for enabling them to solve complex manipulation tasks that would otherwise be intractable. While recent generative frameworks can automatically synthesize task settings, such as 3D scenes and reward functions, they have not yet addressed the challenge of tool-use scenarios. Simply retrieving human-designed tools might not be ideal since many tools (e.g., a rolling pin) are difficult for robotic manipulators to handle. Furthermore, existing tool design approaches either rely on predefined templates with limited parameter tuning or apply generic 3D generation methods that are not optimized for tool creation.To address these limitations, we propose RobotSmith, an automated pipeline that leverages the implicit physical knowledge embedded in vision-language models (VLMs) alongside the more accurate physics provided by physics simulations to design and use tools for robotic manipulation. Our system (1) iteratively proposes tool designs using collaborative VLM agents, (2) generates low-level robot trajectories for tool use, and (3) jointly optimizes tool geometry and usage for task performance. We evaluate our approach across a wide range of manipulation tasks involving rigid, deformable, and fluid objects. Experiments show that our method consistently outperforms strong baselines in terms of both task success rate and overall performance. Notably, our approach achieves a 50.0% average success rate, significantly surpassing other baselines such as 3D generation (21.4%) and tool retrieval (11.1%). Finally, we deploy our system in real-world settings, demonstrating that the generated tools and their usage plans transfer effectively to physical execution, validating the practicality and generalization capabilities of our approach.

Task Gallery

Reach a cube outside workspace

Hold a phone upright

Lift a bowl

Lift a Piggy

Make a calabash shaped dough

Flatten the dough

Cut the dough into two halves

Transport water

Pour water in a bottle

Real-World Experiments

Make a calabash shaped dough (1x)

Fill a bottle with water (1x)

Hold a phone upright (1x)

Make a pancake (16x)