Polyglot Translator Robot
OCR, YOLO, Python, ROS2, MoveIt2, Emika Franka Robot Arm
Authors: Allen Liu, Damien Koh, Kassidy Shedd, Henry Brown, Megan Black
GitHub: View this project on GitHub
Project Description
This project implements a multilingual translation robot system that processes input from either written text on a whiteboard or spoken audio through a microphone. The system performs real-time language detection, translation, and physically writes the translated output using a robotic arm.
System Workflow
flowchart TD
START([System Ready]) --> INPUT{Input Type?}
INPUT -->|Visual| CAMERA[Capture Whiteboard Image]
INPUT -->|Audio| MIC[Record Audio]
CAMERA --> HUMAN_DET[Human Detection<br/>YOLO]
HUMAN_DET -->|Person Detected| OCR_PROC[OCR Processing]
HUMAN_DET -->|No Person| CAMERA
MIC --> SPEECH_REC[Speech Recognition]
OCR_PROC --> LANG_DET[Language Detection]
SPEECH_REC --> LANG_DET
LANG_DET --> TRANS[Google Translate API]
TRANS --> GEN_WP[Generate Writing Waypoints]
GEN_WP --> CALIB[AprilTag Calibration]
CALIB --> PLAN[MoveIt2 Cartesian Path]
PLAN --> WRITE[Robot Writes Translation]
WRITE --> END([Complete])
style TRANS fill:#fff4e1
style PLAN fill:#e1f5ff
style OCR_PROC fill:#d4edda
The system integrates natural language processing, machine learning, computer vision, and robotics to create a cohesive multilingual translation platform with physical output capabilities.
System Architecture
This project integrates five specialized subsystems to enable multilingual translation and robotic writing.
graph TB
subgraph Input["Input Processing"]
TEXT[Text Input<br/>Whiteboard]
AUDIO[Audio Input<br/>Microphone]
YOLO[YOLO Object Detection]
OCR[OCR Text Recognition]
SPEECH[Speech Recognition]
end
subgraph Translation["Translation Engine"]
DETECT_LANG[Language Detection]
TRANSLATE[Google Translate API]
end
subgraph Planning["Path Planning"]
STRING2WP[String to Waypoints<br/>Matplotlib]
APRILTAG[AprilTag Detection<br/>Whiteboard Calibration]
end
subgraph Execution["Robot Control"]
MOVEIT[MoveIt2 Cartesian Planner]
FRANKA[Franka Emika Robot Arm]
end
TEXT --> YOLO
YOLO --> OCR
AUDIO --> SPEECH
OCR --> DETECT_LANG
SPEECH --> DETECT_LANG
DETECT_LANG --> TRANSLATE
TRANSLATE --> STRING2WP
APRILTAG --> MOVEIT
STRING2WP --> MOVEIT
MOVEIT --> FRANKA
style TRANSLATE fill:#fff4e1
style MOVEIT fill:#e1f5ff
style OCR fill:#d4edda
Subsystem Responsibilities:
writer(Allen): Cartesian path planning using MoveIt2 for writing characters on the whiteboard, with AprilTag-based calibrationtranslation(Damien): Google Translate API integration for language translationcomputer_vision(Megan): YOLO object detection and OCR for text recognition and human detectionstring2waypoints(Kassidy): Matplotlib-based waypoint generation for character trajectoriesapriltags(Henry): AprilTag detection for whiteboard localization and orientation
Features
Translate from Chinese to English
Translate from German to French
Translate from Spanish to Korean
Translate from Simpified Chinese to Traditional Chinese
Hindi Voice to English
Spanish Voice to English
Challenges
- Cartesian Path Planner: When initially incorporating the
find cartesian pathfunctionality using theMoveItAPI, we encountered a challenge whereRVizindicated that the robot had identified the path but was unable to execute it. To address this issue, we examined our code related to theMoveItAPI, specifically focusing on the function responsible for calling theComputeCartesianPathservice. Upon comparing our implementation with the officialMoveItdocumentation, we identified a crucial missing parameter known ascartesian_speed_limit_link, which had not been specified in our code. Once we addressed this omission and provided the necessary parameter, the robot successfully executed the intended movements. - TF tree when integrating
apriltags: Upon the initial implementation ofapriltagson the robot, we encountered an issue where the robot occasionally failed to move as intended, leading to collisions when approaching certain orientations and positions. To address this challenge, our debugging process involved a thorough examination of theTF treeassociated with the robot. We conducted numerous experiments by sending various commands, instructing the robot to move in all possible directions. During this investigation, a crucial insight emerged when analyzing theTF tree. It was discovered that with the introduction ofapriltagsinto the system, the root frame of theTF treeshifted frompanda_link0, the base frame of the robot, tocamera_link. Consequently, the commands we were sending were relative to thecamera_linkframe rather than the base frame. Upon rectifying this discrepancy, specifically aligning the commands with the correct base frame, the robot executed movements flawlessly.
Possible Improvements
- Some script language still fail to detect: To resolve this issue, we can try to refind the language model by training more dataset on the script languages making it easier to detect the script languages.
- Sometimes, the camera source get dropped, need to re-launch all: This happened because sometimes the
realsensecamera package does not detect the camera successfully so that it will throw can error when that happens. To address this issue, we can surround that with a protect function that catch the error when it throws and try it again.