Polyglot Translator Robot 多语言翻译机器人
Multilingual translation robot using OCR, speech recognition, and robotic writing 使用OCR、语音识别和机械臂书写的多语言翻译机器人
OCR, YOLO, Python, ROS2, MoveIt2, Emika Franka Robot Arm
Authors: Allen Liu, Damien Koh, Kassidy Shedd, Henry Brown, Megan Black
GitHub: View this project on GitHub
Project Description
This project implements a multilingual translation robot system that processes input from either written text on a whiteboard or spoken audio through a microphone. The system performs real-time language detection, translation, and physically writes the translated output using a robotic arm.
System Workflow
flowchart TD
START([System Ready]) --> INPUT{Input Type?}
INPUT -->|Visual| CAMERA[Capture Whiteboard Image]
INPUT -->|Audio| MIC[Record Audio]
CAMERA --> HUMAN_DET[Human Detection<br/>YOLO]
HUMAN_DET -->|Person Detected| OCR_PROC[OCR Processing]
HUMAN_DET -->|No Person| CAMERA
MIC --> SPEECH_REC[Speech Recognition]
OCR_PROC --> LANG_DET[Language Detection]
SPEECH_REC --> LANG_DET
LANG_DET --> TRANS[Google Translate API]
TRANS --> GEN_WP[Generate Writing Waypoints]
GEN_WP --> CALIB[AprilTag Calibration]
CALIB --> PLAN[MoveIt2 Cartesian Path]
PLAN --> WRITE[Robot Writes Translation]
WRITE --> END([Complete])
style TRANS fill:#fff4e1
style PLAN fill:#e1f5ff
style OCR_PROC fill:#d4edda
The system integrates natural language processing, machine learning, computer vision, and robotics to create a cohesive multilingual translation platform with physical output capabilities.
System Architecture
This project integrates five specialized subsystems to enable multilingual translation and robotic writing.
graph TB
subgraph Input["Input Processing"]
TEXT[Text Input<br/>Whiteboard]
AUDIO[Audio Input<br/>Microphone]
YOLO[YOLO Object Detection]
OCR[OCR Text Recognition]
SPEECH[Speech Recognition]
end
subgraph Translation["Translation Engine"]
DETECT_LANG[Language Detection]
TRANSLATE[Google Translate API]
end
subgraph Planning["Path Planning"]
STRING2WP[String to Waypoints<br/>Matplotlib]
APRILTAG[AprilTag Detection<br/>Whiteboard Calibration]
end
subgraph Execution["Robot Control"]
MOVEIT[MoveIt2 Cartesian Planner]
FRANKA[Franka Emika Robot Arm]
end
TEXT --> YOLO
YOLO --> OCR
AUDIO --> SPEECH
OCR --> DETECT_LANG
SPEECH --> DETECT_LANG
DETECT_LANG --> TRANSLATE
TRANSLATE --> STRING2WP
APRILTAG --> MOVEIT
STRING2WP --> MOVEIT
MOVEIT --> FRANKA
style TRANSLATE fill:#fff4e1
style MOVEIT fill:#e1f5ff
style OCR fill:#d4edda
Subsystem Responsibilities:
writer(Allen): Cartesian path planning using MoveIt2 for writing characters on the whiteboard, with AprilTag-based calibrationtranslation(Damien): Google Translate API integration for language translationcomputer_vision(Megan): YOLO object detection and OCR for text recognition and human detectionstring2waypoints(Kassidy): Matplotlib-based waypoint generation for character trajectoriesapriltags(Henry): AprilTag detection for whiteboard localization and orientation
Features
Translate from Chinese to English
Translate from German to French
Translate from Spanish to Korean
Translate from Simpified Chinese to Traditional Chinese
Hindi Voice to English
Spanish Voice to English
Challenges
- Cartesian Path Planner: When initially incorporating the
find cartesian pathfunctionality using theMoveItAPI, we encountered a challenge whereRVizindicated that the robot had identified the path but was unable to execute it. To address this issue, we examined our code related to theMoveItAPI, specifically focusing on the function responsible for calling theComputeCartesianPathservice. Upon comparing our implementation with the officialMoveItdocumentation, we identified a crucial missing parameter known ascartesian_speed_limit_link, which had not been specified in our code. Once we addressed this omission and provided the necessary parameter, the robot successfully executed the intended movements. - TF tree when integrating
apriltags: Upon the initial implementation ofapriltagson the robot, we encountered an issue where the robot occasionally failed to move as intended, leading to collisions when approaching certain orientations and positions. To address this challenge, our debugging process involved a thorough examination of theTF treeassociated with the robot. We conducted numerous experiments by sending various commands, instructing the robot to move in all possible directions. During this investigation, a crucial insight emerged when analyzing theTF tree. It was discovered that with the introduction ofapriltagsinto the system, the root frame of theTF treeshifted frompanda_link0, the base frame of the robot, tocamera_link. Consequently, the commands we were sending were relative to thecamera_linkframe rather than the base frame. Upon rectifying this discrepancy, specifically aligning the commands with the correct base frame, the robot executed movements flawlessly.
Possible Improvements
- Some script language still fail to detect: To resolve this issue, we can try to refind the language model by training more dataset on the script languages making it easier to detect the script languages.
- Sometimes, the camera source get dropped, need to re-launch all: This happened because sometimes the
realsensecamera package does not detect the camera successfully so that it will throw can error when that happens. To address this issue, we can surround that with a protect function that catch the error when it throws and try it again.
OCR, YOLO, Python, ROS2, MoveIt2, Emika Franka 机械臂
作者: Allen Liu, Damien Koh, Kassidy Shedd, Henry Brown, Megan Black
GitHub: 在 GitHub 上查看此项目
项目描述
该项目实现了一个多语言翻译机器人系统,可以处理白板上的书写文字或通过麦克风输入的语音。系统进行实时语言检测、翻译,并使用机械臂物理书写翻译输出。
该系统集成了自然语言处理、机器学习、计算机视觉和机器人技术,创建了一个具有物理输出能力的统一多语言翻译平台。
系统架构
该项目集成了五个专业子系统,实现多语言翻译和机器人书写。
子系统职责:
writer(Allen): 使用 MoveIt2 进行笛卡尔路径规划,在白板上书写字符,并使用 AprilTag 进行标定translation(Damien): Google Translate API 集成,用于语言翻译computer_vision(Megan): YOLO 目标检测和 OCR,用于文字识别和人体检测string2waypoints(Kassidy): 基于 Matplotlib 的字符轨迹航点生成apriltags(Henry): AprilTag 检测,用于白板定位和方向确定
演示视频
中文翻译成英文
德语翻译成法语
简体中文翻译成繁体中文