Machine Learning Tool Recognition

Olin Human Interactions Robotics Lab, 2016

You Only Look Once (YOLO)

YOLO is a real time object detection algorithm published by Joseph Redmon et. al. in 2015.

Before YOLO, object detection networks like R-CNN (Rapid Convolutional Neural Networks) used a multistep process:

RPNs (Region Proposal Networks) proposed bounding boxes
A classifier identified the boxes
Post processing eliminated duplicates and refined the bounding boxes

YOLO reduced this to a single step process, which achieved similar VOC as state of the art algorithms of the time while vastly speeding up FPS (frames per second). [1]

Cooperative Task Completion

In 2016 when I started this project, YOLO v3 had just been released. [2]

I wanted to explore the use of machine learning for real time object detection in autonomous robotics, so I set out to create an immersive experience where a coworker robot assists a human at a workstation. Although the coworker human may have their hands full with a task, they can verbally request a tool and the coworker robot will locate and hand the requested tool to them.

For my coworker robot, I used the Olin Interactive Robotics Lab's [3] R-17 robotic arm as it was already equipped with a "head" sensor rig with endoscopic cameras and a microphone and thus required minimal retooling.

Software

The software was divided into three major functions: speech parsing, tool detection and classification, and tool grasp. Each function communicated via ROS publishers and subscribers.

To parse speech I used PyAudio to collect sound samples and PocketSphinx to decode the sound into a list of string commands, with some basic denoising.

For object detection and classification, I trained YOLO v3 on a custom dataset of tools. [4] Because existing training databases like Imagenet didn't have enough images of tools, I used a Python package called BeautifulSoup to write a web scraper to collect images of hammers, screwdrivers, and other tools.

My project partner, Yichen Jiang, wrote the tool grasp functionality to control the coworker robot to retrieve an input tool. We ran a live demo at Olin's 2017 Spring Exposition for the public.

Public Project Artifacts

YouTube: 2017 Olin College Spring Exposition

References

[1] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection (2015), arxiv.org

[2] Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).

[3] The Olin Interactive Robotics Lab's name has since changed to HIRo (Human Interactions Robotics Lab)

[4] https://medium.com/@anirudh.s.chakravarthy/training-yolov3-on-your-custom-dataset-19a1abbdaf09

Page updated

Google Sites

Report abuse