Projects

Description: This project aims to develop a model capable of human behavior and motion understanding a prediction. Details are outlined in the following proposal: Classification of Buildings using Google Street-View

Description: The goal of this internship project was to create a novel multi object tracking architecture using transformers. This is a reserach project investigating various forms of speech/audio communication for managing a fleet of autonomous robots. I investigated existing Multi Object trakcing methods such as Global Tracking Transformers (GTR), Trackformer and MOTR and sought to improve them. All the existing methods do not utilize a transformers capability to deal with long range dependencies. GTR feeds in detections to a transformer to perform tracking over a window of frames, however, they lose contextual information but only feeding in detections, and they rely heavily on an accurate detector. On the other hand, Trackformer and MOTR autogressively feed in pairs of frames, not just detections, to detect and track objects in the next frame, however, they only work with a window of 2 frames at once. During this internship I helped develop an object tracking architecture that fuses multiple frames of features into the transformer and outputs multiple frames of object detections and tracks as output.
Work on this project is ongoing and experiments are being conducted. We refer to our model as Trackformer++ as it builds off of Trackformer. A visual comparison is shown below.

Global Tracking Transformer Architecture Trackformer Architecture Trackformer++ (Our) Architecture
Transformer architectures for multi object tracking (ours is called Trackformer++)

Co-authors: Michael Laielli

This is a reserach project investigating various forms of speech/audio communication for managing a fleet of autonomous robots.

Semantic segmentation of a building Semantic segmentation of a building Semantic segmentation of a building
Example of the simulation running, user interface, and real robots they represent.
Abstract:

Agriculture is facing a labor crisis, leading to increased interest in fleets of small, under-canopy robots (agbots) that can perform precise, targeted actions (e.g., crop scouting, weeding, fertilization), while being supervised by human operators remotely. However, farmers are not necessarily experts in robotics technology and will not adopt technologies that add to their workload or do not provide an immediate payoff. In this work, we explore methods for communication between a remote human operator and multiple agbots and examine the impact of audio communication on the operator's preferences and productivity. We develop a simulation platform where agbots are deployed across a field, randomly encounter failures, and call for help from the operator. As the agbots report errors, various audio communication mechanisms are tested to convey which robot failed and what type of failure occurs. The human is tasked with verbally diagnosing the failure while completing a secondary task. A user study was conducted to test three audio communication methods: earcons, single-phrase commands, and full sentence communication. Each user completed a survey to determine each method's overall effectiveness and preferences. Our results suggest that the system using short phrases is the most positively perceived by participants and may allow for the human to complete the secondary task more efficiently.

Full paper accepted @ IEEE ROMAN 2022: Examining Audio Communication Mechanisms for Supervising Fleets of Agricultural Robots
Co-authors: Tianchen Ji, Dr. Katie Driggs-Campbell

Description:

I worked in NVIDIA’s Autonomous machines unit, under the Jetson Dev Tech team. The Jetson product line is NVIDIA’s embedded AI GPU for edge IoT and mobile robotics applications. The goal of my project was to develop an open-source Scene Text Recognition (STR) system for NVIDIA partners to use, as well as for NVIDIA’s own internal projects to use. I initially researched and benchmarked various state of the art STR models (CSTR, STRN, EasyOCR) and chose the 2 stage EasyOCR framework to further develop, as it performed the best. The first stage is text detection, where bounding boxes are drawn around the text, and the second stage is recognition, where the images are cropped to those bounding boxes and classification is performed on the letters/words. I used NVIDIA’s TensorRT framework to speed up the model’s inference on the V100 GPU and the Jetson AGX Xavier (JAX). TensorRT approximately doubled the model’s inference throughput. The V100 GPU is one of NVIDIA’s top industry grade GPUs thus performed better, however, the JAX performed significantly well for its small form factor, so much so that it was able to perform STR in real-time with a video camera at approximately 30 fps. This fast inference real-time video application was packaged in a docker container for easy deployment. Below are some diagrams of the project. The open-source code is on NVIDIA-AI-IOT GitHub:Scene Text Recognition Github

2 stage STR NVIDIA project NVIDIA project NVIDIA project NVIDIA project NVIDIA project NVIDIA project NVIDIA project
This was a final project for CS 433 Machine learning at EPFL in Switzerland. It was completed in collaboration with a civil engineering lab at EPFL to accurately classify buildings by their window to facade ratio using images pulled from Googl Street view. The project involved labeling and running a semantic segmentation task for building classification through resnet CNNs.
Semantic segmentation of a building
Semantic segmentation labeling windows, images and building facade for the building classification task, using a self made lableing tool.
Abstract:

Abstract—There are currently a very diverse range of building materials and construction styles used in cities throughout the world. Being able to track these materials and methods in buildings is important as it determines the procedures for rehabilitation and repairs. In order to determine the make of a building, recent data about the buildings is needed which is not always readily available. Identifying each building, if not automatized, would require an enormous amount of manpower. We hope to automatize this task, providing a framework capable of fetching images of buildings from Google Street-View and classifying them.

Full paper for course (not published) Classification of Buildings using Google Street-View
Co-authors: Francisco Lozano, Cary Chai
Designed a Maximum Power Point Tracker (MPPT) solar charge controller on for an autonomous ariel vehicle.
Solar MPPT Characge Controller Poster
Reserach Poster.
Project Website

VREP Obstacle Avoidance

Using iRobot with Lidar

Using gmapping in hallways

A* Path Planning

Snippets of project, see website for full details
Project Website