Grasp planning is one of the most challenging tasks in robot manipulation. Apart from perception ambiguity, the grasp robustness and the successful execution rely heavily on the dynamics of the robotic hands. The student is expected to research and develop benchmarking environments and evaluation metrics for grasp planning. The development in simulation environments as ISAAC Sim and Gazebo will allow us to integrate and evaluate different robotic hands for grasping a variety of everyday objects. We will evaluate grasp performance using different metrics (e.g., object-category-wise, affordance-wise, etc.), and finally, test the sim2real gap when transferring such approaches from popular simulators to real robots.
The student will have the chance to work with different robotic hands (Justin hand, PAL TIAGo hands, Robotiq gripper, Panda gripper, etc.) and is expected to transfer the results to at least two robots (Rollin’ Justin at DLR and TIAGo++ at TU Darmstadt). The results of this thesis are intended to be made public (both the data and the benchmarking framework) for the benefit of the robotics community.
As this thesis is offered in collaboration with the DLR institute of Robotics and Mechatronics in Oberpfaffenhofen near Munich, the student is expected to work in DLR for a period of 8-months for the thesis. On-site work at the premises of DLR can be expected but not guaranteed due to COVID-19 restrictions. A large part of the project can be carried out remotely.
 Collins, Jack, Shelvin Chand, Anthony Vanderkop, and David Howard. “A Review of Physics Simulators for Robotic Applications.” IEEE Access (2021).
 Bekiroglu, Y., Marturi, N., Roa, M. A., Adjigble, K. J. M., Pardi, T., Grimm, C., … & Stolkin, R. (2019). Benchmarking protocol for grasp planning algorithms. IEEE Robotics and Automation Letters, 5(2), 315-322.
Georgia Chalvatzaki gave a talk on Thursday, March 18 at the Learning and Intelligent Systems group of TU Berlin. She presented recent results, but also her vision and ongoing research within the iROSA group and her Emmy Noether research project.
Three papers got accepted in ICRA2021, whose topics will be directly extended in the context of the iROSA project.
Tosatto, S.; Chalvatzaki, G.; Peters, J. (2021). Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). See Details BibTeX Reference
Li, Q.; Chalvatzaki, G.; Peters, J.; Wang, Y. (2021). Directed Acyclic Graph Neural Network for Human Motion Prediction, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). See Details BibTeX Reference
Morgan, A.; Nandha, D.; Chalvatzaki, G.; D’Eramo, C.; Dollar, A.; Peters, J. (2021). Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). See Details BibTeX Reference
The newly founded Intelligent Robotics for Assistance (iROSA) group, led by Dr. Georgia Chalvatzaki, in cooperation with the Intelligent Autonomous Systems Lab (IAS) at the Technical University of Darmstadt (TU Darmstadt) is seeking a Ph.D. student with a strong interest in the following research topic:
Robot Learning Task and Motion Planning of Long-horizon (mobile) manipulation tasks.
The Ph.D. student will work on a highly interdisciplinary topic at the intersection of machine learning and classical robotics. Following the increasing demand for embodied AI agents that will serve as assistants in houses, workplaces, etc., we will research how intelligent behavior may be acquired by the continual purposeful interaction of an agent with an environment and the induced sensorimotor experience. Our central research question is “How can embodied AI systems, specifically mobile manipulator robots, acquire skills for performing long-horizon assistive tasks in human-inhabited environments?”. As planning for assistive tasks requires impractical computational time, coupling planning with learning methods is key to advancing the state-of-the-art in the field of mobile manipulation. Before the introduction of deep Reinforcement Learning, learning methods were not able to scale well to high-dimensional problems, thus prohibiting their use in real-world problems. iROSA group aims to create mobile manipulation robot assistants with the ability to intelligently acquire their skills, fluently interact with humans through handover tasks, and dynamically adapt their behavior for accomplishing long-horizon household tasks, like the fetch-carry-handover, in human-inhabited environments. For long-horizon planning, we will explore ideas from classic Task and Motion planning, Graph neural networks for combinatorial optimization, Hierarchical Reinforcement Learning, self-supervised Representation Learning, etc.
Students and researchers from the areas of robotics and robotics-related areas including machine learning, control engineering, and computer vision are welcome to apply. The candidates are expected to conduct independent research and at the same time contribute to the research topics listed above. Women and people of underrepresented minority groups are strongly encouraged to apply.
ABOUT THE APPLICANT
Ph.D. position applicants need to have a Master’s degree (high grade required) in a relevant field (e.g., Robotics, Computer Science, Engineering, Statistics & Optimization, Math, and Physics). Expertise in working with real robot systems (including e.g. programming in ROS and sensor data processing) and/or computer vision, deep learning is a big plus. Note that we favor heavily candidates with real robot experience.
The position is for a 36-month contract. Payment will be according to the German TVL payment scheme.
The position is planned to start between June 2021 and September 2021 depending on the candidate’s availability.
There is no official deadline, but we will adopt a first-come, first-served policy!
Ph.D. applicants should provide a comprehensive research statement on their research experience and motivation about the Ph.D. topic, a PDF with their CV, degrees (Bachelor’s and Master’s), and grade-sheets, and at least two references who are willing to write a recommendation letter.
Please state clearly how your experience in robotics, computer vision, and machine learning relates to the offered topics in your Research Statement.
Please ensure to include your date of availability for starting the Ph.D. position.
After submitting the application, send a quick notification with the subject line “Ph.D. student applicant <your name>” to Dr. Georgia Chalvatzaki (email@example.com) and include your application number in the e-mail.
ABOUT iROSA and IAS
The iROSA group (https://irosalab.com/) is a newly founded group on intelligent robotics for assistance led by Dr. Georgia Chalvatzaki (https://www.ias.informatik.tu-darmstadt.de/Team/GeorgiaChalvatzaki). Georgia, previously a postdoctoral researcher at the Intelligent Autonomous Systems group (IAS) in the Department of Computer Science at TU Darmstadt, has been accepted into the renowned Emmy Noether Programme (ENP) of the German Research Foundation (DFG) in 2021. This project was awarded within the ENP Artificial Intelligence call of the DFG – only 9 proposals out of 91 proposals were selected for funding. It enables outstanding young scientists to qualify for a university professorship by independently leading a junior research group over six years. In her research group iROSA, Dr. Chalvatzaki and her new team will research the topic of “Robot Learning of Mobile Manipulation for Assistive Robotics”. Dr. Chalvatzaki proposes new methods at the intersection of machine learning and classical robotics, taking one step further the research for embodied AI robotic assistants. The research in iROSA proposes novel methods for combined planning and learning for enabling mobile manipulator robots to solve complex tasks in house-like environments, with the human-in-the-loop of the interaction process. The iROSA group has access to two bi-manual manipulator robots TIAGo++ by PAL robotics, a dedicated OptiTrack Motion Capture System, Kinect Azure, and RealSense cameras, a cluster for accelerated computing, etc.
Dr. Chalvatzaki completed her Ph.D. studies in 2019 at the Intelligent Robotics and Automation Lab at the Electrical and Computer Engineering School of the National Technical University of Athens, Greece, with her thesis “Human-Centered Modeling for Assistive Robotics: Stochastic Estimation and Robot Learning in Decision Making.” During her career, she has worked on eight research projects, and she has published more than 35 papers (Google scholar), most of which in top-tier robotics and machine learning venues, e.g., ICRA, IROS, RA-L. Her research at the Computer Science department of TU Darmstadt has been about human-robot collaboration and joint action. In her recent work, she focused on robotic grasping, manipulation, and motion prediction, introducing novel methods for orientation attentive grasp synthesis, accelerated skill learning, and human intention prediction.
The IAS group of TUDa (https://www.ias.informatik.tu-darmstadt.de/) is considered one of the strongest robot learning groups in Europe with expertise ranging from the development of novel machine learning methods (e.g., novel reinforcement learning approaches, policy search, imitation learning, regression approaches, etc.) over semi-autonomy of intelligent systems (e.g., shared control, interaction primitives, human-collaboration during manufacturing) to fully autonomous robotics (e.g., robot learning architectures, motor skill representation acquisition & refinement, grasping, manipulation, tactile sensing, nonlinear control, operational space control, robot table tennis). IAS members are well-known researchers both in the machine learning and the robotics community. The lab collaborates with numerous universities in Germany, Europe, the USA, and Japan as well as companies such as ABB, Honda Research, Franka Emika, and Porsche Motorsport. The iROSA and the IAS lab are located in the city center campus of TU-Darmstadt close to the beautiful Herrngarten park.
ABOUT TU DARMSTADT
TU Darmstadt is one of Germany’s top technical universities and is well known for its research and teaching. It was one of the first universities in the world to introduce electrical engineering programs, and it is Germany’s first fully autonomous university. More information can be found on https://en.wikipedia.org/wiki/Technische_Universit%C3%A4t_Darmstadt
Darmstadt is a well-known high-tech center with essential activities in spacecraft operations (e.g., through the European Space Operations Centre, the European Organization for the Exploitation of Meteorological Satellites), chemistry, pharmacy, information technology, biotechnology, telecommunications, and mechatronics, and consistently ranked among the Top high-tech regions in Germany. Darmstadt’s important centers for arts, music, and theatre allow for versatile cultural activities, while the proximity of the Odenwald forest and the Rhine valley allows for many outdoor sports. The 33,547 students of Darmstadt’s three universities constitute a significant part of Darmstadt’s 140,000 inhabitants. Darmstadt is located close to the center of Europe. With just 17 minutes driving distance to the Frankfurt airport (closer than Frankfurt itself), it is one of Europe’s best-connected cities. Most major European cities can be reached within less than 2.5h from Darmstadt.
Abstract: Societal facts like the increase in life expectancy, the lack of nursing staff, the hectic rhythms of everyday life, and the recent unprecedented situation of the Covid-19 pandemic, make the need for intelligent robotic assistants more urgent than ever. Spanning their applications from home-environments to hospitals, workhouses to agricultural development, etc., the embodied AI robotic assistants are in the epicenter of modern robotics and AI research. In this talk, I will go through my research work for developing intelligent methods for such assistive agents. We will draw the big picture of intelligent service robots, and we will specifically focus on sub-problems that I have tackled in the last few years. The main research areas we will cover consider: the perception and recognition of human activities, combining classical methods like tracking with machine learning for extracting useful multi-sensor human-related information for robot action planning and control; algorithms for encoding object-features that allow 6D tracking and grasp-planning; we will discuss methods that can leverage human-centered information for learning intelligent robot behavior using reinforcement learning; and, I will elaborate on our recent work about accelerated policy learning of manipulation tasks both through the effective combination of imitation and reinforcement learning, but also through a novel method for model-predictive policy optimization. While these topics cover only partial aspects of the bigger problem, we will discuss the open research questions on the combination of learning, reasoning, and planning in unstructured environments using mobile manipulator robots. Mobile manipulators are the most emblematic systems to encapsulate the benefits of embodied AI research towards achieving the long-term vision of developing intelligent robotic assistants.
Abstract: Inherent morphological characteristics in objects may offer a wide range of plausible grasping orientations that obfuscates the visual learning of robotic grasping. Existing grasp generation approaches are cursed to construct discontinuous grasp maps by aggregating annotations for drastically different orientations per grasping point. Moreover, current methods generate grasp candidates across a single direction in the robot’s viewpoint, ignoring its feasibility constraints. In this paper, we propose a novel augmented grasp map representation, suitable for pixel-wise synthesis, that locally disentangles grasping orientations by partitioning the angle space into multiple bins. Furthermore, we introduce the ORientation AtteNtive Grasp synthEsis (ORANGE) framework, that jointly addresses classification into orientation bins and angle-value regression. The bin-wise orientation maps further serve as an attention mechanism for areas with higher graspability, i.e. probability of being an actual grasp point. We report new state-of-the-art 94.71% performance on Jacquard, with a simple U-Net using only depth images, outperforming even multi-modal approaches. Subsequent qualitative results with a real bi-manual robot validate ORANGE’s effectiveness in generating grasps for multiple orientations, hence allowing planning grasps that are feasible.
In this work, we tackle the problem of disentangling the possible orientations per grasp point. To this end, we propose a novel augmented grasp representation that parses annotated grasps into multiple orientation bins. Stemming from this representation, we introduce an orientation-attentive method for predicting pixel-wise grasp configurations from depth images. We classify the grasps according to their orientations into discrete bins, while we regress their values for continuous estimation of the grasp orientation per bin.
Moreover, this orientation map acts as a bin-wise attention mechanism over the grasp quality map, to teach a CNN-based model to focus its attention on the actual grasp points of the object. The proposed method, named ORANGE (ORientation AtteNtive Grasp synthEsis), is model-agnostic, as it can be interleaved with any CNN-based approach capable of performing segmentation while boosting their performance for improved grasp predictions. ORANGE achieves state-of-the-art results on the most challenging grasping dataset, acquiring 94.71% using only the depth modality, against all other related methods. Knowledge from ORANGE can also be easily transferred and leads to significantly accurate predictions on the much smaller dataset Cornell. Moreover, our analysis is supported by robotic experiments, both in simulation and with a real robot. Our physical experiments show the importance of disentangling the grasp orientation for achieving an efficient robot grasp planning while also highlighting other parameters that affect the grasp success.
What is a grasp map?
A grasp map was first introduced in  as a way of relating discrete grasp points over the depth map of an object. Particularly, a planar grasp is a configuration containing the grasp center on the object to which the robotic hand should be aligned, the orientation φ around the z axis and the required fingers’ or jaws’ opening (width) w. A quality measure q characterizes the success of the respective grasp configuration. For a (depth) image I, grasp synthesis is the problem of finding the grasp map: , where are each of them a map in , containing the pixel-wise values of respectively. can be approximated through a learned mapping using a deep neural network ( being its weights). The best visible grasp configuration can now be estimated as .
Why is this representation problematic? The grasp maps constructed by current pixel-wise learning approaches [1-4] are prone to discontinuities that cause performance to saturate, due to the overlapping grasping orientations per point. Motivated by the need of acquiring approaching grasp vectors from multiple orientations, we introduce an augmented grasp map representation, that fuels both the continuous orientation estimation, commonly treated as a regression problem, and a discrete classification.
Let’s take as example the Jacquard dataset. Jacquard is currently one of the most diverse and densely annotated grasping datasets with images and million grasp annotations. Grasps are represented as rectangles with given center, angle, width (gripper’s opening) and height (jaws’ size). The annotations are simulated and not human-labeled, resulting into multiple overlapping boxes considering all possible grasp orientations per grasp point and many different jaw sizes. Box annotations are invariant to the jaws’ size, leaving it as a free variable to be arbitrarily chosen during evaluation. The authors of  proposed a grasp map representation, generating pixel-wise quality, angle and width maps, by iterating over the annotated boxes and stacking binary maps, equal to the value of interest inside the box and zero elsewhere. Since the quality map is binary, it is indifferent to the order of the boxes and equivalent to iterating only on the boxes with the maximum jaws’ size. For angle and width maps however, overlapping boxes with different centers and angles will be overwritten by the box that appears last in the list, hence leading to discontinuities. Crucially, a binary quality map does not ensure a valid maximum: all non-center points inside an annotated box are maxima as well, and have equal probability of being selected as a grasp center. Due to these facts, a hypothetical regressor that perfectly predicts the evaluation GT maps fails to reconstruct the annotated bounding boxes and scores only using the Jaccard (Intersection over Union-IoU)  index at the 0.25 threshold, while its performance degrades rapidly towards higher thresholds.
Augmented grasp map representation:
We part from recent approaches on pixel-wise grasp synthesis and partition the angle values into bins, to minimize the overlaps of multiple angles per point. Since we are dealing with antipodal grasps, it is sufficient to predict an angle in the range of . We, thus, proceed to construct 3-dimensional maps of size , where each bin corresponds to a range of degrees.
Note, however, that we do not discretize the angles’ values: we instead place them inside the corresponding bins. For the remaining overlaps, we pick the value with the smallest angle, ensuring that the network is trained on a valid GT angle value, instead of some statistics of multiple values (e.g. mean or median), while remaining invariant to the order of the annotations.
To overcome the information loss from constructing binary maps, we create soft quality maps that contain ones on the exact positions of the centers of the boxes, while their values degrade moving towards the boxes’ edges. We find this significant for the networks to learn to maximize the quality value on the actual grasp points, and do not acquire strong Gaussian filtering  and consequently reduces post-processing time. One remaining issue is the multiple instances of the same grasp centers and angles using different jaw sizes. We construct our augmented maps picking the smallest jaw size available, i.e. closer to the boundaries of the objects’ shape. Intuitively, the annotated quality map gives a rough estimate of the object’s segmentation mask, which appears important for extracting grasp regions. During evaluation, we adopt the half jaw size as in  to be directly comparable. Although having to estimate this parameter hurts performance, our approach still achieves large reconstruction ability.
We reformulate the previous grasp map formalization to consider orientation bins , where is the angle map.
For facilitating learning, we adopt the angle encoding suggested by  into the cosine, sine components that lie in the range of . Since the antipodal grasps are symmetrical around , we employ the sub-maps for and with bins. The angle maps are then computed as: . represents the gripper’s width map. , is a real-valued quality map, where ‘ indicates a grasp point with maximum visible quality. is a binary orientation map where indicates a filled angle bin in the respective position. is the pixel-wise “graspability” map. This binary map contains s’ only in the annotated grasp points of the object w.r.t. the image , and helps to assess the graspability of the pixels, i.e. the probability of representing grasp points of the real world.
ORANGE architecture: The proposed framework is model-agnostic; it suffices to employ any CNN-based model that has the capacity to segment regions of interest. Then, an initial depth image is processed to output an augmented grasp map . and are combined to reconstruct the grasps centers, angles and widths.
Training: Each map is separately supervised: we minimize the Mean Square Error (MSE) of the real-valued and and their respective ground truths, and we force a Binary Cross-Entropy loss (BCE) on and . Next, we employ an attentive loss that directly minimizes the MSE between (element-wise multiplication) and the ground truth quality map. This attention mechanism drives the network’s focus over regions of the feature map that correspond to filled bins and thus regions nearby a valid grasp center.
Inference: First, and are multiplied to obtain a graspability-refined quality map. This can be viewed as a pixel-wise prior regularization, where is the prior probability of a pixel to be a grasping point and is the posterior, measuring its grasping quality. This product is multiplied by to filter out values in empty bins, resulting in the final quality map . Finally, we choose the optimum grasping center as the global maximum of the quality map and retrieve the respective values of and to reconstruct a grasping box. Instead of this greedy approach, we can employ the discrete bins to explore best grasps per bin, or even sample possible grasps through our disentangled latent representation or different possible configurations over grasp positions and orientations.
Model zoo: We embed ORANGE to two off-the-shelf architectures, GGCNN2  and the larger U-Net , as both able of performing segmentation. While these models have totally different capacity, we show that both can perform significantly better when trained with ORANGE. As we are mainly focusing on the advantages of the grasp orientation disentanglement, we consider that any deep network capable of segmentation, can benefit from the ORANGE framework.
Why does it matter to disentangle overlapping grasps for grasp planning?
We conducted experiments with the bi-manual mobile manipulator robot TIAGo++, which is equipped with one gripper and a five-fingered underactuated hand. We leverage our robot’s properties to study how the different orientations in the grasp maps can enable a successful robot grasp.
For this experiment, we chose a set of five objects from the YCB object set, for which we conduct grasps per item; for the left and for the right arm (with a gripper and a hand end-effector, respectively). We place the robot in front of a table and capture the object depth image from the robot’s built-in camera.
Note that this experiment is more challenging than usual bin-picking experiments: the camera viewpoint is much different than that of the training dataset setting, as well as compared to other related robotic experiments, that install a static camera facing the table vertically and plan planar pinching grasps. We, on the other hand, plan open-loop collision-free trajectories considering the robot’s arm and torso motion for planning the trajectory towards a generated target grasp vector by ORANGE. A grasp is successful when the robot holds the object in the air for seconds.
For this experiment, we collect the best grasp point across all bins (i.e., for all predicted orientations) and attempt the ones that are within the feasibility set of the workspace of each arm to showcase the importance of parsing the possible grasp angles.
While with the gripper we are able to grasp most objects ( grasp success), grasping with the robotic hand is more challenging. For the Chips Can, ORANGE delivers a very good grasp map and the robot reaches for the targeted grasp point; however, it is unable to lift the object in the air, due to the morphology of the hand and other parameters, e.g. low friction between object and hand. While using the gripper we are able to achieve good grasps in feasible positions for the robot’s left arm , we sometimes fail when it comes to objects like the mug, for which the predicted grasps are focusing on the handle. Interestingly, we are able to grasp the mug with the hand, as this grasp requires finer manipulation.
Next steps: The results of this experiment highlight two findings: (i) the advantage of acquiring a disentanglement of the potential grasp orientations provides a promising framework for planning feasible robot grasps, especially with bi-manual and mobile manipulator robots. A possible future research direction concerns the learning of a policy for selecting the grasp points per orientation; (ii) a good visual grasp generator can only be a good indicator for a successful grasp. We believe that a combination of the effectiveness of ORANGE fused with tactile feedback can potentially provide a more powerful tool for effective grasping.
 D. Morrison, P. Corke, and J. Leitner, “Learning robust, real-time, reactive robotic grasping,” IJRR, vol. 39, no. 2-3, 2020.  Y. Song, J. Wen, Y. Fei, and C. Yu, “Deep robotic prediction with hierarchical rgb-d fusion,” arXiv preprint arXiv:1909.06585, 2019.  S. Wang, X. Jiang, J. Zhao, X. Wang, W. Zhou, and Y. Liu, “Efficient fully convolution neural network for generating pixel wise robotic grasps with high resolution images,” in IEEE Int’l Conf. on Robotics and Biomimetics, Dec 2019.  S. Kumra, S. Joshi, and F. Sahin, “Antipodal robotic grasping using generative residual convolutional neural network,” arXiv preprint arXiv:1909.04810, 2019.  F. Chu, R. Xu, and P. A. Vela, “Real-world multiobject, multigrasp detection,” IEEE Robotics & Automation Letters (R-AL), vol. 3, no. 4, Oct 2018.  Hara, K., Vemulapalli, R. and Chellappa, R., “Designing deep convolutional neural networks for continuous object orientation estimation,” arXiv preprint arXiv:1702.01499, 2017  O. Ronneberger, P.Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI), ser. LNCS, vol. 9351.Springer, 2015.
Work conducted with Nikolaos Gkanatsios (CMU), Petros Maragos (NTUA), Jan Peters (TU Darmstadt)
BMBF and German Informatics Society (GI) are currently organizing the KI-Camp 2021 on April 27, 2021.
The KI-Camp is a transdisciplinary research convention for young scientists, professionals and artists of all disciplines under 35 years of age working on and with Artificial Intelligence. Seven parallel theme tracks with top-class representatives from research, economy and society will be discussed, researched and you will have the opportunity to network. KI-Camp 2021 will take place in digital space, in all probability with the possibility to attend individual Covid-19-compliant on-site workshops at different locations.
I am nominated for the award of the AI Newcomer in the field of Technical and Engineering Sciences. You can vote for me under the link:
We are excited to announce this year’s edition of the RSS Pioneers workshop.
About RSS Pioneers: RSS Pioneers is an intensive workshop for senior Ph.D. students and postdocs in the robotics community. Held in conjunction with the main Robotics: Science and Systems (RSS) conference, each year the RSS Pioneers brings together a cohort of the world’s top early-career researchers. The workshop aims to provide these promising researchers with networking opportunities and help to navigate their next career stages, and foster creativity and collaboration surrounding challenges in all areas of robotics.
Applications are due March 22, 2021 (AoE). Please note that this year’s pioneers workshop will be a virtual event. We are working on making this Pioneers event as interactive as possible to transfer a full experience to our participants. Stay tuned for more details.
Diversity, equity, and inclusion: RSS Pioneers is committed to ensure and continually improve diversity, equity, and inclusion in all aspects of the workshop. Diversity, equity, and inclusion at our workshop means new perspectives, ideas, and applications, all of which contribute to the success of our field. Individuals from groups underrepresented in robotics, such as minorities, women, and persons with disabilities, are encouraged to apply.
TIAGo++ can now combine a skilled pair of 7-DoF arms to perform coordinated dual-arm actions. The bi-manual robot continues being completely ROS based, fully customizable and expandable with extra sensors and devices, and robust.