@MISC{RobotCooking, author = {Akihiko Yamaguchi}, title = {Science of Robot Cooking}, howpublished = "\url{http://akihikoy.net/p/cook.html}", year = 2016}
This web page explores robot cooking by:
Video: Human demonstration of making pizza. We emphasize that in this scenario, "pouring" skill is dominant: pouring vegetables, meats, cheese, and seasonings. Several different strategies are used in pouring, including shaking containers, rotating the seasoning bottle, and so on.
Manipulation of rigid objects is well-researched in robotics. Cooking is a complex version of manipulation which involves many challenges beyond rigid object manipulation. Here we discuss the difficulties that we can not see in rigid object manipulation.
In cooking, we manipulate many types of deformable objects, including vegetables, meats, liquids, powders, pizza dough, noodle, and so on. Each of them has different types of dynamics that are hard to model. For example a response of cutting banana is different from that of apple. Liquids and powders have much different dynamics. Grasping an egg with a bit stronger force will break it.
Manipulation of deformable objects is more difficult than that of rigid objects. For example peeling banana is extremely hard to robots, while humans and even monkeys do it easily.
Fig: Peeling banana by our Baxter robot.
Many of objects have complicated structures. For example an apple consists of skin, flesh, core, seeds, stem and calyx.
Fig: Anatomical structure of apple.
Beef has more complicated anatomical structure: each body part has a different physical and chemical properties (e.g. hardness, taste). Even a small cut of beef is not uniform; it consists of flesh, bones, muscles, etc.
Manipulation strategy (e.g. cutting strategy) will change with object components. For example a peeling skill is used to remove skin. In cutting an apple, the amount of force (and maybe knife motion) differs in the flesh and the core. Humans use a different skill to cut the muscle of beef (e.g. waiving knife quickly).
A final goal of cooking is making tasty dishes and serving them beautifully. Taste is decided by many different factors, including chemical states such as salinity, sugar content (brix), and acidity. Making tasty dishes requires controlling such states. There are some sensors to measure taste-related values; e.g. salinity, brix, and acidity sensors. On the other hand, only humans can rate the taste.
Audio information is not involved in taste, but it tells us many things during cooking. For example in frying materials with oil, the sound caused by materials, pan, and oil informs us about a state of frying like temperature.
Control of dirt and pollution is also an important mission in cooking. Some of dirt and pollution are visible, but there is invisible pollution such as bacterias. Even humans can not perceive some kinds of bacterias. They would cause critical damages (food poisoning) on humans, so controlling these states is necessary.
A simple example is heating water where its temperature is changed. Sometimes we need to consider three states of water: solid, liquid, and gas. Heating egg and liquid pancake mix gradually changes the type of the state from liquid to solid. This change is not just a symbolic change; sometimes humans want to eat soft-cooked eggs.
Oxidization is an important chemical dynamics. For example apples and bananas change color and taste after cutting. The rate of oxidization changes if they are soaked in water. On the other hand, heating tomato reduces its acidity.
Fig: A dish of first cooking by our Baxter robot. The color of apple changed to brown because the robot took a long time to cook. The dish was not tasty.
The taste ratings of humans are most complicated process. It depends not only on chemical states (salinity, sugar content, acidity, etc.), but also on food textures, and appearance. Ratings also change with individuals and health conditions. Sometimes we feel such ratings are inconsistent.
Some different types of dynamics are sometimes combined. For example strong grasp of fruits changes their shapes and tastes.
Using tools makes cooking easier. Even humans can not cook without tools, so using tools is necessary. Examples include a knife to cut food, a peeler to remove skins, a spatula to stir materials in a frying pan, a funnel to pour into narrow-necked bottles, scissors, and so on.
Selection of a tool is not always obvious. For example we can peel the skin of an apple either with a knife or a peeler.
Each tool use has its own difficulties of manipulation. For example in cutting food with a two-finger gripper (like a gripper of PR2 or Robotiq two finger gripper), grasping orientation is important. With a bad orientation, the knife will easily slip. Some tools are also non-rigid. For example a peeler often has a passive joint, and a spatula tip is made with elastic material.
Fig: Grasping poses of knife in cutting food: (a) grasping perpendicularly to the fingers, and (b) horizontally to the fingers. With (a), the knife easily slipped in the fingers. Note that we modified the knife (covered with a plastic cuboid outer) to grasp like (b).
Humans are using a variety of skills to solve problems. Basic skills for cooking are listed below. An important point is that in each category of task (e.g. pouring), there are a range of skills to handle the different situations. For example skills for pouring include tipping, shaking, and squeezing a container.
Skills are useful, and necessary to solve problems, but implementing them on robots is not a trivial problem.
Each skill may involve variations of styles. For example there are many different types of cuts: julienne cuts, fine/small dice cuts, and so on. In addition, the ways to cut depend on the things to be cut. For example in cutting onions, we utilize the structure of onions to cut. The styles affect cooking including tastes. Different cuts of vegetables need different boiling durations.
There are skills to control chemical dynamics. Oxidation can be slowed down by soaking vegetables or fruits in water. Heating tomatoes reduces its acidity. There are some more ways to reduce acidity of tomatoes. Mixing seasonings is in this category that modifies the taste.
There are various skills to reduce dirt and food pollution. For example heating meats kills bacterias inside them. Different species of bacterias live in different kinds of meats, and each of them has different dynamics (e.g. what temperature kills bacterias). Covering robotic grippers with clean plastic film (see above figures of Baxter cooking) or gloves would be helpful to avoid polluting foods.
We explore existing cooking robotic systems in two categories: robotic systems for basic skills of cooking, and robotic systems for cooking entire dishes. Note that we do not survey on special cooking devices or robots like food processors.
Kunze et al. researched a robotic system that makes pancakes. Bollini et al. developed a robotic system to bake cookies, called Bakebot. Here is a video. Yamazaki et al. researched a robotic system to make salads. Gravot et al. proposed an integrated framework of robot cooking.
Some companies are making cooking robots for demonstrating their robots. For example Toyo Riki Co. made robotic behaviors to cook Okonomiyaki and Takoyaki (Japanese traditional foods). Here are videos: Okonomiyaki, and Takoyaki.
An amazing robot cooking work was introduced in IEEE Spectrum (Jun 11, 2011): Robots Make Bavarian Breakfast Together. Their system considered not only cooking but also shopping (video). In addition, this was not a system with a single robot. Two robot collaborated in cooking (video).
Some developers tried to develop robotic bartenders. For example, MIT and Coke showed off a smartphone-controlled bartender. Here is a video. The NEXTAGE robot made coffee (video).
The Moley Co. is developing a robotic kitchen where two robot arms with dexterous hands are installed. Their robot is introduced in many articles, for example TechCrunch, IBTimes, PSFK, Factor, and The Economist, There is a video on YouTube. They are capturing motions of humans to make behaviors of the robot (see the video in the article of the Economist).
The artificial intelligence system Watson made by IBM Co. used to innovate new recipes (Cognitive Cooking with Chef Watson). Watson was used to process the recipe database and create unusual dishes. The chefs in the Institute of Culinary Education actually created and tasted the dishes. Eventually they published a book: Cognitive Cooking with Chef Watson: Recipes for Innovation from IBM & the Institute of Culinary Education.
An important message through surveying above cooking robots is that robot cooking is a hard task, but is not infeasible since some cooking robots already exist. Another remark is that many of challenges in cooking have similarities among those in everyday activities. For example manipulation of pizza dough is close to manipulation of cloths. Therefore methods, theories, and algorithms that we create for cooking would be useful to other everyday activities. From this point, cooking is a good domain to research. The followings are the summary of challenges.
Robust policy generation to manipulate things governed by various types of dynamics, such as deformable objects, objects with complex structures, chemical states such as salinity and acidity, taste, dirt, pollution, and manipulate objects with tools. We propose a library approach where a skill library is built. We propose an optimization (dynamic programming) for skill selections and skill parameter adjustments.
Knowledge base for robots to understand these types of dynamics is also important to generate robust policies. We can use existing models (e.g. geometry models for collision calculation, rigid body dynamics) as a form of simulators such as Open Dynamics Engine and MuJoCo. For dynamics that are hard to model such as deformable-object dynamics, we propose to use learning methods such as neural networks.
Learning from humans is important. Humans can teach robots in many ways, including showing demonstrations, and engineering skills (behavior programs) and dynamical models. These are many resources of such knowledge for humans; some of them are not well-structured for computers, and sometimes there are no annotations. For example many skills are written in Wikihow and Allrecipes (cooking recipes). Many skills can be watched on YouTube (e.g. making pizza demo). Much of symbol-level and qualitative dynamics are also available including chemical dynamics. For example some pars of fishes have poison (e.g. balloon fishes). Such knowledge must be modeled beforehand (and robots should not learn it from practice), and robots need to reason about behaviors with considering such knowledge.
Dexterous robot hands to be capable of above basic skills are necessary. Robotic tools (tools designed for robot hands) is another approach to increase the capability of robot hands. Human tools are designed to increase human hands. Making robot hands that can use tools for humans is useful, but redesigning tools for robot hands is also a possible approach. For example in the cutting experiments with the Baxter robot (cf. Fig: Grasping poses of knife), we created a cubic plastic cover for the knife so the gripper can tightly hold the knife.
Sensing devices including tactile should be upgraded. As well as general-purpose sensors (e.g. cameras located on a robot head), using special-purpose sensors (e.g. cameras on gripper for manipulation) helps to make robust perception. Making a unified sensor of salinity, sugar content (brix), acidity, and so on like a human tongue is useful in robot cooking for "tasting". We developed an optical sensing skin for robot fingers that measures contact information (force fields) and proximity vision, and tested the sensor in cutting food with a Baxter robot. We found there are at least two useful cases: (1) We implemented the autonomous cutting behavior based on the force estimates which told the robot a reasonable timing to stop cutting when the knife reaches the cutting board. (2) The proximity vision during manipulation makes it easier to detect slip of the knife during cutting.
Robust perceptions to understand what are caused by robot actions. There are not many work of computer vision for robot cooking purposes. For example as there is no good computer vision technique for liquid flow detection and tracking, we developed a stereo vision for tracking liquid flows as a 3D point cloud. As far as we know, there is no good computer vision method to estimate amount of liquids or materials poured into containers or surfaces like pizza dough.