 Bring a laptop PC for these experiments. A web browser is necessary.
 Note
 This text can be read online:
http://akihikoy.net/p/TULE2
 In these experiments, we explore AI (artificial intelligence) technologies for robot control. Especially we use machine learning and optimization. The task is a throwing motion of a singlefinger robot where we control the ball behavior thrown by the robot. When the ball behavior is difficult to analyze, using AI tools is an approach to achieve the task. The goal of these experiments is learning the use of AI technologies under the context of robot control.
Robot System Overview †
The robot we use is a singlefinger robot with 1 degreeoffreedom. It consists of 3D printed frames and an actuator, Dynamixel servo motor. The Dynamixel servo is connected to a mini PC (Raspberry Pi 3B) with USB, which is connected to a local network. All programs are installed on the mini PC including the Dynamixel controller and AI tools.
Students connect their PCs to the network over WiFi, and log on to the mini PC from a web browser.
The following figure illustrates the system and the ball throwing task.
Programming the Robot †
We make programs with Python (a script language) on the mini PC. The Jupyter environment is installed on the mini PC, which enables us to write and run programs through a web browser. You do not have to install any special software for these experiments except for a web browser.
First Step: Move the Robot to Throw a Ball †
Proceed the following steps with the lecturer:
Hello World †
 Connect your PC to the wireless network: ayexp
 Open a web browser on your PC and access:
 http://192.168.11.5X:8888/
 Replace X by the number of the mini PC (find aypiX on the mini PC).
 Type the pass phrase.
 If a message "Please use a different workspace" is displayed, enter a name like "lab2".
 In the directory panel (left side), go to HOME > prg > ai_ctrl_1 > student.
 From the File menu, choose New Launcher.
 Select Notebook > Python 2 on the right panel.
 Set file name BXXXX_NAME_NAME
 Right click the tab of the file, and select "Rename Notebook".
 BXXXX denotes your student ID.
 NAME_NAME denotes your name in alphabet (e.g. Taro_Yamada).
 Type as follows:
#/usr/bin/python import sys sys.path.append('../sample') from libaictrl1 import *
 Press Shift+Enter to run the code block (called cell).
 The above code is for loading the library. It will display nothing.
 You can run the cell again by selecting the cell and pressing Shift+Enter.
 In the next cell, type following:
print 'Hello world, my name is XXX XXX, ID: YYY'
 Replace XXX XXX by your name (e.g. Taro Yamada) and YYY by your student ID.
 Press Shift+Enter to run the cell.
 You will see the text displayed after the cell.
As you have experienced, this environment is a cellbycell interpreter programming interface of Python. It is called Jupyter Notebook.
 Exercise
 Play with Python and Jupyter Notebook.
Compute some equations, e.g.
print math.cos(math.pi)**2
Tips: Using Jupyter Notebook †
The current cell is highlighted by the left mark. Right click it to see the editing menu. You can also click and drag the cell to change the order of the cells.
 Press Esc,A to add a line above the current cell.
 Press Esc,B to add a line below the current cell.
Throwing a Ball with the Robot †
 In the next cell, write the following code and run it.
dxl= SetupRobot()
 This code setups the connection with the robot.
You will see the output like:Opened a port: /dev/ttyUSB0 Changed the baud rate to: 57600 Torque disabled Torque enabled
 This code setups the connection with the robot.
 Move the robot to the initial pose by following code:
 Be careful: the robot moves. Hold the base of the robot.
 Note that the robot is shared with the other students. Do it one by one.
MoveToTrg(dxl, 0.2, 1.0)
 Let's throw the ball:
 Place a ball on the robot arm.
 Type the following code and run:
MoveToTrg(dxl, 0.8, 0.7)
Reference: MoveToTrg †
MoveToTrg is a function to move the robot to a target position.
MoveToTrg(dxl, d_angle, effort)
where dxl is the variable returned by SetupRobot(), d_angle is the target angle in radian, and effort is torque to use (should be in [0,1]). If effort=0, the robot will not move.
Theoretical Overview †
The motion you made in the previous step is a throwing motion, but the ball is not controlled. We are interested in controlling the flying distance of the ball (let's define the distance as the length between the robot mount and the first landing point). The flying distance is decided by several factors:
 Motion of the robot.
 We control the robot with target positions. At least, two target positions are necessary: the initial position and the final position. The motion from the initial to the final positions is controlled by the internal controller of the servo, but you can adjust it by changing the effort parameter (PWM of the servo).
 Ball property (mass distribution, friction, etc.) that affects ball behavior.
 The elasticity of the robot.
 Air resistance.
Sometimes it is not easy to analyse the dynamics of the robot and the ball with considering all these factors.
Using machine learning tools is a solution.
Especially, we take an approach of modelbased reinforcement learning. Refer to literature of machine learning, reinforcement learning, and optimization for the details. Here we describe the ideas briefly:
 We use regression to model the unknown dynamics.
 In this case, the dynamics is a mapping from [input: an initial state and control parameter] to [output: the flying distance].
 Regression is a technology to make such a mapping (estimator from input to output) with pairs of (input, output) data.
 We use optimization to find a control parameter to achieve desired control.
 The goal is throwing a ball to a given target distance.
Regression †
When data of pairs of input x and output y is given ( \((( \mathit{data} = \{x_k, y_k \mid k=1,2,...\} \))) ), creating a model F that estimates y for a given x is called regression:
\$$$ y = F(x) \$$$The model F is trained so that the errors between \((( y_k \))) and \((( F(x_k) \))) are minimized.
The design of the model F varies. A simplest model is a linear model. However the capability of linear models is limited; the modeling error becomes large when modeling nonlinear functions.
There are nonlinear approaches of representing F. The examples are Gaussian processes for regression (GPR) and (artificial) neural networks. In these experiments, we use a version of neural networks, multilayer perceptron (MLP).
MLP can represent nonlinear functions in general. It has some parameters, including:
 The number of hidden layers and units that are designed by the users.
 Weights of the node connections that are learned from data with the backpropagation method.
Note that although MLP can learn a function where x and y are multidimensional vectors, we consider both x and y are single dimensional.
Optimization †
When an objective function E(x) is given, finding x that minimizes E is called optimization. E is a scalar function while x can be multidimensional. E is also called evaluation function, cost function, loss function, etc. Note that maximizing E is the same problem; just consider a negative of E (max E = min (E)).
When the objective function E is simple, we could solve this problem analytically. However when E is complicated, finding analytic solution becomes difficult. In that case, we can take a numerical optimization approach.
A simple example is hill climbing. We start with a random (or certain) initial value \((( x_0 \))), and compute the gradient of E with respect to x around \((( x_0 \))): \((( \frac{dE}{dx}(x_0) \))). Then modifying \((( x_0 \))) to the negative direction of the gradient, the value of the evaluation function at the modified x becomes smaller. Iterating this updates, we will finally find x that minimizes E.
There are other methods like hill climbing that use gradients. For example gradient descent method, and Newton's method. The backpropagation method used to train MLP is also a gradient method.
A drawback of the gradient method is that it is often trapped by local optima. You can imagine that by assuming E has multiple peak points. This issue sometimes happens when E involves learned models such as MLP.
CMAES (Covariance Matrix Adaptation Evolution Strategy) is another approach where we do not have to give the gradient of E. It uses multiple search points (population), which makes it more robust to noisy function than gradient methods.
Modelbased Reinforcement Learning for Throwing Motion †
Let us assume a control parameter x that changes the flying distance of the ball thrown by the robot. For example x is an effort parameter.
Let y denote the flying distance.
Let f denote the dynamics, i.e. y = f(x). Note that we give up to model f analytically, i.e. we do not know f.
The ball throwing problem: for a given target of flying distance \((( y_\mathit{trg} \))), find a control parameter \((( x_\mathit{opt} \))) that achieves \((( y_\mathit{trg} \))), i.e. \((( y_\mathit{trg} \approx f(x_\mathit{opt}) \))). Note that we need to solve this problem for any \((( y_\mathit{trg} \))).
The solution idea is as follows:
 Learning phase:
 Throwing the ball with random control parameters and observing the flying distances. We obtain \((( \{x,y \mid k=1,2,...,N\} \))).
 Training an MLP f with the data obtained above.
 Testing phase:
 For a given target \((( y_\mathit{trg} \))), finding a control parameter \((( x_\mathit{opt} \))) such that \((( y_\mathit{trg} \approx f(x_\mathit{opt}) \))). This problem is formulated as an optimization problem, and solved by CMAES.
 Executing the solution \((( x_\mathit{opt} \))) and measuring the error to evaluate the performance.
The idea of finding the control parameter with optimization is as follows: We define the objective function as a squared error of \((( y_\mathit{trg} \))) and \((( f(x) \))):
\$$$ f_\mathit{error}(x) = (y_\mathit{trg}  f(x))^2 \$$$We use CMAES to minimize this objective function with respect to x. The found x is \((( x_\mathit{opt} \))) that satisfies \((( y_\mathit{trg} \approx f(x_\mathit{opt}) \))).
Report & Email Addresses [UPDATED] †
The report consists of two parts, and you need to submit both.
 Report of experiments (Score: 70):
 Make a report of the experiments using Jupyter Notebook, and download it as a PDF (cf. Making Report with Jupyter Notebook).
 Send the PDF to the lecturer and TA via email.
 Deadline: You need to submit this report within the class (before you leave).
 Language: You may use only English to write this report.
 Report of discussion and survey (Score: 30):
 Make a report of further discussions about the experiment and a survey related to the experiments.
 Send the report to the lecturer and TA via email. If your report consists of text only, put the text in the email. Otherwise, attach a PDF.
 Deadline: 1 week after the class.
 Language: English or Japanese.
Emails †
Send the reports to the lecturer and TA via email.
 Email to: info+tule2@akihikoy.net AND kf.iqbal.29@gmail.com
 The subject should be: TU_LabExpIIBXXXX_NAME_NAME (YYYY)
 BXXXX: your student ID.
 NAME_NAME: your name as written in the student ID card (e.g. 山田太朗).
 YYYY: EXPERIMENTS or DISCUSSION (for "Report of experiments", "Report of discussion and survey" respectively).
Making Report with Jupyter Notebook †
In the Jupyter Notebook, you can write documents in cells. Use this function to make your report.
In default, the mode of the cell is Code. Change it to Markdown by selecting from the right top list.
In the markdown cells, we write text with markdown style. Some useful styles are:
# Section header ## Subsection header Normal text.
 List item 1  List item 2
Press Shift+Enter to convert the plain text to the formatted text.
In the following experiments, insert short notes about the purpose, hypothesis, results of the experiments, and analysis, before and/or after each experiment as the markdown cells.
Finally, select File > Export Notebook As... > PDF to save the document as a PDF file (if there is a trouble with exporting as a PDF, you can export it as an HTML).
Experiments †
We practice the theory mentioned above step by step.
Regression with MLP †
We train an MLP with manually generated data to see how to use MLP. Then we plot the learned function f(x) for visualizing the learning performance.
The steps are:
 Generate data (data_x, data_y) manually.
For example:
Note: data_x[i] corresponds with data_y[i] (e.g. 0.2 is the output to 0.5).data_x= [0.5, 0.6, 0.8, 1.0] data_y= [0.2, 0.4, 0.7, 0.8]
 Train an MLP with (data_x, data_y).
MLPR stands for MLP Regression. This function returns f, which is a function f(x) that returns y.f= TrainMLPR(data_x, data_y)
 Test the function f(x) by feeding x that is not in the data.
For example,
(You can omit "print" like this.)f(0.55)
 Plot the function f(x).
PlotF plots the function f in the range of [xmin,xmax]. With show=True, it shows the graph immediately, and with show=False, it does not shows the graph until plot.show() is executed. plot.plot adds data points into the graph with the marker 'o'. You will see a graph displayed.plot= PlotF(f, xmin=0.0, xmax=1.0, show=False) plot.plot(data_x, data_y, 'o') plot.show()
 Exercise
 Change the data and other values in the above exercise. Try to learn a nonlinear function.
Optimization with CMAES †
We define an objective function f_error as a squared error of a target value y_trg and f(x) learned in the previous step. Then we minimize f_error with respect to x with CMAES.
The steps are:
 Define the objective function f_error.
This is the function definition of Python style. Note that in Python, the indent is important.def f_error(x): y_trg= 0.6 return (y_trgf(x))**2
 Minimize f_error to find x_opt.
The FMin function uses CMAES to minimize the given function. 0.5 is the initial value, and [0.0, 1.0] is the search range (see below).x_opt= FMin(f_error, 0.5, 0.0, 1.0)
 Print values of x_opt, f(x_opt), and f_error(x_opt) to see if the optimization succeeded.
print 'Solution:', x_opt, f(x_opt), f_error(x_opt)
 Plot a graph of f(x) with the optimized value (x_opt, f(x_opt)).
We are plotting f(x), the training data with markers 'o', and the optimized value with marker '*'.plot= PlotF(f, xmin=0.0, xmax=1.0, show=False) plot.plot(data_x, data_y, 'o') plot.plot([x_opt], [f(x_opt)], '*') plot.show()
 Exercise
 Change the target value y_trg. Use the MLP trained in the previous experiment.
Reference: FMin †
FMin(f, x_init, x_min, x_max)
where f is the function to be minimized, x_init is the initial value of search, [x_min,x_max] is the search range.
Learning Dynamics of Throwing Robot †
Let's do the above things with data obtained from the real robot.
The steps are:
 Repeat N times:
 Throw the ball with random control parameter x.
 You can decide the control parameter as you like. A way is defining it as the effort parameter.
 Observe the distance y.
 Put x, y in the list data_x, data_y respectively (just write them manually in the code).
 Throw the ball with random control parameter x.
 Train MLP with data_x, data_y and obtain f(x).
 Plot the function f(x).
 Exercise
 Complete the above steps.
 Hint

For collecting data, create two cells:
MoveToTrg(dxl, 0.2, 1.0)
Run the first cell to move the robot to the initial position. Place the ball. Edit x (control parameter; here, we define it as the effort parameter). Press Shift+Enter to run the second cell. Measure the landing point of the ball (flying distance) as y. Add the values of x and y to the list by manual. Repeat these steps for N times.x= 0.9 MoveToTrg(dxl, 0.8, x)
Control the Robot to Throw the Ball to a Target Location †
Refer to the experiment of Optimization with CMAES, make a program to throw a ball to a target location.
The steps are:
 Define the objective function f_error with y_trg.
 Minimize f_error to find x_opt.
 Print values of x_opt, f(x_opt), and f_error(x_opt) to see if the optimization succeeded.
 Plot a graph of f(x) with the optimized value (x_opt, f(x_opt)).
 Execute the solution x_opt.
 Measure the flying distance and compare it with y_trg.
 Exercise
 Complete the above steps. Repeat the above several times with different y_trg. Evaluate the control accuracy.
Discussion and Survey [UPDATED] †
 Discuss the results of the robot experiments.
 Choose one or two topics to survey:
 Reinforcement learning for robots.
 Modelbased and modelfree reinforcement learning (advantages, disadvantages).
 Recent research of robot learning and AI.
Programs and CAD Models †
The programs and CAD models of the robot are available on GitHub:
https://github.com/akihikoy/ai_ctrl_1
Notes for TA †
Setup before the Experiments †
 Place three Raspberry Pis and three robots.
 Make three pairs of (Raspberry Pi, finger robot) where the robot is connected to Raspberry Pi with USB.
 Connect micro USB power supply to Raspberry Pi to turn on it.
 For each Raspberry Pi (X=6,7,8):
 Log in to Raspberry Pi over SSH.
 IP: 192.168.11.5X where X is the number of raspi (aypiX).
 User: hm
 Turn on JupyterLab:
 .local/bin/jupyter lab
 Log in to Raspberry Pi over SSH.
Shutdown †
 For each Raspberry Pi (X=6,7,8):
 Close JupyterLab by pressing Ctrl+C.
 Shutdown Pi.
 sudo shutdown h now