Reinforcement Learning for Web Testing at Global App Testing
Web applications testing is a time consuming process; its primary goal is to ensure all paths the customer endeavors upon are fully crossable and basic functionalities of the application work.
To put this simply, nothing should interfere with the user experience.
In this article, we explore how to speed up this process by utilizing cutting edge technologies in conjunction with our own existing methods and inventions.
Reinforcement Learning is a rather neglected domain of Machine Learning, if compared to, say, classification or clustering. Lately, it has taken on speed thanks to research on chess and Go game playing agents. Deepind’s single model to play multiple Atari games and open.ai breakthrough in online RL methods like Proximal Policy Optimization (PPO) are now being widely applied in business - from classic approaches on the stock market to non-trivial autonomous vehicle control.
Reinforcement Learning (RL)
RL is a problem solving approach where an agent is presented with an observation O, upon which he selects an action A from a set of possible actions with a policy P, to get a reward R until a goal is met. The goal of the agent training is to get policy P select actions A to maximize cumulative rewards R by modifying its behaviour based on recent (online learning) or past (offline learning) experience.
Deep Graph Neural Networks
DGNN are a special kind of neural networks which can be fed graph-like data. They consist of computations on nodes, edges and their types. Hence, the output could be one or more of the following: node features, edge features or whole graph features (after pooling) for further tasks like regression, classification and/or clusterization.
Autonomous web testing - the GAT way
Reinforcement learning operates under the assumption that the observation is being processed into a numerical form, which is later worked on by neural structures to produce policy. The aforementioned game environments work on screenshots of the game, which make the observation space constant - and here we face the first issue by our team; the web application has various screen sizes, you can scroll the application vertically and horizontally, etc. What we are ultimately dealing with is a variable observation space.
Agents that play games, operate on the market or control vehicles have a limited number of actions to choose from and put in a sequence - buy/sell, right/left, accelerate/brake. In the case of web testing, there are as many actions to execute as there are nodes in the DOM tree - the inner workings of a page which can be clicked or dragged have text typed into them, exponentially increasing the number of possible actions. This introduces the notion of variable and parameterized action space.
To overcome the problems aforementioned, completely new methods had to be invented. To begin, we tackled transforming the HTML structure into a form which could be used to calculate and extract the features. As HTML itself is a tree structure, we decided to base our solution on graph convolutional networks, which allow us to extract global, page-wide features, as well as local, node-wide features fixed in their relation to other nodes in the graph. This method itself could not be used if the values in the graph were text-based; this explains why we had to come up with a method to transform text into numerical values. Deep Graph Neural Networks became the backbone of our solution and allowed us to solve for the variable observation space and focus on further problems.
As you can imagine, the number of possible actions on the web page is overwhelming; as a result, we decided to simplify this part as much as possible by introducing our own heuristics to assess which elements are actionable and identify what actions could be taken on them. We isolated the actions to clicking and typing.
Clicking an element is an easy task, eliciting a simple to observe outcome, whereas typing itself is the tricky part. If you decided you would like to put text into a certain text field, what should be the value? We brainstormed many solutions from autoencoders and generators to data yielding libraries, but ultimately landed on dictionary with usable clusters of texts such as correct_email, space, blank_text, simple_not_correct_password and so on and so forth, thus solving the next problem of variable action space, with parameters (as text keys).
To cover all foundations of Reinforcement Learning, we need to cover the final part, rewarding, to manage our expectations against the agent. We have identified various rewarding systems based on simple task completion such as logging in, to negative rewards for each step taken, to state exploration metrics based on novelty scoring. Each has been independently tested in various environments, leading us to the decision to create a complete mashup of them all, which became yet another solution.
By training our agent to perform simple tasks and explore the page as much as possible, we generated a system which is able to move around pages like a normal tester, both recognizing and reporting issues in parallel - with all attachments included and no delay in compiling reports.