Reporting for Hackathon Outcomes: Group One

Automation testing is one of the most important components to help the Agile Development team achieve its goals. But the design of automation testing is also seen as a time-consuming process, and the position of elements takes up a large part of it.

Enhance is, therefore, looking for a smart way to increase the efficiency of locating the element to make the development of automation testing faster and smarter.



Our team intended to create a feature that could automatically scrape the element from the HTML file and position it as an Enhance element in a new page object class.

  • Stage I: Analysing the Goal

We split the project into three parts: scraping a web page, retrieving and creating scraped elements, and generating a Page Object class in Java. To complete them, we try to find the answers to the following three questions:

  1. What are the available tools for scraping a web page?
  2. How to retrieve and create scraped elements?
  3. How to generate a java Page Object containing the collected elements?

Above: The initial plan for the process for the function


  • Stage II: Tools for Scraping Web Page

The very first step is to scrap the web page. After the research, we list four tools that feel suitable for the solution, the details for each tool are:

  • Jsoup: a Java library that is used to parse the HTML documents. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jQuery-like methods.
  • HtmlUnit: a “GUI-Less browser for Java programs”. It models HTML documents and provides an API that allows you to simulate browser events just like how you do in your “normal” browser.
  • Jaunt: a Java library that can be used to extract data from HTML pages by using a headless browser. The browser provides web-scraping functionality, access to the DOM, and control over each HTTP Request/Response, but does not support JavaScript.
  • Jauntium: a Java library that allows you to easily automate Chrome, Firefox, Safari, Edge, IE, and other modern web browsers. With Jauntium, your Java programs can perform web-scraping and web-automation with full JavaScript support. The library is named “Jauntium” because it builds on both Jaunt and Selenium to overcome the limitations of each.

We prefer to use a Java library for functional testing that can be integrated with our automation framework. The HtmlUnit focuses more on the unit’s test side. Jaunt does not allow JavaScript and expires on a monthly basis. We abandon these two approaches for our solution on the basis of these limitations. Jauntium is a decent library for our solution, but it’s not in the Maven repository, and it requires to be downloaded instead of integrated into our framework.

  • Stage III: Retrieve & Create Scraped Elements

We can construct a Jsoup document that will contain all the elements from the HTML file after we have used the Jsoup to scrape the web page.

This is the toughest part of this project because of the number of situations we need to cover when retrieving elements.

  • Stage IV: Generate a Java Page Object Class

As per the POM design pattern that we use in our automation framework, we need to put all the elements into a page object class. We coded to create a new Java class file, name it and add all the listed elements into the new class. We can then use the newly created elements in our auto tests.

  • Stage V: Update Changes for Web Page

A common issue in is that when there is an update for the web page, the locator of the element will be updated and tests will break. We explored how to easily identify it and produce recommendations for the new element locator.

The result we used to solve this is that when we generate a page object by scraping a page we store the HTML file in the project. We can then compare the previous HTML file to the “live” one when the component in this page is modified. We can suggest what element locator might be a suitable replacement by comparing multiple attributes to find the closest match.



We successfully created a new class of methods to solve these challenges which is integrated with our framework. Using a BDD interface we pass the URL and page object class name. The execution creates a new Java class for the page object using input name which contains all of this page’s Enhance elements based on the conditions we have added.

If the page is updated, the rechecking function will be executed, showing the updated elements with the recommended locator of these elements.



These are the questions and challenges we had during our process:

  • Identify unique formatted name for an element

These elements may have the same name after we scrap the elements from the documents, or the name contains special characters. We need to keep the names of all the elements unique and formatted in order to keep the scripts pretty and easy to maintain.

We decide to use Apache Commons as a result, which helps us keep the element names formatted.

  • Identify new page object class with formatted and valid name

We must give it a new special, formatted and correct name every time we create the new page object class. We intended to use the name of the HTML file as the name of the page object type, but after having tried some of the pages, we found that the name of the file is not always appropriate for all circumstances.

After the discussion within the team, we decided it was more appropriate to manually enter the desired name of the page object.

  • Ignore the unnecessary ids for parent elements in CSS Selector or XPath & Replace the ids in CSS Selector or XPath which are generate automatically

It is not possible to use all initial locators we scrap from the web page immediately, many of them need to be edited before they are useful. We must, therefore, add some conditions to obtain the correct element selector.

We added code to exclude certain generated id patterns which was successful, but we still have more situations we need to consider.

  • Get all types of elements in the web page

We only showed the example of the link elements and buttons in our result. For a real web page, the testing of automation will require more than just links and buttons.

To get other types of elements (e.g. imagines, videos, tables) we can add other types of elements in our method. Also, we can add the kind of elements we prefer to ignore.

  • Classify all elements and sort into categories

Some pages can be split into various parts such as header, bottom, footer. If the element can be classified and classified into categories, maintenance work will be potentially saved many more times.

We found this difficult to get working as it is hard to identify common parents at an appropriate level for labelling. For now we chose to group elements by type rather than parent or page section.



The future aim for this venture is to overcome the challenges we faced and refine our solution, ultimately making the generation of locators in page objects more robust and easier to maintain.