Search results will open in a new window on money.tmx.com.

TMX POV - Big Data & Machine Learning

TMX POV - Big Data & Machine Learning

TMX Global Analytics - Big Data and Machine Learning - The Next Frontier for Trading

Learn more about other key analytics for trade desks at our upcoming webinar

Introduction

Digital transformation has impacted how organizations across the globe conduct business, and artificial intelligence has been under a particular spotlight in recent years. McKinsey estimates that, in 2016, the total annual external investment in AI was between $8 billion to $12 billion, with ML accounting for close to 60% of that investment1. Meanwhile, the International Data Corporation (IDC) predicts that AI and ML spend will continue to experience double digit growth, from roughly $24 billion in 2018 to $77.6 billion by 20222.

The financial services sector has long been a leader in embracing and employing innovative technology solutions, with broker dealers and trading firms investing heavily in order management systems that automate greater portions of the trade lifecycle, including more powerful smart order routers and algorithmic offerings and, more recently, big data and analytics. Investment in these emerging technologies allows broker dealers to better serve their customers and trade more efficiently and effectively, while driving bottom line growth.

Of course, with opportunities presented by these new and emerging technologies, there are also a number of obstacles to overcome. Many organizations have deeply embedded organizational assumptions around information security that make exploring these techniques more expensive and time consuming, and it is rare that firms generate sufficient executive-team sponsorship to invest over multiple years and take risks necessary to build meaningful organizational capabilities in these emerging areas. Finally, leading teams along a complex path of obstacles requires different skill sets at various times to deliver a meaningful result that an organization can leverage or commercialize effectively.

Executive summary

Advances in technology have dramatically reshaped the equity and derivatives trading landscape in Canada in the past 15 years and will continue to do so going forward. Tools powered by big data and advanced technology have typically been available only to deep-pocketed trading desks willing to make substantial investments. Now, thanks to the evolution of the cloud, new open source technologies, greater availability of talent, and improved data warehousing techniques & market data product offerings, leveraging big data and exploring machine learning for trading applications is becoming more available to a broader investment management and trading audience. Although the road to mass adoption of machine learning trading will not be without challenges, exploring these technologies is paramount to remaining competitive in a continuously evolving industry – whether you represent a trading desk, vendor or even a marketplace. This paper provides historical context against which these changes are taking place, explores some key possibilities of successful machine learning applications and outlines key challenges that companies should be considering before introducing machine learning into their trading or investment strategies.

Progression of trading sophistication

To fully appreciate the direction in which the industry is headed, it's first important to understand where it came from and even more importantly, the pace of change in recent years. In Canada, manual ticket waving was the norm from the turn of the century through much of the 1970s. In 1977, the TSX was the first exchange in the world to use technology to underpin its core trading function with the implementation of its Computer Assisted Trading System3. Technology continued to progress as more exchanges globally adopted technology and a move to decimalization was introduced in 1996 and, in April 1997, the exchange's trading floor closed, opting for a fully electronic trading environment.

As the industry moved into the early 2000s, various market structure changes occurred globally that accelerated the need for better technology. The opening of several Alternative Trading Systems or Electronic Communications Networks, increased regulatory requirements stemming from a ‘multiple marketplace' environment, and the emergence of highly sophisticated trading participants making intraday markets, all led to higher levels of adoption of electronic algorithmic trading by broker dealers and investment firms. This heightened pace of trading continued to increase exponentially and by 2011, TSX began tracking activity in microseconds timestamp granularity on its market data feeds, and this was further followed in 2015, with an increase to nanoseconds.

In today's markets, TMX Group today regularly observes over 200 million messages across the Canadian equity marketplace each day, with peak days exceeding 350 million messages. Adding in the US and Europe, this exceeds well over 1 billion messages per day. With trading occurring at such velocities as this, machine learning techniques are becoming one of the fastest and most efficient ways to continually adapt to micro-shifts in trading behaviour and market conditions.

Save for the initial investment decision, firms now rely almost entirely on automated trading strategies for trade execution today through their order management systems, smart order routers and/or algos. To ensure these automated strategies remain effective in a constantly evolving industry, they all require historical market data in some capacity to fuel decision points and ongoing research and development to ensure they are optimized to current market conditions. Many firms review performance of their technologies on a quarterly or even monthly basis, but few do it daily or even intraday. Incorporating machine learning can allow a firm to shorten these review cycles.

Maintaining a historical market database is not something that has traditionally been a worthy case for investment but all trading desks as many rely heavily on vendors systems, but it is quickly becoming evident that trading firms who store and invest in their own internal data management capabilities provides a key competitive advantage since it allows firms to look back and study how orders were handled to inform future decisions. Most firms store their own private execution data for compliance reasons, but this doesn't provide the market context against which trades were executed. Orders sent but not filled and counterparty reactions provide rich context in analyzing trade execution. This is largely because storing data for analysis against Level 1 (top of book) data, let alone level 2 (full depth) data is a challenge in itself as big data frameworks and heavy infrastructure for storage and security are required even if the talent exists within organizations to handle it. But, as big data machine learning capabilities continue to mature, having historical data against which to train these algos and models become increasingly more important and companies investing in this area must budget for the acquisition and maintenance of this data.

FIGURE 1
Growth of Daily Trading Activity on TSX — Average Daily Volume (in millions)

Figure 1

FIGURE 2
Growth of Daily Trading Activity on TSX — Average Daily Trades (in thousands)

Figure 2

Key drivers of machine learning in the investment and trading industry

Recent advances in technology and decreasing costs have made machine learning more accessible and appealing in trading applications. There are several key drivers for this shift:

An increasing need to incorporate a quantitative layer into trading decisions to prove value to clients and help generate alpha. – Smaller firms and large financial institutions alike are seeing increasing client demand for leveraging the latest technologies and analytics in their trading strategies in increasing timeliness and frequencies for this reporting. Trade desks are now being challenged to help their clients generate alpha through trade execution. This is increasingly difficult as more assets shift to passive investing and it becomes harder and harder to beat the benchmark.

Increased global competition – With the free movement of investment capital and greater access to technology, people from across the globe are competing and trading on markets like the TSX. This drives competition for commission revenue, and applies pressure on margins in trading.

Increased availability of new data sets and alternative data, both structured and unstructured – This means there are more opportunities than ever before to turn data into valuable, unique and actionable insights. Even though trade desks aren't typically driving the initial investment decision, there are often significant trading opportunities and signals to be aware of that are found in alternative data that present opportunities for better execution or trading ideas to share with buy side clients.

Emerging supply of AI talent – The most recent crops of analysts entering the workforce have received extensive training and development in advanced technologies, making it easier, albeit still highly competitive, to build a team with the right skills. This talent pool is growing, and firms who foster investment with the right tools and support framework to support their staff will be able to attract the best of the best.

Cost efficiencies – Costs associated with compliance, technology & market data fees and sustained demand for double-digit returns on capital by management are driving the need for advanced trading & analytic capabilities that are more flexible and adaptable to multiple use cases and market conditions to help to keep overhead costs in check.

Increased regulatory burden to prove Best Execution – Regulatory and compliance costs drive the need to continually adapt and take advantage of new technologies that help produce an advantage.

Examples of use cases

Applied properly, machine learning models can be incorporated into a variety of components of the trading lifecycle. With the potential to process thousands of market scenarios and tweak hundreds of variables in the trading strategy, machine learning can help users to discover the optimal configuration or trading approach to executing an order.

Consider an encyclopedia – similar to a data set, it offers a plethora of information. Depending on who is looking for information, and the type of information being sought out, the insights acquired will vary greatly. Much the same can be said for data mining and applying machine learning – the insights firms derive is dependent on the type of data or application that is being sought. The advantage is these models can examine the data and predict or forecast optimal outputs significantly faster, and more reliably, than any human can.

The following use cases represent a very small sample of what models we've observed in use and provide a glimpse of what is possible:

  • Logistic regression, which is probably the simplest ML algorithm. It examines a dependent variable to make predictions on a binary outcome – for example determining whether a security price will move up or down following a specific trigger.
     
  • Applying neural networks that construct a series of nodes – each representing different concepts – which are then linked together in a similar fashion to the human mind. Neural networks can help decipher unstructured data to unlock new investment opportunities – for instance, converting images from satellite data showing healthy crop yields, or to power natural language processing of news/social media data to produce a positive or negative trading signal. Trade desks can advise clients with this research to produce relevant trading ideas for peer symbols.

  • Applying decision tree learning or clustering models to determine customizations to an algorithm based on current market conditions on each symbol. For example, an algorithm might work well in a mean-reverting environment, but not in a strongly trending environment, or in a medium-high liquidity symbol, but not on a low liquidity symbol. Identifying buckets of trading conditions and then having a trading strategy that reacts automatically to this can be incredibly powerful.
     
  • Applying random forests, which calibrate a multitude of decision trees over a specific period of time, to make even more accurate forecasts or predictions around order sizing and venue placement, or provide alerts when dark activity may be occurring in specific symbols of interest.
     
  • Using clustering principles against historic bid/ask displayed volumes at multiple price levels, and volume trends to help predict the most opportune times of the day or type of algorithm to use in order to trade size for each symbol.
     
  • Predicting the probability of being adversely selected when sending orders on specific symbols, venues, time of day, by size of order, or other order placement logic. This can then be fed back to clients in order to set more reasonable expectations for trade instructions and arrival price – strengthening trust and the overall client-trader relationship.

With trading applications, there is no "one size fits all" approach – the model applied and specific use case can vary greatly depending on the goal, and data inputs (such as whether its time-series data, or unstructured data). Once a model is constructed, further customizations to models are often required to optimize for processing power – which translates to slow speeds and higher costs, to reduce the number of false positives and detect when anomalies occur. It's a process that requires a blend of art, science and time to achieve the desired output.

Benefits of machine learning in trading

It is no secret that broker dealers today are being stretched to meet varying client demands, each having their own priorities and expectations. As client demands differ, it's increasingly difficult for firms to offer the flexibility to meet all of those needs. Having machine learning capabilities within an organization allows for additional flexibility by virtue of its ability to facilitate more accurate trading decisions across a wider variety of market situations.

For alpha-seeking firms, there is no doubt that machine learning offers incredible benefits. Most importantly, the time to market for new ideas is paramount and having machine learning embedded in a trading strategy can let firms find and act on opportunities much faster than before.

As firms struggle to balance profitability in an increasingly competitive world, machine learning, when used effectively, can help reduce research and development efforts by strengthening and centralizing core trading capabilities. Because ML allows for self-improvement, it thereby reduces the time and cost typically required to continually monitor and implement new updates across multiple divisions.

With these benefits, it's important to remember that these technologies will likely not completely eliminate the need for human interpretation. The fact remains that skilled, smart professionals are – and will continue to be – the backbone of the industry. They are the ones who can derive deep insights from the data, and develop strategies using these tools. Ultimately, AI and ML augments and empowers market participants to do more with less, instead of completely replacing them and their decision-making abilities.

Common challenges to advancement

Although the industry has come a long way from the manual trading of the 1960s to 1970s, it's still in the infancy stages with respect to big data management and ML. The top challenges facing firms that want to build machine learning capabilities today include:

Sourcing talent – While there are more new graduates in AI and ML fields coming out of colleges and universities than ever before, they are still in very strong demand. Furthermore, experienced individuals with a background in successful AI applications are even harder and more expensive to find. Training on industry-specific nuances is paramount before they can deliver tangible results, so organizations' expectations with respect to return on investment need to reflect sufficient time for these initial training periods.

Sourcing data – Unless you have been saving historical market data on your own systems, you likely will need to purchase historical data – either through a vendor or from exchanges directly. The granularity needed for ML applications far exceeds the traditional open, high, low close data that is readily available from most vendors today. Factors like intraday volatility, venue-specific behavioural traits, backtesting against a multitude of stock splits/consolidations and symbol changes require comprehensive data sets to sufficiently train ML models. Furthermore, raw data feeds purchased from exchanges require significant data transformation efforts to prepare it into a meaningful order book format for analysis.

Hardware – Training a machine learning model requires massive computational resources. Technology experts may already be familiar with the idea of using graphics processing units (GPUs) due to their higher capacity to process large amounts of data in more efficient ways than the traditional computer processing units (CPUs). However, as demand grows for these units other lower cost and more energy-efficient possibilities are also emerging, including field programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). Many cloud-providers now enable massive computational power at scale with access to these chips for fractions of the cost of acquiring them, but creating a full development stack with public and private data in the cloud is still an uphill battle for many information security-conscious firms.

Security – Trading applications are generally treated with intense confidentiality, particularly when client data is incorporated. It can be daunting for firms who want to take advantage of third party scalable cloud infrastructure quickly. But while the information security hurdles within an organization can be challenging, they typically aren't insurmountable. As many companies feel the pressure, cloud security expertise is slowly becoming an emerging capability within companies and there are now many external resources and consulting firms available to assist in these conversations. Many cloud providers and vendors also openly share various security certifications and best practices to help bring their clients onboard.

Top-down vision, bottom-up execution – To be successful over the long term, machine learning must be considered an organizational capability, not a skill set held by an individual staff member or group. At the end of the day, every individual should have a base level knowledge of ML for it to become part of the company's core fibre. Of course, building organizational capabilities takes time and leadership commitment to fund expansion, training and investment in the data and hardware resources necessary to support it. Therefore, it's paramount that research is conducted and leadership is informed upfront, and continually updated on progress to ensure that these initiatives remain on track and are balanced against other organizational priorities.

This list is not exhaustive but provides a comprehensive overview of some of the core challenges faced in the industry today. As the commercial landscape continues to evolve, some of the current challenges will become less prevalent, while newer challenges will naturally emerge as well.

TMX Global Analytics for Machine Learning

TMX Global Analytics is a new globally-focused big data and analytics product designed with machine learning applications in mind. Loaded with full depth of book (level 2) market data from over 75 exchanges and marketplaces globally, it can seamlessly supply much of the public market data needed for training machine learning models. Commonly used libraries used in machine learning such as, but not limited to TensorFlow, Pandas, Theano, SKLearn, SageMaker, MLFlow, and PyTorch can also be imported into the platform and clients are free to construct their own technology stack. As a cloud-based product, it further leverages scalable compute power and passes this on to clients to scale up as needed.

Machine Learning-ready data sets – Since a comprehensive data set underlies any machine learning model, good quality data is paramount to the process. Raw market data normally purchased from exchanges requires a number of transformations and preparation before it is ready for ML applications. TMX's normalized and easily consumable historical data sets provide full-depth of book, global equity tick data at a fraction of the cost and time of acquiring them individually.

Cloud-based – Accessibility is key when dealing with a data set of this magnitude. The TMX Global Analytics platform manages all the data preparation and warehousing, allowing clients to focus time on building, tweaking and debugging the ML model, and not on data engineering. This also enables the platform to power the model with access to both CPUs and GPUs.

Global coverage – The sum is greater than its parts – TMX markets are key for a Canadian view, but when TMX Global Analytics combines the Canadian view with over 75 other markets from around the world, it provides investors and traders with a truly comprehensive global perspective along with other data sets often needed for historical analysis such as corporate actions, FX rates, sample proprietary models & analytics and a growing plethora of alternative data sets.

15 day Free trial offer!
Test it to ensure it meets your needs before you commit. For more information, contact your TMX Analytics representative or email, marketdata@tmx.com

Conclusion

When it comes to machine learning for trading and investing, it's clear that more big changes in the pace and competitiveness of the industry are likely to come. Building ML capabilities within an organization takes time, leadership commitment and rational expectations around short term return-on-investment. Barriers to entry are being addressed by various academic and commercial providers, with talent, availability of data, hardware and security expertise becoming more readily accessible.

Machine learning provides many key benefits to organizations that adopt it – including enabling flexibility for a more demanding client base, faster time to market for new ideas and enables organization to shift more attention and resources to other lines of business such as client service instead of hours of research and development. Ultimately, by harnessing the power of technology and exploiting the latest developments in AI and ML, the industry stands to unlock lasting bottom-line value and take a bold step forward as it innovates for the future.

For more information

Please contact TMX Global Analytics to learn how you can try or buy data products.

TMX ANALYTICS
datasales@tmx.com 

 

1 Bughin, Jacques et al. "Artificial Intelligence, The Next Digital Frontier?" McKinsey & Company. June 2017. 
https://www.mckinsey.com/~/media/McKinsey/Industries/Advanced%20Electronics/Our%20Insights/How%20artificial%20intelligence%20can%20deliver%20real%20value%20to%20companies/MGI-Artificial-Intelligence-Discussion-paper.ashx 
2 International Data Corporation. "Worldwide Spending on Cognitive and Artificial Intelligence Systems Forecast to Reach $77.6 Billion in 2022." September 19, 2018. 
https://www.idc.com/getdoc.jsp?containerId=prUS44291818
3 The Canadian Encyclopedia. "Toronto Stock Exchange." Last modified December 16, 2013.
https://www.thecanadianencyclopedia.ca/en/article/toronto-stock-exchange