publications
2025
- Towards Music Industry 5.0: Perspectives on Artificial IntelligenceAlexander Williams, and Mathieu BarthetIn Artificial Intelligence for Music Workshop at the 39th Annual AAAI Conference on Artificial Intelligence, Mar 2025
Artificial Intelligence (AI) is a disruptive technology that is transforming many industries including the music industry. Recently, the concept of Industry 5.0. has been proposed emphasising principles of sustainability, resilience, and human-centricity to address current shortcomings in Industry 4.0. and its associated technologies, including AI. In line with these principles, this paper puts forward a position for ethical AI practices in the music industry. We outline the current state of AI in the music industry and its wider ethical and legal issues through an analysis and discussion of contemporary case studies. We list current commercial applications of AI in music, collect a range of perspectives on AI in the industry from diverse stakeholders, and comment on existing and forthcoming regulatory frameworks and industry initiatives. Resultingly, we provide several timely research directions, practical recommendations, and commercial opportunities to aid the transition to a human-centric, resilient, and sustainable music industry 5.0. This work particularly focuses on western music industry case studies in the European Union (EU), United States of America (US), and United Kingdom (UK), but many of the issues raised are universal. While this work is not exhaustive, we nevertheless hope it guides researchers, businesses, and policy makers to develop responsible frameworks for deploying and regulating AI in the music industry.
- Temporal Considerations in DJ Mix Information Retrieval and GenerationAlexander Williams, Gregor Meehan, Stefan Lattner, Johan Pauwels, and Mathieu BarthetIn 32nd International Symposium on Temporal Representation and Reasoning, Mar 2025
Music is the art of arranging sounds in time so as to produce a continuous, unified, and evocative composition. Electronic dance music (EDM) is a collection of musical sub-genres produced using computers and electronic instruments and often presented through the medium of DJing, where tracks are curated and mixed sequentially into a continuous stream of music to offer unique listening and dancing experiences over time periods ranging from several minutes to several hours. A DJ’s actions and decisions occur at several levels of temporal granularity, from real-time audio manipulation (e.g. of tempo) for smooth inter-track transitions to long-term planning of track selection and sequencing for mix content and flow. While human DJs can instinctively operate across these different temporal resolutions, replicating this capability in an end-to-end automated DJing system presents significant challenges. In this paper, we analyse existing works in DJ mix information retrieval and generation from this temporal perspective. We first explain the close link between DJing and the temporal notion of musical rhythm, then describe a framework for categorising DJing actions by temporal granularity. Using this framework, we summarise and contrast potential approaches for automating and augmenting sequential DJ decision making, and discuss the unique characteristics of DJ mix track selection as a sequential recommendation task. In doing so, we hope to facilitate the implementation of more robust and complete automated DJing systems in future research.
2024
- Deep Learning-based Audio Representations for the Analysis and Visualisation of Electronic Dance Music DJ MixesAlexander Williams, Haokun Tian, Stefan Lattner, Mathieu Barthet, and Charalampos SaitisIn AES International Symposium on AI and the Musician, Jun 2024
Electronic dance music (EDM), produced using computers and electronic instruments, is a collection of musical subgenres that emphasise timbre and rhythm over melody and harmony. It is usually presented through the medium of DJing, where tracks are curated and mixed sequentially to offer unique listening and dancing experiences. However, unlike key and tempo annotations, DJs still rely on audition rather than metadata to examine and select tracks with complementary audio content. In this work, we investigate the use of deep learning-based representations (Complex Autoencoder and OpenL3) for analysing and visualising audio content on a corpus of DJ mixes with approximate transition timestamps and compare them with signal processing-based representations (joint time-frequency scattering transform and mel-frequency cepstral coefficients). Representations are computed once per second and visualised with UMAP dimensionality reduction. We propose heuristics based on the identification of observed patterns in visualisations and time-sensitive Euclidean distances in the representation space to compute DJ transition lengths, transition smoothness, and inter-song, song-to-song, and full-mix audio content consistency using audio representations along with rough DJ transition timestamps. Our method enables the visualisation of variations within music tracks, facilitating the analysis of DJ mixes and individual EDM tracks. This approach supports musicians in making informed creative decisions based on such visualisations. We share our code, dataset annotations, computed audio representations, and trained CAE model. We encourage researchers and music enthusiasts alike to analyse their own music using our tools: https://github.com/ alexjameswilliams/EDMAudioRepresentations.
2023
- A Reinforcement Learning Approach to Powertrain OptimisationHocine Matallah, Asad Javied, Alexander Williams, Ashraf Fahmy Abdo, and Fawzi BelblidiaIn Sustainable Design and Manufacturing, Jun 2023
A strategy to reduce computation time and improve minimisation performance in the context of optimisation of battery electric vehicle power trains is provided, motivated by constraints in the motor manufacturing business. This paper proposes a holistic design exploration approach to investigate and identify the optimal powertrain concept for cars based on the component costs and energy consumption costs. Optimal powertrain design and component sizes are determined by analysing various powertrain configuration topologies, as well as single and multi-speed gearbox combinations. The impact of powertrain combinations on vehicle attributes and total costs is investigated further. Multi-objective optimisation in this domain considers a total of 29 component parameters comprised of differing modalities. We apply a novel reinforcement learning-based framework to the problem of simultaneous optimisation of these 29 parameters and demonstrate the feasibility of this optimisation method for this domain. Our results show that, in comparison to single rear motor setups, multi-motor systems offer better vehicle attributes and cheaper total costs. We also show that load points with front and back axle motors may be shifted to a greater efficiency zone to achieve decreased energy consumption and expenses.
- Sound-and-Image-Informed Music Artwork Generation Using Text-to-Image ModelsAlexander J. Williams, Stefan Lattner, and Mathieu BarthetIn Music Recommender Systems Workshop at the 17th ACM Conference on Recommender Systems, Sep 2023
Music and its accompanying artwork have a symbiotic relationship. While some artists are involved in both domains, the creation of music and artwork require different skill sets. The development of deep generative models for music and image generation has potential to democratise these mediums and make multi-modal creation more accessible for casual creators and other stakeholders. In this work, we propose a co-creative pipeline for the generation of images to accompany a musical piece. This pipeline utilises state-of-the-art models for music-to-text, image-to-text, and subsequently text-to-image generation to recommend, via generation, visuals for a piece of music that are informed not only by the audio of a musical piece, but also a user-recommended corpus of artworks and prompts to give a meaningful grounding in the generated material. We demonstrate the potential of our pipeline using a corpus of material from artists with strongly connected visual and musical identities, and make it available in the form of a Python notebook for users to easily generate their own musical and visual compositions using their chosen corpus - available here: https://github.com/alexjameswilliams/Music-Text-To-Image-Generation
2021
- Survey of Energy Harvesting Technologies for Wireless Sensor NetworksAlexander J. Williams, Matheus F. Torquato, Ian M. Cameron, Ashraf A. Fahmy, and Johann SienzIEEE Access, Sep 2021Conference Name: IEEE Access
Energy harvesting (EH) technologies could lead to self-sustaining wireless sensor networks (WSNs) which are set to be a key technology in Industry 4.0. There are numerous methods for small-scale EH but these methods differ greatly in their environmental applicability, energy conversion characteristics, and physical form which makes choosing a suitable EH method for a particular WSN application challenging due to the specific application-dependency. Furthermore, the choice of EH technology is intrinsically linked to non-trivial decisions on energy storage technologies and combinatorial architectures for a given WSN application. In this paper we survey the current state of EH technology for small-scale WSNs in terms of EH methods, energy storage technologies, and EH system architectures for combining methods and storage including multi-source and multi-storage architectures, as well as highlighting a number of other optimisation considerations. This work is intended to provide an introduction to EH technologies in terms of their general working principle, application potential, and other implementation considerations with the aim of accelerating the development of sustainable WSN applications in industry.
- Cascade Optimisation of Battery Electric Vehicle PowertrainsMatheus F. Torquato, Kayalvizhi Lakshmanan, Natalia Narożańska, Ryan Potter, Alexander Williams, Fawzi Belblidia, Ashraf A. Fahmy, and Johann SienzIn Procedia Computer Science, Jan 2021
Motivated by challenges in the motor manufacturing industry, a solution to reduce computation time and improve minimisation performance in the context of optimisation of battery electric vehicle powertrain is presented. We propose a cascade optimisation method that takes advantage of two different vehicle models: the proprietary YASA MATLAB® vehicle model and a Python machine learning-based vehicle model derived from the proprietary model. Gearbox type, powertrain configuration and motor parameters are included as input variables to the objective function explored in this work while constraints related to acceleration time and top speed must be met. The combination of these two models in a constrained optimisation genetic algorithm managed to both reduce the amount of computation time required and achieve more optimal target values relating to minimising vehicle total cost than either the proprietary or machine learning model alone. The coarse-to-fine approach utilised in the cascade optimisation was proven to be mainly responsible for the improved optimisation result. By using the final population of the machine learning vehicle model optimisation as the initial population of the following simulation-based minimisation, the initial time-consuming search to produce a population satisfying all domain constraints was practically eliminated. The obtained results showed that the cascade optimisation was able to reduce the computation time by 53% and still achieve a minimisation value 14% lower when compared to the YASA Vehicle Model Optimisation.
- Real-Time Hybrid Visual Servoing of a Redundant Manipulator via Deep Reinforcement LearningAlexander WilliamsSwansea University, Feb 2021
Fixtureless assembly may be necessary in some manufacturing tasks and environments due to various constraints but poses challenges for automation due to non-deterministic characteristics not favoured by traditional approaches to industrial au-tomation. Visual servoing methods of robotic control could be effective for sensitive manipulation tasks where the desired end-effector pose can be ascertained via visual cues. Visual data is complex and computationally expensive to process but deep reinforcement learning has shown promise for robotic control in vision-based manipu-lation tasks. However, these methods are rarely used in industry due to the resources and expertise required to develop application-specific systems and prohibitive training costs. Training reinforcement learning models in simulated environments offers a number of benefits for the development of robust robotic control algorithms by reducing training time and costs, and providing repeatable benchmarks for which algorithms can be tested, developed and eventually deployed on real robotic control environments. In this work, we present a new simulated reinforcement learning envi-ronment for developing accurate robotic manipulation control systems in fixtureless environments. Our environment incorporates a contemporary collaborative industrial robot, the KUKA LBR iiwa, with the goal of positioning its end effector in a generic fixtureless environment based on a visual cue. Observational inputs are comprised of the robotic joint positions and velocities, as well as two cameras, whose positioning reflect hybrid visual servoing with one camera attached to the robotic end-effector, and another observing the workspace respectively. We propose a state-of-the-art deep reinforcement learning approach to solving the task environment and make prelimi-nary assessments of the efficacy of this approach to hybrid visual servoing methods for the defined problem environment. We also conduct a series of experiments exploring the hyperparameter space in the proposed reinforcement learning method. Although we could not prove the efficacy of a deep reinforcement approach to solving the task environment with our initial results, we remain confident that such an approach could be feasible to solving this industrial manufacturing challenge and that our contributions in this work in terms of the novel software provide a good basis for the exploration of reinforcement learning approaches to hybrid visual servoing in accurate manufacturing contexts.