AI research atlas / v2
Learn AI papers in the right order.
Start with landmark ideas, move through foundations, then branch into LLMs, GenAI, agents, systems, and safety with a reading path that keeps the field from feeling random.
Build the mental timeline before going deep.
Move from foundations to modern systems.
Learning path
Where to start, and what to read next
Orientation / 1-2 weeks
Start Here
Read the papers everyone keeps referencing so the rest of the map has anchors.
Foundations / 2-4 weeks
Classical ML
Learn the statistical and probabilistic ideas that still sit under modern models.
Foundations / 1-2 weeks
Optimization
Understand the training mechanics behind gradient-based learning.
Builder / 3-5 weeks
Deep Learning Core
Move through representation learning, CNNs, residual networks, and scaling patterns.
Builder / 3-6 weeks
Sequence Models and LLMs
Study attention, transformers, language modeling, instruction tuning, and evaluation.
Specialist / 3-6 weeks
Generative AI
Compare GANs, diffusion, autoregressive generation, and modern GenAI workflows.
Specialist / 2-4 weeks
Multimodal and Retrieval
Connect language with images, retrieval, embeddings, and real-world knowledge access.
Specialist / 3-5 weeks
RL and Agents
Learn decision making, feedback, policy learning, and agent-style systems.
Practitioner / 2-4 weeks
Systems and Scaling
Understand the infrastructure and engineering papers behind large-scale training.
Practitioner / 2-4 weeks
Safety and Interpretability
Study robustness, alignment, transparency, and how to reason about model behavior.
Learning Paradigms
Trust and Deployment
Research library
Optimization
Showing papers for this learning path. Open any paper card to read the full paper and related resources.
Adam: A Method for Stochastic Optimization
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
ImageNet classification with deep convolutional neural networks
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Dropout: a simple way to prevent neural networks from overfitting
Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different “thinned ” networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance. We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective. We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
TransMLA: Multi-Head Latent Attention Is All You Need
In this paper, we present TransMLA, a framework that seamlessly converts any GQA-based pre-trained model into an MLA-based model. Our approach enables direct compatibility with DeepSeek's codebase, allowing these models to fully leverage DeepSeek-specific optimizations such as vLLM and SGlang. By compressing 93% of the KV cache in LLaMA-2-7B, TransMLA achieves a 10.6x inference speedup at an 8K context length while preserving meaningful output quality. Additionally, the model requires only 6 billion tokens for fine-tuning to regain performance on par with the original across multiple benchmarks. TransMLA offers a practical solution for migrating GQA-based models to the MLA structure. When combined with DeepSeek's advanced features, such as FP8 quantization and Multi-Token Prediction, even greater inference acceleration can be realized.
The ethical evaluation of large language models and its optimization
No abstract available yet.
Position: AI Safety Must Embrace an Antifragile Perspective
This position paper contends that modern AI research must adopt an antifragile perspective on safety -- one in which the system's capacity to guarantee long-term AI safety such as handling rare or out-of-distribution (OOD) events expands over time. Conventional static benchmarks and single-shot robustness tests overlook the reality that environments evolve and that models, if left unchallenged, can drift into maladaptation (e.g., reward hacking, over-optimization, or atrophy of broader capabilities). We argue that an antifragile approach -- Rather than striving to rapidly reduce current uncertainties, the emphasis is on leveraging those uncertainties to better prepare for potentially greater, more unpredictable uncertainties in the future -- is pivotal for the long-term reliability of open-ended ML systems. In this position paper, we first identify key limitations of static testing, including scenario diversity, reward hacking, and over-alignment. We then explore the potential of antifragile solutions to manage rare events. Crucially, we advocate for a fundamental recalibration of the methods used to measure, benchmark, and continually improve AI safety over the long term, complementing existing robustness approaches by providing ethical and practical guidelines towards fostering an antifragile AI safety community.
Only 5% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation
Document-level Neural Machine Translation (DocNMT) has been proven crucial for handling discourse phenomena by introducing document-level context information. One of the most important directions is to input the whole document directly to the standard Transformer model. In this case, efficiency becomes a critical concern due to the quadratic complexity of the attention module. Existing studies either focus on the encoder part, which cannot be deployed on sequence-to-sequence generation tasks, e.g., Machine Translation (MT), or suffer from a significant performance drop. In this work, we keep the translation performance while gaining 20\% speed up by introducing extra selection layer based on lightweight attention that selects a small portion of tokens to be attended. It takes advantage of the original attention to ensure performance and dimension reduction to accelerate inference. Experimental results show that our method could achieve up to 95\% sparsity (only 5\% tokens attended) approximately, and save 93\% computation cost on the attention module compared with the original Transformer, while maintaining the performance.
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Large Language Models (LLMs) are increasingly used in healthcare, yet ensuring their safety and trustworthiness remains a barrier to deployment. Conversational medical assistants must avoid unsafe compliance without over-refusing benign queries. We present an iterative post-deployment alignment framework that applies Kahneman-Tversky Optimization (KTO) and Direct Preference Optimization (DPO) to refine models against domain-specific safety signals. Using the CARES-18K benchmark for adversarial robustness, we evaluate four LLMs (Llama-3B/8B, Meditron-8B, Mistral-7B) across multiple cycles. Our results show up to 42% improvement in safety-related metrics for harmful query detection, alongside interesting trade-offs against erroneous refusals, thereby exposing architecture-dependent calibration biases. We also perform ablation studies to identify when self-evaluation is reliable and when external or finetuned judges are necessary to maximize performance gains. Our findings underscore the importance of adopting best practices that balance patient safety, user trust, and clinical utility in the design of conversational medical assistants.
MerLin: A Discovery Engine for Photonic and Hybrid Quantum Machine Learning
Identifying where quantum models may offer practical benefits in near term quantum machine learning (QML) requires moving beyond isolated algorithmic proposals toward systematic and empirical exploration across models, datasets, and hardware constraints. We introduce MerLin, an open-source framework designed as a discovery engine for photonic and hybrid quantum machine learning. MerLin integrates optimized strong simulation of linear optical circuits into standard PyTorch and scikit learn workflows, enabling end-to-end differentiable training of quantum layers. MerLin is designed around systematic benchmarking and reproducibility. As an initial contribution, we reproduce eighteen state-of-the-art photonic and hybrid QML works spanning kernel methods, reservoir computing, convolutional and recurrent architectures, generative models, and modern training paradigms. These reproductions are released as reusable, modular experiments that can be directly extended and adapted, establishing a shared experimental baseline consistent with empirical benchmarking methodologies widely adopted in modern artificial intelligence. By embedding photonic quantum models within established machine learning ecosystems, MerLin allows practitioners to leverage existing tooling for ablation studies, cross-modality comparisons, and hybrid classical-quantum workflows. The framework already implements hardware-aware features, allowing tests on available quantum hardware while enabling exploration beyond its current capabilities, positioning MerLin as a forward-looking co-design tool linking algorithms, benchmarks, and hardware.
Composable Assurance for AI Alignment: A Framework for Propagating Formal Safety Properties Through MLOps
The increasing complexity of modern AI systems exposes a significant assurance gap: safety evidence from practices like red-teaming and robustness testing remains fragmented, lacking a formal mechanism for composition and propagation throughout the development lifecycle. This prevents the construction of rigorous, dynamic safety cases essential for trustworthy AI. We introduce the Composable Assurance Framework (CAF), a novel engineering methodology that integrates safety assurance directly into MLOps workflows. At its core is the Formal Safety Assertion (FSA), a standardized, machine-readable structure that verifiably links safety properties—such as robustness scores or the absence of deceptive circuits—to specific AI artifacts. We then define a Composition Calculus, a set of formal rules governing how FSAs are propagated and aggregated as components are combined into a system. This approach transforms the development pipeline into an automated evidence-gathering engine, whose output is a dynamic Directed Acyclic Graph (DAG) of assertions that constitutes a living safety case. Through a prototype and a Retrieval-Augmented Generation (RAG) case study, we demonstrate how CAF automatically enforces a predefined safety policy, blocking non-compliant deployments.
Mathematical Framework for Constitutional AI: Formal Structures and Constraint-Based Alignment
As artificial intelligence (AI) systems grow more complex and permeate critical decision environments, ensuring their alignment with safety-oriented principles remains a pivotal research challenge. Constitutional AI (CAI) leverages human-readable rules to direct model outputs toward safer, more consistent behavior. This paper introduces a rigorous mathematical framework formalizing CAI's structure, modelling rule sets as indexed collections of predicates—termed constitutional constraints—over model output spaces, embedded within optimization and logic frameworks. Drawing on set theory and order theory, we analyze constraint interactions, delineate feasible regions in output spaces, and establish a principled link between alignment objectives and constrained minimization problems. Central contributions include proofs of theoretical guarantees, such as convergence to safe optima and robustness bounds, under mild consistency conditions on constraint sets (e.g., non-contradiction and monotonicity). These results enable quantifiable safety assurances absent in prior heuristic approaches. We further discuss practical deployment implications for safety-critical domains like autonomous systems and medical diagnostics, including scalable constraint verification and runtime enforcement mechanisms. This framework bridges formal methods with AI alignment, paving the way for verifiable constitutional safeguards.
Multiclass Graph-Based Large Margin Classifiers: Unified Approach for Support Vectors and Neural Networks
While large margin classifiers are originally an outcome of an optimization framework, support vectors (SVs) can be obtained from geometric approaches. This article presents advances in the use of Gabriel graphs (GGs) in binary and multiclass classification problems. For Chipclass, a hyperparameter-less and optimization-less GG-based binary classifier, we discuss how activation functions and support edge (SE)-centered neurons affect the classification, proposing smoother functions and structural SV (SSV)-centered neurons to achieve margins with low probabilities and smoother classification contours. We extend the neural network architecture, which can be trained with backpropagation with a softmax function and a cross-entropy loss, or by solving a system of linear equations. A new subgraph-/distance-based membership function for graph regularization is also proposed, along with a new GG recomputation algorithm that is less computationally expensive than the standard approach. Experimental results with the Friedman test show that our method was better than previous GG-based classifiers and statistically equivalent to tree-based models.
Multi-Dimensional Safety Assessments of LLM-Assisted Driving Systems
Large language models (LLMs), AI systems trained to process and generate human language, are increasingly being integrated into autonomous vehicles, leading to the emergence of LLM-assisted driving systems (LADSs). Rigorous evaluation of LADSs is essential for driving technological progress and building user trust. However, there is currently a lack of specific evaluation metrics for LADSs, and existing LLM evaluation methods are not readily applicable to LADSs. To evaluate the performance of LADSs, this study proposes four assessment indices that collectively consider driving robustness, safety, and ethical decision-making. First, cosine similarity is employed to guide the injection of disturbances, establishing a basis for quantitative input-output analysis. Second, robustness and safety indices are proposed to characterize vehicle performance, while an LLM-based evaluator is used to assess ethical behavior. To enhance alignment with human judgment, a language-numerical optimization algorithm is developed for prompt tuning. By integrating the knowledge base, Cohen's Kappa (κ) between the experienced driver and the LLM-based evaluator reaches 0.81, indicating strong agreement. Additionally, this study first identifies and analyzes a novel phenomenon termed "extreme thinking". Building on these results, a multi-dimensional safety assessment index is proposed to evaluate LADSs. The proposed indices and methods are validated using over 1000 data segments collected from both simulations and experiments.
Fourier Learning Machines: Nonharmonic Fourier-Based Neural Networks for Scientific Machine Learning
We introduce the Fourier Learning Machine (FLM), a neural network (NN) architecture designed to represent a multidimensional nonharmonic Fourier series. The FLM uses a simple feedforward structure with cosine activation functions to learn the frequencies, amplitudes, and phase shifts of the series as trainable parameters. This design allows the model to create a problem-specific spectral basis adaptable to both periodic and nonperiodic functions. Unlike previous Fourier-inspired NN models, the FLM is the first architecture able to represent a multidimensional Fourier series with a complete set of basis functions in separable form, doing so by using a standard Multilayer Perceptron-like architecture. A one-to-one correspondence between the Fourier coefficients and amplitudes and phase-shifts is demonstrated, allowing for the translation between a full, separable basis form and the cosine phase-shifted one. Additionally, we evaluate the performance of FLMs on several scientific computing problems, including benchmark Partial Differential Equations (PDEs) and a family of Optimal Control Problems (OCPs). Computational experiments show that the performance of FLMs is comparable, and often superior, to that of established architectures like SIREN and vanilla feedforward NNs.
Generalizing Machine Learning Evaluation through the Integration of Shannon Entropy and Rough Set Theory
This research paper delves into the innovative integration of Shannon entropy and rough set theory, presenting a novel approach to generalize the evaluation approach in machine learning. The conventional application of entropy, primarily focused on information uncertainty, is extended through its combination with rough set theory to offer a deeper insight into data's intrinsic structure and the interpretability of machine learning models. We introduce a comprehensive framework that synergizes the granularity of rough set theory with the uncertainty quantification of Shannon entropy, applied across a spectrum of machine learning algorithms. Our methodology is rigorously tested on various datasets, showcasing its capability to not only assess predictive performance but also to illuminate the underlying data complexity and model robustness. The results underscore the utility of this integrated approach in enhancing the evaluation landscape of machine learning, offering a multi-faceted perspective that balances accuracy with a profound understanding of data attributes and model dynamics. This paper contributes a groundbreaking perspective to machine learning evaluation, proposing a method that encapsulates a holistic view of model performance, thereby facilitating more informed decision-making in model selection and application.
Multi-fidelity Machine Learning for Uncertainty Quantification and Optimization
In system analysis and design optimization, multiple computational models are typically available to represent a given physical system. These models can be broadly classified as high-fidelity models, which provide highly accurate predictions but require significant computational resources, and low-fidelity models, which are computationally efficient but less accurate. Multi-fidelity methods integrate high- and low-fidelity models to balance computational cost and predictive accuracy. This perspective paper provides an in-depth overview of the emerging field of machine learning-based multi-fidelity methods, with a particular emphasis on uncertainty quantification and optimization. For uncertainty quantification, a particular focus is on multi-fidelity graph neural networks, compared with multi-fidelity polynomial chaos expansion. For optimization, our emphasis is on multi-fidelity Bayesian optimization, offering a unified perspective on multi-fidelity priors and proposing an application strategy when the objective function is an integral or a weighted sum. We highlight the current state of the art, identify critical gaps in the literature, and outline key research opportunities in this evolving field.
Hyperparameter Optimization in Machine Learning
Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determines the effectiveness of systems based on these technologies. Manual hyperparameter search is often time-consuming and becomes infeasible when the number of hyperparameters is large. Automating the search is an important step towards advancing, streamlining, and systematizing machine learning, freeing researchers and practitioners alike from the burden of finding a good set of hyperparameters by trial and error. In this survey, we present a unified treatment of hyperparameter optimization, providing the reader with examples, insights into the state-of-the-art, and numerous links to further reading. We cover the main families of techniques to automate hyperparameter search, often referred to as hyperparameter optimization or tuning, including random and quasi-random search, bandit-, model-, population-, and gradient-based approaches. We further discuss extensions, including online, constrained, and multi-objective formulations, touch upon connections with other fields, such as meta-learning and neural architecture search, and conclude with open questions and future research directions.
ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data
We seek to enable classic processing of continuous ultra-sparse spatiotemporal data generated by event-based sensors with dense machine learning models. We propose a novel hybrid pipeline composed of asynchronous sensing and synchronous processing that combines several ideas: (1) an embedding based on PointNet models -- the ALERT module -- that can continuously integrate new and dismiss old events thanks to a leakage mechanism, (2) a flexible readout of the embedded data that allows to feed any downstream model with always up-to-date features at any sampling rate, (3) exploiting the input sparsity in a patch-based approach inspired by Vision Transformer to optimize the efficiency of the method. These embeddings are then processed by a transformer model trained for object and gesture recognition. Using this approach, we achieve performances at the state-of-the-art with a lower latency than competitors. We also demonstrate that our asynchronous model can operate at any desired sampling rate.
Privacy-preserving machine learning for healthcare: open challenges and future perspectives
Machine Learning (ML) has recently shown tremendous success in modeling various healthcare prediction tasks, ranging from disease diagnosis and prognosis to patient treatment. Due to the sensitive nature of medical data, privacy must be considered along the entire ML pipeline, from model training to inference. In this paper, we conduct a review of recent literature concerning Privacy-Preserving Machine Learning (PPML) for healthcare. We primarily focus on privacy-preserving training and inference-as-a-service, and perform a comprehensive review of existing trends, identify challenges, and discuss opportunities for future research directions. The aim of this review is to guide the development of private and efficient ML models in healthcare, with the prospects of translating research efforts into real-world settings.
Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization
Gradient-based minimax optimal algorithms have greatly promoted the development of continuous optimization and machine learning. One seminal work due to Yurii Nesterov [Nes83a] established $\tilde{\mathcal{O}}(\sqrt{L/μ})$ gradient complexity for minimizing an $L$-smooth $μ$-strongly convex objective. However, an ideal algorithm would adapt to the explicit complexity of a particular objective function and incur faster rates for simpler problems, triggering our reconsideration of two defeats of existing optimization modeling and analysis. (i) The worst-case optimality is neither the instance optimality nor such one in reality. (ii) Traditional $L$-smoothness condition may not be the primary abstraction/characterization for modern practical problems. In this paper, we open up a new way to design and analyze gradient-based algorithms with direct applications in machine learning, including linear regression and beyond. We introduce two factors $(α, τ_α)$ to refine the description of the degenerated condition of the optimization problems based on the observation that the singular values of Hessian often drop sharply. We design adaptive algorithms that solve simpler problems without pre-known knowledge with reduced gradient or analogous oracle accesses. The algorithms also improve the state-of-art complexities for several problems in machine learning, thereby solving the open problem of how to design faster algorithms in light of the known complexity lower bounds. Specially, with the $\mathcal{O}(1)$-nuclear norm bounded, we achieve an optimal $\tilde{\mathcal{O}}(μ^{-1/3})$ (v.s. $\tilde{\mathcal{O}}(μ^{-1/2})$) gradient complexity for linear regression. We hope this work could invoke the rethinking for understanding the difficulty of modern problems in optimization.
Evolutionary Dynamic Optimization and Machine Learning
Evolutionary Computation (EC) has emerged as a powerful field of Artificial Intelligence, inspired by nature's mechanisms of gradual development. However, EC approaches often face challenges such as stagnation, diversity loss, computational complexity, population initialization, and premature convergence. To overcome these limitations, researchers have integrated learning algorithms with evolutionary techniques. This integration harnesses the valuable data generated by EC algorithms during iterative searches, providing insights into the search space and population dynamics. Similarly, the relationship between evolutionary algorithms and Machine Learning (ML) is reciprocal, as EC methods offer exceptional opportunities for optimizing complex ML tasks characterized by noisy, inaccurate, and dynamic objective functions. These hybrid techniques, known as Evolutionary Machine Learning (EML), have been applied at various stages of the ML process. EC techniques play a vital role in tasks such as data balancing, feature selection, and model training optimization. Moreover, ML tasks often require dynamic optimization, for which Evolutionary Dynamic Optimization (EDO) is valuable. This paper presents the first comprehensive exploration of reciprocal integration between EDO and ML. The study aims to stimulate interest in the evolutionary learning community and inspire innovative contributions in this domain.
Changing Data Sources in the Age of Machine Learning for Official Statistics
Data science has become increasingly essential for the production of official statistics, as it enables the automated collection, processing, and analysis of large amounts of data. With such data science practices in place, it enables more timely, more insightful and more flexible reporting. However, the quality and integrity of data-science-driven statistics rely on the accuracy and reliability of the data sources and the machine learning techniques that support them. In particular, changes in data sources are inevitable to occur and pose significant risks that are crucial to address in the context of machine learning for official statistics. This paper gives an overview of the main risks, liabilities, and uncertainties associated with changing data sources in the context of machine learning for official statistics. We provide a checklist of the most prevalent origins and causes of changing data sources; not only on a technical level but also regarding ownership, ethics, regulation, and public perception. Next, we highlight the repercussions of changing data sources on statistical reporting. These include technical effects such as concept drift, bias, availability, validity, accuracy and completeness, but also the neutrality and potential discontinuation of the statistical offering. We offer a few important precautionary measures, such as enhancing robustness in both data sourcing and statistical techniques, and thorough monitoring. In doing so, machine learning-based official statistics can maintain integrity, reliability, consistency, and relevance in policy-making, decision-making, and public discourse.
Physics-Inspired Interpretability Of Machine Learning Models
The ability to explain decisions made by machine learning models remains one of the most significant hurdles towards widespread adoption of AI in highly sensitive areas such as medicine, cybersecurity or autonomous driving. Great interest exists in understanding which features of the input data prompt model decision making. In this contribution, we propose a novel approach to identify relevant features of the input data, inspired by methods from the energy landscapes field, developed in the physical sciences. By identifying conserved weights within groups of minima of the loss landscapes, we can identify the drivers of model decision making. Analogues to this idea exist in the molecular sciences, where coordinate invariants or order parameters are employed to identify critical features of a molecule. However, no such approach exists for machine learning loss landscapes. We will demonstrate the applicability of energy landscape methods to machine learning models and give examples, both synthetic and from the real world, for how these methods can help to make models more interpretable.
Active learning for data streams: a survey
Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in real time. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research.
Explanatory machine learning for sequential human teaching
The topic of comprehensibility of machine-learned theories has recently drawn increasing attention. Inductive Logic Programming (ILP) uses logic programming to derive logic theories from small data based on abduction and induction techniques. Learned theories are represented in the form of rules as declarative descriptions of obtained knowledge. In earlier work, the authors provided the first evidence of a measurable increase in human comprehension based on machine-learned logic rules for simple classification tasks. In a later study, it was found that the presentation of machine-learned explanations to humans can produce both beneficial and harmful effects in the context of game learning. We continue our investigation of comprehensibility by examining the effects of the ordering of concept presentations on human comprehension. In this work, we examine the explanatory effects of curriculum order and the presence of machine-learned explanations for sequential problem-solving. We show that 1) there exist tasks A and B such that learning A before B has a better human comprehension with respect to learning B before A and 2) there exist tasks A and B such that the presence of explanations when learning A contributes to improved human comprehension when subsequently learning B. We propose a framework for the effects of sequential teaching on comprehension based on an existing definition of comprehensibility and provide evidence for support from data collected in human trials. Empirical results show that sequential teaching of concepts with increasing complexity a) has a beneficial effect on human comprehension and b) leads to human re-discovery of divide-and-conquer problem-solving strategies, and c) studying machine-learned explanations allows adaptations of human problem-solving strategy with better performance.
Learning Curves for Decision Making in Supervised Machine Learning: A Survey
Learning curves are a concept from social sciences that has been adopted in the context of machine learning to assess the performance of a learning algorithm with respect to a certain resource, e.g., the number of training examples or the number of training iterations. Learning curves have important applications in several machine learning contexts, most notably in data acquisition, early stopping of model training, and model selection. For instance, learning curves can be used to model the performance of the combination of an algorithm and its hyperparameter configuration, providing insights into their potential suitability at an early stage and often expediting the algorithm selection process. Various learning curve models have been proposed to use learning curves for decision making. Some of these models answer the binary decision question of whether a given algorithm at a certain budget will outperform a certain reference performance, whereas more complex models predict the entire learning curve of an algorithm. We contribute a framework that categorises learning curve approaches using three criteria: the decision-making situation they address, the intrinsic learning curve question they answer and the type of resources they use. We survey papers from the literature and classify them into this framework.
Modern graph neural networks do worse than classical greedy algorithms in solving combinatorial optimization problems like maximum independent set
The recent work ``Combinatorial Optimization with Physics-Inspired Graph Neural Networks'' [Nat Mach Intell 4 (2022) 367] introduces a physics-inspired unsupervised Graph Neural Network (GNN) to solve combinatorial optimization problems on sparse graphs. To test the performances of these GNNs, the authors of the work show numerical results for two fundamental problems: maximum cut and maximum independent set (MIS). They conclude that "the graph neural network optimizer performs on par or outperforms existing solvers, with the ability to scale beyond the state of the art to problems with millions of variables." In this comment, we show that a simple greedy algorithm, running in almost linear time, can find solutions for the MIS problem of much better quality than the GNN. The greedy algorithm is faster by a factor of $10^4$ with respect to the GNN for problems with a million variables. We do not see any good reason for solving the MIS with these GNN, as well as for using a sledgehammer to crack nuts. In general, many claims of superiority of neural networks in solving combinatorial problems are at risk of being not solid enough, since we lack standard benchmarks based on really hard problems. We propose one of such hard benchmarks, and we hope to see future neural network optimizers tested on these problems before any claim of superiority is made.
NCVX: A General-Purpose Optimization Solver for Constrained Machine and Deep Learning
Imposing explicit constraints is relatively new but increasingly pressing in deep learning, stimulated by, e.g., trustworthy AI that performs robust optimization over complicated perturbation sets and scientific applications that need to respect physical laws and constraints. However, it can be hard to reliably solve constrained deep learning problems without optimization expertise. The existing deep learning frameworks do not admit constraints. General-purpose optimization packages can handle constraints but do not perform auto-differentiation and have trouble dealing with nonsmoothness. In this paper, we introduce a new software package called NCVX, whose initial release contains the solver PyGRANSO, a PyTorch-enabled general-purpose optimization package for constrained machine/deep learning problems, the first of its kind. NCVX inherits auto-differentiation, GPU acceleration, and tensor variables from PyTorch, and is built on freely available and widely used open-source frameworks. NCVX is available at https://ncvx.org, with detailed documentation and numerous examples from machine/deep learning and other fields.
Teaching Uncertainty Quantification in Machine Learning through Use Cases
Uncertainty in machine learning is not generally taught as general knowledge in Machine Learning course curricula. In this paper we propose a short curriculum for a course about uncertainty in machine learning, and complement the course with a selection of use cases, aimed to trigger discussion and let students play with the concepts of uncertainty in a programming setting. Our use cases cover the concept of output uncertainty, Bayesian neural networks and weight distributions, sources of uncertainty, and out of distribution detection. We expect that this curriculum and set of use cases motivates the community to adopt these important concepts into courses for safety in AI.
Public Policymaking for International Agricultural Trade using Association Rules and Ensemble Machine Learning
International economics has a long history of improving our understanding of factors causing trade, and the consequences of free flow of goods and services across countries. The recent shocks to the free trade regime, especially trade disputes among major economies, as well as black swan events, such as trade wars and pandemics, raise the need for improved predictions to inform policy decisions. AI methods are allowing economists to solve such prediction problems in new ways. In this manuscript, we present novel methods that predict and associate food and agricultural commodities traded internationally. Association Rules (AR) analysis has been deployed successfully for economic scenarios at the consumer or store level, such as for market basket analysis. In our work however, we present analysis of imports and exports associations and their effects on commodity trade flows. Moreover, Ensemble Machine Learning methods are developed to provide improved agricultural trade predictions, outlier events' implications, and quantitative pointers to policy makers.
A Machine Learning Approach for Recruitment Prediction in Clinical Trial Design
Significant advancements have been made in recent years to optimize patient recruitment for clinical trials, however, improved methods for patient recruitment prediction are needed to support trial site selection and to estimate appropriate enrollment timelines in the trial design stage. In this paper, using data from thousands of historical clinical trials, we explore machine learning methods to predict the number of patients enrolled per month at a clinical trial site over the course of a trial's enrollment duration. We show that these methods can reduce the error that is observed with current industry standards and propose opportunities for further improvement.
SiReN: Sign-Aware Recommendation Using Graph Neural Networks
In recent years, many recommender systems using network embedding (NE) such as graph neural networks (GNNs) have been extensively studied in the sense of improving recommendation accuracy. However, such attempts have focused mostly on utilizing only the information of positive user-item interactions with high ratings. Thus, there is a challenge on how to make use of low rating scores for representing users' preferences since low ratings can be still informative in designing NE-based recommender systems. In this study, we present SiReN, a new sign-aware recommender system based on GNN models. Specifically, SiReN has three key components: 1) constructing a signed bipartite graph for more precisely representing users' preferences, which is split into two edge-disjoint graphs with positive and negative edges each, 2) generating two embeddings for the partitioned graphs with positive and negative edges via a GNN model and a multi-layer perceptron (MLP), respectively, and then using an attention model to obtain the final embeddings, and 3) establishing a sign-aware Bayesian personalized ranking (BPR) loss function in the process of optimization. Through comprehensive experiments, we empirically demonstrate that SiReN consistently outperforms state-of-the-art NE-aided recommendation methods.
Learning Objective Boundaries for Constraint Optimization Problems
Constraint Optimization Problems (COP) are often considered without sufficient knowledge on the boundaries of the objective variable to optimize. When available, tight boundaries are helpful to prune the search space or estimate problem characteristics. Finding close boundaries, that correctly under- and overestimate the optimum, is almost impossible without actually solving the COP. This paper introduces Bion, a novel approach for boundary estimation by learning from previously solved instances of the COP. Based on supervised machine learning, Bion is problem-specific and solver-independent and can be applied to any COP which is repeatedly solved with different data inputs. An experimental evaluation over seven realistic COPs shows that an estimation model can be trained to prune the objective variables' domains by over 80%. By evaluating the estimated boundaries with various COP solvers, we find that Bion improves the solving process for some problems, although the effect of closer bounds is generally problem-dependent.
On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice
Machine learning algorithms have been used widely in various applications and areas. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine learning models has a direct impact on the model's performance. It often requires deep knowledge of machine learning algorithms and appropriate hyper-parameter optimization techniques. Although several automatic optimization techniques exist, they have different strengths and drawbacks when applied to different types of problems. In this paper, optimizing the hyper-parameters of common machine learning models is studied. We introduce several state-of-the-art optimization techniques and discuss how to apply them to machine learning algorithms. Many available libraries and frameworks developed for hyper-parameter optimization problems are provided, and some open challenges of hyper-parameter optimization research are also discussed in this paper. Moreover, experiments are conducted on benchmark datasets to compare the performance of different optimization methods and provide practical examples of hyper-parameter optimization. This survey paper will help industrial users, data analysts, and researchers to better develop machine learning models by identifying the proper hyper-parameter configurations effectively.
DOME: Recommendations for supervised machine learning validation in biology
Modern biology frequently relies on machine learning to provide predictions and improve decision processes. There have been recent calls for more scrutiny on machine learning performance and possible limitations. Here we present a set of community-wide recommendations aiming to help establish standards of supervised machine learning validation in biology. Adopting a structured methods description for machine learning based on data, optimization, model, evaluation (DOME) will aim to help both reviewers and readers to better understand and assess the performance and limitations of a method or outcome. The recommendations are formulated as questions to anyone wishing to pursue implementation of a machine learning algorithm. Answers to these questions can be easily included in the supplementary material of published papers.
mFI-PSO: A Flexible and Effective Method in Adversarial Image Generation for Deep Neural Networks
Deep neural networks (DNNs) have achieved great success in image classification, but can be very vulnerable to adversarial attacks with small perturbations to images. To improve adversarial image generation for DNNs, we develop a novel method, called mFI-PSO, which utilizes a Manifold-based First-order Influence measure for vulnerable image and pixel selection and the Particle Swarm Optimization for various objective functions. Our mFI-PSO can thus effectively design adversarial images with flexible, customized options on the number of perturbed pixels, the misclassification probability, and the targeted incorrect class. Experiments demonstrate the flexibility and effectiveness of our mFI-PSO in adversarial attacks and its appealing advantages over some popular methods.
Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar
Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre-trained model which generalizes from many different datasets and similar tasks. Our results demonstrate improved performance compared with our earlier work and existing methods on AutoML benchmark datasets for classification and regression tasks. In the spirit of reproducible research we make our data, models, and code publicly available.
Reproducibility in Machine Learning for Health
Machine learning algorithms designed to characterize, monitor, and intervene on human health (ML4H) are expected to perform safely and reliably when operating at scale, potentially outside strict human supervision. This requirement warrants a stricter attention to issues of reproducibility than other fields of machine learning. In this work, we conduct a systematic evaluation of over 100 recently published ML4H research papers along several dimensions related to reproducibility. We find that the field of ML4H compares poorly to more established machine learning fields, particularly concerning data and code accessibility. Finally, drawing from success in other fields of science, we propose recommendations to data providers, academic publishers, and the ML4H research community in order to promote reproducible research moving forward.