Research Paper ML Hub

AI research atlas / v2

Learn AI papers in the right order.

Start with landmark ideas, move through foundations, then branch into LLMs, GenAI, agents, systems, and safety with a reading path that keeps the field from feeling random.

10 learning tracksFull-paper readerChatGPT handoff
Recommended firstLandmark papers

Build the mental timeline before going deep.

Then specializeLLMs, GenAI, safety

Move from foundations to modern systems.

Read modePDF + resources
Path-firstNo more random paper hopping
Research-nativearXiv links, PDFs, resources
Study loopTrack reading and discuss in ChatGPT

Learning path

Where to start, and what to read next

Start with landmarks
01

Orientation / 1-2 weeks

Start Here

Read the papers everyone keeps referencing so the rest of the map has anchors.

Know the landmark namesBuild historical contextPick a direction
Open papers
02

Foundations / 2-4 weeks

Classical ML

Learn the statistical and probabilistic ideas that still sit under modern models.

Bayesian thinkingModel evaluationUncertainty
Open papers
03

Foundations / 1-2 weeks

Optimization

Understand the training mechanics behind gradient-based learning.

Gradient descentGeneralizationTraining stability
Open papers
04

Builder / 3-5 weeks

Deep Learning Core

Move through representation learning, CNNs, residual networks, and scaling patterns.

CNN intuitionRepresentation learningBenchmark culture
Open papers
05

Builder / 3-6 weeks

Sequence Models and LLMs

Study attention, transformers, language modeling, instruction tuning, and evaluation.

AttentionPretrainingInstruction following
Open papers
06

Specialist / 3-6 weeks

Generative AI

Compare GANs, diffusion, autoregressive generation, and modern GenAI workflows.

DiffusionGANsGeneration tradeoffs
Open papers
07

Specialist / 2-4 weeks

Multimodal and Retrieval

Connect language with images, retrieval, embeddings, and real-world knowledge access.

Vision-languageEmbeddingsRetrieval
Open papers
08

Specialist / 3-5 weeks

RL and Agents

Learn decision making, feedback, policy learning, and agent-style systems.

PoliciesRewardsExploration
Open papers
09

Practitioner / 2-4 weeks

Systems and Scaling

Understand the infrastructure and engineering papers behind large-scale training.

Distributed trainingServingEfficiency
Open papers
10

Practitioner / 2-4 weeks

Safety and Interpretability

Study robustness, alignment, transparency, and how to reason about model behavior.

AlignmentRobustnessInterpretability
Open papers

Research library

Foundation Models

Showing papers for this learning path. Open any paper card to read the full paper and related resources.

40 papers shown
unread2017

Attention is All you Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Ashish Vaswani, Noam Shazeer, Niki Parmar 175,029
Foundation Models
unread2014

Very Deep Convolutional Networks for Large-Scale Image Recognition

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

Karen Simonyan, Andrew Zisserman 75,503
Foundation Models
unread2023

MizAR 60 for Mizar 50

As a present to Mizar on its 50th anniversary, we develop an AI/TP system that automatically proves about 60% of the Mizar theorems in the hammer setting. We also automatically prove 75% of the Mizar theorems when the automated provers are helped by using only the premises used in the human-written Mizar proofs. We describe the methods and large-scale experiments leading to these results. This includes in particular the E and Vampire provers, their ENIGMA and Deepire learning modifications, a number of learning-based premise selection methods, and the incremental loop that interleaves growing a corpus of millions of ATP proofs with training increasingly strong AI/TP systems on them. We also present a selection of Mizar problems that were proved automatically.

Jakubův, Jan, Chvalovský, Karel, Goertzel, Zarathustra 75,444
Foundation Models
unread2018

AI-Assisted Pipeline for Dynamic Generation of Trustworthy Health Supplement Content at Scale

Although geospatial question answering systems have received increasing attention in recent years, existing prototype systems struggle to properly answer qualitative spatial questions. In this work, we propose a unique framework for answering qualitative spatial questions, which comprises three main components: a geoparser that takes the input questions and extracts place semantic information from text, a reasoning system which is embedded with a crisp reasoner, and finally, answer extraction, which refines the solution space and generates final answers. We present an experimental design to evaluate our framework for point-based cardinal direction calculus (CDC) relations by developing an automated approach for generating three types of synthetic qualitative spatial questions. The initial evaluations of generated answers in our system are promising because a high proportion of answers were labelled correct.

Holter, Ole Magnus, Ell, Basil 45,527
Foundation Models
unread1981

Random sample consensus

A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing

Martin A. Fischler, Robert C. Bolles 25,258
Foundation Models
unread2008

The qualitative content analysis process

AIM: This paper is a description of inductive and deductive content analysis. BACKGROUND: Content analysis is a method that may be used with either qualitative or quantitative data and in an inductive or deductive way. Qualitative content analysis is commonly used in nursing studies but little has been published on the analysis process and many research books generally only provide a short description of this method. DISCUSSION: When using content analysis, the aim was to build a model to describe the phenomenon in a conceptual form. Both inductive and deductive analysis processes are represented as three main phases: preparation, organizing and reporting. The preparation phase is similar in both approaches. The concepts are derived from the data in inductive content analysis. Deductive content analysis is used when the structure of analysis is operationalized on the basis of previous knowledge. CONCLUSION: Inductive content analysis is used in cases where there are no previous studies dealing with the phenomenon or when it is fragmented. A deductive approach is useful if the general aim was to test a previous theory in a different situation or to compare categories at different time periods.

Satu Elo, Helvi Kyngäs 21,946
Foundation Models
unread2020

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov 21,478
Foundation Models
unread2016

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

Martín Abadi, Ashish Agarwal, P. Barham 11,666
Foundation Models
unread2023

Segment Anything

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive – often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at segment-anything.com to foster research into foundation models for computer vision. We recommend reading the full paper at: arxiv.org/abs/2304.02643.

Alexander M. Kirillov, Eric Mintun, Nikhila Ravi 8,614
Foundation Models
unread2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided by human performance. One of the benefits of DL is the ability to learn massive amounts of data. The DL field has grown fast in the last few years and it has been extensively used to successfully address a wide range of traditional applications. More importantly, DL has outperformed well-known ML techniques in many domains, e.g., cybersecurity, natural language processing, bioinformatics, robotics and control, and medical information processing, among many others. Despite it has been contributed several works reviewing the State-of-the-Art on DL, all of them only tackled one aspect of the DL, which leads to an overall lack of knowledge about it. Therefore, in this contribution, we propose using a more holistic approach in order to provide a more suitable starting point from which to develop a full understanding of DL. Specifically, this review attempts to provide a more comprehensive survey of the most important aspects of DL and including those enhancements recently added to the field. In particular, this paper outlines the importance of DL, presents the types of DL techniques and networks. It then presents convolutional neural networks (CNNs) which the most utilized DL network type and describes the development of CNNs architectures together with their main features, e.g., starting with the AlexNet network and closing with the High-Resolution network (HR.Net). Finally, we further present the challenges and suggested solutions to help researchers understand the existing research gaps. It is followed by a list of the major DL applications. Computational tools including FPGA, GPU, and CPU are summarized along with a description of their influence on DL. The paper ends with the evolution matrix, benchmark datasets, and summary and conclusion.

Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi 7,361
Foundation Models
unread2025

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Ashish Vaswani, Noam Shazeer, Niki Parmar 6,533
Foundation Models
unread2021

Learning Transferable Visual Models From Natural Language Supervision

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.

Alec Radford, Jong Wook Kim, Chris Hallacy 5,296
Foundation Models
unread2021

Machine Learning: Algorithms, Real-World Applications and Research Directions

In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning, which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.

Iqbal H. Sarker 4,270
Foundation Models
unread2023

LLaMA: Open and Efficient Foundation Language Models

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

Hugo Touvron, Thibaut Lavril, Gautier Izacard 3,873
Foundation Models
unread2020

Language Models are Few-Shot Learners

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

T. B. Brown, Benjamin Mann, Nick Ryder 3,029
Foundation Models
unread2021

Is Space-Time Attention All You Need for Video Understanding?

We present a convolution-free approach to video classification built exclusively on self-attention over space and time. Our method, named"TimeSformer,"adapts the standard Transformer architecture to video by enabling spatiotemporal feature learning directly from a sequence of frame-level patches. Our experimental study compares different self-attention schemes and suggests that"divided attention,"where temporal attention and spatial attention are separately applied within each block, leads to the best video classification accuracy among the design choices considered. Despite the radically new design, TimeSformer achieves state-of-the-art results on several action recognition benchmarks, including the best reported accuracy on Kinetics-400 and Kinetics-600. Finally, compared to 3D convolutional networks, our model is faster to train, it can achieve dramatically higher test efficiency (at a small drop in accuracy), and it can also be applied to much longer video clips (over one minute long). Code and models are available at: https://github.com/facebookresearch/TimeSformer.

Gedas Bertasius, Heng Wang, L. Torresani 2,920
Foundation Models
unread2023

Llama 2: Open Foundation and Fine-Tuned Chat Models

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

Hugo Touvron, Louis Martin, Kevin H. Stone 2,609
Foundation Models
unread2024

A Survey on Evaluation of Large Language Models

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate , where to evaluate , and how to evaluate . Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, education, natural and social sciences, agent applications, and other areas. Secondly, we answer the ‘where’ and ‘how’ questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing the performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey

Yupeng Chang, Xu Wang, Jindong Wang 2,320
Foundation Models
unread2023

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan 1,539
Foundation Models
unread2023

Artificial Intelligence, Machine Learning and Deep Learning in Advanced Robotics, A Review

No abstract available yet.

Mohsen Soori, B. Arezoo, Roza Dastres 874
Foundation Models
unread2019

A Survey on Distributed Machine Learning

The demand for artificial intelligence has grown significantly over the past decade, and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new challenges: first and foremost, the efficient parallelization of the training process and the creation of a coherent model. This article provides an extensive overview of the current state-of-the-art in the field by outlining the challenges and opportunities of distributed machine learning over conventional (centralized) machine learning, discussing the techniques used for distributed machine learning, and providing an overview of the systems that are available.

Joost Verbraeken, Matthijs Wolting, J. Katzy 851
Foundation Models
unread2020

Attention Is All You Need In Speech Separation

Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-to-sequence learning. RNNs, however, are inherently sequential models that do not allow parallelization of their computations. Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.In this paper, we propose the SepFormer, a novel RNN-free Transformer-based neural network for speech separation. The Sep-Former learns short and long-term dependencies with a multi-scale approach that employs transformers. The proposed model achieves state-of-the-art (SOTA) performance on the standard WSJ0-2/3mix datasets. It reaches an SI-SNRi of 22.3 dB on WSJ0-2mix and an SI-SNRi of 19.5 dB on WSJ0-3mix. The SepFormer inherits the parallelization advantages of Transformers and achieves a competitive performance even when downsampling the encoded representation by a factor of 8. It is thus significantly faster and it is less memory-demanding than the latest speech separation systems with comparable performance.

Cem Subakan, M. Ravanelli, Samuele Cornell 756
Foundation Models
unread2022

RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning

Abstract The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to ∼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a ‘living data resource.’ Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.

S. Burley, Charmi Bhikadiya, Chunxiao Bi 667
Foundation Models
unread2023

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.

Yang Liu, Yuanshun Yao, Jean-François Ton 537
Foundation Models
unread2018

Artificial Intelligence, Machine Learning, Deep Learning, and Cognitive Computing: What Do These Terms Mean and How Will They Impact Health Care?

This article was presented at the 2017 annual meeting of the American Association of Hip and Knee Surgeons to introduce the members gathered as the audience to the concepts behind artificial intelligence (AI) and the applications that AI can have in the world of health care today. We discuss the origin of AI, progress to machine learning, and then discuss how the limits of machine learning lead data scientists to develop artificial neural networks and deep learning algorithms through biomimicry. We will place all these technologies in the context of practical clinical examples and show how AI can act as a tool to support and amplify human cognitive functions for physicians delivering care to increasingly complex patients. The aim of this article is to provide the reader with a basic understanding of the fundamentals of AI. Its purpose is to demystify this technology for practicing surgeons so they can better understand how and where to apply it.

S. Bini 529
Foundation Models
unread2019

Scaling Distributed Machine Learning with In-Network Aggregation

Training complex machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the training process. Our approach, SwitchML, reduces the volume of exchanged data by aggregating the model updates from multiple workers in the network. We co-design the switch processing with the end-host protocols and ML frameworks to provide a robust, efficient solution that speeds up training by up to 300%, and at least by 20% for a number of real-world benchmark models.

Amedeo Sapio, Marco Canini, Chen-Yu Ho 526
Foundation Models
unread2018

Building Trust in Artificial Intelligence, Machine Learning, and Robotics

No abstract available yet.

K. Siau, Weiyu Wang 498
Foundation Models
unread2018

Artificial intelligence, machine learning and health systems

Artificial Intelligence and machine learning have the potential to be the catalyst for transformation of health systems to improve efficiency and effectiveness, create headroom for universal health coverage and improve outcomes

T. Panch, Peter Szolovits, R. Atun 445
Foundation Models
unread2020

Channel Attention Is All You Need for Video Frame Interpolation

Prevailing video frame interpolation techniques rely heavily on optical flow estimation and require additional model complexity and computational cost; it is also susceptible to error propagation in challenging scenarios with large motion and heavy occlusion. To alleviate the limitation, we propose a simple but effective deep neural network for video frame interpolation, which is end-to-end trainable and is free from a motion estimation network component. Our algorithm employs a special feature reshaping operation, referred to as PixelShuffle, with a channel attention, which replaces the optical flow computation module. The main idea behind the design is to distribute the information in a feature map into multiple channels and extract motion information by attending the channels for pixel-level frame synthesis. The model given by this principle turns out to be effective in the presence of challenging motion and occlusion. We construct a comprehensive evaluation benchmark and demonstrate that the proposed approach achieves outstanding performance compared to the existing models with a component for optical flow computation.

Myungsub Choi, Heewon Kim, Bohyung Han 361
Foundation Models
unread2024

Ethical and Bias Considerations in Artificial Intelligence (AI)/Machine Learning.

As artificial intelligence (AI) gains prominence in pathology and medicine, the ethical implications and potential biases within such integrated AI models will require careful scrutiny. Ethics and bias are important considerations in our practice settings, especially as increased number of machine learning (ML) systems are being integrated within our various medical domains. Such machine learning based systems, have demonstrated remarkable capabilities in specified tasks such as but not limited to image recognition, natural language processing, and predictive analytics. However, the potential bias that may exist within such AI-ML models can also inadvertently lead to unfair and potentially detrimental outcomes. The source of bias within such machine learning models can be due to numerous factors but can be typically put in three main buckets (data bias, development bias and interaction bias). These could be due to the training data, algorithmic bias, feature engineering and selection issues, clinical and institutional bias (i.e. practice variability), reporting bias, and temporal bias (i.e. changes in technology, clinical practice or disease patterns). Therefore despite the potential of these AI-ML applications, their deployment in our day to day practice also raises noteworthy ethical concerns. To address ethics and bias in medicine, a comprehensive evaluation process is required which will encompass all aspects such systems, from model development through clinical deployment. Addressing these biases is crucial to ensure that AI-ML systems remain fair, transparent, and beneficial to all. This review will discuss the relevant ethical and bias considerations in AI-ML specifically within the pathology and medical domain.

Matthew G. Hanna, Liron Pantanowitz, Brian R. Jackson 290
Foundation Models
unread2017

Artificial intelligence, machine learning and deep learning

No abstract available yet.

Pariwat Ongsulee 283
Foundation Models
unread2019

Artificial Intelligence, Machine Learning, Automation, Robotics, Future of Work and Future of Humanity: A Review and Research Agenda

The exponential advancement in artificial intelligence (AI), machine learning, robotics, and automation are rapidly transforming industries and societies across the world. The way we work, the way we live, and the way we interact with others are expected to be transformed at a speed and scale beyond anything we have observed in human history. This new industrial revolution is expected, on one hand, to enhance and improve our lives and societies. On the other hand, it has the potential to cause major upheavals in our way of life and our societal norms. The window of opportunity to understand the impact of these technologies and to preempt their negative effects is closing rapidly. Humanity needs to be proactive, rather than reactive, in managing this new industrial revolution. This article looks at the promises, challenges, and future research directions of these transformative technologies. Not only are the technological aspects investigated, but behavioral, societal, policy, and governance issues are reviewed as well. This research contributes to the ongoing discussions and debates about AI, automation, machine learning, and robotics. It is hoped that this article will heighten awareness of the importance of understanding these disruptive technologies as a basis for formulating policies and regulations that can maximize the benefits of these advancements for humanity and, at the same time, curtail potential dangers and negative impacts.

Weiyu Wang, K. Siau 281
Foundation Models
unread2023

Attention is all you need: utilizing attention in AI-enabled drug discovery

Abstract Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.

Yang Zhang, Caiqi Liu, Mujiexin Liu 271
Foundation Models
unread2020

A Review of Further Directions for Artificial Intelligence, Machine Learning, and Deep Learning in Smart Logistics

Industry 4.0 concepts and technologies ensure the ongoing development of micro- and macro-economic entities by focusing on the principles of interconnectivity, digitalization, and automation. In this context, artificial intelligence is seen as one of the major enablers for Smart Logistics and Smart Production initiatives. This paper systematically analyzes the scientific literature on artificial intelligence, machine learning, and deep learning in the context of Smart Logistics management in industrial enterprises. Furthermore, based on the results of the systematic literature review, the authors present a conceptual framework, which provides fruitful implications based on recent research findings and insights to be used for directing and starting future research initiatives in the field of artificial intelligence (AI), machine learning (ML), and deep learning (DL) in Smart Logistics.

M. Woschank, E. Rauch, Helmut E. Zsifkovits 254
Foundation Models
unread2022

Has the Future Started? The Current Growth of Artificial Intelligence, Machine Learning, and Deep Learning

In the modern era, many terms related to artificial intelligence, machine learning, and deep learning are widely used in domains such as business, healthcare, industries, and military. In these fields, the accurate prediction and analysis of data are crucial, regardless of how large the data are. However, using big data is confusing due to the rapid growth and massive development in public life, which requires a tremendous human effort in order to deal with such type of data and extract worthy information from it. Thus, the role of artificial intelligence begins in analyzing big data based on scientific techniques, especially in machine learning, whereby it can identify patterns of decision-making and reduce human intervention. In this regard, the significance role of artificial intelligence, machine learning and deep learning is growing rapidly. In this article, the authors decide to highlight these sciences by discussing how to develop and apply them in many decision-making domains. In addition, the influence of artificial intelligence in healthcare and the gains this science provides in the face of the COVID-19 pandemic are highlighted. This article concludes that these sciences have a significant impact, especially in healthcare, as well as the ability to grow and improve their methodology in decision-making. Additionally, artificial intelligence is a vital science, especially in the face of COVID-19.

Maad M. Mijwil 245
Foundation Models
unread2024

Deliberative Alignment: Reasoning Enables Safer Language Models

As large-scale language models increasingly impact safety-critical domains, ensuring their reliable adherence to well-defined principles remains a fundamental challenge. We introduce Deliberative Alignment, a new paradigm that directly teaches the model safety specifications and trains it to explicitly recall and accurately reason over the specifications before answering. We used this approach to align OpenAI’s o-series models [1], and achieved highly precise adherence to OpenAI’s safety policies, without requiring human-written chain-of-thoughts or answers. Deliberative Alignment pushes the Pareto frontier by simultaneously increasing robustness to jailbreaks while decreasing overrefusal rates, and also improves out-of-distribution generalization. We demonstrate that reasoning over explicitly specified policies enables more scalable, trustworthy, and interpretable alignment.

Melody Y. Guan, Manas R. Joglekar, Eric Wallace 238
Foundation Models
unread2022

Artificial Intelligence, Machine Learning, and Deep Learning in Structural Engineering: A Scientometrics Review of Trends and Best Practices

No abstract available yet.

Arash Teymori Gharah Tapeh, M. Z. Naser 228
Foundation Models
unread2020

The need for a system view to regulate artificial intelligence/machine learning-based software as medical device

Artificial intelligence (AI) and Machine learning (ML) systems in medicine are poised to significantly improve health care, for example, by offering earlier diagnoses of diseases or recommending optimally individualized treatment plans. However, the emergence of AI/ML in medicine also creates challenges, which regulators must pay attention to. Which medical AI/ML-based products should be reviewed by regulators? What evidence should be required to permit marketing for AI/ML-based software as a medical device (SaMD)? How can we ensure the safety and effectiveness of AI/ML-based SaMD that may change over time as they are applied to new data? The U.S. Food and Drug Administration (FDA), for example, has recently proposed a discussion paper to address some of these issues. But it misses an important point: we argue that regulators like the FDA need to widen their scope from evaluating medical AI/ML-based products to assessing systems. This shift in perspective—from a product view to a system view—is central to maximizing the safety and efficacy of AI/ML in health care, but it also poses significant challenges for agencies like the FDA who are used to regulating products, not systems. We offer several suggestions for regulators to make this challenging but important transition.

S. Gerke, Boris Babic, T. Evgeniou 225
Foundation Models
unread2019

Promising Artificial Intelligence‐Machine Learning‐Deep Learning Algorithms in Ophthalmology

Abstract: The lifestyle of modern society has changed significantly with the emergence of artificial intelligence (AI), machine learning (ML), and deep learning (DL) technologies in recent years. Artificial intelligence is a multidimensional technology with various components such as advanced algorithms, ML and DL. Together, AI, ML, and DL are expected to provide automated devices to ophthalmologists for early diagnosis and timely treatment of ocular disorders in the near future. In fact, AI, ML, and DL have been used in ophthalmic setting to validate the diagnosis of diseases, read images, perform corneal topographic mapping and intraocular lens calculations. Diabetic retinopathy (DR), age‐related macular degeneration (AMD), and glaucoma are the 3 most common causes of irreversible blindness on a global scale. Ophthalmic imaging provides a way to diagnose and objectively detect the progression of a number of pathologies including DR, AMD, glaucoma, and other ophthalmic disorders. There are 2 methods of imaging used as diagnostic methods in ophthalmic practice: fundus digital photography and optical coherence tomography (OCT). Of note, OCT has become the most widely used imaging modality in ophthalmology settings in the developed world. Changes in population demographics and lifestyle, extension of average lifespan, and the changing pattern of chronic diseases such as obesity, diabetes, DR, AMD, and glaucoma create a rising demand for such images. Furthermore, the limitation of availability of retina specialists and trained human graders is a major problem in many countries. Consequently, given the current population growth trends, it is inevitable that analyzing such images is time‐consuming, costly, and prone to human error. Therefore, the detection and treatment of DR, AMD, glaucoma, and other ophthalmic disorders through unmanned automated applications system in the near future will be inevitable. We provide an overview of the potential impact of the current AI, ML, and DL methods and their applications on the early detection and treatment of DR, AMD, glaucoma, and other ophthalmic diseases.

L. Balyen, T. Peto 220
Foundation Models
unread2018

The state-of-the-art on Intellectual Property Analytics (IPA): A literature review on artificial intelligence, machine learning and deep learning methods for analysing intellectual property (IP) data

Abstract Big data is increasingly available in all areas of manufacturing and operations, which presents an opportunity for better decision making and discovery of the next generation of innovative technologies. Recently, there have been substantial developments in the field of patent analytics, which describes the science of analysing large amounts of patent information to discover trends. We define Intellectual Property Analytics (IPA) as the data science of analysing large amount of IP information, to discover relationships, trends and patterns for decision making. In this paper, we contribute to the ongoing discussion on the use of intellectual property analytics methods, i.e artificial intelligence methods, machine learning and deep learning approaches, to analyse intellectual property data. This literature review follows a narrative approach with search strategy, where we present the state-of-the-art in intellectual property analytics by reviewing 57 recent articles. The bibliographic information of the articles are analysed, followed by a discussion of the articles divided in four main categories: knowledge management, technology management, economic value, and extraction and effective management of information. We hope research scholars and industrial users, may find this review helpful when searching for the latest research efforts pertaining to intellectual property analytics.

L. Aristodemou, F. Tietze 195
Foundation Models