diff --git a/LLMs-papers.md b/LLMs-papers.md
index 1beb408..2fbe058 100644
--- a/LLMs-papers.md
+++ b/LLMs-papers.md
@@ -14,34 +14,3 @@
   Corpus Models](http://arxiv.org/abs/2112.07055v2)
 9. [Unsupervised Improvement of Factual Knowledge in Language Models](http://arxiv.org/abs/2304.01597v1)
 10. [Multilingual Text Classification for Dravidian Languages](http://arxiv.org/abs/2112.01705v1)
-11. [Can Large Language Models design a Robot?](http://arxiv.org/abs/2303.15324v1)
-12. [HinFlair: pre-trained contextual string embeddings for pos tagging and
-  text classification in the Hindi language](http://arxiv.org/abs/2101.06949v1)
-13. [Jigsaw: Large Language Models meet Program Synthesis](http://arxiv.org/abs/2112.02969v1)
-14. [Larger-Scale Transformers for Multilingual Masked Language Modeling](http://arxiv.org/abs/2105.00572v1)
-15. [Exploring Cross-lingual Textual Style Transfer with Large Multilingual
-  Language Models](http://arxiv.org/abs/2206.02252v1)
-16. [Low-Resource Language Modelling of South African Languages](http://arxiv.org/abs/2104.00772v1)
-17. [In What Languages are Generative Language Models the Most Formal?
-  Analyzing Formality Distribution across Languages](http://arxiv.org/abs/2302.12299v1)
-18. [GreenPLM: Cross-lingual pre-trained language models conversion with
-  (almost) no cost](http://arxiv.org/abs/2211.06993v2)
-19. [BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting](http://arxiv.org/abs/2212.09535v1)
-20. [LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with
-  Knowledge Distillation](http://arxiv.org/abs/2302.08387v1)
-21. [Benchmarking Language Models for Code Syntax Understanding](http://arxiv.org/abs/2210.14473v1)
-22. [Can Character-based Language Models Improve Downstream Task Performance
-  in Low-Resource and Noisy Language Scenarios?](http://arxiv.org/abs/2110.13658v1)
-23. [Learning Translation Quality Evaluation on Low Resource Languages from
-  Large Language Models](http://arxiv.org/abs/2302.03491v1)
-24. [Investigating the Translation Performance of a Large Multilingual
-  Language Model: the Case of BLOOM](http://arxiv.org/abs/2303.01911v1)
-25. [Investigating Robustness of Dialog Models to Popular Figurative Language
-  Constructs](http://arxiv.org/abs/2110.00687v1)
-26. [DPRK-BERT: The Supreme Language Model](http://arxiv.org/abs/2112.00567v1)
-27. [MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural
-  Code Generation](http://arxiv.org/abs/2208.08227v4)
-28. [Subword Segmental Language Modelling for Nguni Languages](http://arxiv.org/abs/2210.06525v1)
-29. [TabLLM: Few-shot Classification of Tabular Data with Large Language
-  Models](http://arxiv.org/abs/2210.10723v2)
-30. [Language Model Supervision for Handwriting Recognition Model Adaptation](http://arxiv.org/abs/1808.01423v1)
diff --git a/data/2023-04-17.json b/data/2023-04-17.json
new file mode 100644
index 0000000..1592c3f
--- /dev/null
+++ b/data/2023-04-17.json
@@ -0,0 +1 @@
+[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper clarifies why bias cannot be completely mitigated in MachineLearning (ML) and proposes an end-to-end methodology to translate the ethicalprinciple of justice and fairness into the practice of ML development as anongoing agreement with stakeholders. The pro-ethical iterative processpresented in the paper aims to challenge asymmetric power dynamics in thefairness decision making within ML design and support ML development teams toidentify, mitigate and monitor bias at each step of ML systems development. Theprocess also provides guidance on how to explain the always imperfecttrade-offs in terms of bias to users.", "output": "Fairness: from the ethical principle to the practice of Machine Learning development as an ongoing agreement with stakeholders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-Image (T2I) generation is enabling new applications that supportcreators, designers, and general end users of productivity software bygenerating illustrative content with high photorealism starting from a givendescriptive text as a prompt. Such models are however trained on massiveamounts of web data, which surfaces the peril of potential harmful biases thatmay leak in the generation process itself. In this paper, we take amulti-dimensional approach to studying and quantifying common social biases asreflected in the generated images, by focusing on how occupations, personalitytraits, and everyday situations are depicted across representations of(perceived) gender, age, race, and geographical location. Through an extensiveset of both automated and human evaluation experiments we present findings fortwo popular T2I models: DALLE-v2 and Stable Diffusion. Our results reveal thatthere exist severe occupational biases of neutral prompts majorly excludinggroups of people from results for both models. Such biases can get mitigated byincreasing the amount of specification in the prompt itself, although theprompting mitigation will not address discrepancies in image quality or otherusages of the model or its representations in other scenarios. Further, weobserve personality traits being associated with only a limited set of peopleat the intersection of race, gender, and age. Finally, an analysis ofgeographical location representations on everyday situations (e.g., park, food,weddings) shows that for most situations, images generated through defaultlocation-neutral prompts are closer and more similar to images generated forlocations of United States and Germany.", "output": "Social Biases through the Text-to-Image Generation Lens."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Next generation cars embed intelligent assessment of car driving safetythrough innovative solutions often based on usage of artificial intelligence.The safety driving monitoring can be carried out using several methodologieswidely treated in scientific literature. In this context, the author proposesan innovative approach that uses ad-hoc bio-sensing system suitable toreconstruct the physio-based attentional status of the car driver. Toreconstruct the car driver physiological status, the author proposed the use ofa bio-sensing probe consisting of a coupled LEDs at Near infrared (NiR)spectrum with a photodetector. This probe placed over the monitored subjectallows to detect a physiological signal called PhotoPlethysmoGraphy (PPG). ThePPG signal formation is regulated by the change in oxygenated andnon-oxygenated hemoglobin concentration in the monitored subject bloodstreamwhich will be directly connected to cardiac activity in turn regulated by theAutonomic Nervous System (ANS) that characterizes the subject's attentionlevel. This so designed car driver drowsiness monitoring will be combined withfurther driving safety assessment based on correlated intelligent drivingscenario understanding.", "output": "Deep Learning Systems for Advanced Driving Assistance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We applied physics-informed neural networks to solve the constitutiverelations for nonlinear, path-dependent material behavior. As a result, thetrained network not only satisfies all thermodynamic constraints but alsoinstantly provides information about the current material state (i.e., freeenergy, stress, and the evolution of internal variables) under any givenloading scenario without requiring initial data. One advantage of this work isthat it bypasses the repetitive Newton iterations needed to solve nonlinearequations in complex material models. Additionally, strategies are provided toreduce the required order of derivation for obtaining the tangent operator. Thetrained model can be directly used in any finite element package (or othernumerical methods) as a user-defined material model. However, challenges remainin the proper definition of collocation points and in integrating severalnon-equality constraints that become active or non-active simultaneously. Wetested this methodology on rate-independent processes such as the classical vonMises plasticity model with a nonlinear hardening law, as well as local damagemodels for interface cracking behavior with a nonlinear softening law. Finally,we discuss the potential and remaining challenges for future developments ofthis new approach.", "output": "Learning solution of nonlinear constitutive material models using physics-informed neural networks: COMM-PINN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Combinatorial optimization (CO) aims to efficiently find the best solution toNP-hard problems ranging from statistical physics to social media marketing. Awide range of CO applications can benefit from local search methods becausethey allow reversible action over greedy policies. Deep Q-learning (DQN) usingmessage-passing neural networks (MPNN) has shown promise in replicating thelocal search behavior and obtaining comparable results to the local searchalgorithms. However, the over-smoothing and the information loss during theiterations of message passing limit its robustness across applications, and thelarge message vectors result in memory inefficiency. Our paper introducesRELS-DQN, a lightweight DQN framework that exhibits the local search behaviorwhile providing practical scalability. Using the RELS-DQN model trained on oneapplication, it can generalize to various applications by providing solutionvalues higher than or equal to both the local search algorithms and theexisting DQN models while remaining efficient in runtime and memory.", "output": "RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the continuous improvement of computing power and deep learningalgorithms in recent years, the foundation model has grown in popularity.Because of its powerful capabilities and excellent performance, this technologyis being adopted and applied by an increasing number of industries. In theintelligent transportation industry, artificial intelligence faces thefollowing typical challenges: few shots, poor generalization, and a lack ofmulti-modal techniques. Foundation model technology can significantly alleviatethe aforementioned issues. To address these, we designed the 1st FoundationModel Challenge, with the goal of increasing the popularity of foundation modeltechnology in traffic scenarios and promoting the rapid development of theintelligent transportation industry. The challenge is divided into two tracks:all-in-one and cross-modal image retrieval. Furthermore, we provide a newbaseline and benchmark for the two tracks, called Open-TransMind. According toour knowledge, Open-TransMind is the first open-source transportationfoundation model with multi-task and multi-modal capabilities. Simultaneously,Open-TransMind can achieve state-of-the-art performance on detection,classification, and segmentation datasets of traffic scenarios. Our source codeis available at ", "output": "Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement learning demonstrates significant potential in automaticallybuilding control policies in numerous domains, but shows low efficiency whenapplied to robot manipulation tasks due to the curse of dimensionality. Tofacilitate the learning of such tasks, prior knowledge or heuristics thatincorporate inherent simplification can effectively improve the learningperformance. This paper aims to define and incorporate the natural symmetrypresent in physical robotic environments. Then, sample-efficient policies aretrained by exploiting the expert demonstrations in symmetrical environmentsthrough an amalgamation of reinforcement and behavior cloning, which gives theoff-policy learning process a diverse yet compact initiation. Furthermore, itpresents a rigorous framework for a recent concept and explores its scope forrobot manipulation tasks. The proposed method is validated via twopoint-to-point reaching tasks of an industrial arm, with and without anobstacle, in a simulation experiment study. A PID controller, which tracks thelinear joint-space trajectories with hard-coded temporal logic to produceinterim midpoints, is used to generate demonstrations in the study. The resultsof the study present the effect of the number of demonstrations and quantifythe magnitude of behavior cloning to exemplify the possible improvement ofmodel-free reinforcement learning in common manipulation tasks. A comparisonstudy between the proposed method and a traditional off-policy reinforcementlearning algorithm indicates its advantage in learning performance andpotential value for applications.", "output": "Exploiting Symmetry and Heuristic Demonstrations in Off-policy Reinforcement Learning for Robotic Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Simulation is essential to reinforcement learning (RL) before implementationin the real world, especially for safety-critical applications like robotmanipulation. Conventionally, RL agents are sensitive to the discrepanciesbetween the simulation and the real world, known as the sim-to-real gap. Theapplication of domain randomization, a technique used to fill this gap, islimited to the imposition of heuristic-randomized models. We investigate theproperties of intrinsic stochasticity of real-time simulation (RT-IS) ofoff-the-shelf simulation software and its potential to improve the robustnessof RL methods and the performance of domain randomization. Firstly, we conductanalytical studies to measure the correlation of RT-IS with the occupation ofthe computer hardware and validate its comparability with the naturalstochasticity of a physical robot. Then, we apply the RT-IS feature in thetraining of an RL agent. The simulation and physical experiment results verifythe feasibility and applicability of RT-IS to robust RL agent design for robotmanipulation tasks. The RT-IS-powered robust RL agent outperforms conventionalRL agents on robots with modeling uncertainties. It requires fewer heuristicrandomization and achieves better generalizability than the conventionaldomain-randomization-powered agents. Our findings provide a new perspective onthe sim-to-real problem in practical applications like robot manipulationtasks.", "output": "Exploiting Intrinsic Stochasticity of Real-Time Simulation to Facilitate Robust Reinforcement Learning for Robot Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In biomedical research and artificial intelligence, access to large,well-balanced, and representative datasets is crucial for developingtrustworthy applications that can be used in real-world scenarios. However,obtaining such datasets can be challenging, as they are often restricted tohospitals and specialized facilities. To address this issue, the study proposesto generate highly realistic synthetic faces exhibiting drug abuse traitsthrough augmentation. The proposed method, called \"3DG-GA\", Deep De-identifiedanonymous Dataset Generation, uses Genetics Algorithm as a strategy forsynthetic faces generation. The algorithm includes GAN artificial facegeneration, forgery detection, and face recognition. Initially, a dataset of120 images of actual facial drug abuse is used. By preserving, the drug traits,the 3DG-GA provides a dataset containing 3000 synthetic facial drug abuseimages. The dataset will be open to the scientific community, which canreproduce our results and benefit from the generated datasets while avoidinglegal or ethical restrictions.", "output": "Generation of artificial facial drug abuse images using Deep De-identified anonymous Dataset augmentation through Genetics Algorithm (3DG-GA)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The short-form videos have explosive popularity and have dominated the newsocial media trends. Prevailing short-video platforms,~textit{e.g.}, Kuaishou(Kwai), TikTok, Instagram Reels, and YouTube Shorts, have changed the way weconsume and create content. For video content creation and understanding, theshot boundary detection (SBD) is one of the most essential components invarious scenarios. In this work, we release a new public Short video sHotbOundary deTection dataset, named SHOT, consisting of 853 complete short videosand 11,606 shot annotations, with 2,716 high quality shot boundary annotationsin 200 test videos. Leveraging this new data wealth, we propose to optimize themodel design for video SBD, by conducting neural architecture search in asearch space encapsulating various advanced 3D ConvNets and Transformers. Ourproposed approach, named AutoShot, achieves higher F1 scores than previousstate-of-the-art approaches, e.g., outperforming TransNetV2 by 4.2%, when beingderived and evaluated on our newly constructed SHOT dataset. Moreover, tovalidate the generalizability of the AutoShot architecture, we directlyevaluate it on another three public datasets: ClipShots, BBC and RAI, and theF1 scores of AutoShot outperform previous state-of-the-art approaches by 1.1%,0.9% and 1.2%, respectively. The SHOT dataset and code can be found in .", "output": "AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "IoT is one of the fastest-growing technologies and it is estimated that morethan a billion devices would be utilized across the globe by the end of 2030.To maximize the capability of these connected entities, trust and reputationamong IoT entities is essential. Several trust management models have beenproposed in the IoT environment; however, these schemes have not fullyaddressed the IoT devices features, such as devices role, device type and itsdynamic behavior in a smart environment. As a result, traditional trust andreputation models are insufficient to tackle these characteristics anduncertainty risks while connecting nodes to the network. Whilst continuousstudy has been carried out and various articles suggest promising solutions inconstrained environments, research on trust and reputation is still at itsinfancy. In this paper, we carry out a comprehensive literature review onstate-of-the-art research on the trust and reputation of IoT devices andsystems. Specifically, we first propose a new structure, namely a new taxonomy,to organize the trust and reputation models based on the ways trust is managed.The proposed taxonomy comprises of traditional trust management-based systemsand artificial intelligence-based systems, and combine both the classes whichencourage the existing schemes to adapt these emerging concepts. Thiscollaboration between the conventional mathematical and the advanced ML modelsresult in design schemes that are more robust and efficient. Then we drill downto compare and analyse the methods and applications of these systems based oncommunity-accepted performance metrics, e.g. scalability, delay,cooperativeness and efficiency. Finally, built upon the findings of theanalysis, we identify and discuss open research issues and challenges, andfurther speculate and point out future research directions.", "output": "IoT trust and reputation: a survey and taxonomy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "ChatGPT has recently gathered attention from the general public and academiaas a tool that is able to generate plausible and human-sounding text answers tovarious questions. One potential use, or abuse, of ChatGPT is in answeringvarious questions or even generating whole essays and research papers in anacademic or classroom setting. While recent works have explored the use ofChatGPT in the context of humanities, business school, or medical school, thiswork explores how ChatGPT performs in the context of an introductory computerengineering course. This work assesses ChatGPT's aptitude in answering quizzes,homework, exam, and laboratory questions in an introductory-level computerengineering course. This work finds that ChatGPT can do well on questionsasking about generic concepts. However, predictably, as a text-only tool, itcannot handle questions with diagrams or figures, nor can it generate diagramsand figures. Further, also clearly, the tool cannot do hands-on labexperiments, breadboard assembly, etc., but can generate plausible answers tosome laboratory manual questions. One of the key observations presented in thiswork is that the ChatGPT tool could not be used to pass all components of thecourse. Nevertheless, it does well on quizzes and short-answer questions. Onthe other hand, plausible, human-sounding answers could confuse students whengenerating incorrect but still plausible answers.", "output": "Analyzing ChatGPT's Aptitude in an Introductory Computer Engineering Course."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial General Intelligence (AGI) is poised to revolutionize a variety ofsectors, including healthcare, finance, transportation, and education. Withinhealthcare, AGI is being utilized to analyze clinical medical notes, recognizepatterns in patient data, and aid in patient management. Agriculture is anothercritical sector that impacts the lives of individuals worldwide. It serves as afoundation for providing food, fiber, and fuel, yet faces several challenges,such as climate change, soil degradation, water scarcity, and food security.AGI has the potential to tackle these issues by enhancing crop yields, reducingwaste, and promoting sustainable farming practices. It can also help farmersmake informed decisions by leveraging real-time data, leading to more efficientand effective farm management. This paper delves into the potential futureapplications of AGI in agriculture, such as agriculture image processing,natural language processing (NLP), robotics, knowledge graphs, andinfrastructure, and their impact on precision livestock and precision crops. Byleveraging the power of AGI, these emerging technologies can provide farmerswith actionable insights, allowing for optimized decision-making and increasedproductivity. The transformative potential of AGI in agriculture is vast, andthis paper aims to highlight its potential to revolutionize the industry.", "output": "AGI for Agriculture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The task of intercepting a target moving along a rectilinear or circulartrajectory by a Dubins' car is formulated as a time-optimal control problemwith an arbitrary direction of the car's velocity at the interception moment.To solve this problem and to synthesize interception trajectories, neuralnetwork methods of unsupervised learning based on the Deep Deterministic PolicyGradient algorithm are used. The analysis of the obtained control laws andinterception trajectories in comparison with the analytical solutions of theinterception problem is performed. The mathematical modeling for the parametersof the target movement that the neural network had not seen before duringtraining is carried out. Model experiments are conducted to test the stabilityof the neural solution. The effectiveness of using neural network methods forthe synthesis of interception trajectories for given classes of targetmovements is shown.", "output": "Neural Network Algorithm for Intercepting Targets Moving Along Known Trajectories by a Dubins' Car."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As technology progresses, smart automated systems will serve an increasinglyimportant role in the agricultural industry. Current existing vision systemsfor yield estimation face difficulties in occlusion and scalability as theyutilize a camera system that is large and expensive, which are unsuitable fororchard environments. To overcome these problems, this paper presents a sizemeasurement method combining a machine learning model and depth images capturedfrom three low cost RGBD cameras to detect and measure the height and width oftomatoes. The performance of the presented system is evaluated on a labenvironment with real tomato fruits and fake leaves to simulate occlusion inthe real farm environment. To improve accuracy by addressing fruit occlusion,our three-camera system was able to achieve a height measurement accuracy of0.9114 and a width accuracy of 0.9443.", "output": "Visual based Tomato Size Measurement System for an Indoor Farming Environment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Improving performance in multiple domains is a challenging task, and oftenrequires significant amounts of data to train and test models. Active learningtechniques provide a promising solution by enabling models to select the mostinformative samples for labeling, thus reducing the amount of labeled datarequired to achieve high performance. In this paper, we present an activelearning-based framework for improving performance across multiple domains. Ourapproach consists of two stages: first, we use an initial set of labeled datato train a base model, and then we iteratively select the most informativesamples for labeling to refine the model. We evaluate our approach on severalmulti-domain datasets, including image classification, sentiment analysis, andobject recognition. Our experiments demonstrate that our approach consistentlyoutperforms baseline methods and achieves state-of-the-art performance onseveral datasets. We also show that our method is highly efficient, requiringsignificantly fewer labeled samples than other active learning-based methods.Overall, our approach provides a practical and effective solution for improvingperformance across multiple domains using active learning techniques.", "output": "Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-Agent Reinforcement Learning (MARL) discovers policies that maximizereward but do not have safety guarantees during the learning and deploymentphases. Although shielding with Linear Temporal Logic (LTL) is a promisingformal method to ensure safety in single-agent Reinforcement Learning (RL), itresults in conservative behaviors when scaling to multi-agent scenarios.Additionally, it poses computational challenges for synthesizing shields incomplex multi-agent environments. This work introduces Model-based DynamicShielding (MBDS) to support MARL algorithm design. Our algorithm synthesizesdistributive shields, which are reactive systems running in parallel with eachMARL agent, to monitor and rectify unsafe behaviors. The shields candynamically split, merge, and recompute based on agents' states. This designenables efficient synthesis of shields to monitor agents in complexenvironments without coordination overheads. We also propose an algorithm tosynthesize shields without prior knowledge of the dynamics model. The proposedalgorithm obtains an approximate world model by interacting with theenvironment during the early stage of exploration, making our MBDS enjoy formalsafety guarantees with high probability. We demonstrate in simulations that ourframework can surpass existing baselines in terms of safety guarantees andlearning performance.", "output": "Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present NeRFVS, a novel neural radiance fields (NeRF) based method toenable free navigation in a room. NeRF achieves impressive performance inrendering images for novel views similar to the input views while suffering fornovel views that are significantly different from the training views. Toaddress this issue, we utilize the holistic priors, including pseudo depth mapsand view coverage information, from neural reconstruction to guide the learningof implicit neural representations of 3D indoor scenes. Concretely, anoff-the-shelf neural reconstruction method is leveraged to generate a geometryscaffold. Then, two loss functions based on the holistic priors are proposed toimprove the learning of NeRF: 1) A robust depth loss that can tolerate theerror of the pseudo depth map to guide the geometry learning of NeRF; 2) Avariance loss to regularize the variance of implicit neural representations toreduce the geometry and color ambiguity in the learning procedure. These twoloss functions are modulated during NeRF optimization according to the viewcoverage information to reduce the negative influence brought by the viewcoverage imbalance. Extensive results demonstrate that our NeRFVS outperformsstate-of-the-art view synthesis methods quantitatively and qualitatively onindoor scenes, achieving high-fidelity free navigation results.", "output": "NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Heterogeneous graph neural networks aim to discover discriminative nodeembeddings and relations from multi-relational networks.One challenge ofheterogeneous graph learning is the design of learnable meta-paths, whichsignificantly influences the quality of learned embeddings.Thus, in this paper,we propose an Attributed Multi-Order Graph Convolutional Network (AMOGCN),which automatically studies meta-paths containing multi-hop neighbors from anadaptive aggregation of multi-order adjacency matrices. The proposed modelfirst builds different orders of adjacency matrices from manually designed nodeconnections. After that, an intact multi-order adjacency matrix is attachedfrom the automatic fusion of various orders of adjacency matrices. This processis supervised by the node semantic information, which is extracted from thenode homophily evaluated by attributes. Eventually, we utilize a one-layersimplifying graph convolutional network with the learned multi-order adjacencymatrix, which is equivalent to the cross-hop node information propagation withmulti-layer graph neural networks. Substantial experiments reveal that AMOGCNgains superior semi-supervised classification performance compared withstate-of-the-art competitors.", "output": "Attributed Multi-order Graph Convolutional Network for Heterogeneous Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a framework for developing forecasting models bystreamlining the connections between core components of the developmentalprocess. The proposed framework enables swift and robust integration of newdatasets, experimentation on different algorithms, and selection of the bestmodels. We start with the datasets of different issues and apply pre-processingsteps to clean and engineer meaningful representations of time-series data. Toidentify robust training configurations, we introduce a novel mechanism ofmultiple cross-validation strategies. We apply different evaluation metrics tofind the best-suited models for varying applications. One of the referentapplications is our participation in the intelligent forecasting competitionheld by the United States Agency of International Development (USAID). Finally,we leverage the flexibility of the framework by applying different evaluationmetrics to assess the performance of the models in inventory managementsettings.", "output": "Streamlined Framework for Agile Forecasting Model Development towards Efficient Inventory Management."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The structural re-parameterization (SRP) technique is a novel deep learningtechnique that achieves interconversion between different network architecturesthrough equivalent parameter transformations. This technique enables themitigation of the extra costs for performance improvement during training, suchas parameter size and inference time, through these transformations duringinference, and therefore SRP has great potential for industrial and practicalapplications. The existing SRP methods have successfully considered manycommonly used architectures, such as normalizations, pooling methods,multi-branch convolution. However, the widely used self-attention modulescannot be directly implemented by SRP due to these modules usually act on thebackbone network in a multiplicative manner and the modules' output isinput-dependent during inference, which limits the application scenarios ofSRP. In this paper, we conduct extensive experiments from a statisticalperspective and discover an interesting phenomenon Stripe Observation, whichreveals that channel attention values quickly approach some constant vectorsduring training. This observation inspires us to propose a simple-yet-effectiveattention-alike structural re-parameterization (ASR) that allows us to achieveSRP for a given network while enjoying the effectiveness of the self-attentionmechanism. Extensive experiments conducted on several standard benchmarksdemonstrate the effectiveness of ASR in generally improving the performance ofexisting backbone networks, self-attention modules, and SRP methods without anyelaborated model crafting. We also analyze the limitations and provideexperimental or theoretical evidence for the strong robustness of the proposedASR.", "output": "ASR: Attention-alike Structural Re-parameterization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer is beneficial for image denoising tasks since it can modellong-range dependencies to overcome the limitations presented by inductiveconvolutional biases. However, directly applying the transformer structure toremove noise is challenging because its complexity grows quadratically with thespatial resolution. In this paper, we propose an efficient Dual-branchDeformable Transformer (DDT) denoising network which captures both local andglobal interactions in parallel. We divide features with a fixed patch size anda fixed number of patches in local and global branches, respectively. Inaddition, we apply deformable attention operation in both branches, which helpsthe network focus on more important regions and further reduces computationalcomplexity. We conduct extensive experiments on real-world and syntheticdenoising tasks, and the proposed DDT achieves state-of-the-art performancewith significantly fewer computational costs.", "output": "DDT: Dual-branch Deformable Transformer for Image Denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a generic framework for establishing the decidability of a widerange of logical entailment problems (briefly called querying), based on theexistence of countermodels that are structurally simple, gauged by certaintypes of width measures (with treewidth and cliquewidth as popular examples).As an important special case of our framework, we identify logics exhibitingwidth-finite finitely universal model sets, warranting decidable entailment fora wide range of homomorphism-closed queries, subsuming a diverse set ofpractically relevant query languages. As a particularly powerful width measure,we propose Blumensath's partitionwidth, which subsumes various other commonlyconsidered width measures and exhibits highly favorable computational andstructural properties. Focusing on the formalism of existential rules as apopular showcase, we explain how finite partitionwidth sets of rules subsumeother known abstract decidable classes but -- leveraging existing notions ofstratification -- also cover a wide range of new rulesets. We expose naturallimitations for fitting the class of finite unification sets into our pictureand provide several options for remedy.", "output": "Decidability of Querying First-Order Theories via Countermodels of Finite Width."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Evaluating the general abilities of foundation models to tackle human-leveltasks is a vital aspect of their development and application in the pursuit ofArtificial General Intelligence (AGI). Traditional benchmarks, which rely onartificial datasets, may not accurately represent human-level capabilities. Inthis paper, we introduce AGIEval, a novel benchmark specifically designed toassess foundation model in the context of human-centric standardized exams,such as college entrance exams, law school admission tests, math competitions,and lawyer qualification tests. We evaluate several state-of-the-art foundationmodels, including GPT-4, ChatGPT, and Text-Davinci-003, using this benchmark.Impressively, GPT-4 surpasses average human performance on SAT, LSAT, and mathcompetitions, attaining a 95% accuracy rate on the SAT Math test and a 92.5%accuracy on the English test of the Chinese national college entrance exam.This demonstrates the extraordinary performance of contemporary foundationmodels. In contrast, we also find that GPT-4 is less proficient in tasks thatrequire complex reasoning or specific domain knowledge. Our comprehensiveanalyses of model capabilities (understanding, knowledge, reasoning, andcalculation) reveal these models' strengths and limitations, providing valuableinsights into future directions for enhancing their general capabilities. Byconcentrating on tasks pertinent to human cognition and decision-making, ourbenchmark delivers a more meaningful and robust evaluation of foundationmodels' performance in real-world scenarios. The data, code, and all modeloutputs are released in ", "output": "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exact computation of the partition function is known to be intractable,necessitating approximate inference techniques. Existing methods forapproximate inference are slow to converge for many benchmarks. The control ofaccuracy-complexity trade-off is also non-trivial in many of these methods. Wepropose a novel incremental build-infer-approximate (IBIA) framework forapproximate inference that addresses these issues. In this framework, theprobabilistic graphical model is converted into a sequence of clique treeforests (SCTF) with bounded clique sizes. We show that the SCTF can be used toefficiently compute the partition function. We propose two new algorithms whichare used to construct the SCTF and prove the correctness of both. The first isan algorithm for incremental construction of CTFs that is guaranteed to give avalid CTF with bounded clique sizes and the second is an approximationalgorithm that takes a calibrated CTF as input and yields a valid andcalibrated CTF with reduced clique sizes as the output. We have evaluated ourmethod using several benchmark sets from recent UAI competitions and ourresults show good accuracies with competitive runtimes.", "output": "IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic networks provide a useful tool to understand how related conceptsare retrieved from memory. However, most current network approaches usepairwise links to represent memory recall patterns. Pairwise connectionsneglect higher-order associations, i.e. relationships between more than twoconcepts at a time. These higher-order interactions might covariate with (andthus contain information about) how similar concepts are along psycholinguisticdimensions like arousal, valence, familiarity, gender and others. We overcomethese limits by introducing feature-rich cognitive hypergraphs as quantitativemodels of human memory where: (i) concepts recalled together can all engage inhyperlinks involving also more than two concepts at once (cognitive hypergraphaspect), and (ii) each concept is endowed with a vector of psycholinguisticfeatures (feature-rich aspect). We build hypergraphs from word association dataand use evaluation methods from machine learning features to predict conceptconcreteness. Since concepts with similar concreteness tend to cluster togetherin human memory, we expect to be able to leverage this structure. Using wordassociation data from the Small World of Words dataset, we compared a pairwisenetwork and a hypergraph with N=3586 concepts/nodes. Interpretable artificialintelligence models trained on (1) psycholinguistic features only, (2)pairwise-based feature aggregations, and on (3) hypergraph-based aggregationsshow significant differences between pairwise and hypergraph links.Specifically, our results show that higher-order and feature-rich hypergraphmodels contain richer information than pairwise networks leading to improvedprediction of word concreteness. The relation with previous studies aboutconceptual clustering and compartmentalisation in associative knowledge andhuman memory are discussed.", "output": "Towards hypergraph cognitive networks as feature-rich models of knowledge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Being able to create meaningful symbols and proficiently use them for highercognitive functions such as communication, reasoning, planning, etc., isessential and unique for human intelligence. Current deep neural networks arestill far behind human's ability to create symbols for such higher cognitivefunctions. Here we propose a solution, named SEA-net, to endow neural networkswith ability of symbol creation, semantic understanding and communication.SEA-net generates symbols that dynamically configure the network to performspecific tasks. These symbols capture compositional semantic information thatenables the system to acquire new functions purely by symbolic manipulation orcommunication. In addition, we found that these self-generated symbols exhibitan intrinsic structure resembling that of natural language, suggesting a commonframework underlying the generation and understanding of symbols in both humanbrains and artificial neural networks. We hope that it will be instrumental inproducing more capable systems in the future that can synergize the strengthsof connectionist and symbolic approaches for AI.", "output": "Emergence of Symbols in Neural Networks for Semantic Understanding and Communication."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The lack of interpretability of the Vision Transformer may hinder its use incritical real-world applications despite its effectiveness. To overcome thisissue, we propose a post-hoc interpretability method called VISION DIFFMASK,which uses the activations of the model's hidden layers to predict the relevantparts of the input that contribute to its final predictions. Our approach usesa gating mechanism to identify the minimal subset of the original input thatpreserves the predicted distribution over classes. We demonstrate thefaithfulness of our method, by introducing a faithfulness task, and comparingit to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K,achieving compelling results. To aid reproducibility and further extension ofour work, we open source our implementation:", "output": "VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel fully unsupervised framework that learnsaction representations suitable for the action segmentation task from thesingle input video itself, without requiring any training data. Our method is adeep metric learning approach rooted in a shallow network with a triplet lossoperating on similarity distributions and a novel triplet selection strategythat effectively models temporal and semantic priors to discover actions in thenew representational space. Under these circumstances, we successfully recovertemporal boundaries in the learned action representations with higher qualitycompared with existing unsupervised approaches. The proposed method isevaluated on two widely used benchmark datasets for the action segmentationtask and it achieves competitive performance by applying a generic clusteringalgorithm on the learned representations.", "output": "Leveraging triplet loss for unsupervised action segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Predicting high-fidelity future human poses, from a historically observedsequence, is decisive for intelligent robots to interact with humans. Deepend-to-end learning approaches, which typically train a generic pre-trainedmodel on external datasets and then directly apply it to all test samples,emerge as the dominant solution to solve this issue. Despite encouragingprogress, they remain non-optimal, as the unique properties (e.g., motionstyle, rhythm) of a specific sequence cannot be adapted. More generally, attest-time, once encountering unseen motion categories (out-of-distribution),the predicted poses tend to be unreliable. Motivated by this observation, wepropose a novel test-time adaptation framework that leverages twoself-supervised auxiliary tasks to help the primary forecasting network adaptto the test sequence. In the testing phase, our model can adjust the modelparameters by several gradient updates to improve the generation quality.However, due to catastrophic forgetting, both auxiliary tasks typically tend tothe low ability to automatically present the desired positive incentives forthe final prediction performance. For this reason, we also propose ameta-auxiliary learning scheme for better adaptation. In terms of generalsetup, our approach obtains higher accuracy, and under two new experimentaldesigns for out-of-distribution data (unseen subjects and categories), achievessignificant improvements.", "output": "Meta-Auxiliary Learning for Adaptive Human Pose Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a comprehensive, multi-stage machine learningmethodology that effectively integrates information systems and artificialintelligence to enhance decision-making processes within the domain ofoperations research. The proposed framework adeptly addresses commonlimitations of existing solutions, such as the neglect of data-drivenestimation for vital production parameters, exclusive generation of pointforecasts without considering model uncertainty, and lacking explanationsregarding the sources of such uncertainty. Our approach employs QuantileRegression Forests for generating interval predictions, alongside both localand global variants of SHapley Additive Explanations for the examinedpredictive process monitoring problem. The practical applicability of theproposed methodology is substantiated through a real-world production planningcase study, emphasizing the potential of prescriptive analytics in refiningdecision-making procedures. This paper accentuates the imperative of addressingthese challenges to fully harness the extensive and rich data resourcesaccessible for well-informed decision-making.", "output": "Quantifying and Explaining Machine Learning Uncertainty in Predictive Process Monitoring: An Operations Research Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a systematic investigation into the effectiveness ofSelf-Supervised Learning (SSL) methods for Electrocardiogram (ECG) arrhythmiadetection. We begin by conducting a novel distribution analysis on threepopular ECG-based arrhythmia datasets: PTB-XL, Chapman, and Ribeiro. To thebest of our knowledge, our study is the first to quantify these distributionsin this area. We then perform a comprehensive set of experiments usingdifferent augmentations and parameters to evaluate the effectiveness of variousSSL methods, namely SimCRL, BYOL, and SwAV, for ECG representation learning,where we observe the best performance achieved by SwAV. Furthermore, ouranalysis shows that SSL methods achieve highly competitive results to thoseachieved by supervised state-of-the-art methods. To further assess theperformance of these methods on both In-Distribution (ID) andOut-of-Distribution (OOD) ECG data, we conduct cross-dataset training andtesting experiments. Our comprehensive experiments show almost identicalresults when comparing ID and OOD schemes, indicating that SSL techniques canlearn highly effective representations that generalize well across differentOOD datasets. This finding can have major implications for ECG-based arrhythmiadetection. Lastly, to further analyze our results, we perform detailedper-disease studies on the performance of the SSL methods on the threedatasets.", "output": "In-Distribution and Out-of-Distribution Self-supervised ECG Representation Learning for Arrhythmia Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Certified defense methods against adversarial perturbations have beenrecently investigated in the black-box setting with a zeroth-order (ZO)perspective. However, these methods suffer from high model variance with lowperformance on high-dimensional datasets due to the ineffective design of thedenoiser and are limited in their utilization of ZO techniques. To this end, wepropose a certified ZO preprocessing technique for removing adversarialperturbations from the attacked image in the black-box setting using only modelqueries. We propose a robust UNet denoiser (RDUNet) that ensures the robustnessof black-box models trained on high-dimensional datasets. We propose a novelblack-box denoised smoothing (DS) defense mechanism, ZO-RUDS, by prepending ourRDUNet to the black-box model, ensuring black-box defense. We further proposeZO-AE-RUDS in which RDUNet followed by autoencoder (AE) is prepended to theblack-box model. We perform extensive experiments on four classificationdatasets, CIFAR-10, CIFAR-10, Tiny Imagenet, STL-10, and the MNIST dataset forimage reconstruction tasks. Our proposed defense methods ZO-RUDS and ZO-AE-RUDSbeat SOTA with a huge margin of $35%$ and $9%$, for low dimensional(CIFAR-10) and with a margin of $20.61%$ and $23.51%$ for high-dimensional(STL-10) datasets, respectively.", "output": "Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision transformers have been applied successfully for image recognitiontasks. There have been either multi-headed self-attention based (ViTcite{dosovitskiy2020image}, DeIT, cite{touvron2021training}) similar to theoriginal work in textual models or more recently based on spectral layers(Fnetcite{lee2021fnet}, GFNetcite{rao2021global},AFNOcite{guibas2021efficient}). We hypothesize that both spectral andmulti-headed attention plays a major role. We investigate this hypothesisthrough this work and observe that indeed combining spectral and multi-headedattention layers provides a better transformer architecture. We thus proposethe novel Spectformer architecture for transformers that combines spectral andmulti-headed attention layers. We believe that the resulting representationallows the transformer to capture the feature representation appropriately andit yields improved performance over other transformer representations. Forinstance, it improves the top-1 accuracy by 2% on ImageNet compared to bothGFNet-H and LiT. SpectFormer-S reaches 84.25% top-1 accuracy on ImageNet-1K(state of the art for small version). Further, Spectformer-L achieves 85.7%that is the state of the art for the comparable base version of thetransformers. We further ensure that we obtain reasonable results in otherscenarios such as transfer learning on standard datasets such as CIFAR-10,CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigateits use in downstream tasks such of object detection and instance segmentationon the MS-COCO dataset and observe that Spectformer shows consistentperformance that is comparable to the best backbones and can be furtheroptimized and improved. Hence, we believe that combined spectral and attentionlayers are what are needed for vision transformers.", "output": "SpectFormer: Frequency and Attention is what you need in a Vision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to performmonolingual sentiment classification (sub-task A) for 12 African languages,multilingual sentiment classification (sub-task B), and zero-shot sentimentclassification (task C). For sub-task A, we conducted experiments usingclassical machine learning classifiers, Afro-centric language models, andlanguage-specific models. For task B, we fine-tuned multilingual pre-trainedlanguage models that support many of the languages in the task. For task C, weused we make use of a parameter-efficient Adapter approach that leveragesmonolingual texts in the target language for effective zero-shot transfer. Ourfindings suggest that using pre-trained Afro-centric language models improvesperformance for low-resource African languages. We also ran experiments usingadapters for zero-shot tasks, and the results suggest that we can obtainpromising results by using adapters with a limited amount of resources.", "output": "Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Preserving the individuals' privacy in sharing spatial-temporal datasets iscritical to prevent re-identification attacks based on unique trajectories.Existing privacy techniques tend to propose ideal privacy-utility tradeoffs,however, largely ignore the fairness implications of mobility models andwhether such techniques perform equally for different groups of users. Thequantification between fairness and privacy-aware models is still unclear andthere barely exists any defined sets of metrics for measuring fairness in thespatial-temporal context. In this work, we define a set of fairness metricsdesigned explicitly for human mobility, based on structural similarity andentropy of the trajectories. Under these definitions, we examine the fairnessof two state-of-the-art privacy-preserving models that rely on GAN andrepresentation learning to reduce the re-identification rate of users for datasharing. Our results show that while both models guarantee group fairness interms of demographic parity, they violate individual fairness criteria,indicating that users with highly similar trajectories receive disparateprivacy gain. We conclude that the tension between the re-identification taskand individual fairness needs to be considered for future spatial-temporal dataanalysis and modelling to achieve a privacy-preserving fairness-aware setting.", "output": "Analysing Fairness of Privacy-Utility Mobility Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability of image and video generation models to create photorealisticimages has reached unprecedented heights, making it difficult to distinguishbetween real and fake images in many cases. However, despite this progress, agap remains between the quality of generated images and those found in the realworld. To address this, we have reviewed a vast body of literature from bothacademic publications and social media to identify qualitative shortcomings inimage generation models, which we have classified into five categories. Byunderstanding these failures, we can identify areas where these models needimprovement, as well as develop strategies for detecting deep fakes. Theprevalence of deep fakes in today's society is a serious concern, and ourfindings can help mitigate their negative impact.", "output": "Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sleep abnormalities can have severe health consequences. Automated sleepstaging, i.e. labelling the sequence of sleep stages from the patient'sphysiological recordings, could simplify the diagnostic process. Previous workon automated sleep staging has achieved great results, mainly relying on theEEG signal. However, often multiple sources of information are available beyondEEG. This can be particularly beneficial when the EEG recordings are noisy oreven missing completely. In this paper, we propose CoRe-Sleep, a CoordinatedRepresentation multimodal fusion network that is particularly focused onimproving the robustness of signal analysis on imperfect data. We demonstratehow appropriately handling multimodal information can be the key to achievingsuch robustness. CoRe-Sleep tolerates noisy or missing modalities segments,allowing training on incomplete data. Additionally, it shows state-of-the-artperformance when testing on both multimodal and unimodal data using a singlemodel on SHHS-1, the largest publicly available study that includes sleep stagelabels. The results indicate that training the model on multimodal data doespositively influence performance when tested on unimodal data. This work aimsat bridging the gap between automated analysis tools and their clinicalutility.", "output": "CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which isdemonstrated to be one small step for generative AI (GAI), but one giant leapfor artificial general intelligence (AGI). Since its official release inNovember 2022, ChatGPT has quickly attracted numerous users with extensivemedia coverage. Such unprecedented attention has also motivated numerousresearchers to investigate ChatGPT from various aspects. According to Googlescholar, there are more than 500 articles with ChatGPT in their titles ormentioning it in their abstracts. Considering this, a review is urgentlyneeded, and our work fills this gap. Overall, this work is the first to surveyChatGPT with a comprehensive review of its underlying technology, applications,and challenges. Moreover, we present an outlook on how ChatGPT might evolve torealize general-purpose AIGC (a.k.a. AI-generated content), which will be asignificant milestone for the development of AGI.", "output": "One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning-based wearable human activity recognition (WHAR) modelsenable the development of various smart and connected community applicationssuch as sleep pattern monitoring, medication reminders, cognitive healthassessment, sports analytics, etc. However, the widespread adoption of theseWHAR models is impeded by their degraded performance in the presence of datadistribution heterogeneities caused by the sensor placement at different bodypositions, inherent biases and heterogeneities across devices, and personal andenvironmental diversities. Various traditional machine learning algorithms andtransfer learning techniques have been proposed in the literature to addressthe underpinning challenges of handling such data heterogeneities. Domainadaptation is one such transfer learning techniques that has gained significantpopularity in recent literature. In this paper, we survey the recent progressof domain adaptation techniques in the Inertial Measurement Unit (IMU)-basedhuman activity recognition area, discuss potential future directions.", "output": "Domain Adaptation for Inertial Measurement Unit-based Human Activity Recognition: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the Proposal for a Regulation laying down harmonised rules on ArtificialIntelligence (AI Act) the European Union provides the first regulatory documentthat applies to the entire complex of AI systems. While some fear that theregulation leaves too much room for interpretation and thus bring littlebenefit to society, others expect that the regulation is too restrictive and,thus, blocks progress and innovation, as well as hinders the economic successof companies within the EU. Without a systematic approach, it is difficult toassess how it will actually impact the AI landscape. In this paper, we suggesta systematic approach that we applied on the initial draft of the AI Act thathas been released in April 2021. We went through several iterations ofcompiling the list of AI products and projects in and from Germany, which theLernende Systeme platform lists, and then classified them according to the AIAct together with experts from the fields of computer science and law. Ourstudy shows a need for more concrete formulation, since for some provisions itis often unclear whether they are applicable in a specific case or not. Apartfrom that, it turns out that only about 30% of the AI systems considered wouldbe regulated by the AI Act, the rest would be classified as low-risk. However,as the database is not representative, the results only provide a firstassessment. The process presented can be applied to any collections, and alsorepeated when regulations are about to change. This allows fears of over- orunder-regulation to be investigated before the regulations comes into effect.", "output": "Quantitative study about the estimated impact of the AI Act."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Objective digital data is scarce yet needed in many domains to enableresearch that can transform the standard of healthcare. While data fromconsumer-grade wearables and smartphones is more accessible, there is criticalneed for similar data from clinical-grade devices used by patients with adiagnosed condition. The prevalence of wearable medical devices in the diabetesdomain sets the stage for unique research and development within this field andbeyond. However, the scarcity of open-source datasets presents a major barrierto progress. To facilitate broader research on diabetes-relevant problems andaccelerate development of robust computational solutions, we provide theDiaTrend dataset. The DiaTrend dataset is composed of intensive longitudinaldata from wearable medical devices, including a total of 27,561 days ofcontinuous glucose monitor data and 8,220 days of insulin pump data from 54patients with diabetes. This dataset is useful for developing novel analyticsolutions that can reduce the disease burden for people living with diabetesand increase knowledge on chronic condition management in outpatient settings.", "output": "DiaTrend: A dataset from advanced diabetes technology to enable development of novel analytic solutions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This chapter presents some of the fundamental assumptions and principles thatcould form the philosophical foundation of GeoAI and spatial data science.Instead of reviewing the well-established characteristics of spatial data(analysis), including interaction, neighborhoods, and autocorrelation, thechapter highlights themes such as sustainability, bias in training data,diversity in schema knowledge, and the (potential lack of) neutrality of GeoAIsystems from a unifying ethical perspective. Reflecting on our profession'sethical implications will assist us in conducting potentially disruptiveresearch more responsibly, identifying pitfalls in designing, training, anddeploying GeoAI-based systems, and developing a shared understanding of thebenefits but also potential dangers of artificial intelligence and machinelearning research across academic fields, all while sharing our unique(geo)spatial perspective with others.", "output": "Philosophical Foundations of GeoAI: Exploring Sustainability, Diversity, and Bias in GeoAI and Spatial Data Science."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Passive radio frequency (PRF)-based indoor positioning systems (IPS) haveattracted researchers' attention due to their low price, easy and customizableconfiguration, and non-invasive design. This paper proposes a PRF-basedthree-dimensional (3D) indoor positioning system (PIPS), which is able to usesignals of opportunity (SoOP) for positioning and also capture a scenariosignature. PIPS passively monitors SoOPs containing scenario signatures througha single receiver. Moreover, PIPS leverages the Dynamic Data DrivenApplications System (DDDAS) framework to devise and customize the samplingfrequency, enabling the system to use the most impacted frequency band as therated frequency band. Various regression methods within three ensemble learningstrategies are used to train and predict the receiver position. The PRFspectrum of 60 positions is collected in the experimental scenario, and threecriteria are applied to evaluate the performance of PIPS. Experimental resultsshow that the proposed PIPS possesses the advantages of high accuracy,configurability, and robustness.", "output": "Passive Radio Frequency-based 3D Indoor Positioning System via Ensemble Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper considers reliable and secure Spectrum Sensing (SS) based onFederated Learning (FL) in the Cognitive Radio (CR) environment. Motivation,architectures, and algorithms of FL in SS are discussed. Security and privacythreats on these algorithms are overviewed, along with possible countermeasuresto such attacks. Some illustrative examples are also provided, with designrecommendations for FL-based SS in future CRs.", "output": "Secure Federated Learning for Cognitive Radio Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Correct identification and categorization of plant diseases are crucial forensuring the safety of the global food supply and the overall financial successof stakeholders. In this regard, a wide range of solutions has been madeavailable by introducing deep learning-based classification systems fordifferent staple crops. Despite being one of the most important commercialcrops in many parts of the globe, research proposing a smart solution forautomatically classifying apple leaf diseases remains relatively unexplored.This study presents a technique for identifying apple leaf diseases based ontransfer learning. The system extracts features using a pretrainedEfficientNetV2S architecture and passes to a classifier block for effectiveprediction. The class imbalance issues are tackled by utilizing runtime dataaugmentation. The effect of various hyperparameters, such as input resolution,learning rate, number of epochs, etc., has been investigated carefully. Thecompetence of the proposed pipeline has been evaluated on the apple leafdisease subset from the publicly available `PlantVillage' dataset, where itachieved an accuracy of 99.21%, outperforming the existing works.", "output": "An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Power-seeking behavior is a key source of risk from advanced AI, but ourtheoretical understanding of this phenomenon is relatively limited. Building onexisting theoretical results demonstrating power-seeking incentives for mostreward functions, we investigate how the training process affects power-seekingincentives and show that they are still likely to hold for trained agents undersome simplifying assumptions. We formally define the training-compatible goalset (the set of goals consistent with the training rewards) and assume that thetrained agent learns a goal from this set. In a setting where the trained agentfaces a choice to shut down or avoid shutdown in a new situation, we prove thatthe agent is likely to avoid shutdown. Thus, we show that power-seekingincentives can be probable (likely to arise for trained agents) and predictive(allowing us to predict undesirable behavior in new situations).", "output": "Power-seeking can be probable and predictive for trained agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper assesses the accuracy, reliability and bias of the Large LanguageModel (LLM) ChatGPT-4 on the text analysis task of classifying the politicalaffiliation of a Twitter poster based on the content of a tweet. The LLM iscompared to manual annotation by both expert classifiers and crowd workers,generally considered the gold standard for such tasks. We use Twitter messagesfrom United States politicians during the 2020 election, providing a groundtruth against which to measure accuracy. The paper finds that ChatGPT-4 hasachieves higher accuracy, higher reliability, and equal or lower bias than thehuman classifiers. The LLM is able to correctly annotate messages that requirereasoning on the basis of contextual knowledge, and inferences around theauthor's intentions - traditionally seen as uniquely human abilities. Thesefindings suggest that LLM will have substantial impact on the use of textualdata in the social sciences, by enabling interpretive research at a scale.", "output": "ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms play an important role in a variety of importantdecision-making processes, including targeted advertisement displays, home loanapprovals, and criminal behavior predictions. Given the far-reaching impact ofthese algorithms, it is crucial that they operate fairly, free from bias orprejudice towards certain groups in the population. Ensuring impartiality inthese algorithms is essential for promoting equality and avoidingdiscrimination. To this end we introduce a unified framework for randomizedsubset selection that incorporates group fairness constraints. Our probleminvolves a global utility function and a set of group utility functions foreach group, here a group refers to a group of individuals (e.g., people)sharing the same attributes (e.g., gender). Our aim is to generate adistribution across feasible subsets, specifying the selection probability ofeach feasible set, to maximize the global utility function while meeting apredetermined quota for each group utility function in expectation. Note thatthere may not necessarily be any direct connections between the global utilityfunction and each group utility function. We demonstrate that this frameworkunifies and generalizes many significant applications in machine learning andoperations research. Our algorithmic results either improves the best knownresult or provide the first approximation algorithms for new applications.", "output": "Beyond Submodularity: A Unified Framework of Randomized Set Selection with Group Fairness Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural network (DNN) models are valuable intellectual property of modelowners, constituting a competitive advantage. Therefore, it is crucial todevelop techniques to protect against model theft. Model ownership resolution(MOR) is a class of techniques that can deter model theft. A MOR scheme enablesan accuser to assert an ownership claim for a suspect model by presentingevidence, such as a watermark or fingerprint, to show that the suspect modelwas stolen or derived from a source model owned by the accuser. Most of theexisting MOR schemes prioritize robustness against malicious suspects, ensuringthat the accuser will win if the suspect model is indeed a stolen model.In this paper, we show that common MOR schemes in the literature arevulnerable to a different, equally important but insufficiently explored,robustness concern: a malicious accuser. We show how malicious accusers cansuccessfully make false claims against independent suspect models that were notstolen. Our core idea is that a malicious accuser can deviate (withoutdetection) from the specified MOR process by finding (transferable) adversarialexamples that successfully serve as evidence against independent suspectmodels. To this end, we first generalize the procedures of common MOR schemesand show that, under this generalization, defending against false claims is aschallenging as preventing (transferable) adversarial examples. Via systematicempirical evaluation we demonstrate that our false claim attacks always succeedin all prominent MOR schemes with realistic configurations, including against areal-world model: Amazon's Rekognition API.", "output": "False Claims against Model Ownership Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To address the challenges of digital intelligence in the digital economy,artificial intelligence-generated content (AIGC) has emerged. AIGC usesartificial intelligence to assist or replace manual content generation bygenerating content based on user-inputted keywords or requirements. Thedevelopment of large model algorithms has significantly strengthened thecapabilities of AIGC, which makes AIGC products a promising generative tool andadds convenience to our lives. As an upstream technology, AIGC has unlimitedpotential to support different downstream applications. It is important toanalyze AIGC's current capabilities and shortcomings to understand how it canbe best utilized in future applications. Therefore, this paper provides anextensive overview of AIGC, covering its definition, essential conditions,cutting-edge capabilities, and advanced features. Moreover, it discusses thebenefits of large-scale pre-trained models and the industrial chain of AIGC.Furthermore, the article explores the distinctions between auxiliary generationand automatic generation within AIGC, providing examples of text generation.The paper also examines the potential integration of AIGC with the Metaverse.Lastly, the article highlights existing issues and suggests some futuredirections for application.", "output": "AI-Generated Content (AIGC): A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent approaches have attempted to personalize dialogue systems byleveraging profile information into models. However, this knowledge is scarceand difficult to obtain, which makes the extraction/generation of profileinformation from dialogues a fundamental asset. To surpass this limitation, weintroduce the Profile Generation Task (PGTask). We contribute with a newdataset for this problem, comprising profile sentences aligned with relatedutterances, extracted from a corpus of dialogues. Furthermore, usingstate-of-the-art methods, we provide a benchmark for profile generation on thisnovel dataset. Our experiments disclose the challenges of profile generation,and we hope that this introduces a new research direction.", "output": "PGTask: Introducing the Task of Profile Generation from Dialogues."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Controllable text generation (CTG) by large language models has a hugepotential to transform education for teachers and students alike. Specifically,high quality and diverse question generation can dramatically reduce the loadon teachers and improve the quality of their educational content. Recent workin this domain has made progress with generation, but fails to show that realteachers judge the generated questions as sufficiently useful for the classroomsetting; or if instead the questions have errors and/or pedagogically unhelpfulcontent. We conduct a human evaluation with teachers to assess the quality andusefulness of outputs from combining CTG and question taxonomies (Bloom's and adifficulty taxonomy). The results demonstrate that the questions generated arehigh quality and sufficiently useful, showing their promise for widespread usein the classroom setting.", "output": "How Useful are Educational Questions Generated by Large Language Models?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The remarkable performance of overparameterized deep neural networks (DNNs)must arise from an interplay between network architecture, training algorithms,and structure in the data. To disentangle these three components, we apply aBayesian picture, based on the functions expressed by a DNN, to supervisedlearning. The prior over functions is determined by the network, and is variedby exploiting a transition between ordered and chaotic regimes. For Booleanfunction classification, we approximate the likelihood using the error spectrumof functions on data. When combined with the prior, this accurately predictsthe posterior, measured for DNNs trained with stochastic gradient descent. Thisanalysis reveals that structured data, combined with an intrinsic Occam'srazor-like inductive bias towards (Kolmogorov) simple functions that is strongenough to counteract the exponential growth of the number of functions withcomplexity, is a key to the success of DNNs.", "output": "Do deep neural networks have an inbuilt Occam's razor?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial control is a core capability in controllable image generation.Advancements in layout-guided image generation have shown promising results onin-distribution (ID) datasets with similar spatial configurations. However, itis unclear how these models perform when facing out-of-distribution (OOD)samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench,a diagnostic benchmark for layout-guided image generation that examines fourcategories of spatial control skills: number, position, size, and shape. Webenchmark two recent representative layout-guided image generation methods andobserve that the good ID layout control may not generalize well to arbitrarylayouts in the wild (e.g., objects at the boundary). Next, we proposeIterInpaint, a new baseline that generates foreground and background regions ina step-by-step manner via inpainting, demonstrating stronger generalizabilitythan existing models on OOD layouts in LayoutBench. We perform quantitative andqualitative evaluation and fine-grained analysis on the four LayoutBench skillsto pinpoint the weaknesses of existing models. Lastly, we show comprehensiveablation studies on IterInpaint, including training task ratio, crop&amp;paste vs.repaint, and generation order. Project website: ", "output": "Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) techniques seek to learn the underlying patterns indata using fewer samples, analogous to how humans learn from limitedexperience. In this limited-data scenario, the challenges associated with deepneural networks, such as shortcut learning and texture bias behaviors, arefurther exacerbated. Moreover, the significance of addressing shortcut learningis not yet fully explored in the few-shot setup. To address these issues, wepropose LSFSL, which enforces the model to learn more generalizable featuresutilizing the implicit prior information present in the data. Throughcomprehensive analyses, we demonstrate that LSFSL-trained models are lessvulnerable to alteration in color schemes, statistical correlations, andadversarial perturbations leveraging the global semantics in the data. Ourfindings highlight the potential of incorporating relevant priors in few-shotapproaches to increase robustness and generalization.", "output": "LSFSL: Leveraging Shape Information in Few-shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Individual human decision-makers may benefit from different forms of supportto improve decision outcomes. However, a key question is which form of supportwill lead to accurate decisions at a low cost. In this work, we proposelearning a decision support policy that, for a given input, chooses which formof support, if any, to provide. We consider decision-makers for whom we have noprior information and formalize learning their respective policies as amulti-objective optimization problem that trades off accuracy and cost. Usingtechniques from stochastic contextual bandits, we propose $texttt{THREAD}$, anonline algorithm to personalize a decision support policy for eachdecision-maker, and devise a hyper-parameter tuning strategy to identify acost-performance trade-off using simulated human behavior. We providecomputational experiments to demonstrate the benefits of $texttt{THREAD}$compared to offline baselines. We then introduce $texttt{Modiste}$, aninteractive tool that provides $texttt{THREAD}$ with an interface. We conducthuman subject experiments to show how $texttt{Modiste}$ learns policiespersonalized to each decision-maker and discuss the nuances of learningdecision support policies online for real users.", "output": "Learning Personalized Decision Support Policies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method to estimate the mechanical parameters of fabrics using acasual capture setup with a depth camera. Our approach enables to createmechanically-correct digital representations of real-world textile materials,which is a fundamental step for many interactive design and engineeringapplications. As opposed to existing capture methods, which typically requireexpensive setups, video sequences, or manual intervention, our solution cancapture at scale, is agnostic to the optical appearance of the textile, andfacilitates fabric arrangement by non-expert operators. To this end, we proposea sim-to-real strategy to train a learning-based framework that can take asinput one or multiple images and outputs a full set of mechanical parameters.Thanks to carefully designed data augmentation and transfer learning protocols,our solution generalizes to real images despite being trained only on syntheticdata, hence successfully closing the sim-to-real loop.Key in our work is todemonstrate that evaluating the regression accuracy based on the similarity atparameter space leads to an inaccurate distances that do not match the humanperception. To overcome this, we propose a novel metric for fabric drapesimilarity that operates on the image domain instead on the parameter space,allowing us to evaluate our estimation within the context of a similarity rank.We show that out metric correlates with human judgments about the perception ofdrape similarity, and that our model predictions produce perceptually accurateresults compared to the ground truth parameters.", "output": "How Will It Drape Like? Capturing Fabric Mechanics from Depth Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding verbs is crucial to modelling how people and objects interactwith each other and the environment through space and time. Recently,state-of-the-art video-language models based on CLIP have been shown to havelimited verb understanding and to rely extensively on nouns, restricting theirperformance in real-world video applications that require action and temporalunderstanding. In this work, we improve verb understanding for CLIP-basedvideo-language models by proposing a new Verb-Focused Contrastive (VFC)framework. This consists of two main components: (1) leveraging pretrainedlarge language models (LLMs) to create hard negatives for cross-modalcontrastive learning, together with a calibration strategy to balance theoccurrence of concepts in positive and negative pairs; and (2) enforcing afine-grained, verb phrase alignment loss. Our method achieves state-of-the-artresults for zero-shot performance on three downstream tasks that focus on verbunderstanding: video-text matching, video question-answering and videoclassification. To the best of our knowledge, this is the first work whichproposes a method to alleviate the verb understanding problem, and does notsimply highlight it.", "output": "Verbs in Action: Improving verb understanding in video-language models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretability methods are valuable only if their explanations faithfullydescribe the explained model. In this work, we consider neural networks whosepredictions are invariant under a specific symmetry group. This includespopular architectures, ranging from convolutional to graph neural networks. Anyexplanation that faithfully explains this type of model needs to be inagreement with this invariance property. We formalize this intuition throughthe notion of explanation invariance and equivariance by leveraging theformalism from geometric deep learning. Through this rigorous formalism, wederive (1) two metrics to measure the robustness of any interpretability methodwith respect to the model symmetry group; (2) theoretical robustness guaranteesfor some popular interpretability methods and (3) a systematic approach toincrease the invariance of any interpretability method with respect to asymmetry group. By empirically measuring our metrics for explanations of modelsassociated with various modalities and symmetry groups, we derive a set of 5guidelines to allow users and developers of interpretability methods to producerobust explanations.", "output": "Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Areas under ROC (AUROC) and precision-recall curves (AUPRC) are commonmetrics for evaluating classification performance for imbalanced problems.Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanceddatasets. While stochastic optimization of AUROC has been studied extensively,principled stochastic optimization of AUPRC has been rarely explored. In thiswork, we propose a principled technical method to optimize AUPRC for deeplearning. Our approach is based on maximizing the averaged precision (AP),which is an unbiased point estimator of AUPRC. We cast the objective into a sumof {it dependent compositional functions} with inner functions dependent onrandom variables of the outer level. We propose efficient adaptive andnon-adaptive stochastic algorithms named SOAP with {it provable convergenceguarantee under mild conditions} by leveraging recent advances in stochasticcompositional optimization. Extensive experimental results on image and graphdatasets demonstrate that our proposed method outperforms prior methods onimbalanced problems in terms of AUPRC. To the best of our knowledge, our workrepresents the first attempt to optimize AUPRC with provable convergence. TheSOAP has been implemented in the libAUC library at~url{", "output": "Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As consensus across the various published AI ethics principles is approached,a gap remains between high-level principles and practical techniques that canbe readily adopted to design and develop responsible AI systems. We examine thepractices and experiences of researchers and engineers from Australia'snational scientific research agency (CSIRO), who are involved in designing anddeveloping AI systems for many application areas. Semi-structured interviewswere used to examine how the practices of the participants relate to and alignwith a set of high-level AI ethics principles proposed by the AustralianGovernment. The principles comprise: (1) privacy protection and security, (2)reliability and safety, (3) transparency and explainability, (4) fairness, (5)contestability, (6) accountability, (7) human-centred values, (8) human, socialand environmental wellbeing. Discussions on the gained insights from theinterviews include various tensions and trade-offs between the principles, andprovide suggestions for implementing each high-level principle. We also presentsuggestions aiming to enhance associated support mechanisms.", "output": "AI Ethics Principles in Practice: Perspectives of Designers and Developers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Natural language processing (NLP) aims at investigating the interactionsbetween agents and humans, processing and analyzing large amounts of naturallanguage data. Large-scale language models play an important role in currentnatural language processing. However, the challenges of explainability andcomplexity come along with the developments of language models. One way is tointroduce logical relations and rules into natural language processing models,such as making use of Automated Planning. Automated planning (AI planning)focuses on building symbolic domain models and synthesizing plans to transitinitial states to goals based on domain models. Recently, there have beenplenty of works related to these two fields, which have the abilities togenerate explicit knowledge, e.g., preconditions and effects of action models,and learn from tacit knowledge, e.g., neural models, respectively. IntegratingAI planning and natural language processing effectively improves thecommunication between human and intelligent agents. This paper outlines thecommons and relations between AI planning and natural language processing,argues that each of them can effectively impact on the other one by five areas:(1) planning-based text understanding, (2) planning-based natural languageprocessing, (3) planning-based explainability, (4) text-based human-robotinteraction, and (5) applications. We also explore some potential future issuesbetween AI planning and natural language processing. To the best of ourknowledge, this survey is the first work that addresses the deep connectionsbetween AI planning and Natural language processing.", "output": "Integrating AI Planning with Natural Language Processing: A Combination of Explicit and Tacit Knowledge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When cast into the Deep Reinforcement Learning framework, many robotics tasksrequire solving a long horizon and sparse reward problem, where learningalgorithms struggle. In such context, Imitation Learning (IL) can be a powerfulapproach to bootstrap the learning process. However, most IL methods requireseveral expert demonstrations which can be prohibitively difficult to acquire.Only a handful of IL algorithms have shown efficiency in the context of anextreme low expert data regime where a single expert demonstration isavailable. In this paper, we present a novel algorithm designed to imitatecomplex robotic tasks from the states of an expert trajectory. Based on asequential inductive bias, our method divides the complex task into smallerskills. The skills are learned into a goal-conditioned policy that is able tosolve each skill individually and chain skills to solve the entire task. Weshow that our method imitates a non-holonomic navigation task and scales to acomplex simulated robotic manipulation task with very high sample efficiency.", "output": "Divide & Conquer Imitation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A significant gap remains between today's visual pattern recognition modelsand human-level visual cognition especially when it comes to few-shot learningand compositional reasoning of novel concepts. We introduce Bongard-HOI, a newvisual reasoning benchmark that focuses on compositional learning ofhuman-object interactions (HOIs) from natural images. It is inspired by twodesirable characteristics from the classical Bongard problems (BPs): 1)few-shot concept learning, and 2) context-dependent reasoning. We carefullycurate the few-shot instances with hard negatives, where positive and negativeimages only disagree on action labels, making mere recognition of objectcategories insufficient to complete our benchmarks. We also design multipletest sets to systematically study the generalization of visual learning models,where we vary the overlap of the HOI concepts between the training and testsets of few-shot instances, from partial to no overlaps. Bongard-HOI presents asubstantial challenge to today's visual recognition models. Thestate-of-the-art HOI detection model achieves only 62% accuracy on few-shotbinary prediction while even amateur human testers on MTurk have 91% accuracy.With the Bongard-HOI benchmark, we hope to further advance research efforts invisual reasoning, especially in holistic perception-reasoning systems andbetter representation learning.", "output": "Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The most popular methods for measuring importance of the variables in a blackbox prediction algorithm make use of synthetic inputs that combine predictorvariables from multiple subjects. These inputs can be unlikely, physicallyimpossible, or even logically impossible. As a result, the predictions for suchcases can be based on data very unlike any the black box was trained on. Wethink that users cannot trust an explanation of the decision of a predictionalgorithm when the explanation uses such values. Instead we advocate a methodcalled Cohort Shapley that is grounded in economic game theory and unlike mostother game theoretic methods, it uses only actually observed data to quantifyvariable importance. Cohort Shapley works by narrowing the cohort of subjectsjudged to be similar to a target subject on one or more features. We illustrateit on an algorithmic fairness problem where it is essential to attributeimportance to protected variables that the model was not trained on.", "output": "Variable importance without impossible data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Incorporating personal preference is crucial in advanced machine translationtasks. Despite the recent advancement of machine translation, it remains ademanding task to properly reflect personal style. In this paper, we introducea personalized automatic post-editing framework to address this challenge,which effectively generates sentences considering distinct personal behaviors.To build this framework, we first collect post-editing data that connotes theuser preference from a live machine translation system. Specifically,real-world users enter source sentences for translation and edit themachine-translated outputs according to the user's preferred style. We thenpropose a model that combines a discriminator module and user-specificparameters on the APE framework. Experimental results show that the proposedmethod outperforms other baseline models on four different metrics (i.e., BLEU,TER, YiSi-1, and human evaluation).", "output": "PePe: Personalized Post-editing Model utilizing User-generated Post-edits."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How can we design protein sequences folding into the desired structureseffectively and efficiently? AI methods for structure-based protein design haveattracted increasing attention in recent years; however, few methods cansimultaneously improve the accuracy and efficiency due to the lack ofexpressive features and autoregressive sequence decoder. To address theseissues, we propose PiFold, which contains a novel residue featurizer and PiGNNlayers to generate protein sequences in a one-shot way with improved recovery.Experiments show that PiFold could achieve 51.66% recovery on CATH 4.2, whilethe inference speed is 70 times faster than the autoregressive competitors. Inaddition, PiFold achieves 58.72% and 60.42% recovery scores on TS50 andTS500, respectively. We conduct comprehensive ablation studies to reveal therole of different types of protein features and model designs, inspiringfurther simplification and improvement. The PyTorch code is available athref{", "output": "PiFold: Toward effective and efficient protein inverse folding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian neural networks (BNNs) have received an increased interest in thelast years. In BNNs, a complete posterior distribution of the unknown weightand bias parameters of the network is produced during the training stage. Thisprobabilistic estimation offers several advantages with respect to point-wiseestimates, in particular, the ability to provide uncertainty quantificationwhen predicting new data. This feature inherent to the Bayesian paradigm, isuseful in countless machine learning applications. It is particularly appealingin areas where decision-making has a crucial impact, such as medical healthcareor autonomous driving. The main challenge of BNNs is the computational cost ofthe training procedure since Bayesian techniques often face a severe curse ofdimensionality. Adaptive importance sampling (AIS) is one of the most prominentMonte Carlo methodologies benefiting from sounded convergence guarantees andease for adaptation. This work aims to show that AIS constitutes a successfulapproach for designing BNNs. More precisely, we propose a novel algorithmPMCnet that includes an efficient adaptation mechanism, exploiting geometricinformation on the complex (often multimodal) posterior distribution. Numericalresults illustrate the excellent performance and the improved explorationcapabilities of the proposed method for both shallow and deep neural networks.", "output": "Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifier-free guided diffusion models have recently been shown to be highlyeffective at high-resolution image generation, and they have been widely usedin large-scale diffusion frameworks including DALLE-2, Stable Diffusion andImagen. However, a downside of classifier-free guided diffusion models is thatthey are computationally expensive at inference time since they requireevaluating two diffusion models, a class-conditional model and an unconditionalmodel, tens to hundreds of times. To deal with this limitation, we propose anapproach to distilling classifier-free guided diffusion models into models thatare fast to sample from: Given a pre-trained classifier-free guided model, wefirst learn a single model to match the output of the combined conditional andunconditional models, and then we progressively distill that model to adiffusion model that requires much fewer sampling steps. For standard diffusionmodels trained on the pixel-space, our approach is able to generate imagesvisually comparable to that of the original model using as few as 4 samplingsteps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable tothat of the original model while being up to 256 times faster to sample from.For diffusion models trained on the latent-space (e.g., Stable Diffusion), ourapproach is able to generate high-fidelity images using as few as 1 to 4denoising steps, accelerating inference by at least 10-fold compared toexisting methods on ImageNet 256x256 and LAION datasets. We further demonstratethe effectiveness of our approach on text-guided image editing and inpainting,where our distilled model is able to generate high-quality results using as fewas 2-4 denoising steps.", "output": "On Distillation of Guided Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new task to benchmark scene understanding of embodied agents:Situated Question Answering in 3D Scenes (SQA3D). Given a scene context (e.g.,3D scan), SQA3D requires the tested agent to first understand its situation(position, orientation, etc.) in the 3D scene as described by text, then reasonabout its surrounding environment and answer a question under that situation.Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8kunique situations, along with 20.4k descriptions and 33.4k diverse reasoningquestions for these situations. These questions examine a wide spectrum ofreasoning capabilities for an intelligent agent, ranging from spatial relationcomprehension to commonsense understanding, navigation, and multi-hopreasoning. SQA3D imposes a significant challenge to current multi-modalespecially 3D reasoning models. We evaluate various state-of-the-art approachesand find that the best one only achieves an overall score of 47.20%, whileamateur human participants can reach 90.06%. We believe SQA3D could facilitatefuture embodied AI research with stronger situation understanding and reasoningcapability.", "output": "SQA3D: Situated Question Answering in 3D Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of mobility-on-demand services, rich transportation datasources, and autonomous vehicles (AVs) creates significant opportunities forshared-use AV mobility services (SAMSs) to provide accessible anddemand-responsive personal mobility. SAMS fleet operation involves multipleinterrelated decisions, with a primary focus on efficiently fulfillingpassenger ride requests with a high level of service quality. This paperfocuses on improving the efficiency and service quality of a SAMS vehicle fleetvia anticipatory repositioning of idle vehicles. The rebalancing problem isformulated as a Markov Decision Process, which we propose solving using anadvantage actor critic (A2C) reinforcement learning-based method. The proposedapproach learns a rebalancing policy that anticipates future demand andcooperates with an optimization-based assignment strategy. The approach allowsfor centralized repositioning decisions and can handle large vehicle fleetssince the problem size does not change with the fleet size. Using New York Citytaxi data and an agent-based simulation tool, two versions of the A2C AVrepositioning approach are tested. The first version, A2C-AVR(A), learns toanticipate future demand based on past observations, while the second,A2C-AVR(B), uses demand forecasts. The models are compared to anoptimization-based rebalancing approach and show significant reduction in meanpassenger waiting times, with a slightly increased percentage of empty fleetmiles travelled. The experiments demonstrate the model's ability to anticipatefuture demand and its transferability to cases unseen at the training stage.", "output": "Anticipatory Fleet Repositioning for Shared-use Autonomous Mobility Services: An Optimization and Learning-Based Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning (ML) enabled classification models are becoming increasinglypopular for tackling the sheer volume and speed of online misinformation andother content that could be identified as harmful. In building these models,data scientists need to take a stance on the legitimacy, authoritativeness andobjectivity of the sources of ``truth\" used for model training and testing.This has political, ethical and epistemic implications which are rarelyaddressed in technical papers. Despite (and due to) their reported highaccuracy and performance, ML-driven moderation systems have the potential toshape online public debate and create downstream negative impacts such as unduecensorship and the reinforcing of false beliefs. Using collaborativeethnography and theoretical insights from social studies of science andexpertise, we offer a critical analysis of the process of building ML modelsfor (mis)information classification: we identify a series of algorithmiccontingencies--key moments during model development that could lead todifferent future outcomes, uncertainty and harmful effects as these tools aredeployed by social media platforms. We conclude by offering a tentative pathtoward reflexive and responsible development of ML tools for moderatingmisinformation and other harmful content online.", "output": "Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of iterative machine teaching, where a teachersequentially provides examples based on the status of a learner under adiscrete input space (i.e., a pool of finite samples), which greatly limits theteacher's capability. To address this issue, we study iterative teaching undera continuous input space where the input example (i.e., image) can be eithergenerated by solving an optimization problem or drawn directly from acontinuous distribution. Specifically, we propose data hallucination teaching(DHT) where the teacher can generate input data intelligently based on labels,the learner's status and the target concept. We study a number of challengingteaching setups (e.g., linear/neural learners in omniscient and black-boxsettings). Extensive empirical results verify the effectiveness of DHT.", "output": "Iterative Teaching by Data Hallucination."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the rapid advancements in recent years, medical image analysis islargely dominated by deep learning (DL). However, building powerful and robustDL models requires training with large multi-party datasets. While multiplestakeholders have provided publicly available datasets, the ways in which thesedata are labeled vary widely. For Instance, an institution might provide adataset of chest radiographs containing labels denoting the presence ofpneumonia, while another institution might have a focus on determining thepresence of metastases in the lung. Training a single AI model utilizing allthese data is not feasible with conventional federated learning (FL). Thisprompts us to propose an extension to the widespread FL process, namelyflexible federated learning (FFL) for collaborative training on such data.Using 695,000 chest radiographs from five institutions from across the globe -each with differing labels - we demonstrate that having heterogeneously labeleddatasets, FFL-based training leads to significant performance increase comparedto conventional FL training, where only the uniformly annotated images areutilized. We believe that our proposed algorithm could accelerate the processof bringing collaborative training methods from research and simulation phaseto the real-world applications in healthcare.", "output": "Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is a widely used technique in machine learning to improvemodel performance. However, existing data augmentation techniques in naturallanguage understanding (NLU) may not fully capture the complexity of naturallanguage variations, and they can be challenging to apply to large datasets.This paper proposes the Random Position Noise (RPN) algorithm, a novel dataaugmentation technique that operates at the word vector level. RPN modifies theword embeddings of the original text by introducing noise based on the existingvalues of selected word vectors, allowing for more fine-grained modificationsand better capturing natural language variations. Unlike traditional dataaugmentation methods, RPN does not require gradients in the computational graphduring virtual sample updates, making it simpler to apply to large datasets.Experimental results demonstrate that RPN consistently outperforms existingdata augmentation techniques across various NLU tasks, including sentimentanalysis, natural language inference, and paraphrase detection. Moreover, RPNperforms well in low-resource settings and is applicable to any model featuringa word embeddings layer. The proposed RPN algorithm is a promising approach forenhancing NLU performance and addressing the challenges associated withtraditional data augmentation techniques in large-scale NLU tasks. Ourexperimental results demonstrated that the RPN algorithm achievedstate-of-the-art performance in all seven NLU tasks, thereby highlighting itseffectiveness and potential for real-world NLU applications.", "output": "RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI-based code generators are an emerging solution for automatically writingprograms starting from descriptions in natural language, by using deep neuralnetworks (Neural Machine Translation, NMT). In particular, code generators havebeen used for ethical hacking and offensive security testing by generatingproof-of-concept attacks. Unfortunately, the evaluation of code generatorsstill faces several issues. The current practice uses output similaritymetrics, i.e., automatic metrics that compute the textual similarity ofgenerated code with ground-truth references. However, it is not clear whatmetric to use, and which metric is most suitable for specific contexts. Thiswork analyzes a large set of output similarity metrics on offensive codegenerators. We apply the metrics on two state-of-the-art NMT models using twodatasets containing offensive assembly and Python code with their descriptionsin the English language. We compare the estimates from the automatic metricswith human evaluation and provide practical insights into their strengths andlimitations.", "output": "Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-guided image editing can have a transformative impact in supportingcreative applications. A key challenge is to generate edits that are faithfulto input text prompts, while consistent with input images. We present ImagenEditor, a cascaded diffusion model built, by fine-tuning Imagen on text-guidedimage inpainting. Imagen Editor's edits are faithful to the text prompts, whichis accomplished by using object detectors to propose inpainting masks duringtraining. In addition, Imagen Editor captures fine details in the input imageby conditioning the cascaded pipeline on the original high resolution image. Toimprove qualitative and quantitative evaluation, we introduce EditBench, asystematic benchmark for text-guided image inpainting. EditBench evaluatesinpainting edits on natural and generated images exploring objects, attributes,and scenes. Through extensive human evaluation on EditBench, we find thatobject-masking during training leads to across-the-board improvements intext-image alignment -- such that Imagen Editor is preferred over DALL-E 2 andStable Diffusion -- and, as a cohort, these models are better atobject-rendering than text-rendering, and handle material/color/size attributesbetter than count/shape attributes.", "output": "Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper demonstrates an approach for learning highly semantic imagerepresentations without relying on hand-crafted data-augmentations. Weintroduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), anon-generative approach for self-supervised learning from images. The ideabehind I-JEPA is simple: from a single context block, predict therepresentations of various target blocks in the same image. A core designchoice to guide I-JEPA towards producing semantic representations is themasking strategy; specifically, it is crucial to (a) sample target blocks withsufficiently large scale (semantic), and to (b) use a sufficiently informative(spatially distributed) context block. Empirically, when combined with VisionTransformers, we find I-JEPA to be highly scalable. For instance, we train aViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strongdownstream performance across a wide range of tasks, from linear classificationto object counting and depth prediction.", "output": "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI advice is becoming increasingly popular, e.g., in investment and medicaltreatment decisions. As this advice is typically imperfect, decision-makershave to exert discretion as to whether actually follow that advice: they haveto \"appropriately\" rely on correct and turn down incorrect advice. However,current research on appropriate reliance still lacks a common definition aswell as an operational measurement concept. Additionally, no in-depthbehavioral experiments have been conducted that help understand the factorsinfluencing this behavior. In this paper, we propose Appropriateness ofReliance (AoR) as an underlying, quantifiable two-dimensional measurementconcept. We develop a research model that analyzes the effect of providingexplanations for AI advice. In an experiment with 200 participants, wedemonstrate how these explanations influence the AoR, and, thus, theeffectiveness of AI advice. Our work contributes fundamental concepts for theanalysis of reliance behavior and the purposeful design of AI advisors.", "output": "Appropriate Reliance on AI Advice: Conceptualization and the Effect of Explanations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reliability of machine learning evaluation -- the consistency of observedevaluation scores across replicated model training runs -- is affected byseveral sources of nondeterminism which can be regarded as measurement noise.Current tendencies to remove noise in order to enforce reproducibility ofresearch results neglect inherent nondeterminism at the implementation leveland disregard crucial interaction effects between algorithmic noise factors anddata properties. This limits the scope of conclusions that can be drawn fromsuch experiments. Instead of removing noise, we propose to incorporate severalsources of variance, including their interaction with data properties, into ananalysis of significance and reliability of machine learning evaluation, withthe aim to draw inferences beyond particular instances of trained models. Weshow how to use linear mixed effects models (LMEMs) to analyze performanceevaluation scores, and to conduct statistical inference with a generalizedlikelihood ratio test (GLRT). This allows us to incorporate arbitrary sourcesof noise like meta-parameter variations into statistical significance testing,and to assess performance differences conditional on data properties.Furthermore, a variance component analysis (VCA) enables the analysis of thecontribution of noise sources to overall variance and the computation of areliability coefficient by the ratio of substantial to total variance.", "output": "Towards Inferential Reproducibility of Machine Learning Research."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces Block Data Representations (BDR), a framework forexploring and evaluating a wide spectrum of narrow-precision formats for deeplearning. It enables comparison of popular quantization standards, and throughBDR, new formats based on shared microexponents (MX) are identified, whichoutperform other state-of-the-art quantization approaches, includingnarrow-precision floating-point and block floating-point. MX utilizes multiplelevels of quantization scaling with ultra-fine scaling factors based on sharedmicroexponents in the hardware. The effectiveness of MX is demonstrated onreal-world models including large-scale generative pretraining and inferencing,and production-scale recommendation systems.", "output": "With Shared Microexponents, A Little Shifting Goes a Long Way."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Direct physical interaction with robots is becoming increasingly important inflexible production scenarios, but robots without protective fences also pose agreater risk to the operator. In order to keep the risk potential low,relatively simple measures are prescribed for operation, such as stopping therobot if there is physical contact or if a safety distance is violated.Although human injuries can be largely avoided in this way, all such solutionshave in common that real cooperation between humans and robots is hardlypossible and therefore the advantages of working with such systems cannotdevelop its full potential. In human-robot collaboration scenarios, moresophisticated solutions are required that make it possible to adapt the robot'sbehavior to the operator and/or the current situation. Most importantly, duringfree robot movement, physical contact must be allowed for meaningfulinteraction and not recognized as a collision. However, here lies a keychallenge for future systems: detecting human contact by using robotproprioception and machine learning algorithms. This work uses the Deep MetricLearning (DML) approach to distinguish between non-contact robot movement,intentional contact aimed at physical human-robot interaction, and collisionsituations. The achieved results are promising and show show that DML achieves98.6% accuracy, which is 4% higher than the existing standards (i.e. a deeplearning network trained without DML). It also indicates a promisinggeneralization capability for easy portability to other robots (target robots)by detecting contact (distinguishing between contactless and intentional oraccidental contact) without having to retrain the model with target robot data.", "output": "Improving safety in physical human-robot collaboration via deep metric learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning agent behaviors from observational data has shown to improve ourunderstanding of their decision-making processes, advancing our ability toexplain their interactions with the environment and other agents. Whilemultiple learning techniques have been proposed in the literature, there is oneparticular setting that has not been explored yet: multi agent systems whereagent identities remain anonymous. For instance, in financial markets labeleddata that identifies market participant strategies is typically proprietary,and only the anonymous state-action pairs that result from the interaction ofmultiple market participants are publicly available. As a result, sequences ofagent actions are not observable, restricting the applicability of existingwork. In this paper, we propose a Policy Clustering algorithm, called K-SHAP,that learns to group anonymous state-action pairs according to the agentpolicies. We frame the problem as an Imitation Learning (IL) task, and we learna world-policy able to mimic all the agent behaviors upon differentenvironmental states. We leverage the world-policy to explain each anonymousobservation through an additive feature attribution method called SHAP (SHapleyAdditive exPlanations). Finally, by clustering the explanations we show that weare able to identify different agent policies and group observationsaccordingly. We evaluate our approach on simulated synthetic market data and areal-world financial dataset. We show that our proposal significantly andconsistently outperforms the existing methods, identifying different agentstrategies.", "output": "K-SHAP: Policy Clustering Algorithm for Anonymous State-Action Pairs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-image generation (TTIG) models, a recent addition to creative AI, cangenerate images based on a text description. These models have begun to rivalthe work of professional creatives, and sparked discussions on the future ofcreative work, loss of jobs, and copyright issues, amongst other importantimplications. To support the sustainable adoption of TTIG, we must providerich, reliable and transparent insights into how professionals perceive, adoptand use TTIG. Crucially though, the public debate is shallow, narrow andlacking transparency, while academic work has focused on studying the use ofTTIG in a general artist population, but not on the perceptions and attitudesof professionals in a specific industry. In this paper, we contribute aqualitative, exploratory interview study on TTIG in the Finnish videogameindustry. Through a Template Analysis on semi-structured interviews with 14game professionals, we reveal 12 overarching themes, structured into 49sub-themes on professionals' perception, adoption and use of TTIG systems ingames industry practice. Experiencing (yet another) change of roles andcreative processes, our participants' reflections can inform discussions withinthe industry, be used by policymakers to inform urgently needed legislation,and support researchers in games, HCI and AI to support the sustainable,professional use of TTIG to benefit people and games as cultural artefacts.", "output": "\"An Adapt-or-Die Type of Situation\": Perception, Adoption, and Use of Text-To-Image-Generation AI by Game Industry Professionals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have taken the scientific world by storm,changing the landscape of natural language processing and human-computerinteraction. These powerful tools can answer complex questions and,surprisingly, perform challenging creative tasks (e.g., generate code andapplications to solve problems, write stories, pieces of music, etc.). In thispaper, we present a collaborative game design framework that combinesinteractive evolution and large language models to simulate the typical humandesign process. We use the former to exploit users' feedback for selecting themost promising ideas and large language models for a very complex creative task- the recombination and variation of ideas. In our framework, the processstarts with a brief and a set of candidate designs, either generated using alanguage model or proposed by the users. Next, users collaborate on the designprocess by providing feedback to an interactive genetic algorithm that selects,recombines, and mutates the most promising designs. We evaluated our frameworkon three game design tasks with human designers who collaborated remotely.", "output": "ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vibration signals have been increasingly utilized in various engineeringfields for analysis and monitoring purposes, including structural healthmonitoring, fault diagnosis and damage detection, where vibration signals canprovide valuable information about the condition and integrity of structures.In recent years, there has been a growing trend towards the use of vibrationsignals in the field of bioengineering. Activity-induced structural vibrations,particularly footstep-induced signals, are useful for analyzing the movement ofbiological systems such as the human body and animals, providing valuableinformation regarding an individual's gait, body mass, and posture, making theman attractive tool for health monitoring, security, and human-computerinteraction. However, the presence of various types of noise can compromise theaccuracy of footstep-induced signal analysis. In this paper, we propose a novelensemble model that leverages both the ensemble of multiple signals and ofrecurrent and convolutional neural network predictions. The proposed modelconsists of three stages: preprocessing, hybrid modeling, and ensemble. In thepreprocessing stage, features are extracted using the Fast Fourier Transformand wavelet transform to capture the underlying physics-governed dynamics ofthe system and extract spatial and temporal features. In the hybrid modelingstage, a bi-directional LSTM is used to denoise the noisy signal concatenatedwith FFT results, and a CNN is used to obtain a condensed featurerepresentation of the signal. In the ensemble stage, three layers of afully-connected neural network are used to produce the final denoised signal.The proposed model addresses the challenges associated with structuralvibration signals, which outperforms the prevailing algorithms for a wide rangeof noise levels, evaluated using PSNR, SNR, and WMAPE.", "output": "Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present an artificial intelligence system to remotely assess the motorperformance of individuals with Parkinson's disease (PD). Participantsperformed a motor task (i.e., tapping fingers) in front of a webcam, and datafrom 250 global participants were rated by three expert neurologists followingthe Movement Disorder Society Unified Parkinson's Disease Rating Scale(MDS-UPDRS). The neurologists' ratings were highly reliable, with anintra-class correlation coefficient (ICC) of 0.88. We developed computeralgorithms to obtain objective measurements that align with the MDS-UPDRSguideline and are strongly correlated with the neurologists' ratings. Ourmachine learning model trained on these measures outperformed an MDS-UPDRScertified rater, with a mean absolute error (MAE) of 0.59 compared to therater's MAE of 0.79. However, the model performed slightly worse than theexpert neurologists (0.53 MAE). The methodology can be replicated for similarmotor tasks, providing the possibility of evaluating individuals with PD andother movement disorders remotely, objectively, and in areas with limitedaccess to neurological care.", "output": "Using AI to Measure Parkinson's Disease Severity at Home."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Frank-Wolfe algorithm is a popular method in structurally constrainedmachine learning applications, due to its fast per-iteration complexity.However, one major limitation of the method is a slow rate of convergence thatis difficult to accelerate due to erratic, zig-zagging step directions, evenasymptotically close to the solution. We view this as an artifact ofdiscretization; that is to say, the Frank-Wolfe emph{flow}, which is itstrajectory at asymptotically small step sizes, does not zig-zag, and reducingdiscretization error will go hand-in-hand in producing a more stabilizedmethod, with better convergence properties. We propose two improvements: amultistep Frank-Wolfe method that directly applies optimized higher-orderdiscretization schemes; and an LMO-averaging scheme with reduced discretizationerror, and whose local convergence rate over general convex sets acceleratesfrom a rate of $O(1/k)$ to up to $O(1/k^{3/2})$.", "output": "Reducing Discretization Error in the Frank-Wolfe Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL)where multiple agents interact in the same environment and whose goal is tomaximize the individual returns. Challenges arise when scaling up the number ofagents due to the resultant non-stationarity that the many agents introduce. Inorder to address this issue, Mean Field Games (MFG) rely on the symmetry andhomogeneity assumptions to approximate games with very large populations.Recently, deep Reinforcement Learning has been used to scale MFG to games withlarger number of states. Current methods rely on smoothing techniques such asaveraging the q-values or the updates on the mean-field distribution. This workpresents a different approach to stabilize the learning based on proximalupdates on the mean-field policy. We name our algorithm Mean Field ProximalPolicy Optimization (MF-PPO), and we empirically show the effectiveness of ourmethod in the OpenSpiel framework.", "output": "Regularization of the policy updates for stabilizing Mean Field Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As influencers play considerable roles in social media marketing, companiesincrease the budget for influencer marketing. Hiring effective influencers iscrucial in social influencer marketing, but it is challenging to find the rightinfluencers among hundreds of millions of social media users. In this paper, wepropose InfluencerRank that ranks influencers by their effectiveness based ontheir posting behaviors and social relations over time. To represent theposting behaviors and social relations, the graph convolutional neural networksare applied to model influencers with heterogeneous networks during differenthistorical periods. By learning the network structure with the embedded nodefeatures, InfluencerRank can derive informative representations for influencersat each period. An attentive recurrent neural network finally distinguisheshighly effective influencers from other influencers by capturing the knowledgeof the dynamics of influencer representations over time. Extensive experimentshave been conducted on an Instagram dataset that consists of 18,397 influencerswith their 2,952,075 posts published within 12 months. The experimental resultsdemonstrate that InfluencerRank outperforms existing baseline methods. Anin-depth analysis further reveals that all of our proposed features and modelcomponents are beneficial to discover effective influencers.", "output": "InfluencerRank: Discovering Effective Influencers via Graph Convolutional Attentive Recurrent Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning image classification and image generation using the same set ofnetwork parameters is a challenging problem. Recent advanced approaches performwell in one task often exhibit poor performance in the other. This workintroduces an energy-based classifier and generator, namely EGC, which canachieve superior performance in both tasks using a single neural network.Unlike a conventional classifier that outputs a label given an image (i.e., aconditional distribution $p(y|mathbf{x})$), the forward pass in EGC is aclassifier that outputs a joint distribution $p(mathbf{x},y)$, enabling animage generator in its backward pass by marginalizing out the label $y$. Thisis done by estimating the energy and classification probability given a noisyimage in the forward pass, while denoising it using the score functionestimated in the backward pass. EGC achieves competitive generation resultscompared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUNChurch, while achieving superior classification accuracy and robustness againstadversarial attacks on CIFAR-10. This work represents the first successfulattempt to simultaneously excel in both tasks using a single set of networkparameters. We believe that EGC bridges the gap between discriminative andgenerative learning.", "output": "EGC: Image Generation and Classification via a Diffusion Energy-Based Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video captioning aims to convey dynamic scenes from videos using naturallanguage, facilitating the understanding of spatiotemporal information withinour environment. Although there have been recent advances, generating detailedand enriched video descriptions continues to be a substantial challenge. Inthis work, we introduce Video ChatCaptioner, an innovative approach forcreating more comprehensive spatiotemporal video descriptions. Our methodemploys a ChatGPT model as a controller, specifically designed to select framesfor posing video content-driven questions. Subsequently, a robust algorithm isutilized to answer these visual queries. This question-answer frameworkeffectively uncovers intricate video details and shows promise as a method forenhancing video content. Following multiple conversational rounds, ChatGPT cansummarize enriched video content based on previous conversations. Wequalitatively demonstrate that our Video ChatCaptioner can generate captionscontaining more visual details about the videos. The code is publicly availableat ", "output": "Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human intelligence has the remarkable ability to assemble basic skills intocomplex ones so as to solve complex tasks. This ability is equally importantfor Artificial Intelligence (AI), and thus, we assert that in addition to thedevelopment of large, comprehensive intelligent models, it is equally crucialto equip such models with the capability to harness various domain-specificexpert models for complex task-solving in the pursuit of Artificial GeneralIntelligence (AGI). Recent developments in Large Language Models (LLMs) havedemonstrated remarkable learning and reasoning abilities, making them promisingas a controller to select, synthesize, and execute external models to solvecomplex tasks. In this project, we develop OpenAGI, an open-source AGI researchplatform, specifically designed to offer complex, multi-step tasks andaccompanied by task-specific datasets, evaluation metrics, and a diverse rangeof extensible models. OpenAGI formulates complex tasks as natural languagequeries, serving as input to the LLM. The LLM subsequently selects,synthesizes, and executes models provided by OpenAGI to address the task.Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF)mechanism, which uses the task-solving result as feedback to improve the LLM'stask-solving ability. Thus, the LLM is responsible for synthesizing variousexternal models for solving complex tasks, while RLTF provides feedback toimprove its task-solving ability, enabling a feedback loop for self-improvingAI. We believe that the paradigm of LLMs operating various expert models forcomplex task-solving is a promising approach towards AGI. To facilitate thecommunity's long-term improvement and evaluation of AGI's ability, weopen-source the code, benchmark, and evaluation methods of the OpenAGI projectat ", "output": "OpenAGI: When LLM Meets Domain Experts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is a new distributed learning paradigm, with privacy,utility, and efficiency as its primary pillars. Existing research indicatesthat it is unlikely to simultaneously attain infinitesimal privacy leakage,utility loss, and efficiency. Therefore, how to find an optimal trade-offsolution is the key consideration when designing the FL algorithm. One commonway is to cast the trade-off problem as a multi-objective optimization problem,i.e., the goal is to minimize the utility loss and efficiency reduction whileconstraining the privacy leakage not exceeding a predefined value. However,existing multi-objective optimization frameworks are very time-consuming, anddo not guarantee the existence of the Pareto frontier, this motivates us toseek a solution to transform the multi-objective problem into asingle-objective problem because it is more efficient and easier to be solved.To this end, we propose FedPAC, a unified framework that leverages PAC learningto quantify multiple objectives in terms of sample complexity, suchquantification allows us to constrain the solution space of multiple objectivesto a shared dimension, so that it can be solved with the help of asingle-objective optimization algorithm. Specifically, we provide the resultsand detailed analyses of how to quantify the utility loss, privacy leakage,privacy-utility-efficiency trade-off, as well as the cost of the attacker fromthe PAC learning perspective.", "output": "Probably Approximately Correct Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Appropriately regulating artificial intelligence is an increasingly urgentpolicy challenge. Legislatures and regulators lack the specialized knowledgerequired to best translate public demands into legal requirements. Overrelianceon industry self-regulation fails to hold producers and users of AI systemsaccountable to democratic demands. Regulatory markets, in which governmentsrequire the targets of regulation to purchase regulatory services from aprivate regulator, are proposed. This approach to AI regulation could overcomethe limitations of both command-and-control regulation and self-regulation.Regulatory market could enable governments to establish policy priorities forthe regulation of AI, whilst relying on market forces and industry R&amp;D effortsto pioneer the methods of regulation that best achieve policymakers' statedobjectives.", "output": "Regulatory Markets: The Future of AI Governance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have made significant progress in variousdomains, including healthcare. However, the specialized nature of clinicallanguage understanding tasks presents unique challenges and limitations thatwarrant further investigation. In this study, we conduct a comprehensiveevaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, withinthe realm of clinical language understanding tasks. These tasks span a diverserange, including named entity recognition, relation extraction, naturallanguage inference, semantic textual similarity, document classification, andquestion-answering. We also introduce a novel prompting strategy,self-questioning prompting (SQP), tailored to enhance LLMs' performance byeliciting informative questions and answers pertinent to the clinical scenariosat hand. Our evaluation underscores the significance of task-specific learningstrategies and prompting techniques for improving LLMs' effectiveness inhealthcare-related tasks. Additionally, our in-depth error analysis on thechallenging relation extraction task offers valuable insights into errordistribution and potential avenues for improvement using SQP. Our study shedslight on the practical implications of employing LLMs in the specialized domainof healthcare, serving as a foundation for future research and the developmentof potential applications in healthcare settings.", "output": "Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Time series classification (TSC) is a challenging task due to the diversityof types of feature that may be relevant for different classification tasks,including trends, variance, frequency, magnitude, and various patterns. Toaddress this challenge, several alternative classes of approach have beendeveloped, including similarity-based, features and intervals, shapelets,dictionary, kernel, neural network, and hybrid approaches. While kernel, neuralnetwork, and hybrid approaches perform well overall, some specializedapproaches are better suited for specific tasks. In this paper, we propose anew similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), whichoutperforms previous state-of-the-art similarity-based classifiers across theUCR benchmark and outperforms state-of-the-art kernel, neural network, andhybrid methods on specific datasets in the benchmark that are best addressed bysimilarity-base methods. PF 2.0 incorporates three recent advances in timeseries similarity measures -- (1) computationally efficient early abandoningand pruning to speedup elastic similarity computations; (2) a new elasticsimilarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost functiontuning. It rationalizes the set of similarity measures employed, reducing theeight base measures of the original PF to three and using the first derivativetransform with all similarity measures, rather than a limited subset. We haveimplemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PFframework more efficient.", "output": "Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Homographs, words with the same spelling but different meanings, remainchallenging in Neural Machine Translation (NMT). While recent works leveragevarious word embedding approaches to differentiate word sense in NMT, they donot focus on the pivotal components in resolving ambiguities of homographs inNMT: the hidden states of an encoder. In this paper, we propose a novelapproach to tackle homographic issues of NMT in the latent space. We firsttrain an encoder (aka \"HDR-encoder\") to learn universal sentencerepresentations in a natural language inference (NLI) task. We furtherfine-tune the encoder using homograph-based synset sentences from WordNet,enabling it to learn word-level homographic disambiguation representations(HDR). The pre-trained HDR-encoder is subsequently integrated with atransformer-based NMT in various schemes to improve translation accuracy.Experiments on four translation directions demonstrate the effectiveness of theproposed method in enhancing the performance of NMT systems in the BLEU scores(up to +2.3 compared to a solid baseline). The effects can be verified by othermetrics (F1, precision, and recall) of translation accuracy in an additionaldisambiguation task. Visualization methods like heatmaps, T-SNE and translationexamples are also utilized to demonstrate the effects of the proposed method.", "output": "Learning Homographic Disambiguation Representation for Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of approaches for trajectory prediction requires metrics tovalidate and compare their performance. Currently established metrics are basedon Euclidean distance, which means that errors are weighted equally in alldirections. Euclidean metrics are insufficient for structured environments likeroads, since they do not properly capture the agent's intent relative to theunderlying lane. In order to provide a reasonable assessment of trajectoryprediction approaches with regard to the downstream planning task, we propose anew metric that is lane distance-based: Lane Miss Rate (LMR). For thecalculation of LMR, the ground-truth and predicted endpoints are assigned tolane segments, more precisely their centerlines. Measured by the distance alongthe lane segments, predictions that are within a certain threshold distance tothe ground-truth count as hits, otherwise they count as misses. LMR is thendefined as the ratio of sequences that yield a miss. Our results on threestate-of-the-art trajectory prediction models show that LMR preserves the orderof Euclidean distance-based metrics. In contrast to the Euclidean Miss Rate,qualitative results show that LMR yields misses for sequences where predictionsare located on wrong lanes. Hits on the other hand result for sequences wherepredictions are located on the correct lane. This means that LMR implicitlyweights Euclidean error relative to the lane and goes into the direction ofcapturing intents of traffic agents. The source code of LMR for Argoverse 2 ispublicly available.", "output": "LMR: Lane Distance-Based Metric for Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in areas such as natural language processing and computervision rely on intricate and massive models that have been trained using vastamounts of unlabelled or partly labeled data and training or deploying thesestate-of-the-art methods to resource constraint environments has been achallenge. Galaxy morphologies are crucial to understanding the processes bywhich galaxies form and evolve. Efficient methods to classify galaxymorphologies are required to extract physical information from modern-dayastronomy surveys. In this paper, we introduce methods to learn from lessamounts of data. We propose using a hybrid transformer-convolutionalarchitecture drawing much inspiration from the success of CoAtNet and MaxViT.Concretely, we use the transformer-convolutional hybrid with a new stack designfor the network, a different way of creating a relative self-attention layer,and pair it with a careful selection of data augmentation and regularizationtechniques. Our approach sets a new state-of-the-art on predicting galaxymorphologies from images on the Galaxy10 DECals dataset, a science objective,which consists of 17736 labeled images achieving $94.86%$ top-$1$ accuracy,beating the current state-of-the-art for this task by $4.62%$. Furthermore,this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet.We also find that models and training methods used for larger datasets wouldoften not work very well in the low-data regime. Our code and models will bereleased at a later date before the conference.", "output": "Astroformer: More Data Might Not be All You Need for Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-Image (T2I) generation is enabling new applications that supportcreators, designers, and general end users of productivity software bygenerating illustrative content with high photorealism starting from a givendescriptive text as a prompt. Such models are however trained on massiveamounts of web data, which surfaces the peril of potential harmful biases thatmay leak in the generation process itself. In this paper, we take amulti-dimensional approach to studying and quantifying common social biases asreflected in the generated images, by focusing on how occupations, personalitytraits, and everyday situations are depicted across representations of(perceived) gender, age, race, and geographical location. Through an extensiveset of both automated and human evaluation experiments we present findings fortwo popular T2I models: DALLE-v2 and Stable Diffusion. Our results reveal thatthere exist severe occupational biases of neutral prompts majorly excludinggroups of people from results for both models. Such biases can get mitigated byincreasing the amount of specification in the prompt itself, although theprompting mitigation will not address discrepancies in image quality or otherusages of the model or its representations in other scenarios. Further, weobserve personality traits being associated with only a limited set of peopleat the intersection of race, gender, and age. Finally, an analysis ofgeographical location representations on everyday situations (e.g., park, food,weddings) shows that for most situations, images generated through defaultlocation-neutral prompts are closer and more similar to images generated forlocations of United States and Germany.", "output": "Social Biases through the Text-to-Image Generation Lens."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the continuous improvement of computing power and deep learningalgorithms in recent years, the foundation model has grown in popularity.Because of its powerful capabilities and excellent performance, this technologyis being adopted and applied by an increasing number of industries. In theintelligent transportation industry, artificial intelligence faces thefollowing typical challenges: few shots, poor generalization, and a lack ofmulti-modal techniques. Foundation model technology can significantly alleviatethe aforementioned issues. To address these, we designed the 1st FoundationModel Challenge, with the goal of increasing the popularity of foundation modeltechnology in traffic scenarios and promoting the rapid development of theintelligent transportation industry. The challenge is divided into two tracks:all-in-one and cross-modal image retrieval. Furthermore, we provide a newbaseline and benchmark for the two tracks, called Open-TransMind. According toour knowledge, Open-TransMind is the first open-source transportationfoundation model with multi-task and multi-modal capabilities. Simultaneously,Open-TransMind can achieve state-of-the-art performance on detection,classification, and segmentation datasets of traffic scenarios. Our source codeis available at ", "output": "Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D object retrieval is an important yet challenging task, which has drawnmore and more attention in recent years. While existing approaches have madestrides in addressing this issue, they are often limited to restricted settingssuch as image and sketch queries, which are often unfriendly interactions forcommon users. In order to overcome these limitations, this paper presents anovel SHREC challenge track focusing on text-based fine-grained retrieval of 3Danimal models. Unlike previous SHREC challenge tracks, the proposed task isconsiderably more challenging, requiring participants to develop innovativeapproaches to tackle the problem of text-based retrieval. Despite the increaseddifficulty, we believe that this task has the potential to drive usefulapplications in practice and facilitate more intuitive interactions with 3Dobjects. Five groups participated in our competition, submitting a total of 114runs. While the results obtained in our competition are satisfactory, we notethat the challenges presented by this task are far from being fully solved. Assuch, we provide insights into potential areas for future research andimprovements. We believe that we can help push the boundaries of 3D objectretrieval and facilitate more user-friendly interactions via vision-languagetechnologies.", "output": "TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ultra-low-resolution Infrared (IR) array sensors offer a low-cost,energy-efficient, and privacy-preserving solution for people counting, withapplications such as occupancy monitoring. Previous work has shown that DeepLearning (DL) can yield superior performance on this task. However, theliterature was missing an extensive comparative analysis of various efficientDL architectures for IR array-based people counting, that considers not onlytheir accuracy, but also the cost of deploying them on memory- andenergy-constrained Internet of Things (IoT) edge nodes. In this work, weaddress this need by comparing 6 different DL architectures on a novel datasetcomposed of IR images collected from a commercial 8x8 array, which we madeopenly available. With a wide architectural exploration of each model type, weobtain a rich set of Pareto-optimal solutions, spanning cross-validatedbalanced accuracy scores in the 55.70-82.70% range. When deployed on acommercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, thesemodels occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference,while consuming 17.18-120.43 $mu$J of energy. Our models are significantlymore accurate than a previous deterministic method (up to +39.9%), while beingup to 3.53x faster and more energy efficient. Further, our models' accuracy iscomparable to state-of-the-art DL solutions on similar resolution sensors,despite a much lower complexity. All our models enable continuous, real-timeinference on a MCU-based IoT node, with years of autonomous operation withoutbattery recharging.", "output": "Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training models to apply linguistic knowledge and visual concepts from 2Dimages to 3D world understanding is a promising direction that researchers haveonly recently started to explore. In this work, we design a novel 3Dpre-training Vision-Language method that helps a model learn semanticallymeaningful and transferable 3D scene point cloud representations. We inject therepresentational power of the popular CLIP model into our 3D encoder byaligning the encoded 3D scene features with the corresponding 2D image and textembeddings produced by CLIP. To assess our model's 3D world reasoningcapability, we evaluate it on the downstream task of 3D Visual QuestionAnswering. Experimental quantitative and qualitative results show that ourpre-training method outperforms state-of-the-art works in this task and leadsto an interpretable representation of 3D scene features.", "output": "CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an$E(3)times SO(3)$ equivariant framework for sparse deconvolution of volumeswhere each voxel contains a spherical signal. Such 6D data naturally arises indiffusion MRI (dMRI), a medical imaging modality widely used to measuremicrostructure and structural connectivity. As each dMRI voxel is typically amixture of various overlapping structures, there is a need for blinddeconvolution to recover crossing anatomical structures such as white mattertracts. Existing dMRI work takes either an iterative or deep learning approachto sparse spherical deconvolution, yet it typically does not account forrelationships between neighboring measurements. This work constructsequivariant deep learning layers which respect to symmetries of spatialrotations, reflections, and translations, alongside the symmetries of voxelwisespherical rotations. As a result, RT-ESD improves on previous work acrossseveral tasks including fiber recovery on the DiSCo dataset,deconvolution-derived partial volume estimation on real-world textit{in vivo}human brain dMRI, and improved downstream reconstruction of fiber tractogramson the Tractometer dataset. Our implementation is available at", "output": "$E(3) \\times SO(3)$-Equivariant Networks for Spherical Deconvolution in Diffusion MRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In biomedical research and artificial intelligence, access to large,well-balanced, and representative datasets is crucial for developingtrustworthy applications that can be used in real-world scenarios. However,obtaining such datasets can be challenging, as they are often restricted tohospitals and specialized facilities. To address this issue, the study proposesto generate highly realistic synthetic faces exhibiting drug abuse traitsthrough augmentation. The proposed method, called \"3DG-GA\", Deep De-identifiedanonymous Dataset Generation, uses Genetics Algorithm as a strategy forsynthetic faces generation. The algorithm includes GAN artificial facegeneration, forgery detection, and face recognition. Initially, a dataset of120 images of actual facial drug abuse is used. By preserving, the drug traits,the 3DG-GA provides a dataset containing 3000 synthetic facial drug abuseimages. The dataset will be open to the scientific community, which canreproduce our results and benefit from the generated datasets while avoidinglegal or ethical restrictions.", "output": "Generation of artificial facial drug abuse images using Deep De-identified anonymous Dataset augmentation through Genetics Algorithm (3DG-GA)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative models such as StyleGAN2 and Stable Diffusion have achievedstate-of-the-art performance in computer vision tasks such as image synthesis,inpainting, and de-noising. However, current generative models for faceinpainting often fail to preserve fine facial details and the identity of theperson, despite creating aesthetically convincing image structures andtextures. In this work, we propose Person Aware Tuning (PAT) of Mask-AwareTransformer (MAT) for face inpainting, which addresses this issue. Our proposedmethod, PATMAT, effectively preserves identity by incorporating referenceimages of a subject and fine-tuning a MAT architecture trained on faces. Byusing ~40 reference images, PATMAT creates anchor points in MAT's style module,and tunes the model using the fixed anchors to adapt the model to a new faceidentity. Moreover, PATMAT's use of multiple images per anchor during trainingallows the model to use fewer reference images than competing methods. Wedemonstrate that PATMAT outperforms state-of-the-art models in terms of imagequality, the preservation of person-specific details, and the identity of thesubject. Our results suggest that PATMAT can be a promising approach forimproving the quality of personalized face inpainting.", "output": "PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, the joint detection-and-tracking paradigm has been a verypopular way of tackling the multi-object tracking (MOT) task. Many of themethods following this paradigm use the object center keypoint for detection.However, we argue that the center point is not optimal since it is often notvisible in crowded scenarios, which results in many missed detections when theobjects are partially occluded. We propose TopTrack, a jointdetection-and-tracking method that uses the top of the object as a keypoint fordetection instead of the center because it is more often visible. Furthermore,TopTrack processes consecutive frames in separate streams in order tofacilitate training. We performed experiments to show that using the object topas a keypoint for detection can reduce the amount of missed detections, whichin turn leads to more complete trajectories and less lost trajectories.TopTrack manages to achieve competitive results with other state-of-the-arttrackers on two MOT benchmarks.", "output": "TopTrack: Tracking Objects By Their Top."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The short-form videos have explosive popularity and have dominated the newsocial media trends. Prevailing short-video platforms,~textit{e.g.}, Kuaishou(Kwai), TikTok, Instagram Reels, and YouTube Shorts, have changed the way weconsume and create content. For video content creation and understanding, theshot boundary detection (SBD) is one of the most essential components invarious scenarios. In this work, we release a new public Short video sHotbOundary deTection dataset, named SHOT, consisting of 853 complete short videosand 11,606 shot annotations, with 2,716 high quality shot boundary annotationsin 200 test videos. Leveraging this new data wealth, we propose to optimize themodel design for video SBD, by conducting neural architecture search in asearch space encapsulating various advanced 3D ConvNets and Transformers. Ourproposed approach, named AutoShot, achieves higher F1 scores than previousstate-of-the-art approaches, e.g., outperforming TransNetV2 by 4.2%, when beingderived and evaluated on our newly constructed SHOT dataset. Moreover, tovalidate the generalizability of the AutoShot architecture, we directlyevaluate it on another three public datasets: ClipShots, BBC and RAI, and theF1 scores of AutoShot outperform previous state-of-the-art approaches by 1.1%,0.9% and 1.2%, respectively. The SHOT dataset and code can be found in .", "output": "AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the significant progress in face recognition in the past years, theyare often treated as \"black boxes\" and have been criticized for lackingexplainability. It becomes increasingly important to understand thecharacteristics and decisions of deep face recognition systems to make themmore acceptable to the public. Explainable face recognition (XFR) refers to theproblem of interpreting why the recognition model matches a probe face with oneidentity over others. Recent studies have explored use of visual saliency mapsas an explanation, but they often lack a deeper analysis in the context of facerecognition. This paper starts by proposing a rigorous definition ofexplainable face recognition (XFR) which focuses on the decision-making processof the deep recognition model. Following the new definition, a similarity-basedRISE algorithm (S-RISE) is then introduced to produce high-quality visualsaliency maps. Furthermore, an evaluation approach is proposed tosystematically validate the reliability and accuracy of general visualsaliency-based XFR methods.", "output": "Explanation of Face Recognition via Saliency Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "An ego vehicle following a virtual lead vehicle planned route is an essentialcomponent when autonomous and non-autonomous vehicles interact. Yet, there is aquestion about the driver's ability to follow the planned lead vehicle route.Thus, predicting the trajectory of the ego vehicle route given a lead vehicleroute is of interest. We introduce a new dataset, the FollowMe dataset, whichoffers a motion and behavior prediction problem by answering the latterquestion of the driver's ability to follow a lead vehicle. We also introduce adeep spatio-temporal graph model FollowMe-STGCNN as a baseline for the dataset.In our experiments and analysis, we show the design benefits of FollowMe-STGCNNin capturing the interactions that lie within the dataset. We contrast theperformance of FollowMe-STGCNN with prior motion prediction models showing theneed to have a different design mechanism to address the lead vehicle followingsettings.", "output": "FollowMe: Vehicle Behaviour Prediction in Autonomous Vehicle Settings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Detecting digital face manipulation in images and video has attractedextensive attention due to the potential risk to public trust. To counteractthe malicious usage of such techniques, deep learning-based deepfake detectionmethods have been employed and have exhibited remarkable performance. However,the performance of such detectors is often assessed on related benchmarks thathardly reflect real-world situations. For example, the impact of various imageand video processing operations and typical workflow distortions on detectionaccuracy has not been systematically measured. In this paper, a more reliableassessment framework is proposed to evaluate the performance of learning-baseddeepfake detectors in more realistic settings. To the best of ouracknowledgment, it is the first systematic assessment approach for deepfakedetectors that not only reports the general performance under real-worldconditions but also quantitatively measures their robustness toward differentprocessing operations. To demonstrate the effectiveness and usage of theframework, extensive experiments and detailed analysis of three populardeepfake detection methods are further presented in this paper. In addition, astochastic degradation-based data augmentation method driven by realisticprocessing operations is designed, which significantly improves the robustnessof deepfake detectors.", "output": "Assessment Framework for Deepfake Detection in Real-world Situations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concept bottleneck models (CBM) are a popular way of creating moreinterpretable neural networks by having hidden layer neurons correspond tohuman-understandable concepts. However, existing CBMs and their variants havetwo crucial limitations: first, they need to collect labeled data for each ofthe predefined concepts, which is time consuming and labor intensive; second,the accuracy of a CBM is often significantly lower than that of a standardneural network, especially on more complex datasets. This poor performancecreates a barrier for adopting CBMs in practical real world applications.Motivated by these challenges, we propose Label-free CBM which is a novelframework to transform any neural network into an interpretable CBM withoutlabeled concept data, while retaining a high accuracy. Our Label-free CBM hasmany advantages, it is: scalable - we present the first CBM scaled to ImageNet,efficient - creating a CBM takes only a few hours even for very large datasets,and automated - training it for a new dataset requires minimal human effort.Our code is available at ", "output": "Label-Free Concept Bottleneck Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While deep learning models have become the predominant method for medicalimage segmentation, they are typically not capable of generalizing to unseensegmentation tasks involving new anatomies, image modalities, or labels. Givena new segmentation task, researchers generally have to train or fine-tunemodels, which is time-consuming and poses a substantial barrier for clinicalresearchers, who often lack the resources and expertise to train neuralnetworks. We present UniverSeg, a method for solving unseen medicalsegmentation tasks without additional training. Given a query image and exampleset of image-label pairs that define a new segmentation task, UniverSeg employsa new Cross-Block mechanism to produce accurate segmentation maps without theneed for additional training. To achieve generalization to new tasks, we havegathered and standardized a collection of 53 open-access medical segmentationdatasets with over 22,000 scans, which we refer to as MegaMedical. We used thiscollection to train UniverSeg on a diverse set of anatomies and imagingmodalities. We demonstrate that UniverSeg substantially outperforms severalrelated methods on unseen tasks, and thoroughly analyze and draw insights aboutimportant aspects of the proposed system. The UniverSeg source code and modelweights are freely available at ", "output": "UniverSeg: Universal Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As deep learning models increasingly find applications in critical domainssuch as medical imaging, the need for transparent and trustworthydecision-making becomes paramount. Many explainability methods provide insightsinto how these models make predictions by attributing importance to inputfeatures. As Vision Transformer (ViT) becomes a promising alternative toconvolutional neural networks for image classification, its interpretabilityremains an open research question. This paper investigates the performance ofvarious interpretation methods on a ViT applied to classify chest X-ray images.We introduce the notion of evaluating faithfulness, sensitivity, and complexityof ViT explanations. The obtained results indicate that Layerwise relevancepropagation for transformers outperforms Local interpretable model-agnosticexplanations and Attention visualization, providing a more accurate andreliable representation of what a ViT has actually learned. Our findingsprovide insights into the applicability of ViT explanations in medical imagingand highlight the importance of using appropriate evaluation criteria forcomparing them.", "output": "Towards Evaluating Explanations of Vision Transformers for Medical Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Denoising diffusion probabilistic models (DDPMs) employ a sequence of whiteGaussian noise samples to generate an image. In analogy with GANs, those noisemaps could be considered as the latent code associated with the generatedimage. However, this native noise space does not possess a convenientstructure, and is thus challenging to work with in editing tasks. Here, wepropose an alternative latent noise space for DDPM that enables a wide range ofediting operations via simple means, and present an inversion method forextracting these edit-friendly noise maps for any given image (real orsynthetically generated). As opposed to the native DDPM noise space, theedit-friendly noise maps do not have a standard normal distribution and are notstatistically independent across timesteps. However, they allow perfectreconstruction of any desired image, and simple transformations on themtranslate into meaningful manipulations of the output image (e.g., shifting,color edits). Moreover, in text-conditional models, fixing those noise mapswhile changing the text prompt, modifies semantics while retaining structure.We illustrate how this property enables text-based editing of real images viathe diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIMinversion). We also show how it can be used within existing diffusion-basedediting methods to improve their quality and diversity.", "output": "An Edit Friendly DDPM Noise Space: Inversion and Manipulations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As technology progresses, smart automated systems will serve an increasinglyimportant role in the agricultural industry. Current existing vision systemsfor yield estimation face difficulties in occlusion and scalability as theyutilize a camera system that is large and expensive, which are unsuitable fororchard environments. To overcome these problems, this paper presents a sizemeasurement method combining a machine learning model and depth images capturedfrom three low cost RGBD cameras to detect and measure the height and width oftomatoes. The performance of the presented system is evaluated on a labenvironment with real tomato fruits and fake leaves to simulate occlusion inthe real farm environment. To improve accuracy by addressing fruit occlusion,our three-camera system was able to achieve a height measurement accuracy of0.9114 and a width accuracy of 0.9443.", "output": "Visual based Tomato Size Measurement System for an Indoor Farming Environment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Direct optimization of interpolated features on multi-resolution voxel gridshas emerged as a more efficient alternative to MLP-like modules. However, thisapproach is constrained by higher memory expenses and limited representationcapabilities. In this paper, we introduce a novel dynamic grid optimizationmethod for high-fidelity 3D surface reconstruction that incorporates both RGBand depth observations. Rather than treating each voxel equally, we optimizethe process by dynamically modifying the grid and assigning more finer-scalevoxels to regions with higher complexity, allowing us to capture more intricatedetails. Furthermore, we develop a scheme to quantify the dynamic subdivisionof voxel grid during optimization without requiring any priors. The proposedapproach is able to generate high-quality 3D reconstructions with fine detailson both synthetic and real-world data, while maintaining computationalefficiency, which is substantially faster than the baseline method NeuralRGBD.", "output": "Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Keypoint detection &amp; descriptors are foundational tech-nologies for computervision tasks like image matching, 3D reconstruction and visual odometry.Hand-engineered methods like Harris corners, SIFT, and HOG descriptors havebeen used for decades; more recently, there has been a trend to introducelearning in an attempt to improve keypoint detectors. On inspection however,the results are difficult to interpret; recent learning-based methods employ avast diversity of experimental setups and design choices: empirical results areoften reported using different backbones, protocols, datasets, types ofsupervisions or tasks. Since these differences are often coupled together, itraises a natural question on what makes a good learned keypoint detector. Inthis work, we revisit the design of existing keypoint detectors bydeconstructing their methodologies and identifying the key components. Were-design each component from first-principle and propose Simple LearnedKeypoints (SiLK) that is fully-differentiable, lightweight, and flexible.Despite its simplicity, SiLK advances new state-of-the-art on DetectionRepeatability and Homography Estimation tasks on HPatches and 3D Point-CloudRegistration task on ScanNet, and achieves competitive performance tostate-of-the-art on camera pose estimation in 2022 Image Matching Challenge andScanNet.", "output": "SiLK -- Simple Learned Keypoints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current top-leading solutions for video object segmentation (VOS) typicallyfollow a matching-based regime: for each query frame, the segmentation mask isinferred according to its correspondence to previously processed and the firstannotated frames. They simply exploit the supervisory signals from thegroundtruth masks for learning mask prediction only, without posing anyconstraint on the space-time correspondence matching, which, however, is thefundamental building block of such regime. To alleviate this crucial yetcommonly ignored issue, we devise a correspondence-aware training framework,which boosts matching-based VOS solutions by explicitly encouraging robustcorrespondence matching during network learning. Through comprehensivelyexploring the intrinsic coherence in videos on pixel and object levels, ouralgorithm reinforces the standard, fully supervised training of masksegmentation with label-free, contrastive correspondence learning. Withoutneither requiring extra annotation cost during training, nor causing speeddelay during deployment, nor incurring architectural modification, ouralgorithm provides solid performance gains on four widely used benchmarks,i.e., DAVIS2016&amp;2017, and YouTube-VOS2018&amp;2019, on the top of famousmatching-based VOS solutions.", "output": "Boosting Video Object Segmentation via Space-time Correspondence Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose an embarrassingly simple yet highly effectivezero-shot semantic segmentation (ZS3) method, based on the pre-trainedvision-language model CLIP. First, our study provides a couple of keydiscoveries: (i) the global tokens (a.k.a [CLS] tokens in Transformer) of thetext branch in CLIP provide a powerful representation of semantic informationand (ii) these text-side [CLS] tokens can be regarded as category priors toguide CLIP visual encoder pay more attention on the corresponding region ofinterest. Based on that, we build upon the CLIP model as a backbone which weextend with a One-Way [CLS] token navigation from text to the visual branchthat enables zero-shot dense prediction, dubbed textbf{ClsCLIP}. Specifically,we use the [CLS] token output from the text branch, as an auxiliary semanticprompt, to replace the [CLS] token in shallow layers of the ViT-based visualencoder. This one-way navigation embeds such global category prior earlier andthus promotes semantic segmentation. Furthermore, to better segment tinyobjects in ZS3, we further enhance ClsCLIP with a local zoom-in strategy, whichemploys a region proposal pre-processing and we get ClsCLIP+. Extensiveexperiments demonstrate that our proposed ZS3 method achieves a SOTAperformance, and it is even comparable with those few-shot semanticsegmentation methods.", "output": "[CLS] Token is All You Need for Zero-Shot Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel two-component loss for biomedical imagesegmentation tasks called the Instance-wise and Center-of-Instance (ICI) loss,a loss function that addresses the instance imbalance problem commonlyencountered when using pixel-wise loss functions such as the Dice loss. TheInstance-wise component improves the detection of small instances or ``blobs\"in image datasets with both large and small instances. The Center-of-Instancecomponent improves the overall detection accuracy. We compared the ICI losswith two existing losses, the Dice loss and the blob loss, in the task ofstroke lesion segmentation using the ATLAS R2.0 challenge dataset from MICCAI2022. Compared to the other losses, the ICI loss provided a better balancedsegmentation, and significantly outperformed the Dice loss with an improvementof $1.7-3.7%$ and the blob loss by $0.6-5.0%$ in terms of the Dice similaritycoefficient on both validation and test set, suggesting that the ICI loss is apotential solution to the instance imbalance problem.", "output": "Improving Segmentation of Objects with Varying Sizes in Biomedical Images using Instance-wise and Center-of-Instance Segmentation Loss Function."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Stereo image super-resolution aims to improve the quality of high-resolutionstereo image pairs by exploiting complementary information across views. Toattain superior performance, many methods have prioritized designing complexmodules to fuse similar information across views, yet overlooking theimportance of intra-view information for high-resolution reconstruction. Italso leads to problems of wrong texture in recovered images. To address thisissue, we explore the interdependencies between various hierarchies fromintra-view and propose a novel method, named Cross-View-Hierarchy Network forStereo Image Super-Resolution (CVHSSR). Specifically, we design across-hierarchy information mining block (CHIMB) that leverages channelattention and large kernel convolution attention to extract both global andlocal features from the intra-view, enabling the efficient restoration ofaccurate texture details. Additionally, a cross-view interaction module (CVIM)is proposed to fuse similar features from different views by utilizingcross-view attention mechanisms, effectively adapting to the binocular scene.Extensive experiments demonstrate the effectiveness of our method. CVHSSRachieves the best stereo image super-resolution performance than otherstate-of-the-art methods while using fewer parameters. The source code andpre-trained models are available at ", "output": "Cross-View Hierarchy Network for Stereo Image Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural image compression methods have seen increasingly strong performance inrecent years. However, they suffer orders of magnitude higher computationalcomplexity compared to traditional codecs, which stands in the way ofreal-world deployment. This paper takes a step forward in closing this gap indecoding complexity by adopting shallow or even linear decoding transforms. Tocompensate for the resulting drop in compression performance, we exploit theoften asymmetrical computation budget between encoding and decoding, byadopting more powerful encoder networks and iterative encoding. Wetheoretically formalize the intuition behind, and our experimental resultsestablish a new frontier in the trade-off between rate-distortion and decodingcomplexity for neural image compression. Specifically, we achieverate-distortion performance competitive with the established mean-scalehyperprior architecture of Minnen et al. (2018), while reducing the overalldecoding complexity by 80 %, or over 90 % for the synthesis transform alone.Our code can be found at ", "output": "Asymmetrically-powered Neural Image Compression with Shallow Decoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present ShapeClipper, a novel method that reconstructs 3D object shapesfrom real-world single-view RGB images. Instead of relying on laborious 3D,multi-view or camera pose annotation, ShapeClipper learns shape reconstructionfrom a set of single-view segmented images. The key idea is to facilitate shapelearning via CLIP-based shape consistency, where we encourage objects withsimilar CLIP encodings to share similar shapes. We also leverage off-the-shelfnormals as an additional geometric constraint so the model can learn betterbottom-up reasoning of detailed surface geometry. These two novel consistencyconstraints, when used to regularize our model, improve its ability to learnboth global shape structure and local geometric details. We evaluate our methodover three challenging real-world datasets, Pix3D, Pascal3D+, and OpenImages,where we achieve superior performance over state-of-the-art methods.", "output": "ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Set-based face recognition (SFR) aims to recognize the face sets in theunconstrained scenario, where the appearance of same identity may changedramatically with extreme variances (e.g., illumination, pose, expression). Weargue that the two crucial issues in SFR, the face quality and burstiness, areboth identity-irrelevant and variance-relevant. The quality and burstinessassessment are interfered with by the entanglement of identity, and the facerecognition is interfered with by the entanglement of variance. Thus we proposeto separate the identity features with the variance features in alight-weighted set-based disentanglement framework. Beyond disentanglement, thevariance features are fully utilized to indicate face quality and burstiness ina set, rather than being discarded after training. To suppress face burstinessin the sets, we propose a vocabulary-based burst suppression (VBS) method whichquantizes faces with a reference vocabulary. With interword and intra-wordnormalization operations on the assignment scores, the face burtisness degreesare appropriately estimated. The extensive illustrations and experimentsdemonstrate the effect of the disentanglement framework with VBS, which getsnew state-of-the-art on the SFR benchmarks. The code will be released at", "output": "Set-Based Face Recognition Beyond Disentanglement: Burstiness Suppression With Variance Vocabulary."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Transformers have shown promising performance in various visiontasks. However, the high costs of global self-attention remain challenging forTransformers, especially for high-resolution vision tasks. Local self-attentionruns attention computation within a limited region for the sake of efficiency,resulting in insufficient context modeling as their receptive fields are small.In this work, we introduce two new attention modules to enhance the globalmodeling capability of the hierarchical vision transformer, namely, randomsampling windows (RS-Win) and important region windows (IR-Win). Specifically,RS-Win sample random image patches to compose the window, following a uniformdistribution, i.e., the patches in RS-Win can come from any position in theimage. IR-Win composes the window according to the weights of the image patchesin the attention map. Notably, RS-Win is able to capture global informationthroughout the entire model, even in earlier, high-resolution stages. IR-Winenables the self-attention module to focus on important regions of the imageand capture more informative features. Incorporated with these designs,RSIR-Win Transformer demonstrates competitive performance on common visiontasks.", "output": "RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exemplar-based image colorization aims to colorize a target grayscale imagebased on a color reference image, and the key is to establish accuratepixel-level semantic correspondence between these two images. Previous methodssearch for correspondence across the entire reference image, and this type ofglobal matching is easy to get mismatch. We summarize the difficulties in twoaspects: (1) When the reference image only contains a part of objects relatedto target image, improper correspondence will be established in unrelatedregions. (2) It is prone to get mismatch in regions where the shape or textureof the object is easily confused. To overcome these issues, we propose SPColor,a semantic prior guided exemplar-based image colorization framework. Differentfrom previous methods, SPColor first coarsely classifies pixels of thereference and target images to several pseudo-classes under the guidance ofsemantic prior, then the correspondences are only established locally betweenthe pixels in the same class via the newly designed semantic prior guidedcorrespondence network. In this way, improper correspondence between differentsemantic classes is explicitly excluded, and the mismatch is obviouslyalleviated. Besides, to better reserve the color from reference, a similaritymasked perceptual loss is designed. Noting that the carefully designed SPColorutilizes the semantic prior provided by an unsupervised segmentation model,which is free for additional manual semantic annotations. Experimentsdemonstrate that our model outperforms recent state-of-the-art methods bothquantitatively and qualitatively on public dataset.", "output": "SPColor: Semantic Prior Guided Exemplar-based Image Colorization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent applications of deep convolutional neural networks in medical imagingraise concerns about their interpretability. While most explainable deeplearning applications use post hoc methods (such as GradCAM) to generatefeature attribution maps, there is a new type of case-based reasoning models,namely ProtoPNet and its variants, which identify prototypes during trainingand compare input image patches with those prototypes. We propose the firstmedical prototype network (MProtoNet) to extend ProtoPNet to brain tumorclassification with 3D multi-parametric magnetic resonance imaging (mpMRI)data. To address different requirements between 2D natural images and 3D mpMRIsespecially in terms of localizing attention regions, a new attention modulewith soft masking and online-CAM loss is introduced. Soft masking helps sharpenattention maps, while online-CAM loss directly utilizes image-level labels whentraining the attention module. MProtoNet achieves statistically significantimprovements in interpretability metrics of both correctness and localizationcoherence (with a best activation precision of $0.713pm0.058$) withouthuman-annotated labels during training, when compared with GradCAM and severalProtoPNet variants. The source code is available at", "output": "MProtoNet: A Case-Based Interpretable Model for Brain Tumor Classification with 3D Multi-parametric Magnetic Resonance Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Manipulatives used in the right way help improve mathematical conceptsleading to better learning outcomes. In this paper, we present a phygital(physical + digital) curriculum inspired teaching system for kids aged 5-8 tolearn geometry using shape tile manipulatives. Combining smaller shapes to formlarger ones is an important skill kids learn early on which requires shapetiles to be placed close to each other in the play area. This introduces achallenge of oriented object detection for densely packed objects witharbitrary orientations. Leveraging simulated data for neural network trainingand light-weight mobile architectures, we enable our system to understand userinteractions and provide real-time audiovisual feedback. Experimental resultsshow that our network runs real-time with high precision/recall on consumerdevices, thereby providing a consistent and enjoyable learning experience.", "output": "Gamifying Math Education using Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer-based image denoising methods have achieved encouraging resultsin the past year. However, it must uses linear operations to model long-rangedependencies, which greatly increases model inference time and consumes GPUstorage space. Compared with convolutional neural network-based methods,current Transformer-based image denoising methods cannot achieve a balancebetween performance improvement and resource consumption. In this paper, wepropose an Efficient Wavelet Transformer (EWT) for image denoising.Specifically, we use Discrete Wavelet Transform (DWT) and Inverse WaveletTransform (IWT) for downsampling and upsampling, respectively. This method canfully preserve the image features while reducing the image resolution, therebygreatly reducing the device resource consumption of the Transformer model.Furthermore, we propose a novel Dual-stream Feature Extraction Block (DFEB) toextract image features at different levels, which can further reduce modelinference time and GPU memory usage. Experiments show that our method speeds upthe original Transformer by more than 80%, reduces GPU memory usage by morethan 60%, and achieves excellent denoising results. All code will be public.", "output": "EWT: Efficient Wavelet-Transformer for Single Image Denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the success of multimodal learning in cross-modal retrieval task, theremarkable progress relies on the correct correspondence among multimedia data.However, collecting such ideal data is expensive and time-consuming. Inpractice, most widely used datasets are harvested from the Internet andinevitably contain mismatched pairs. Training on such noisy correspondencedatasets causes performance degradation because the cross-modal retrievalmethods can wrongly enforce the mismatched data to be similar. To tackle thisproblem, we propose a Meta Similarity Correction Network (MSCN) to providereliable similarity scores. We view a binary classification task as themeta-process that encourages the MSCN to learn discrimination from positive andnegative meta-data. To further alleviate the influence of noise, we design aneffective data purification strategy using meta-data as prior knowledge toremove the noisy samples. Extensive experiments are conducted to demonstratethe strengths of our method in both synthetic and real-world noises, includingFlickr30K, MS-COCO, and Conceptual Captions.", "output": "Noisy Correspondence Learning with Meta Similarity Correction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Improving performance in multiple domains is a challenging task, and oftenrequires significant amounts of data to train and test models. Active learningtechniques provide a promising solution by enabling models to select the mostinformative samples for labeling, thus reducing the amount of labeled datarequired to achieve high performance. In this paper, we present an activelearning-based framework for improving performance across multiple domains. Ourapproach consists of two stages: first, we use an initial set of labeled datato train a base model, and then we iteratively select the most informativesamples for labeling to refine the model. We evaluate our approach on severalmulti-domain datasets, including image classification, sentiment analysis, andobject recognition. Our experiments demonstrate that our approach consistentlyoutperforms baseline methods and achieves state-of-the-art performance onseveral datasets. We also show that our method is highly efficient, requiringsignificantly fewer labeled samples than other active learning-based methods.Overall, our approach provides a practical and effective solution for improvingperformance across multiple domains using active learning techniques.", "output": "Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automated interpretation of electrocardiograms (ECG) has garnered significantattention with the advancements in machine learning methodologies. Despite thegrowing interest in automated ECG interpretation using machine learning, mostcurrent studies focus solely on classification or regression tasks and overlooka crucial aspect of clinical cardio-disease diagnosis: the diagnostic reportgenerated by experienced human clinicians. In this paper, we introduce a novelapproach to ECG interpretation, leveraging recent breakthroughs in LargeLanguage Models (LLMs) and Vision-Transformer (ViT) models. Rather thantreating ECG diagnosis as a classification or regression task, we propose analternative method of automatically identifying the most similar clinical casesbased on the input ECG data. Also, since interpreting ECG as images are moreaffordable and accessible, we process ECG as encoded images and adopt avision-language learning paradigm to jointly learn vision-language alignmentbetween encoded ECG images and ECG diagnosis reports. Encoding ECG into imagescan result in an efficient ECG retrieval system, which will be highly practicaland useful in clinical applications. More importantly, our findings could serveas a crucial resource for providing diagnostic services in regions where onlypaper-printed ECG images are accessible due to past underdevelopment.", "output": "Converting ECG Signals to Images for Efficient Image-text Retrieval via Encoding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present NeRFVS, a novel neural radiance fields (NeRF) based method toenable free navigation in a room. NeRF achieves impressive performance inrendering images for novel views similar to the input views while suffering fornovel views that are significantly different from the training views. Toaddress this issue, we utilize the holistic priors, including pseudo depth mapsand view coverage information, from neural reconstruction to guide the learningof implicit neural representations of 3D indoor scenes. Concretely, anoff-the-shelf neural reconstruction method is leveraged to generate a geometryscaffold. Then, two loss functions based on the holistic priors are proposed toimprove the learning of NeRF: 1) A robust depth loss that can tolerate theerror of the pseudo depth map to guide the geometry learning of NeRF; 2) Avariance loss to regularize the variance of implicit neural representations toreduce the geometry and color ambiguity in the learning procedure. These twoloss functions are modulated during NeRF optimization according to the viewcoverage information to reduce the negative influence brought by the viewcoverage imbalance. Extensive results demonstrate that our NeRFVS outperformsstate-of-the-art view synthesis methods quantitatively and qualitatively onindoor scenes, achieving high-fidelity free navigation results.", "output": "NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel Text-to-Image Generation Network, Adaptive LayoutRefinement Generative Adversarial Network (ALR-GAN), to adaptively refine thelayout of synthesized images without any auxiliary information. The ALR-GANincludes an Adaptive Layout Refinement (ALR) module and a Layout VisualRefinement (LVR) loss. The ALR module aligns the layout structure (which refersto locations of objects and background) of a synthesized image with that of itscorresponding real image. In ALR module, we proposed an Adaptive LayoutRefinement (ALR) loss to balance the matching of hard and easy features, formore efficient layout structure matching. Based on the refined layoutstructure, the LVR loss further refines the visual representation within thelayout area. Experimental results on two widely-used datasets show that ALR-GANperforms competitively at the Text-to-Image generation task.", "output": "ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a novel module called middle spectrum grouped convolution(MSGC) for efficient deep convolutional neural networks (DCNNs) with themechanism of grouped convolution. It explores the broad \"middle spectrum\" areabetween channel pruning and conventional grouped convolution. Compared withchannel pruning, MSGC can retain most of the information from the input featuremaps due to the group mechanism; compared with grouped convolution, MSGCbenefits from the learnability, the core of channel pruning, for constructingits group topology, leading to better channel division. The middle spectrumarea is unfolded along four dimensions: group-wise, layer-wise, sample-wise,and attention-wise, making it possible to reveal more powerful andinterpretable structures. As a result, the proposed module acts as a boosterthat can reduce the computational cost of the host backbones for general imagerecognition with even improved predictive accuracy. For example, in theexperiments on ImageNet dataset for image classification, MSGC can reduce themultiply-accumulates (MACs) of ResNet-18 and ResNet-50 by half but stillincrease the Top-1 accuracy by more than 1%. With 35% reduction of MACs, MSGCcan also increase the Top-1 accuracy of the MobileNetV2 backbone. Results on MSCOCO dataset for object detection show similar observations. Our code andtrained models are available at ", "output": "Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale pre-training has brought unimodal fields such as computer visionand natural language processing to a new era. Following this trend, the size ofmulti-modal learning models constantly increases, leading to an urgent need toreduce the massive computational cost of finetuning these models for downstreamtasks. In this paper, we propose an efficient and flexible multimodal fusionmethod, namely PMF, tailored for fusing unimodally pre-trained transformers.Specifically, we first present a modular multimodal fusion framework thatexhibits high flexibility and facilitates mutual interactions among differentmodalities. In addition, we disentangle vanilla prompts into three types inorder to learn different optimizing objectives for multimodal learning. It isalso worth noting that we propose to add prompt vectors only on the deep layersof the unimodal transformers, thus significantly reducing the training memoryusage. Experiment results show that our proposed method achieves comparableperformance to several other multimodal finetuning methods with less than 3%trainable parameters and up to 66% saving of training memory usage.", "output": "Efficient Multimodal Fusion via Interactive Prompting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) via customization of a deep learning network withlimited data has emerged as a promising technique to achieve personalized userexperiences on edge devices. However, existing FSL methods primarily assumeindependent and identically distributed (IID) data and utilize eithercomputational backpropagation updates for each task or a common model withtask-specific prototypes. Unfortunately, the former solution is infeasible foredge devices that lack on-device backpropagation capabilities, while the latteroften struggles with limited generalization ability, especially forout-of-distribution (OOD) data. This paper proposes a lightweight,plug-and-play FSL module called Task-aware Normalization (TANO) that enablesefficient and task-aware adaptation of a deep neural network withoutbackpropagation. TANO covers the properties of multiple user groups bycoordinating the updates of several groups of the normalization statisticsduring meta-training and automatically identifies the appropriate normalizationgroup for a downstream few-shot task. Consequently, TANO provides stable buttask-specific estimations of the normalization statistics to close thedistribution gaps and achieve efficient model adaptation. Results on bothintra-domain and out-of-domain generalization experiments demonstrate that TANOoutperforms recent methods in terms of accuracy, inference speed, and modelsize. Moreover, TANO achieves promising results on widely-used FSL benchmarksand data from real applications.", "output": "Out-of-distribution Few-shot Learning For Edge Devices without Model Fine-tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous vehicles rely on a variety of sensors to gather information abouttheir surrounding. The vehicle's behavior is planned based on the environmentperception, making its reliability crucial for safety reasons. The active LiDARsensor is able to create an accurate 3D representation of a scene, making it avaluable addition for environment perception for autonomous vehicles. Due tolight scattering and occlusion, the LiDAR's performance change under adverseweather conditions like fog, snow or rain. This limitation recently fostered alarge body of research on approaches to alleviate the decrease in perceptionperformance. In this survey, we gathered, analyzed, and discussed differentaspects on dealing with adverse weather conditions in LiDAR-based environmentperception. We address topics such as the availability of appropriate data, rawpoint cloud processing and denoising, robust perception algorithms and sensorfusion to mitigate adverse weather induced shortcomings. We furthermoreidentify the most pressing gaps in the current literature and pinpointpromising research directions.", "output": "Survey on LiDAR Perception in Adverse Weather Conditions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Light fields are a type of image data that capture both spatial and angularscene information by recording light rays emitted by a scene from differentorientations. In this context, spatial information is defined as features thatremain static regardless of perspectives, while angular information refers tofeatures that vary between viewpoints. We propose a novel neural network that,by design, can separate angular and spatial information of a light field. Thenetwork represents spatial information using spatial kernels shared among allSub-Aperture Images (SAIs), and angular information using sets of angularkernels for each SAI. To further improve the representation capability of thenetwork without increasing parameter number, we also introduce angular kernelallocation and kernel tensor decomposition mechanisms. Extensive experimentsdemonstrate the benefits of information separation: when applied to thecompression task, our network outperforms other state-of-the-art methods by alarge margin. And angular information can be easily transferred to other scenesfor rendering dense views, showing the successful separation and the potentialuse case for the view synthesis task. We plan to release the code uponacceptance of the paper to encourage further research on this topic.", "output": "Learning-based Spatial and Angular Information Separation for Light Field Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Monocular depth estimation is fundamental for 3D scene understanding anddownstream applications. However, even under the supervised setup, it is stillchallenging and ill-posed due to the lack of full geometric constraints.Although a scene can consist of millions of pixels, there are fewer high-levelpatterns. We propose iDisc to learn those patterns with internal discretizedrepresentations. The method implicitly partitions the scene into a set ofhigh-level patterns. In particular, our new module, Internal Discretization(ID), implements a continuous-discrete-continuous bottleneck to learn thoseconcepts without supervision. In contrast to state-of-the-art methods, theproposed model does not enforce any explicit constraints or priors on the depthoutput. The whole network with the ID module can be trained end-to-end, thanksto the bottleneck module based on attention. Our method sets the new state ofthe art with significant improvements on NYU-Depth v2 and KITTI, outperformingall published methods on the official KITTI benchmark. iDisc can also achievestate-of-the-art results on surface normal estimation. Further, we explore themodel generalization capability via zero-shot testing. We observe thecompelling need to promote diversification in the outdoor scenario. Hence, weintroduce splits of two autonomous driving datasets, DDAD and Argoverse. Codeis available at <a href=\" http URL</a> .", "output": "iDisc: Internal Discretization for Monocular Depth Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The dominant majority of 3D models that appear in gaming, VR/AR, and those weuse to train geometric deep learning algorithms are incomplete, since they aremodeled as surface meshes and missing their interior structures. We present alearning framework to recover the shape interiors (RoSI) of existing 3D modelswith only their exteriors from multi-view and multi-articulation images. Givena set of RGB images that capture a target 3D object in different articulatedposes, possibly from only few views, our method infers the interior planes thatare observable in the input images. Our neural architecture is trained in acategory-agnostic manner and it consists of a motion-aware multi-view analysisphase including pose, depth, and motion estimations, followed by interior planedetection in images and 3D space, and finally multi-view plane fusion. Inaddition, our method also predicts part articulations and is able to realizeand even extrapolate the captured motions on the target 3D object. We evaluateour method by quantitative and qualitative comparisons to baselines andalternative solutions, as well as testing on untrained object categories andreal image inputs to assess its generalization capabilities.", "output": "RoSI: Recovering 3D Shape Interiors from Few Articulation Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The structural re-parameterization (SRP) technique is a novel deep learningtechnique that achieves interconversion between different network architecturesthrough equivalent parameter transformations. This technique enables themitigation of the extra costs for performance improvement during training, suchas parameter size and inference time, through these transformations duringinference, and therefore SRP has great potential for industrial and practicalapplications. The existing SRP methods have successfully considered manycommonly used architectures, such as normalizations, pooling methods,multi-branch convolution. However, the widely used self-attention modulescannot be directly implemented by SRP due to these modules usually act on thebackbone network in a multiplicative manner and the modules' output isinput-dependent during inference, which limits the application scenarios ofSRP. In this paper, we conduct extensive experiments from a statisticalperspective and discover an interesting phenomenon Stripe Observation, whichreveals that channel attention values quickly approach some constant vectorsduring training. This observation inspires us to propose a simple-yet-effectiveattention-alike structural re-parameterization (ASR) that allows us to achieveSRP for a given network while enjoying the effectiveness of the self-attentionmechanism. Extensive experiments conducted on several standard benchmarksdemonstrate the effectiveness of ASR in generally improving the performance ofexisting backbone networks, self-attention modules, and SRP methods without anyelaborated model crafting. We also analyze the limitations and provideexperimental or theoretical evidence for the strong robustness of the proposedASR.", "output": "ASR: Attention-alike Structural Re-parameterization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer is beneficial for image denoising tasks since it can modellong-range dependencies to overcome the limitations presented by inductiveconvolutional biases. However, directly applying the transformer structure toremove noise is challenging because its complexity grows quadratically with thespatial resolution. In this paper, we propose an efficient Dual-branchDeformable Transformer (DDT) denoising network which captures both local andglobal interactions in parallel. We divide features with a fixed patch size anda fixed number of patches in local and global branches, respectively. Inaddition, we apply deformable attention operation in both branches, which helpsthe network focus on more important regions and further reduces computationalcomplexity. We conduct extensive experiments on real-world and syntheticdenoising tasks, and the proposed DDT achieves state-of-the-art performancewith significantly fewer computational costs.", "output": "DDT: Dual-branch Deformable Transformer for Image Denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, event cameras have shown large applicability in several computervision fields especially concerning tasks that require high temporalresolution. In this work, we investigate the usage of such kind of data foremotion recognition by presenting NEFER, a dataset for Neuromorphic Event-basedFacial Expression Recognition. NEFER is composed of paired RGB and event videosrepresenting human faces labeled with the respective emotions and alsoannotated with face bounding boxes and facial landmarks. We detail the dataacquisition process as well as providing a baseline method for RGB and eventdata. The collected data captures subtle micro-expressions, which are hard tospot with RGB data, yet emerge in the event domain. We report a doublerecognition accuracy for the event-based approach, proving the effectiveness ofa neuromorphic approach for analyzing fast and hardly detectable expressionsand the emotions they conceal.", "output": "Neuromorphic Event-based Facial Expression Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose the gradient-weighted Object Detector Activation Maps (ODAM), avisualized explanation technique for interpreting the predictions of objectdetectors. Utilizing the gradients of detector targets flowing into theintermediate feature maps, ODAM produces heat maps that show the influence ofregions on the detector's decision for each predicted attribute. Compared toprevious works classification activation maps (CAM), ODAM generatesinstance-specific explanations rather than class-specific ones. We show thatODAM is applicable to both one-stage detectors and two-stage detectors withdifferent types of detector backbones and heads, and produces higher-qualityvisual explanations than the state-of-the-art both effectively and efficiently.We next propose a training scheme, Odam-Train, to improve the explanationability on object discrimination of the detector through encouragingconsistency between explanations for detections on the same object, anddistinct explanations for detections on different objects. Based on the heatmaps produced by ODAM with Odam-Train, we propose Odam-NMS, which considers theinformation of the model's explanation for each prediction to distinguish theduplicate detected objects. We present a detailed analysis of the visualizedexplanations of detectors and carry out extensive experiments to validate theeffectiveness of the proposed ODAM.", "output": "ODAM: Gradient-based instance-specific visual explanations for object detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning the hash representation of multi-view heterogeneous data is animportant task in multimedia retrieval. However, existing methods fail toeffectively fuse the multi-view features and utilize the metric informationprovided by the dissimilar samples, leading to limited retrieval precision.Current methods utilize weighted sum or concatenation to fuse the multi-viewfeatures. We argue that these fusion methods cannot capture the interactionamong different views. Furthermore, these methods ignored the informationprovided by the dissimilar samples. We propose a novel deep metric multi-viewhashing (DMMVH) method to address the mentioned problems. Extensive empiricalevidence is presented to show that gate-based fusion is better than typicalmethods. We introduce deep metric learning to the multi-view hashing problems,which can utilize metric information of dissimilar samples. On theMIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the currentstate-of-the-art methods by a large margin (up to 15.28 mean Average Precision(mAP) improvement).", "output": "Deep Metric Multi-View Hashing for Multimedia Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions inLevel-2+ self-driving vehicles. State-of-the-art DMSs leverage multiple sensorsmounted at different locations to monitor the driver and the vehicle's interiorscene and employ decision-level fusion to integrate these heterogenous data.However, this fusion method may not fully utilize the complementarity ofdifferent data sources and may overlook their relative importance. To addressthese limitations, we propose a novel multiview multimodal driver monitoringsystem based on feature-level fusion through multi-head self-attention (MHSA).We demonstrate its effectiveness by comparing it against four alternativefusion strategies (Sum, Conv, SE, and AFF). We also present a novelGPU-friendly supervised contrastive learning framework SuMoCo to learn betterrepresentations. Furthermore, We fine-grained the test split of the DAD datasetto enable the multi-class recognition of drivers' activities. Experiments onthis enhanced database demonstrate that 1) the proposed MHSA-based fusionmethod (AUC-ROC: 97.0%) outperforms all baselines and previous approaches, and2) training MHSA with patch masking can improve its robustness againstmodality/view collapses. The code and annotations are publicly available.", "output": "Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advances in automatic sign language translation (SLT) to spoken languageshave been mostly benchmarked with datasets of limited size and restricteddomains. Our work advances the state of the art by providing the first baselineresults on How2Sign, a large and broad dataset.We train a Transformer over I3D video features, using the reduced BLEU as areference metric for validation, instead of the widely used BLEU score. Wereport a result of 8.03 on the BLEU score, and publish the first open-sourceimplementation of its kind to promote further advances.", "output": "Sign Language Translation from Instructional Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Flatlandia, a novel problem for visual localization of an imagefrom object detections composed of two specific tasks: i) Coarse MapLocalization: localizing a single image observing a set of objects in respectto a 2D map of object landmarks; ii) Fine-grained 3DoF Localization: estimatinglatitude, longitude, and orientation of the image within a 2D map. Solutionsfor these new tasks exploit the wide availability of open urban maps annotatedwith GPS locations of common objects (eg via surveying or crowd-sourced). Suchmaps are also more storage-friendly than standard large-scale 3D models oftenused in visual localization while additionally being privacy-preserving. Asexisting datasets are unsuited for the proposed problem, we provide theFlatlandia dataset, designed for 3DoF visual localization in multiple urbansettings and based on crowd-sourced data from five European cities. We use theFlatlandia dataset to validate the complexity of the proposed tasks.", "output": "You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is well known that a band-limited signal can be reconstructed from itsuniformly spaced samples if the sampling rate is sufficiently high. Morerecently, it has been proved that one can reconstruct a 1D band-limited signaleven if the exact sample locations are unknown, but given just the distributionof the sample locations and their ordering in 1D. In this work, we extend theanalytical bounds on the reconstruction error in such scenarios forquasi-bandlimited signals. We also prove that the method for such areconstruction is resilient to a certain proportion of errors in thespecification of the sample location ordering. We then express the problem oftomographic reconstruction of 2D images from 1D Radon projections under unknownangles with known angle distribution, as a special case for reconstruction ofquasi-bandlimited signals from samples at unknown locations with knowndistribution. Building upon our theoretical background, we present asymptoticbounds for 2D quasi-bandlimited image reconstruction from 1D Radon projectionsin the unknown angles setting, which commonly occurs in cryo-electronmicroscopy (cryo-EM). To the best of our knowledge, this is the first piece ofwork to perform such an analysis for 2D cryo-EM, even though the associatedreconstruction algorithms have been known for a long time.", "output": "Analysis of Tomographic Reconstruction of 2D Images using the Distribution of Unknown Projection Angles."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Magnetic Resonance (MR) images suffer from various types of artifacts due tomotion, spatial resolution, and under-sampling. Conventional deep learningmethods deal with removing a specific type of artifact, leading to separatelytrained models for each artifact type that lack the shared knowledgegeneralizable across artifacts. Moreover, training a model for each type andamount of artifact is a tedious process that consumes more training time andstorage of models. On the other hand, the shared knowledge learned by jointlytraining the model on multiple artifacts might be inadequate to generalizeunder deviations in the types and amounts of artifacts. Model-agnosticmeta-learning (MAML), a nested bi-level optimization framework is a promisingtechnique to learn common knowledge across artifacts in the outer level ofoptimization, and artifact-specific restoration in the inner level. We proposecurriculum-MAML (CMAML), a learning process that integrates MAML withcurriculum learning to impart the knowledge of variable artifact complexity toadaptively learn restoration of multiple artifacts during training. Comparativestudies against Stochastic Gradient Descent and MAML, using two cardiacdatasets reveal that CMAML exhibits (i) better generalization with improvedPSNR for 83% of unseen types and amounts of artifacts and improved SSIM in allcases, and (ii) better artifact suppression in 4 out of 5 cases of compositeartifacts (scans with multiple artifacts).", "output": "Generalizable Deep Learning Method for Suppressing Unseen and Multiple MRI Artifacts Using Meta-learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper explores a hierarchical prompting mechanism for the hierarchicalimage classification (HIC) task. Different from prior HIC methods, ourhierarchical prompting is the first to explicitly inject ancestor-classinformation as a tokenized hint that benefits the descendant-classdiscrimination. We think it well imitates human visual recognition, i.e.,humans may use the ancestor class as a prompt to draw focus on the subtledifferences among descendant classes. We model this prompting mechanism into aTransformer with Hierarchical Prompting (TransHP). TransHP consists of threesteps: 1) learning a set of prompt tokens to represent the coarse (ancestor)classes, 2) on-the-fly predicting the coarse class of the input image at anintermediate block, and 3) injecting the prompt token of the predicted coarseclass into the intermediate feature. Though the parameters of TransHP maintainthe same for all input images, the injected coarse-class prompt conditions(modifies) the subsequent feature extraction and encourages a dynamic focus onrelatively subtle differences among the descendant classes. Extensiveexperiments show that TransHP improves image classification on accuracy (e.g.,improving ViT-B/16 by +2.83% ImageNet classification accuracy), training dataefficiency (e.g., +12.69% improvement under 10% ImageNet training data), andmodel explainability. Moreover, TransHP also performs favorably against priorHIC methods, showing that TransHP well exploits the hierarchical information.", "output": "TransHP: Image Classification with Hierarchical Prompting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The lack of interpretability of the Vision Transformer may hinder its use incritical real-world applications despite its effectiveness. To overcome thisissue, we propose a post-hoc interpretability method called VISION DIFFMASK,which uses the activations of the model's hidden layers to predict the relevantparts of the input that contribute to its final predictions. Our approach usesa gating mechanism to identify the minimal subset of the original input thatpreserves the predicted distribution over classes. We demonstrate thefaithfulness of our method, by introducing a faithfulness task, and comparingit to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K,achieving compelling results. To aid reproducibility and further extension ofour work, we open source our implementation:", "output": "VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose an ultrafast automated model compression frameworkcalled SeerNet for flexible network deployment. Conventionalnon-differen-tiable methods discretely search the desirable compression policybased on the accuracy from exhaustively trained lightweight models, andexisting differentiable methods optimize an extremely large supernet to obtainthe required compressed model for deployment. They both cause heavycomputational cost due to the complex compression policy search and evaluationprocess. On the contrary, we obtain the optimal efficient networks by directlyoptimizing the compression policy with an accurate performance predictor, wherethe ultrafast automated model compression for various computational costconstraint is achieved without complex compression policy search andevaluation. Specifically, we first train the performance predictor based on theaccuracy from uncertain compression policies actively selected by efficientevolutionary search, so that informative supervision is provided to learn theaccurate performance predictor with acceptable cost. Then we leverage thegradient that maximizes the predicted performance under the barrier complexityconstraint for ultrafast acquisition of the desirable compression policy, whereadaptive update stepsizes with momentum are employed to enhance optimality ofthe acquired pruning and quantization strategy. Compared with thestate-of-the-art automated model compression methods, experimental results onimage classification and object detection show that our method achievescompetitive accuracy-complexity trade-offs with significant reduction of thesearch cost.", "output": "Learning Accurate Performance Predictors for Ultrafast Automated Model Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "More information leads to better decisions and predictions, right? Confirmingthis hypothesis, several studies concluded that the simultaneous use of opticaland thermal images leads to better predictions in crowd counting. However, theway multimodal models extract enriched features from both modalities is not yetfully understood. Since the use of multimodal data usually increases thecomplexity, inference time, and memory requirements of the models, it isrelevant to examine the differences and advantages of multimodal compared tomonomodal models. In this work, all available multimodal datasets for crowdcounting are used to investigate the differences between monomodal andmultimodal models. To do so, we designed a monomodal architecture thatconsiders the current state of research on monomodal crowd counting. Inaddition, several multimodal architectures have been developed using differentmultimodal learning strategies. The key components of the monomodalarchitecture are also used in the multimodal architectures to be able to answerwhether multimodal models perform better in crowd counting in general.Surprisingly, no general answer to this question can be derived from theexisting datasets. We found that the existing datasets hold a bias towardthermal images. This was determined by analyzing the relationship between thebrightness of optical images and crowd count as well as examining theannotations made for each dataset. Since answering this question is importantfor future real-world applications of crowd counting, this paper establishescriteria for a potential dataset suitable for answering whether multimodalmodels perform better in crowd counting in general.", "output": "Why Existing Multimodal Crowd Counting Datasets Can Lead to Unfulfilled Expectations in Real-World Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel fully unsupervised framework that learnsaction representations suitable for the action segmentation task from thesingle input video itself, without requiring any training data. Our method is adeep metric learning approach rooted in a shallow network with a triplet lossoperating on similarity distributions and a novel triplet selection strategythat effectively models temporal and semantic priors to discover actions in thenew representational space. Under these circumstances, we successfully recovertemporal boundaries in the learned action representations with higher qualitycompared with existing unsupervised approaches. The proposed method isevaluated on two widely used benchmark datasets for the action segmentationtask and it achieves competitive performance by applying a generic clusteringalgorithm on the learned representations.", "output": "Leveraging triplet loss for unsupervised action segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Detecting fake images is becoming a major goal of computer vision. This needis becoming more and more pressing with the continuous improvement of synthesismethods based on Generative Adversarial Networks (GAN), and even more with theappearance of powerful methods based on Diffusion Models (DM). Towards thisend, it is important to gain insight into which image features betterdiscriminate fake images from real ones. In this paper we report on oursystematic study of a large number of image generators of different families,aimed at discovering the most forensically relevant characteristics of real andgenerated images. Our experiments provide a number of interesting observationsand shed light on some intriguing properties of synthetic images: (1) not onlythe GAN models but also the DM and VQ-GAN (Vector Quantized GenerativeAdversarial Networks) models give rise to visible artifacts in the Fourierdomain and exhibit anomalous regular patterns in the autocorrelation; (2) whenthe dataset used to train the model lacks sufficient variety, its biases can betransferred to the generated images; (3) synthetic and real images exhibitsignificant differences in the mid-high frequency signal content, observable intheir radial and angular spectral power distributions.", "output": "Intriguing properties of synthetic images: from generative adversarial networks to diffusion models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Predicting high-fidelity future human poses, from a historically observedsequence, is decisive for intelligent robots to interact with humans. Deepend-to-end learning approaches, which typically train a generic pre-trainedmodel on external datasets and then directly apply it to all test samples,emerge as the dominant solution to solve this issue. Despite encouragingprogress, they remain non-optimal, as the unique properties (e.g., motionstyle, rhythm) of a specific sequence cannot be adapted. More generally, attest-time, once encountering unseen motion categories (out-of-distribution),the predicted poses tend to be unreliable. Motivated by this observation, wepropose a novel test-time adaptation framework that leverages twoself-supervised auxiliary tasks to help the primary forecasting network adaptto the test sequence. In the testing phase, our model can adjust the modelparameters by several gradient updates to improve the generation quality.However, due to catastrophic forgetting, both auxiliary tasks typically tend tothe low ability to automatically present the desired positive incentives forthe final prediction performance. For this reason, we also propose ameta-auxiliary learning scheme for better adaptation. In terms of generalsetup, our approach obtains higher accuracy, and under two new experimentaldesigns for out-of-distribution data (unseen subjects and categories), achievessignificant improvements.", "output": "Meta-Auxiliary Learning for Adaptive Human Pose Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most model-free visual object tracking methods formulate the tracking task asobject location estimation given by a 2D segmentation or a bounding box in eachvideo frame. We argue that this representation is limited and instead proposeto guide and improve 2D tracking with an explicit object representation, namelythe textured 3D shape and 6DoF pose in each video frame. Our representationtackles a complex long-term dense correspondence problem between all 3D pointson the object for all video frames, including frames where some points areinvisible. To achieve that, the estimation is driven by re-rendering the inputvideo frames as well as possible through differentiable rendering, which hasnot been used for tracking before. The proposed optimization minimizes a novelloss function to estimate the best 3D shape, texture, and 6DoF pose. We improvethe state-of-the-art in 2D segmentation tracking on three different datasetswith mostly rigid objects.", "output": "Tracking by 3D Model Estimation of Unknown Objects in Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Certified defense methods against adversarial perturbations have beenrecently investigated in the black-box setting with a zeroth-order (ZO)perspective. However, these methods suffer from high model variance with lowperformance on high-dimensional datasets due to the ineffective design of thedenoiser and are limited in their utilization of ZO techniques. To this end, wepropose a certified ZO preprocessing technique for removing adversarialperturbations from the attacked image in the black-box setting using only modelqueries. We propose a robust UNet denoiser (RDUNet) that ensures the robustnessof black-box models trained on high-dimensional datasets. We propose a novelblack-box denoised smoothing (DS) defense mechanism, ZO-RUDS, by prepending ourRDUNet to the black-box model, ensuring black-box defense. We further proposeZO-AE-RUDS in which RDUNet followed by autoencoder (AE) is prepended to theblack-box model. We perform extensive experiments on four classificationdatasets, CIFAR-10, CIFAR-10, Tiny Imagenet, STL-10, and the MNIST dataset forimage reconstruction tasks. Our proposed defense methods ZO-RUDS and ZO-AE-RUDSbeat SOTA with a huge margin of $35%$ and $9%$, for low dimensional(CIFAR-10) and with a margin of $20.61%$ and $23.51%$ for high-dimensional(STL-10) datasets, respectively.", "output": "Certified Zeroth-order Black-Box Defense with Robust UNet Denoiser."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel method for Zero-Shot Anomaly Localization that leverages abidirectional mapping derived from the 1-dimensional Wasserstein Distance. Theproposed approach allows pinpointing the anomalous regions in a texture withincreased precision by aggregating the contribution of a pixel to the errors ofall nearby patches. We validate our solution on several datasets and obtainmore than a 40% reduction in error over the previous state of the art on theMVTec AD dataset in a zero-shot setting.", "output": "High-Fidelity Zero-Shot Texture Anomaly Localization Using Feature Correspondence Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video quality assessment (VQA) aims to simulate the human perception of videoquality, which is influenced by factors ranging from low-level color andtexture details to high-level semantic content. To effectively model thesecomplicated quality-related factors, in this paper, we decompose video intothree levels (ie, patch level, frame level, and clip level), and propose anovel Zoom-VQA architecture to perceive spatio-temporal features at differentlevels. It integrates three components: patch attention module, frame pyramidalignment, and clip ensemble strategy, respectively for capturingregion-of-interest in the spatial dimension, multi-level information atdifferent feature levels, and distortions distributed over the temporaldimension. Owing to the comprehensive design, Zoom-VQA obtains state-of-the-artresults on four VQA benchmarks and achieves 2nd place in the NTIRE 2023 VQAchallenge. Notably, Zoom-VQA has outperformed the previous best results on twosubsets of LSVQ, achieving 0.8860 (+1.0%) and 0.7985 (+1.9%) of SRCC on therespective subsets. Adequate ablation studies further verify the effectivenessof each component. Codes and models are released in", "output": "Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision transformers have been applied successfully for image recognitiontasks. There have been either multi-headed self-attention based (ViTcite{dosovitskiy2020image}, DeIT, cite{touvron2021training}) similar to theoriginal work in textual models or more recently based on spectral layers(Fnetcite{lee2021fnet}, GFNetcite{rao2021global},AFNOcite{guibas2021efficient}). We hypothesize that both spectral andmulti-headed attention plays a major role. We investigate this hypothesisthrough this work and observe that indeed combining spectral and multi-headedattention layers provides a better transformer architecture. We thus proposethe novel Spectformer architecture for transformers that combines spectral andmulti-headed attention layers. We believe that the resulting representationallows the transformer to capture the feature representation appropriately andit yields improved performance over other transformer representations. Forinstance, it improves the top-1 accuracy by 2% on ImageNet compared to bothGFNet-H and LiT. SpectFormer-S reaches 84.25% top-1 accuracy on ImageNet-1K(state of the art for small version). Further, Spectformer-L achieves 85.7%that is the state of the art for the comparable base version of thetransformers. We further ensure that we obtain reasonable results in otherscenarios such as transfer learning on standard datasets such as CIFAR-10,CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigateits use in downstream tasks such of object detection and instance segmentationon the MS-COCO dataset and observe that Spectformer shows consistentperformance that is comparable to the best backbones and can be furtheroptimized and improved. Hence, we believe that combined spectral and attentionlayers are what are needed for vision transformers.", "output": "SpectFormer: Frequency and Attention is what you need in a Vision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Document-based Visual Question Answering examines the document understandingof document images in conditions of natural language questions. We proposed anew document-based VQA dataset, PDF-VQA, to comprehensively examine thedocument understanding from various aspects, including document elementrecognition, document layout structural understanding as well as contextualunderstanding and key information extraction. Our PDF-VQA dataset extends thecurrent scale of document understanding that limits on the single document pageto the new scale that asks questions over the full document of multiple pages.We also propose a new graph-based VQA model that explicitly integrates thespatial and hierarchically structural relationships between different documentelements to boost the document structural understanding. The performances arecompared with several baselines over different question types andtasksfootnote{The full dataset will be released after paper acceptance.", "output": "PDF-VQA: A New Dataset for Real-World VQA on PDF Documents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of high-definition display devices, the practicalscenario of Super-Resolution (SR) usually needs to super-resolve large inputlike 2K to higher resolution (4K/8K). To reduce the computational and memorycost, current methods first split the large input into local patches and thenmerge the SR patches into the output. These methods adaptively allocate asubnet for each patch. Quantization is a very important technique for networkacceleration and has been used to design the subnets. Current methods train anMLP bit selector to determine the propoer bit for each layer. However, theyuniformly sample subnets for training, making simple subnets overfitted andcomplicated subnets underfitted. Therefore, the trained bit selector fails todetermine the optimal bit. Apart from this, the introduced bit selector bringsadditional cost to each layer of the SR network. In this paper, we propose anovel method named Content-Aware Bit Mapping (CABM), which can remove the bitselector without any performance loss. CABM also learns a bit selector for eachlayer during training. After training, we analyze the relation between the edgeinformation of an input patch and the bit of each layer. We observe that theedge information can be an effective metric for the selected bit. Therefore, wedesign a strategy to build an Edge-to-Bit lookup table that maps the edge scoreof a patch to the bit of each layer during inference. The bit configuration ofSR network can be determined by the lookup tables of all layers. Our strategycan find better bit configuration, resulting in more efficient mixed precisionnetworks. We conduct detailed experiments to demonstrate the generalizationability of our method. The code will be released.", "output": "CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised learning (SSL) has made remarkable progress in visualrepresentation learning. Some studies combine SSL with knowledge distillation(SSL-KD) to boost the representation learning performance of small models. Inthis study, we propose a Multi-mode Online Knowledge Distillation method (MOKD)to boost self-supervised visual representation learning. Different fromexisting SSL-KD methods that transfer knowledge from a static pre-trainedteacher to a student, in MOKD, two different models learn collaboratively in aself-supervised manner. Specifically, MOKD consists of two distillation modes:self-distillation and cross-distillation modes. Among them, self-distillationperforms self-supervised learning for each model independently, whilecross-distillation realizes knowledge interaction between different models. Incross-distillation, a cross-attention feature search strategy is proposed toenhance the semantic feature alignment between different models. As a result,the two models can absorb knowledge from each other to boost theirrepresentation learning performance. Extensive experimental results ondifferent backbones and datasets demonstrate that two heterogeneous models canbenefit from MOKD and outperform their independently trained baseline. Inaddition, MOKD also outperforms existing SSL-KD methods for both the studentand teacher models.", "output": "Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability of image and video generation models to create photorealisticimages has reached unprecedented heights, making it difficult to distinguishbetween real and fake images in many cases. However, despite this progress, agap remains between the quality of generated images and those found in the realworld. To address this, we have reviewed a vast body of literature from bothacademic publications and social media to identify qualitative shortcomings inimage generation models, which we have classified into five categories. Byunderstanding these failures, we can identify areas where these models needimprovement, as well as develop strategies for detecting deep fakes. Theprevalence of deep fakes in today's society is a serious concern, and ourfindings can help mitigate their negative impact.", "output": "Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which isdemonstrated to be one small step for generative AI (GAI), but one giant leapfor artificial general intelligence (AGI). Since its official release inNovember 2022, ChatGPT has quickly attracted numerous users with extensivemedia coverage. Such unprecedented attention has also motivated numerousresearchers to investigate ChatGPT from various aspects. According to Googlescholar, there are more than 500 articles with ChatGPT in their titles ormentioning it in their abstracts. Considering this, a review is urgentlyneeded, and our work fills this gap. Overall, this work is the first to surveyChatGPT with a comprehensive review of its underlying technology, applications,and challenges. Moreover, we present an outlook on how ChatGPT might evolve torealize general-purpose AIGC (a.k.a. AI-generated content), which will be asignificant milestone for the development of AGI.", "output": "One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Super-Resolution (SR) has gained increasing research attention over the pastfew years. With the development of Deep Neural Networks (DNNs), manysuper-resolution methods based on DNNs have been proposed. Although most ofthese methods are aimed at ordinary frames, there are few works onsuper-resolution of omnidirectional frames. In these works, omnidirectionalframes are projected from the 3D sphere to a 2D plane by Equi-RectangularProjection (ERP). Although ERP has been widely used for projection, it hassevere projection distortion near poles. Current DNN-based SR methods use 2Dconvolution modules, which is more suitable for the regular grid. In thispaper, we find that different projection methods have great impact on theperformance of DNNs. To study this problem, a comprehensive comparison ofprojections in omnidirectional super-resolution is conducted. We compare the SRresults of different projection methods. Experimental results show thatEqui-Angular cube map projection (EAC), which has minimal distortion, achievesthe best result in terms of WS-PSNR compared with other projections. Code anddata will be released.", "output": "A Comprehensive Comparison of Projections in Omnidirectional Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional neural networks learns spatial features and are heavilyinterlinked within kernels. The SE module have broken the traditional route ofneural networks passing the entire result to next layer. Instead SE only passesimportant features to be learned with its squeeze and excitation (SE) module.We propose variations of the SE module which improvises the process of squeezeand excitation and enhances the performance. The proposed squeezing or excitingthe layer makes it possible for having a smooth transition of layer weights.These proposed variations also retain the characteristics of SE module. Theexperimented results are carried out on residual networks and the results aretabulated.", "output": "Variations of Squeeze and Excitation networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Correct identification and categorization of plant diseases are crucial forensuring the safety of the global food supply and the overall financial successof stakeholders. In this regard, a wide range of solutions has been madeavailable by introducing deep learning-based classification systems fordifferent staple crops. Despite being one of the most important commercialcrops in many parts of the globe, research proposing a smart solution forautomatically classifying apple leaf diseases remains relatively unexplored.This study presents a technique for identifying apple leaf diseases based ontransfer learning. The system extracts features using a pretrainedEfficientNetV2S architecture and passes to a classifier block for effectiveprediction. The class imbalance issues are tackled by utilizing runtime dataaugmentation. The effect of various hyperparameters, such as input resolution,learning rate, number of epochs, etc., has been investigated carefully. Thecompetence of the proposed pipeline has been evaluated on the apple leafdisease subset from the publicly available `PlantVillage' dataset, where itachieved an accuracy of 99.21%, outperforming the existing works.", "output": "An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D scanning as a technique to digitize objects in reality and create their 3Dmodels, is used in many fields and areas. Though the quality of 3D scansdepends on the technical characteristics of the 3D scanner, the common drawbackis the smoothing of fine details, or the edges of an object. We introduceSepicNet, a novel deep network for the detection and parametrization of sharpedges in 3D shapes as primitive curves. To make the network end-to-endtrainable, we formulate the curve fitting in a differentiable manner. Wedevelop an adaptive point cloud sampling technique that captures the sharpfeatures better than uniform sampling. The experiments were conducted on anewly introduced large-scale dataset of 50k 3D scans, where the sharp edgeannotations were extracted from their parametric CAD models, and demonstratesignificant improvement over state-of-the-art methods.", "output": "SepicNet: Sharp Edges Recovery by Parametric Inference of Curves in 3D Shapes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a novel method for human hands tracking using data froman event camera. The event camera detects changes in brightness, measuringmotion, with low latency, no motion blur, low power consumption and highdynamic range. Captured frames are analysed using lightweight algorithmsreporting 3D hand position data. The chosen pick-and-place scenario serves asan example input for collaborative human-robot interactions and in obstacleavoidance for human-robot safety applications. Events data are pre-processedinto intensity frames. The regions of interest (ROI) are defined through objectedge event activity, reducing noise. ROI features are extracted for usein-depth perception. Event-based tracking of human hand demonstrated feasible,in real time and at a low computational cost. The proposed ROI-finding methodreduces noise from intensity images, achieving up to 89% of data reduction inrelation to the original, while preserving the features. The depth estimationerror in relation to ground truth (measured with wearables), measured usingdynamic time warping and using a single event camera, is from 15 to 30millimetres, depending on the plane it is measured. Tracking of human hands in3D space using a single event camera data and lightweight algorithms to defineROI features (hands tracking in space).", "output": "Event-based tracking of human hands."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How to estimate the uncertainty of a given model is a crucial problem.Current calibration techniques treat different classes equally and thusimplicitly assume that the distribution of training data is balanced, butignore the fact that real-world data often follows a long-tailed distribution.In this paper, we explore the problem of calibrating the model trained from along-tailed distribution. Due to the difference between the imbalanced trainingdistribution and balanced test distribution, existing calibration methods suchas temperature scaling can not generalize well to this problem. Specificcalibration methods for domain adaptation are also not applicable because theyrely on unlabeled target domain instances which are not available. Modelstrained from a long-tailed distribution tend to be more overconfident to headclasses. To this end, we propose a novel knowledge-transferring-basedcalibration method by estimating the importance weights for samples of tailclasses to realize long-tailed calibration. Our method models the distributionof each class as a Gaussian distribution and views the source statistics ofhead classes as a prior to calibrate the target distributions of tail classes.We adaptively transfer knowledge from head classes to get the targetprobability density of tail classes. The importance weight is estimated by theratio of the target probability density over the source probability density.Extensive experiments on CIFAR-10-LT, MNIST-LT, CIFAR-100-LT, and ImageNet-LTdatasets demonstrate the effectiveness of our method.", "output": "Transfer Knowledge from Head to Tail: Uncertainty Calibration under Long-tailed Distribution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spiking neural networks have attracted extensive attention from researchersin many fields due to their brain-like information processing mechanism. Theproposal of surrogate gradient enables the spiking neural networks to migrateto more complex tasks, and gradually close the gap with the conventionalartificial neural networks. Current spiking neural networks utilize the outputof all moments to produce the final prediction, which compromises theirtemporal characteristics and causes a reduction in performance and efficiency.We propose a temporal knowledge sharing approach (TKS) that enables theinteraction of information between different moments, by selecting the outputof specific moments to compose teacher signals to guide the training of thenetwork along with the real labels. We have validated TKS on both staticdatasets CIFAR10, CIFAR100, ImageNet-1k and neuromorphic datasets DVS-CIFAR10,NCALTECH101. Our experimental results indicate that we have achieved thecurrent optimal performance in comparison with other algorithms. Experiments onFine-grained classification datasets further demonstrate our algorithm'ssuperiority with CUB-200-2011, StanfordDogs, and StanfordCars. TKS algorithmhelps the model to have stronger temporal generalization capability, allowingthe network to guarantee performance with large time steps in the trainingphase and with small time steps in the testing phase. This greatly facilitatesthe deployment of SNNs on edge devices.", "output": "Temporal Knowledge Sharing enable Spiking Neural Network Learning from Past and Future."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing implicit neural representation (INR) methods do not fully exploitspatiotemporal redundancies in videos. Index-based INRs ignore thecontent-specific spatial features and hybrid INRs ignore the contextualdependency on adjacent frames, leading to poor modeling capability for sceneswith large motion or dynamics. We analyze this limitation from the perspectiveof function fitting and reveal the importance of frame difference. To useexplicit motion information, we propose Difference Neural Representation forVideos (DNeRV), which consists of two streams for content and frame difference.We also introduce a collaborative content unit for effective feature fusion. Wetest DNeRV for video compression, inpainting, and interpolation. DNeRV achievescompetitive results against the state-of-the-art neural compression approachesand outperforms existing implicit methods on downstream inpainting andinterpolation for $960 times 1920$ videos.", "output": "DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A reliable perception has to be robust against challenging environmentalconditions. Therefore, recent efforts focused on the use of radar sensors inaddition to camera and lidar sensors for perception applications. However, thesparsity of radar point clouds and the poor data availability remainchallenging for current perception methods. To address these challenges, anovel graph neural network is proposed that does not just use the informationof the points themselves but also the relationships between the points. Themodel is designed to consider both point features and point-pair features,embedded in the edges of the graph. Furthermore, a general approach forachieving transformation invariance is proposed which is robust against unseenscenarios and also counteracts the limited data availability. Thetransformation invariance is achieved by an invariant data representationrather than an invariant model architecture, making it applicable to othermethods. The proposed RadarGNN model outperforms all previous methods on theRadarScenes dataset. In addition, the effects of different invariances on theobject detection and semantic segmentation quality are investigated. The codeis made available as open-source software under", "output": "RadarGNN: Transformation Invariant Graph Neural Network for Radar-based Perception."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a novel approach to real-time automatic rim detection,classification, and inspection by combining traditional computer vision anddeep learning techniques. At the end of every automotive assembly line, aquality control process is carried out to identify any potential defects in theproduced cars. Common yet hazardous defects are related, for example, toincorrectly mounted rims. Routine inspections are mostly conducted by humanworkers that are negatively affected by factors such as fatigue or distraction.We have designed a new prototype to validate whether all four wheels on asingle car match in size and type. Additionally, we present three comprehensiveopen-source databases, CWD1500, WHEEL22, and RB600, for wheel, rim, and boltdetection, as well as rim classification, which are free-to-use for scientificpurposes.", "output": "Real-Time Wheel Detection and Rim Classification in Automotive Production."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce NeRD, a new demosaicking method for generating full-color imagesfrom Bayer patterns. Our approach leverages advancements in neural fields toperform demosaicking by representing an image as a coordinate-based neuralnetwork with sine activation functions. The inputs to the network are spatialcoordinates and a low-resolution Bayer pattern, while the outputs are thecorresponding RGB values. An encoder network, which is a blend of ResNet andU-net, enhances the implicit neural representation of the image to improve itsquality and ensure spatial consistency through prior learning. Our experimentalresults demonstrate that NeRD outperforms traditional and state-of-the-artCNN-based methods and significantly closes the gap to transformer-basedmethods.", "output": "NeRD: Neural field-based Demosaicking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial examples have been found for various deep as well as shallowlearning models, and have at various times been suggested to be either fixablemodel-specific bugs, or else inherent dataset feature, or both. We presenttheoretical and empirical results to show that adversarial examples areapproximate discontinuities resulting from models that specify approximatelybijective maps $f: Bbb R^n to Bbb R^m; n neq m$ over their inputs, and thisdiscontinuity follows from the topological invariance of dimension.", "output": "Adversarial Examples from Dimensional Invariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Age is an important variable to describe the expected brain's anatomy statusacross the normal aging trajectory. The deviation from that normative agingtrajectory may provide some insights into neurological diseases. Inneuroimaging, predicted brain age is widely used to analyze different diseases.However, using only the brain age gap information (ie the difference betweenthe chronological age and the estimated age) can be not enough informative fordisease classification problems. In this paper, we propose to extend the notionof global brain age by estimating brain structure ages using structuralmagnetic resonance imaging. To this end, an ensemble of deep learning models isfirst used to estimate a 3D aging map (ie voxel-wise age estimation). Then, a3D segmentation mask is used to obtain the final brain structure ages. Thisbiomarker can be used in several situations. First, it enables to accuratelyestimate the brain age for the purpose of anomaly detection at the populationlevel. In this situation, our approach outperforms several state-of-the-artmethods. Second, brain structure ages can be used to compute the deviation fromthe normal aging process of each brain structure. This feature can be used in amulti-disease classification task for an accurate differential diagnosis at thesubject level. Finally, the brain structure age deviations of individuals canbe visualized, providing some insights about brain abnormality and helpingclinicians in real medical contexts.", "output": "Brain Structure Ages -- A new biomarker for multi-disease classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent works have shown that large models pretrained on common visuallearning tasks can provide useful representations for a wide range ofspecialized perception problems, as well as a variety of robotic manipulationtasks. While prior work on robotic manipulation has predominantly used frozenpretrained features, we demonstrate that in robotics this approach can fail toreach optimal performance, and that fine-tuning of the full model can lead tosignificantly better results. Unfortunately, fine-tuning disrupts thepretrained visual representation, and causes representational drift towards thefine-tuned task thus leading to a loss of the versatility of the originalmodel. We introduce \"lossless adaptation\" to address this shortcoming ofclassical fine-tuning. We demonstrate that appropriate placement of ourparameter efficient adapters can significantly reduce the performance gapbetween frozen pretrained representations and full end-to-end fine-tuningwithout changes to the original representation and thus preserving originalcapabilities of the pretrained model. We perform a comprehensive investigationacross three major model architectures (ViTs, NFNets, and ResNets), supervised(ImageNet-1K classification) and self-supervised pretrained weights (CLIP,BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstratethat our claims are strongly validated in various settings.", "output": "Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans possess the capacity to reason about the future based on a sparsecollection of visual cues acquired over time. In order to emulate this ability,we introduce a novel task called Anticipation Captioning, which generates acaption for an unseen oracle image using a sparsely temporally-ordered set ofimages. To tackle this new task, we propose a model called A-CAP, whichincorporates commonsense knowledge into a pre-trained vision-language model,allowing it to anticipate the caption. Through both qualitative andquantitative evaluations on a customized visual storytelling dataset, A-CAPoutperforms other image captioning methods and establishes a strong baselinefor anticipation captioning. We also address the challenges inherent in thistask.", "output": "A-CAP: Anticipation Captioning with Commonsense Knowledge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper investigates the problem of class-incremental object detection foragricultural applications where a model needs to learn new plant species anddiseases incrementally without forgetting the previously learned ones. We adapttwo public datasets to include new categories over time, simulating a morerealistic and dynamic scenario. We then compare three class-incrementallearning methods that leverage different forms of knowledge distillation tomitigate catastrophic forgetting. Our experiments show that all three methodssuffer from catastrophic forgetting, but the recent Dynamic Y-KD approach,which additionally uses a dynamic architecture that grows new branches to learnnew tasks, outperforms ILOD and Faster-ILOD in most scenarios both on new andold classes.These results highlight the challenges and opportunities of continual objectdetection for agricultural applications. In particular, the large intra-classand small inter-class variability that is typical of plant images exacerbatethe difficulty of learning new categories without interfering with previousknowledge. We publicly release our code to encourage future work.", "output": "Class-Incremental Learning of Plant and Disease Detection: Growing Branches with Knowledge Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Without access to the source data, source-free domain adaptation (SFDA)transfers knowledge from a source-domain trained model to target domains.Recently, SFDA has gained popularity due to the need to protect the dataprivacy of the source domain, but it suffers from catastrophic forgetting onthe source domain due to the lack of data. To systematically investigate themechanism of catastrophic forgetting, we first reimplement previous SFDAapproaches within a unified framework and evaluate them on four benchmarks. Weobserve that there is a trade-off between adaptation gain and forgetting loss,which motivates us to design a consistency regularization to mitigateforgetting. In particular, we propose a continual source-free domain adaptationapproach named CoSDA, which employs a dual-speed optimized teacher-studentmodel pair and is equipped with consistency learning capability. Ourexperiments demonstrate that CoSDA outperforms state-of-the-art approaches incontinuous adaptation. Notably, our CoSDA can also be integrated with otherSFDA methods to alleviate forgetting.", "output": "CoSDA: Continual Source-Free Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have proven to be highly effective in generatinghigh-quality images. However, adapting large pre-trained diffusion models tonew domains remains an open challenge, which is critical for real-worldapplications. This paper proposes DiffFit, a parameter-efficient strategy tofine-tune large pre-trained diffusion models that enable fast adaptation to newdomains. DiffFit is embarrassingly simple that only fine-tunes the bias termand newly-added scaling factors in specific layers, yet resulting insignificant training speed-up and reduced model storage costs. Compared withfull fine-tuning, DiffFit achieves 2$times$ training speed-up and only needsto store approximately 0.12% of the total model parameters. Intuitivetheoretical analysis has been provided to justify the efficacy of scalingfactors on fast adaptation. On 8 downstream datasets, DiffFit achieves superioror competitive performances compared to the full fine-tuning while being moreefficient. Remarkably, we show that DiffFit can adapt a pre-trainedlow-resolution generative model to a high-resolution one by adding minimalcost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of3.02 on ImageNet 512$times$512 benchmark by fine-tuning only 25 epochs from apublic pre-trained ImageNet 256$times$256 checkpoint while being 30$times$more training efficient than the closest competitor.", "output": "DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the limitations of inadequate Whole-Slide Image (WSI) samples withweak labels, pseudo-bag-based multiple instance learning (MIL) appears as avibrant prospect in WSI classification. However, the pseudo-bag dividingscheme, often crucial for classification performance, is still an open topicworth exploring. Therefore, this paper proposes a novel scheme, ProtoDiv, usinga bag prototype to guide the division of WSI pseudo-bags. Rather than designingcomplex network architecture, this scheme takes a plugin-and-play approach tosafely augment WSI data for effective training while preserving sampleconsistency. Furthermore, we specially devise an attention-based prototype thatcould be optimized dynamically in training to adapt to a classification task.We apply our ProtoDiv scheme on seven baseline models, and then carry out agroup of comparison experiments on two public WSI datasets. Experiments confirmour ProtoDiv could usually bring obvious performance improvements to WSIclassification.", "output": "ProtoDiv: Prototype-guided Division of Consistent Pseudo-bags for Whole-slide Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Breast cancer has reached the highest incidence rate worldwide among allmalignancies since 2020. Breast imaging plays a significant role in earlydiagnosis and intervention to improve the outcome of breast cancer patients. Inthe past decade, deep learning has shown remarkable progress in breast cancerimaging analysis, holding great promise in interpreting the rich informationand complex context of breast imaging modalities. Considering the rapidimprovement in the deep learning technology and the increasing severity ofbreast cancer, it is critical to summarize past progress and identify futurechallenges to be addressed. In this paper, we provide an extensive survey ofdeep learning-based breast cancer imaging research, covering studies onmammogram, ultrasound, magnetic resonance imaging, and digital pathology imagesover the past decade. The major deep learning methods, publicly availabledatasets, and applications on imaging-based screening, diagnosis, treatmentresponse prediction, and prognosis are described in detail. Drawn from thefindings of this survey, we present a comprehensive discussion of thechallenges and potential avenues for future research in deep learning-basedbreast cancer imaging.", "output": "Deep Learning in Breast Cancer Imaging: A Decade of Progress and Future Directions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most state-of-the-art instance segmentation methods rely on large amounts ofpixel-precise ground-truth annotations for training, which are expensive tocreate. Interactive segmentation networks help generate such annotations basedon an image and the corresponding user interactions such as clicks. Existingmethods for this task can only process a single instance at a time and eachuser interaction requires a full forward pass through the entire deep network.We introduce a more efficient approach, called DynaMITe, in which we representuser interactions as spatio-temporal queries to a Transformer decoder with apotential to segment multiple object instances in a single iteration. Ourarchitecture also alleviates any need to re-compute image features duringrefinement, and requires fewer interactions for segmenting multiple instancesin a single image when compared to other methods. DynaMITe achievesstate-of-the-art results on multiple existing interactive segmentationbenchmarks, and also on the new multi-instance benchmark that we propose inthis paper.", "output": "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial control is a core capability in controllable image generation.Advancements in layout-guided image generation have shown promising results onin-distribution (ID) datasets with similar spatial configurations. However, itis unclear how these models perform when facing out-of-distribution (OOD)samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench,a diagnostic benchmark for layout-guided image generation that examines fourcategories of spatial control skills: number, position, size, and shape. Webenchmark two recent representative layout-guided image generation methods andobserve that the good ID layout control may not generalize well to arbitrarylayouts in the wild (e.g., objects at the boundary). Next, we proposeIterInpaint, a new baseline that generates foreground and background regions ina step-by-step manner via inpainting, demonstrating stronger generalizabilitythan existing models on OOD layouts in LayoutBench. We perform quantitative andqualitative evaluation and fine-grained analysis on the four LayoutBench skillsto pinpoint the weaknesses of existing models. Lastly, we show comprehensiveablation studies on IterInpaint, including training task ratio, crop&amp;paste vs.repaint, and generation order. Project website: ", "output": "Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) techniques seek to learn the underlying patterns indata using fewer samples, analogous to how humans learn from limitedexperience. In this limited-data scenario, the challenges associated with deepneural networks, such as shortcut learning and texture bias behaviors, arefurther exacerbated. Moreover, the significance of addressing shortcut learningis not yet fully explored in the few-shot setup. To address these issues, wepropose LSFSL, which enforces the model to learn more generalizable featuresutilizing the implicit prior information present in the data. Throughcomprehensive analyses, we demonstrate that LSFSL-trained models are lessvulnerable to alteration in color schemes, statistical correlations, andadversarial perturbations leveraging the global semantics in the data. Ourfindings highlight the potential of incorporating relevant priors in few-shotapproaches to increase robustness and generalization.", "output": "LSFSL: Leveraging Shape Information in Few-shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have recently become the de-facto approach for generativemodeling in the 2D domain. However, extending diffusion models to 3D ischallenging due to the difficulties in acquiring 3D ground truth data fortraining. On the other hand, 3D GANs that integrate implicit 3D representationsinto GANs have shown remarkable 3D-aware generation when trained only onsingle-view image datasets. However, 3D GANs do not provide straightforwardways to precisely control image synthesis. To address these challenges, Wepresent Control3Diff, a 3D diffusion model that combines the strengths ofdiffusion models and 3D GANs for versatile, controllable 3D-aware imagesynthesis for single-view datasets. Control3Diff explicitly models theunderlying latent distribution (optionally conditioned on external inputs),thus enabling direct control during the diffusion process. Moreover, ourapproach is general and applicable to any type of controlling input, allowingus to train it with the same diffusion objective without any auxiliarysupervision. We validate the efficacy of Control3Diff on standard imagegeneration benchmarks, including FFHQ, AFHQ, and ShapeNet, using variousconditioning inputs such as images, sketches, and text prompts. Please see theproject website (url{ for video comparisons.", "output": "Learning Controllable 3D Diffusion Models from Single-view Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Burst image processing is becoming increasingly popular in recent years.However, it is a challenging task since individual burst images undergomultiple degradations and often have mutual misalignments resulting in ghostingand zipper artifacts. Existing burst restoration methods usually do notconsider the mutual correlation and non-local contextual information amongburst frames, which tends to limit these approaches in challenging cases.Another key challenge lies in the robust up-sampling of burst frames. Theexisting up-sampling methods cannot effectively utilize the advantages ofsingle-stage and progressive up-sampling strategies with conventional and/orrecent up-samplers at the same time. To address these challenges, we propose anovel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct aspatially precise high-quality image from a burst of low-quality raw images.GMTNet consists of three modules optimized for burst processing tasks:Multi-scale Burst Feature Alignment (MBFA) for feature denoising and alignment,Transposed-Attention Feature Merging (TAFM) for multi-frame featureaggregation, and Resolution Transfer Feature Up-sampler (RTFU) to up-scalemerged features and construct a high-quality output image. Detailedexperimental analysis on five datasets validates our approach and sets astate-of-the-art for burst super-resolution, burst denoising, and low-lightburst enhancement.", "output": "Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method to estimate the mechanical parameters of fabrics using acasual capture setup with a depth camera. Our approach enables to createmechanically-correct digital representations of real-world textile materials,which is a fundamental step for many interactive design and engineeringapplications. As opposed to existing capture methods, which typically requireexpensive setups, video sequences, or manual intervention, our solution cancapture at scale, is agnostic to the optical appearance of the textile, andfacilitates fabric arrangement by non-expert operators. To this end, we proposea sim-to-real strategy to train a learning-based framework that can take asinput one or multiple images and outputs a full set of mechanical parameters.Thanks to carefully designed data augmentation and transfer learning protocols,our solution generalizes to real images despite being trained only on syntheticdata, hence successfully closing the sim-to-real loop.Key in our work is todemonstrate that evaluating the regression accuracy based on the similarity atparameter space leads to an inaccurate distances that do not match the humanperception. To overcome this, we propose a novel metric for fabric drapesimilarity that operates on the image domain instead on the parameter space,allowing us to evaluate our estimation within the context of a similarity rank.We show that out metric correlates with human judgments about the perception ofdrape similarity, and that our model predictions produce perceptually accurateresults compared to the ground truth parameters.", "output": "How Will It Drape Like? Capturing Fabric Mechanics from Depth Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Radiance Field training can be accelerated through the use ofgrid-based representations in NeRF's learned mapping from spatial coordinatesto colors and volumetric density. However, these grid-based approaches lack anexplicit understanding of scale and therefore often introduce aliasing, usuallyin the form of jaggies or missing scene content. Anti-aliasing has previouslybeen addressed by mip-NeRF 360, which reasons about sub-volumes along a conerather than points along a ray, but this approach is not natively compatiblewith current grid-based techniques. We show how ideas from rendering and signalprocessing can be used to construct a technique that combines mip-NeRF 360 andgrid-based models such as Instant NGP to yield error rates that are 8% - 76%lower than either prior technique, and that trains 22x faster than mip-NeRF360.", "output": "Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, there has been an arms race of pose forecasting methods aimed atsolving the spatio-temporal task of predicting a sequence of future 3D poses ofa person given a sequence of past observed ones. However, the lack of unifiedbenchmarks and limited uncertainty analysis have hindered progress in thefield. To address this, we first develop an open-source library for human poseforecasting, featuring multiple models, datasets, and standardized evaluationmetrics, with the aim of promoting research and moving toward a unified andfair evaluation. Second, we devise two types of uncertainty in the problem toincrease performance and convey better trust: 1) we propose a method formodeling aleatoric uncertainty by using uncertainty priors to inject knowledgeabout the behavior of uncertainty. This focuses the capacity of the model inthe direction of more meaningful supervision while reducing the number oflearned parameters and improving stability; 2) we introduce a novel approachfor quantifying the epistemic uncertainty of any model through clustering andmeasuring the entropy of its assignments. Our experiments demonstrate up to$25%$ improvements in accuracy and better performance in uncertaintyestimation.", "output": "Toward Reliable Human Pose Forecasting with Uncertainty."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding verbs is crucial to modelling how people and objects interactwith each other and the environment through space and time. Recently,state-of-the-art video-language models based on CLIP have been shown to havelimited verb understanding and to rely extensively on nouns, restricting theirperformance in real-world video applications that require action and temporalunderstanding. In this work, we improve verb understanding for CLIP-basedvideo-language models by proposing a new Verb-Focused Contrastive (VFC)framework. This consists of two main components: (1) leveraging pretrainedlarge language models (LLMs) to create hard negatives for cross-modalcontrastive learning, together with a calibration strategy to balance theoccurrence of concepts in positive and negative pairs; and (2) enforcing afine-grained, verb phrase alignment loss. Our method achieves state-of-the-artresults for zero-shot performance on three downstream tasks that focus on verbunderstanding: video-text matching, video question-answering and videoclassification. To the best of our knowledge, this is the first work whichproposes a method to alleviate the verb understanding problem, and does notsimply highlight it.", "output": "Verbs in Action: Improving verb understanding in video-language models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current transformer-based change detection (CD) approaches either employ apre-trained model trained on large-scale image classification ImageNet datasetor rely on first pre-training on another CD dataset and then fine-tuning on thetarget benchmark. This current strategy is driven by the fact that transformerstypically require a large amount of training data to learn inductive biases,which is insufficient in standard CD datasets due to their small size. Wedevelop an end-to-end CD approach with transformers that is trained fromscratch and yet achieves state-of-the-art performance on four publicbenchmarks. Instead of using conventional self-attention that struggles tocapture inductive biases when trained from scratch, our architecture utilizes ashuffled sparse-attention operation that focuses on selected sparse informativeregions to capture the inherent characteristics of the CD data. Moreover, weintroduce a change-enhanced feature fusion (CEFF) module to fuse the featuresfrom input image pairs by performing a per-channel re-weighting. Our CEFFmodule aids in enhancing the relevant semantic changes while suppressing thenoisy ones. Extensive experiments on four CD datasets reveal the merits of theproposed contributions, achieving gains as high as 14.27% inintersection-over-union (IoU) score, compared to the best-published results inthe literature. Code is available aturl{", "output": "Remote Sensing Change Detection With Transformers Trained from Scratch."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We address the problem of learning person-specific facial priors from a smallnumber (e.g., 20) of portrait photos of the same person. This enables us toedit this specific person's facial appearance, such as expression and lighting,while preserving their identity and high-frequency facial details. Key to ourapproach, which we dub DiffusionRig, is a diffusion model conditioned on, or\"rigged by,\" crude 3D face models estimated from single in-the-wild images byan off-the-shelf estimator. On a high level, DiffusionRig learns to mapsimplistic renderings of 3D face models to realistic photos of a given person.Specifically, DiffusionRig is trained in two stages: It first learns genericfacial priors from a large-scale face dataset and then person-specific priorsfrom a small portrait photo collection of the person of interest. By learningthe CGI-to-photo mapping with such personalized priors, DiffusionRig can \"rig\"the lighting, facial expression, head pose, etc. of a portrait photo,conditioned only on coarse 3D models while preserving this person's identityand other high-frequency characteristics. Qualitative and quantitativeexperiments show that DiffusionRig outperforms existing approaches in bothidentity preservation and photorealism. Please see the project website: for the supplemental material, video, code, anddata.", "output": "DiffusionRig: Learning Personalized Priors for Facial Appearance Editing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale Vision-Language Models, such as CLIP, learn powerful image-textrepresentations that have found numerous applications, from zero-shotclassification to text-to-image generation. Despite that, their capabilitiesfor solving novel discriminative tasks via prompting fall behind those of largelanguage models, such as GPT-3. Here we explore the idea of visual promptengineering for solving computer vision tasks beyond classification by editingin image space instead of text. In particular, we discover an emergent abilityof CLIP, where, by simply drawing a red circle around an object, we can directthe model's attention to that region, while also maintaining globalinformation. We show the power of this simple approach by achievingstate-of-the-art in zero-shot referring expressions comprehension and strongperformance in keypoint localization tasks. Finally, we draw attention to somepotential ethical concerns of large language-vision models.", "output": "What does CLIP know about a red circle? Visual prompt engineering for VLMs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D-aware image synthesis encompasses a variety of tasks, such as scenegeneration and novel view synthesis from images. Despite numerous task-specificmethods, developing a comprehensive model remains challenging. In this paper,we present SSDNeRF, a unified approach that employs an expressive diffusionmodel to learn a generalizable prior of neural radiance fields (NeRF) frommulti-view images of diverse objects. Previous studies have used two-stageapproaches that rely on pretrained NeRFs as real data to train diffusionmodels. In contrast, we propose a new single-stage training paradigm with anend-to-end objective that jointly optimizes a NeRF auto-decoder and a latentdiffusion model, enabling simultaneous 3D reconstruction and prior learning,even from sparsely available views. At test time, we can directly sample thediffusion prior for unconditional generation, or combine it with arbitraryobservations of unseen objects for NeRF reconstruction. SSDNeRF demonstratesrobust results comparable to or better than leading task-specific methods inunconditional generation and single/sparse-view 3D reconstruction.", "output": "Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale models pre-trained on large-scale datasets have profoundlyadvanced the development of deep learning. However, the state-of-the-art modelsfor medical image segmentation are still small-scale, with their parametersonly in the tens of millions. Further scaling them up to higher orders ofmagnitude is rarely explored. An overarching goal of exploring large-scalemodels is to train them on large-scale medical segmentation datasets for bettertransfer capacities. In this work, we design a series of Scalable andTransferable U-Net (STU-Net) models, with parameter sizes ranging from 14million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical imagesegmentation model to date. Our STU-Net is based on nnU-Net framework due toits popularity and impressive performance. We first refine the defaultconvolutional blocks in nnU-Net to make them scalable. Then, we empiricallyevaluate different scaling combinations of network depth and width, discoveringthat it is optimal to scale model depth and width together. We train ourscalable STU-Net models on a large-scale TotalSegmentator dataset and find thatincreasing model size brings a stronger performance gain. This observationreveals that a large model is promising in medical image segmentation.Furthermore, we evaluate the transferability of our model on 14 downstreamdatasets for direct inference and 3 datasets for further fine-tuning, coveringvarious modalities and segmentation targets. We observe good performance of ourpre-trained model in both direct inference and fine-tuning. The code andpre-trained models are available at ", "output": "STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a novel representation of volumetric videos forreal-time view synthesis of dynamic scenes. Recent advances in neural scenerepresentations demonstrate their remarkable capability to model and rendercomplex static scenes, but extending them to represent dynamic scenes is notstraightforward due to their slow rendering speed or high storage cost. Tosolve this problem, our key idea is to represent the radiance field of eachframe as a set of shallow MLP networks whose parameters are stored in 2D grids,called MLP maps, and dynamically predicted by a 2D CNN decoder shared by allframes. Representing 3D scenes with shallow MLPs significantly improves therendering speed, while dynamically predicting MLP parameters with a shared 2DCNN instead of explicitly storing them leads to low storage cost. Experimentsshow that the proposed approach achieves state-of-the-art rendering quality onthe NHR and ZJU-MoCap datasets, while being efficient for real-time renderingwith a speed of 41.7 fps for $512 times 512$ images on an RTX 3090 GPU. Thecode is available at ", "output": "Representing Volumetric Videos as Dynamic MLP Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the growing demand for interactive AI systems, there have been fewcomprehensive studies on human-AI interaction in visual understanding e.g.segmentation. Inspired by the development of prompt-based universal interfacesfor LLMs, this paper presents SEEM, a promptable, interactive model forSegmenting Everything Everywhere all at once in an image. SEEM has fourdesiderata: i) Versatility: by introducing a versatile prompting engine fordifferent types of prompts, including points, boxes, scribbles, masks, texts,and referred regions of another image; ii) Compositionality: by learning ajoint visual-semantic space for visual and textual prompts to compose querieson the fly for inference as shown in Fig 1; iii)Interactivity: by incorporatinglearnable memory prompts to retain dialog history information via mask-guidedcross-attention; and iv) Semantic-awareness: by using a text encoder to encodetext queries and mask labels for open-vocabulary segmentation.", "output": "Segment Everything Everywhere All at Once."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent advances in camera-based bird's eye view (BEV) representationexhibit great potential for in-vehicle 3D perception. Despite the substantialprogress achieved on standard benchmarks, the robustness of BEV algorithms hasnot been thoroughly examined, which is critical for safe operations. To bridgethis gap, we introduce RoboBEV, a comprehensive benchmark suite thatencompasses eight distinct corruptions, including Bright, Dark, Fog, Snow,Motion Blur, Color Quant, Camera Crash, and Frame Lost. Based on it, weundertake extensive evaluations across a wide range of BEV-based models tounderstand their resilience and reliability. Our findings indicate a strongcorrelation between absolute performance on in-distribution andout-of-distribution datasets. Nonetheless, there are considerable variations inrelative performance across different approaches. Our experiments furtherdemonstrate that pre-training and depth-free BEV transformation has thepotential to enhance out-of-distribution robustness. Additionally, utilizinglong and rich temporal information largely helps with robustness. Our findingsprovide valuable insights for designing future BEV models that can achieve bothaccuracy and robustness in real-world deployments.", "output": "RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Plain text has become a prevalent interface for text-to-image synthesis.However, its limited customization options hinder users from accuratelydescribing desired outputs. For example, plain text makes it hard to specifycontinuous quantities, such as the precise RGB color value or importance ofeach word. Furthermore, creating detailed text prompts for complex scenes istedious for humans to write and challenging for text encoders to interpret. Toaddress these challenges, we propose using a rich-text editor supportingformats such as font style, size, color, and footnote. We extract each word'sattributes from rich text to enable local style control, explicit tokenreweighting, precise color rendering, and detailed region synthesis. We achievethese capabilities through a region-based diffusion process. We first obtaineach word's region based on cross-attention maps of a vanilla diffusion processusing plain text. For each region, we enforce its text attributes by creatingregion-specific detailed prompts and applying region-specific guidance. Wepresent various examples of image generation from rich text and demonstratethat our method outperforms strong baselines with quantitative evaluations.", "output": "Expressive Text-to-Image Generation with Rich Text."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Univariate and multivariate normal probability distributions are widely usedwhen modeling decisions under uncertainty. Computing the performance of suchmodels requires integrating these distributions over specific domains, whichcan vary widely across models. Besides some special cases, there exist nogeneral analytical expressions, standard numerical methods or software forthese integrals. Here we present mathematical results and open-source softwarethat provide (i) the probability in any domain of a normal in any dimensionswith any parameters, (ii) the probability density, cumulative distribution, andinverse cumulative distribution of any function of a normal vector, (iii) theclassification errors among any number of normal distributions, theBayes-optimal discriminability index and relation to the operatingcharacteristic, (iv) dimension reduction and visualizations for such problems,and (v) tests for how reliably these methods may be used on given data. Wedemonstrate these tools with vision research applications of detectingoccluding objects in natural scenes, and detecting camouflage.", "output": "A method to integrate and classify normal distributions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Areas under ROC (AUROC) and precision-recall curves (AUPRC) are commonmetrics for evaluating classification performance for imbalanced problems.Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanceddatasets. While stochastic optimization of AUROC has been studied extensively,principled stochastic optimization of AUPRC has been rarely explored. In thiswork, we propose a principled technical method to optimize AUPRC for deeplearning. Our approach is based on maximizing the averaged precision (AP),which is an unbiased point estimator of AUPRC. We cast the objective into a sumof {it dependent compositional functions} with inner functions dependent onrandom variables of the outer level. We propose efficient adaptive andnon-adaptive stochastic algorithms named SOAP with {it provable convergenceguarantee under mild conditions} by leveraging recent advances in stochasticcompositional optimization. Extensive experimental results on image and graphdatasets demonstrate that our proposed method outperforms prior methods onimbalanced problems in terms of AUPRC. To the best of our knowledge, our workrepresents the first attempt to optimize AUPRC with provable convergence. TheSOAP has been implemented in the libAUC library at~url{", "output": "Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The human-object interaction (HOI) detection task refers to localizinghumans, localizing objects, and predicting the interactions between eachhuman-object pair. HOI is considered one of the fundamental steps in trulyunderstanding complex visual scenes. For detecting HOI, it is important toutilize relative spatial configurations and object semantics to find salientspatial regions of images that highlight the interactions between human objectpairs. This issue is addressed by the novel self-attention based guidedtransformer network, GTNet. GTNet encodes this spatial contextual informationin human and object visual features via self-attention while achieving state ofthe art results on both the V-COCO and HICO-DET datasets. Code will be madeavailable online.", "output": "GTNet:Guided Transformer Network for Detecting Human-Object Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As a fundamental yet challenging problem in intelligent transportationsystems, point cloud registration attracts vast attention and has been attainedwith various deep learning-based algorithms. The unsupervised registrationalgorithms take advantage of deep neural network-enabled novel representationlearning while requiring no human annotations, making them applicable toindustrial applications. However, unsupervised methods mainly depend on globaldescriptors, which ignore the high-level representations of local geometries.In this paper, we propose to jointly use both global and local descriptors toregister point clouds in a self-supervised manner, which is motivated by acritical observation that all local geometries of point clouds are transformedconsistently under the same transformation. Therefore, local geometries can beemployed to enhance the representation ability of the feature extractionmodule. Moreover, the proposed local descriptor is flexible and can beintegrated into most existing registration methods and improve theirperformance. Besides, we also utilize point cloud reconstruction and normalestimation to enhance the transformation awareness of global and localdescriptors. Lastly, extensive experimental results on one synthetic and threereal-world datasets demonstrate that our method outperforms existingstate-of-art unsupervised registration methods and even surpasses supervisedones in some cases. Robustness and computational efficiency evaluations alsoindicate that the proposed method applies to intelligent vehicles.", "output": "Self-Supervised Point Cloud Registration with Deep Versatile Descriptors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional Neural Networks are the de facto models for image recognition.However 3D CNNs, the straight forward extension of 2D CNNs for videorecognition, have not achieved the same success on standard action recognitionbenchmarks. One of the main reasons for this reduced performance of 3D CNNs isthe increased computational complexity requiring large scale annotated datasetsto train them in scale. 3D kernel factorization approaches have been proposedto reduce the complexity of 3D CNNs. Existing kernel factorization approachesfollow hand-designed and hard-wired techniques. In this paper we proposeGate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module whichcontrols interactions in spatio-temporal decomposition and learns to adaptivelyroute features through time and combine them in a data dependent manner. GSFleverages grouped spatial gating to decompose input tensor and channelweighting to fuse the decomposed tensors. GSF can be inserted into existing 2DCNNs to convert them into an efficient and high performing spatio-temporalfeature extractor, with negligible parameter and compute overhead. We performan extensive analysis of GSF using two popular 2D CNN families and achievestate-of-the-art or competitive performance on five standard action recognitionbenchmarks. Code and models will be made publicly available at", "output": "Gate-Shift-Fuse for Video Action Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A significant gap remains between today's visual pattern recognition modelsand human-level visual cognition especially when it comes to few-shot learningand compositional reasoning of novel concepts. We introduce Bongard-HOI, a newvisual reasoning benchmark that focuses on compositional learning ofhuman-object interactions (HOIs) from natural images. It is inspired by twodesirable characteristics from the classical Bongard problems (BPs): 1)few-shot concept learning, and 2) context-dependent reasoning. We carefullycurate the few-shot instances with hard negatives, where positive and negativeimages only disagree on action labels, making mere recognition of objectcategories insufficient to complete our benchmarks. We also design multipletest sets to systematically study the generalization of visual learning models,where we vary the overlap of the HOI concepts between the training and testsets of few-shot instances, from partial to no overlaps. Bongard-HOI presents asubstantial challenge to today's visual recognition models. Thestate-of-the-art HOI detection model achieves only 62% accuracy on few-shotbinary prediction while even amateur human testers on MTurk have 91% accuracy.With the Bongard-HOI benchmark, we hope to further advance research efforts invisual reasoning, especially in holistic perception-reasoning systems andbetter representation learning.", "output": "Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While diffusion models have shown great success in image generation, theirnoise-inverting generative process does not explicitly consider the structureof images, such as their inherent multi-scale nature. Inspired by diffusionmodels and the empirical success of coarse-to-fine modelling, we propose a newdiffusion-like model that generates images through stochastically reversing theheat equation, a PDE that locally erases fine-scale information when run overthe 2D plane of the image. We interpret the solution of the forward heatequation with constant additive noise as a variational approximation in thediffusion latent variable model. Our new model shows emergent qualitativeproperties not seen in standard diffusion models, such as disentanglement ofoverall colour and shape in images. Spectral analysis on natural imageshighlights connections to diffusion models and reveals an implicitcoarse-to-fine inductive bias in them.", "output": "Generative Modelling With Inverse Heat Dissipation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Densely annotating LiDAR point clouds is costly, which restrains thescalability of fully-supervised learning methods. In this work, we study theunderexplored semi-supervised learning (SSL) in LiDAR segmentation. Our coreidea is to leverage the strong spatial cues of LiDAR point clouds to betterexploit unlabeled data. We propose LaserMix to mix laser beams from differentLiDAR scans, and then encourage the model to make consistent and confidentpredictions before and after mixing. Our framework has three appealingproperties: 1) Generic: LaserMix is agnostic to LiDAR representations (e.g.,range view and voxel), and hence our SSL framework can be universally applied.2) Statistically grounded: We provide a detailed analysis to theoreticallyexplain the applicability of the proposed framework. 3) Effective:Comprehensive experimental analysis on popular LiDAR segmentation datasets(nuScenes, SemanticKITTI, and ScribbleKITTI) demonstrates our effectiveness andsuperiority. Notably, we achieve competitive results over fully-supervisedcounterparts with 2x to 5x fewer labels and improve the supervised-onlybaseline significantly by 10.8% on average. We hope this concise yethigh-performing framework could facilitate future research in semi-supervisedLiDAR segmentation. Code is publicly available.", "output": "LaserMix for Semi-Supervised LiDAR Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As two fundamental representation modalities of 3D objects, 3D point cloudsand multi-view 2D images record shape information from different domains ofgeometric structures and visual appearances. In the current deep learning era,remarkable progress in processing such two data modalities has been achievedthrough respectively customizing compatible 3D and 2D network architectures.However, unlike multi-view image-based 2D visual modeling paradigms, which haveshown leading performance in several common 3D shape recognition benchmarks,point cloud-based 3D geometric modeling paradigms are still highly limited byinsufficient learning capacity, due to the difficulty of extractingdiscriminative features from irregular geometric signals. In this paper, weexplore the possibility of boosting deep 3D point cloud encoders bytransferring visual knowledge extracted from deep 2D image encoders under astandard teacher-student distillation workflow. Generally, we propose PointMCD,a unified multi-view cross-modal distillation architecture, including apretrained deep image encoder as the teacher and a deep point encoder as thestudent. To perform heterogeneous feature alignment between 2D visual and 3Dgeometric domains, we further investigate visibility-aware feature projection(VAFP), by which point-wise embeddings are reasonably aggregated intoview-specific geometric descriptors. By pair-wisely aligning multi-view visualand geometric descriptors, we can obtain more powerful deep point encoderswithout exhausting and complicated network modification. Experiments on 3Dshape classification, part segmentation, and unsupervised learning stronglyvalidate the effectiveness of our method. The code and data will be publiclyavailable at ", "output": "PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel transfer-based targeted attack method thatoptimizes the adversarial perturbations without any extra training efforts forauxiliary networks on training data. Our new attack method is proposed based onthe observation that highly universal adversarial perturbations tend to be moretransferable for targeted attacks. Therefore, we propose to make theperturbation to be agnostic to different local regions within one image, whichwe called as self-universality. Instead of optimizing the perturbations ondifferent images, optimizing on different regions to achieve self-universalitycan get rid of using extra data. Specifically, we introduce a featuresimilarity loss that encourages the learned perturbations to be universal bymaximizing the feature similarity between adversarial perturbed global imagesand randomly cropped local regions. With the feature similarity loss, ourmethod makes the features from adversarial perturbations to be more dominantthan that of benign images, hence improving targeted transferability. We namethe proposed attack method as Self-Universality (SU) attack. Extensiveexperiments demonstrate that SU can achieve high success rates fortransfer-based targeted attacks. On ImageNet-compatible dataset, SU yields animprovement of 12% compared with existing state-of-the-art methods. Code isavailable at ", "output": "Enhancing the Self-Universality for Transferable Targeted Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although analog semantic communication systems have received considerableattention in the literature, there is less work on digital semanticcommunication systems. In this paper, we develop a deep learning (DL)-enabledvector quantized (VQ) semantic communication system for image transmission,named VQ-DeepSC. Specifically, we propose a convolutional neural network(CNN)-based transceiver to extract multi-scale semantic features of images andintroduce multi-scale semantic embedding spaces to perform semantic featurequantization, rendering the data compatible with digital communication systems.Furthermore, we employ adversarial training to improve the quality of receivedimages by introducing a PatchGAN discriminator. Experimental resultsdemonstrate that the proposed VQ-DeepSC is more robustness than BPG in digitalcommunication systems and has comparable MS-SSIM performance to the DeepJSCCmethod.", "output": "Vector Quantized Semantic Communication System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, neural distance functions trained via volumetric raymarching have been widely adopted for multi-view 3D reconstruction. Thesemethods, however, apply the ray marching procedure for the entire scene volume,leading to reduced sampling efficiency and, as a result, lower reconstructionquality in the areas of high-frequency details. In this work, we address thisproblem via joint training of the implicit function and our new coarsesphere-based surface reconstruction. We use the coarse representation toefficiently exclude the empty volume of the scene from the volumetric raymarching procedure without additional forward passes of the neural surfacenetwork, which leads to an increased fidelity of the reconstructions comparedto the base systems. We evaluate our approach by incorporating it into thetraining procedures of several implicit surface modeling methods and observeuniform improvements across both synthetic and real-world datasets. Ourcodebase can be accessed via the project page:", "output": "Sphere-Guided Training of Neural Implicit Surfaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifier-free guided diffusion models have recently been shown to be highlyeffective at high-resolution image generation, and they have been widely usedin large-scale diffusion frameworks including DALLE-2, Stable Diffusion andImagen. However, a downside of classifier-free guided diffusion models is thatthey are computationally expensive at inference time since they requireevaluating two diffusion models, a class-conditional model and an unconditionalmodel, tens to hundreds of times. To deal with this limitation, we propose anapproach to distilling classifier-free guided diffusion models into models thatare fast to sample from: Given a pre-trained classifier-free guided model, wefirst learn a single model to match the output of the combined conditional andunconditional models, and then we progressively distill that model to adiffusion model that requires much fewer sampling steps. For standard diffusionmodels trained on the pixel-space, our approach is able to generate imagesvisually comparable to that of the original model using as few as 4 samplingsteps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable tothat of the original model while being up to 256 times faster to sample from.For diffusion models trained on the latent-space (e.g., Stable Diffusion), ourapproach is able to generate high-fidelity images using as few as 1 to 4denoising steps, accelerating inference by at least 10-fold compared toexisting methods on ImageNet 256x256 and LAION datasets. We further demonstratethe effectiveness of our approach on text-guided image editing and inpainting,where our distilled model is able to generate high-quality results using as fewas 2-4 denoising steps.", "output": "On Distillation of Guided Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new task to benchmark scene understanding of embodied agents:Situated Question Answering in 3D Scenes (SQA3D). Given a scene context (e.g.,3D scan), SQA3D requires the tested agent to first understand its situation(position, orientation, etc.) in the 3D scene as described by text, then reasonabout its surrounding environment and answer a question under that situation.Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8kunique situations, along with 20.4k descriptions and 33.4k diverse reasoningquestions for these situations. These questions examine a wide spectrum ofreasoning capabilities for an intelligent agent, ranging from spatial relationcomprehension to commonsense understanding, navigation, and multi-hopreasoning. SQA3D imposes a significant challenge to current multi-modalespecially 3D reasoning models. We evaluate various state-of-the-art approachesand find that the best one only achieves an overall score of 47.20%, whileamateur human participants can reach 90.06%. We believe SQA3D could facilitatefuture embodied AI research with stronger situation understanding and reasoningcapability.", "output": "SQA3D: Situated Question Answering in 3D Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of iterative machine teaching, where a teachersequentially provides examples based on the status of a learner under adiscrete input space (i.e., a pool of finite samples), which greatly limits theteacher's capability. To address this issue, we study iterative teaching undera continuous input space where the input example (i.e., image) can be eithergenerated by solving an optimization problem or drawn directly from acontinuous distribution. Specifically, we propose data hallucination teaching(DHT) where the teacher can generate input data intelligently based on labels,the learner's status and the target concept. We study a number of challengingteaching setups (e.g., linear/neural learners in omniscient and black-boxsettings). Extensive empirical results verify the effectiveness of DHT.", "output": "Iterative Teaching by Data Hallucination."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a post-processor, called NeighborTrack, that leverages neighborinformation of the tracking target to validate and improve single-objecttracking (SOT) results. It requires no additional data or retraining. Instead,it uses the confidence score predicted by the backbone SOT network toautomatically derive neighbor information and then uses this information toimprove the tracking results. When tracking an occluded target, its appearancefeatures are untrustworthy. However, a general siamese network often cannottell whether the tracked object is occluded by reading the confidence scorealone, because it could be misled by neighbors with high confidence scores. Ourproposed NeighborTrack takes advantage of unoccluded neighbors' information toreconfirm the tracking target and reduces false tracking when the target isoccluded. It not only reduces the impact caused by occlusion, but also fixestracking problems caused by object appearance changes. NeighborTrack isagnostic to SOT networks and post-processing methods. For the VOT challengedataset commonly used in short-term object tracking, we improve three famousSOT networks, Ocean, TransT, and OSTrack, by an average of ${1.92%}$ EAO and${2.11%}$ robustness. For the mid- and long-term tracking experiments based onOSTrack, we achieve state-of-the-art ${72.25%}$ AUC on LaSOT and ${75.7%}$ AOon GOT-10K. Code duplication can be found in", "output": "NeighborTrack: Improving Single Object Tracking by Bipartite Matching with Neighbor Tracklets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose Token Turing Machines (TTM), a sequential, autoregressiveTransformer model with memory for real-world sequential visual understanding.Our model is inspired by the seminal Neural Turing Machine, and has an externalmemory consisting of a set of tokens which summarise the previous history(i.e., frames). This memory is efficiently addressed, read and written using aTransformer as the processing unit/controller at each step. The model's memorymodule ensures that a new observation will only be processed with the contentsof the memory (and not the entire history), meaning that it can efficientlyprocess long sequences with a bounded computational cost at each step. We showthat TTM outperforms other alternatives, such as other Transformer modelsdesigned for long sequences and recurrent neural networks, on two real-worldsequential visual understanding tasks: online temporal activity detection fromvideos and vision-based robot action policy learning.Code is publicly available at:", "output": "Token Turing Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Tensor4D, an efficient yet effective approach to dynamic scenemodeling. The key of our solution is an efficient 4D tensor decompositionmethod so that the dynamic scene can be directly represented as a 4Dspatio-temporal tensor. To tackle the accompanying memory issue, we decomposethe 4D tensor hierarchically by projecting it first into three time-awarevolumes and then nine compact feature planes. In this way, spatial informationover time can be simultaneously captured in a compact and memory-efficientmanner. When applying Tensor4D for dynamic scene reconstruction and rendering,we further factorize the 4D fields to different scales in the sense thatstructural motions and dynamic detailed changes can be learned from coarse tofine. The effectiveness of our method is validated on both synthetic andreal-world scenes. Extensive experiments show that our method is able toachieve high-quality dynamic reconstruction and rendering from sparse-viewcamera rigs or even a monocular camera. The code and dataset will be releasedat ", "output": "Tensor4D : Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning continuous image representations is recently gaining popularity forimage super-resolution (SR) because of its ability to reconstructhigh-resolution images with arbitrary scales from low-resolution inputs.Existing methods mostly ensemble nearby features to predict the new pixel atany queried coordinate in the SR image. Such a local ensemble suffers from somelimitations: i) it has no learnable parameters and it neglects the similarityof the visual features; ii) it has a limited receptive field and cannotensemble relevant features in a large field which are important in an image. Toaddress these issues, this paper proposes a continuous implicitattention-in-attention network, called CiaoSR. We explicitly design an implicitattention network to learn the ensemble weights for the nearby local features.Furthermore, we embed a scale-aware attention in this implicit attentionnetwork to exploit additional non-local information. Extensive experiments onbenchmark datasets demonstrate CiaoSR significantly outperforms the existingsingle image SR methods with the same backbone. In addition, CiaoSR alsoachieves the state-of-the-art performance on the arbitrary-scale SR task. Theeffectiveness of the method is also demonstrated on the real-world SR setting.More importantly, CiaoSR can be flexibly integrated into any backbone toimprove the SR performance.", "output": "CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-guided image editing can have a transformative impact in supportingcreative applications. A key challenge is to generate edits that are faithfulto input text prompts, while consistent with input images. We present ImagenEditor, a cascaded diffusion model built, by fine-tuning Imagen on text-guidedimage inpainting. Imagen Editor's edits are faithful to the text prompts, whichis accomplished by using object detectors to propose inpainting masks duringtraining. In addition, Imagen Editor captures fine details in the input imageby conditioning the cascaded pipeline on the original high resolution image. Toimprove qualitative and quantitative evaluation, we introduce EditBench, asystematic benchmark for text-guided image inpainting. EditBench evaluatesinpainting edits on natural and generated images exploring objects, attributes,and scenes. Through extensive human evaluation on EditBench, we find thatobject-masking during training leads to across-the-board improvements intext-image alignment -- such that Imagen Editor is preferred over DALL-E 2 andStable Diffusion -- and, as a cohort, these models are better atobject-rendering than text-rendering, and handle material/color/size attributesbetter than count/shape attributes.", "output": "Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Our goal with this survey is to provide an overview of the state of the artdeep learning technologies for face generation and editing. We will coverpopular latest architectures and discuss key ideas that make them work, such asinversion, latent representation, loss functions, training procedures, editingmethods, and cross domain style transfer. We particularly focus on GAN-basedarchitectures that have culminated in the StyleGAN approaches, which allowgeneration of high-quality face images and offer rich interfaces forcontrollable semantics editing and preserving photo quality. We aim to providean entry point into the field for readers that have basic knowledge about thefield of deep learning and are looking for an accessible introduction andoverview.", "output": "Face Generation and Editing with StyleGAN: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep convolutional neural networks (DCNNs) based remote sensing (RS) imagesemantic segmentation technology has achieved great success used in manyreal-world applications such as geographic element analysis. However, strongdependency on annotated data of specific scene makes it hard for DCNNs to fitdifferent RS scenes. To solve this problem, recent works gradually focus oncross-domain RS image semantic segmentation task. In this task, differentground sampling distance, remote sensing sensor variation and differentgeographical landscapes are three main factors causing dramatic domain shiftbetween source and target images. To decrease the negative influence of domainshift, we propose a self-training guided disentangled adaptation network(ST-DASegNet). We first propose source student backbone and target studentbackbone to respectively extract the source-style and target-style feature forboth source and target images. Towards the intermediate output feature maps ofeach backbone, we adopt adversarial learning for alignment. Then, we propose adomain disentangled module to extract the universal feature and purify thedistinct feature of source-style and target-style features. Finally, these twofeatures are fused and served as input of source student decoder and targetstudent decoder to generate final predictions. Based on our proposed domaindisentangled module, we further propose exponential moving average (EMA) basedcross-domain separated self-training mechanism to ease the instability anddisadvantageous effect during adversarial optimization. Extensive experimentsand analysis on benchmark RS datasets show that ST-DASegNet outperformsprevious methods on cross-domain RS image semantic segmentation task andachieves state-of-the-art (SOTA) results. Our code is available at", "output": "Self-Training Guided Disentangled Adaptation for Cross-Domain Remote Sensing Image Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce LYSTO, the Lymphocyte Assessment Hackathon, which was held inconjunction with the MICCAI 2019 Conference in Shenzen (China). The competitionrequired participants to automatically assess the number of lymphocytes, inparticular T-cells, in histopathological images of colon, breast, and prostatecancer stained with CD3 and CD8 immunohistochemistry. Differently from otherchallenges setup in medical image analysis, LYSTO participants were solelygiven a few hours to address this problem. In this paper, we describe the goaland the multi-phase organization of the hackathon; we describe the proposedmethods and the on-site results. Additionally, we present post-competitionresults where we show how the presented methods perform on an independent setof lung cancer slides, which was not part of the initial competition, as wellas a comparison on lymphocyte assessment between presented methods and a panelof pathologists. We show that some of the participants were capable to achievepathologist-level performance at lymphocyte assessment. After the hackathon,LYSTO was left as a lightweight plug-and-play benchmark dataset ongrand-challenge website, together with an automatic evaluation platform. LYSTOhas supported a number of research in lymphocyte assessment in oncology. LYSTOwill be a long-lasting educational challenge for deep learning and digitalpathology, it is available at ", "output": "LYSTO: The Lymphocyte Assessment Hackathon and Benchmark Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-target multi-camera tracking (MTMCT) of vehicles, i.e. trackingvehicles across multiple cameras, is a crucial application for the developmentof smart city and intelligent traffic system. The main challenges of MTMCT ofvehicles include the intra-class variability of the same vehicle andinter-class similarity between different vehicles and how to associate the samevehicle accurately across different cameras under large search space. Previousmethods for MTMCT usually use hierarchical clustering of trajectories toconduct cross camera association. However, the search space can be large anddoes not take spatial and temporal information into consideration. In thispaper, we proposed a transformer-based camera link model with spatial andtemporal filtering to conduct cross camera tracking. Achieving 73.68% IDF1 onthe Nvidia Cityflow V2 dataset test set, showing the effectiveness of ourcamera link model on multi-target multi-camera tracking.", "output": "Multi-target multi-camera vehicle tracking using transformer-based camera link model and spatial-temporal information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper demonstrates an approach for learning highly semantic imagerepresentations without relying on hand-crafted data-augmentations. Weintroduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), anon-generative approach for self-supervised learning from images. The ideabehind I-JEPA is simple: from a single context block, predict therepresentations of various target blocks in the same image. A core designchoice to guide I-JEPA towards producing semantic representations is themasking strategy; specifically, it is crucial to (a) sample target blocks withsufficiently large scale (semantic), and to (b) use a sufficiently informative(spatially distributed) context block. Empirically, when combined with VisionTransformers, we find I-JEPA to be highly scalable. For instance, we train aViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strongdownstream performance across a wide range of tasks, from linear classificationto object counting and depth prediction.", "output": "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recognition of facial emotions is an essential aspect of humancommunication, allowing individuals to understand emotions conveyed by facialexpressions and vocal tones. The field of Facial Emotion Recognition (FER) isof great significance in the areas of computer vision and artificialintelligence, with vast commercial and academic potential in fields such assecurity, advertising, and entertainment. We propose a FER framework thatemploys Swin Vision Transformers (SwinT) and squeeze and excitation block (SE)to address vision tasks. The approach uses a transformer model with anattention mechanism, SE, and SAM to improve the efficiency of the model, astransformers often require a large amount of data. Our focus was to create anefficient FER model based on SwinT architecture that can recognize facialemotions using minimal data. We trained our model on a hybrid dataset andevaluated its performance on the AffectNet dataset, achieving an F1-score of0.5420, which surpassed the winner of the Affective Behavior Analysis in theWild (ABAW) Competition held at the European Conference on Computer Vision(ECCV) 2022", "output": "Facial Expression Recognition using Squeeze and Excitation-powered Swin Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Remote sensing hyperspectral and more generally spectral instruments arecommon tools to decipher surface features in Earth and Planetary science. Whilelinear mixture is the most common approximation for compounds detection(mineral, water, ice, etc...), the transfer of light in surface and atmosphericmedium are highly non-linear. The exact simulation of non-linearities can beestimated at very high numerical cost. Here I propose a very simple non-linearform (that includes the regular linear area mixture) of radiative transfer toapproximate surface spectral feature. I demonstrate that this analytical formis able to approximate the grain size and intimate mixture dependence ofsurface features. In addition, the same analytical form can approximate theeffect of Martian mineral aerosols. Unfortunately, Earth aerosols are morecomplex (water droplet, water ice, soot,...) and are not expected to follow thesame trend.", "output": "Approximation of radiative transfer for surface spectral features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning has been widely utilized to mitigate the data scarcityproblem in the field of Alzheimer's disease (AD). Conventional transferlearning relies on re-using models trained on AD-irrelevant tasks such asnatural image classification. However, it often leads to negative transfer dueto the discrepancy between the non-medical source and target medical domains.To address this, we present evidence-empowered transfer learning for ADdiagnosis. Unlike conventional approaches, we leverage an AD-relevant auxiliarytask, namely morphological change prediction, without requiring additional MRIdata. In this auxiliary task, the diagnosis model learns the evidential andtransferable knowledge from morphological features in MRI scans. Experimentalresults demonstrate that our framework is not only effective in improvingdetection performance regardless of model capacity, but also moredata-efficient and faithful.", "output": "Evidence-empowered Transfer Learning for Alzheimer's Disease."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Local feature matching is challenging due to textureless and repetitivepatterns. Existing methods focus on using appearance features and globalinteraction and matching, while the importance of geometry priors in localfeature matching has not been fully exploited. Different from these methods, inthis paper, we delve into the importance of geometry prior and proposeStructured Epipolar Matcher (SEM) for local feature matching, which canleverage the geometric information in an iterative matching way. The proposedmodel enjoys several merits. First, our proposed Structured Feature Extractorcan model the relative positional relationship between pixels andhigh-confidence anchor points. Second, our proposed Epipolar Attention andMatching can filter out irrelevant areas by utilizing the epipolar constraint.Extensive experimental results on five standard benchmarks demonstrate thesuperior performance of our SEM compared to state-of-the-art methods. Projectpage: ", "output": "Structured Epipolar Matcher for Local Feature Matching."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The robustness of 3D perception systems under natural corruptions fromenvironments and sensors is pivotal for safety-critical applications. Existinglarge-scale 3D perception datasets often contain data that are meticulouslycleaned. Such configurations, however, cannot reflect the reliability ofperception models during the deployment stage. In this work, we present Robo3D,the first comprehensive benchmark heading toward probing the robustness of 3Ddetectors and segmentors under out-of-distribution scenarios against naturalcorruptions that occur in real-world environments. Specifically, we considereight corruption types stemming from adversarial weather conditions, externaldisturbances, and internal sensor failure. We uncover that, although promisingresults have been progressively achieved on standard benchmarks,state-of-the-art 3D perception models are at risk of being vulnerable tocorruptions. We draw key observations on the use of data representations,augmentation schemes, and training strategies, that could severely affect themodel's performance. To pursue better robustness, we propose adensity-insensitive training framework along with a simple flexiblevoxelization strategy to enhance the model resiliency. We hope our benchmarkand approach could inspire future research in designing more robust andreliable 3D perception models. Our robustness benchmark suite is publiclyavailable.", "output": "Robo3D: Towards Robust and Reliable 3D Perception against Corruptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale text-to-image diffusion models achieve unprecedented success inimage generation and editing. However, how to extend such success to videoediting is unclear. Recent initial attempts at video editing requiresignificant text-to-video data and computation resources for training, which isoften not accessible. In this work, we propose vid2vid-zero, a simple yeteffective method for zero-shot video editing. Our vid2vid-zero leveragesoff-the-shelf image diffusion models, and doesn't require training on anyvideo. At the core of our method is a null-text inversion module fortext-to-video alignment, a cross-frame modeling module for temporalconsistency, and a spatial regularization module for fidelity to the originalvideo. Without any training, we leverage the dynamic nature of the attentionmechanism to enable bi-directional temporal modeling at test time. Experimentsand analyses show promising results in editing attributes, subjects, places,etc., in real-world videos. Code is made available aturl{", "output": "Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning image classification and image generation using the same set ofnetwork parameters is a challenging problem. Recent advanced approaches performwell in one task often exhibit poor performance in the other. This workintroduces an energy-based classifier and generator, namely EGC, which canachieve superior performance in both tasks using a single neural network.Unlike a conventional classifier that outputs a label given an image (i.e., aconditional distribution $p(y|mathbf{x})$), the forward pass in EGC is aclassifier that outputs a joint distribution $p(mathbf{x},y)$, enabling animage generator in its backward pass by marginalizing out the label $y$. Thisis done by estimating the energy and classification probability given a noisyimage in the forward pass, while denoising it using the score functionestimated in the backward pass. EGC achieves competitive generation resultscompared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUNChurch, while achieving superior classification accuracy and robustness againstadversarial attacks on CIFAR-10. This work represents the first successfulattempt to simultaneously excel in both tasks using a single set of networkparameters. We believe that EGC bridges the gap between discriminative andgenerative learning.", "output": "EGC: Image Generation and Classification via a Diffusion Energy-Based Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial attention has been widely used to improve the performance ofconvolutional neural networks by allowing them to focus on importantinformation. However, it has certain limitations. In this paper, we propose anew perspective on the effectiveness of spatial attention, which is that it cansolve the problem of convolutional kernel parameter sharing. Despite this, theinformation contained in the attention map generated by spatial attention isnot sufficient for large-size convolutional kernels. Therefore, we introduce anew attention mechanism called Receptive-Field Attention (RFA). While previousattention mechanisms such as the Convolutional Block Attention Module (CBAM)and Coordinate Attention (CA) only focus on spatial features, they cannot fullyaddress the issue of convolutional kernel parameter sharing. In contrast, RFAnot only focuses on the receptive-field spatial feature but also provideseffective attention weights for large-size convolutional kernels. TheReceptive-Field Attention convolutional operation (RFAConv), developed by RFA,represents a new approach to replace the standard convolution operation. Itoffers nearly negligible increment of computational cost and parameters, whilesignificantly improving network performance. We conducted a series ofexperiments on ImageNet-1k, MS COCO, and VOC datasets, which demonstrated thesuperiority of our approach in various tasks including classification, objectdetection, and semantic segmentation. Of particular importance, we believe thatit is time to shift focus from spatial features to receptive-field spatialfeatures for current spatial attention mechanisms. By doing so, we can furtherimprove network performance and achieve even better results. The code andpre-trained models for the relevant tasks can be found at", "output": "RFAConv: Innovating Spatital Attention and Standard Convolutional Operation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video captioning aims to convey dynamic scenes from videos using naturallanguage, facilitating the understanding of spatiotemporal information withinour environment. Although there have been recent advances, generating detailedand enriched video descriptions continues to be a substantial challenge. Inthis work, we introduce Video ChatCaptioner, an innovative approach forcreating more comprehensive spatiotemporal video descriptions. Our methodemploys a ChatGPT model as a controller, specifically designed to select framesfor posing video content-driven questions. Subsequently, a robust algorithm isutilized to answer these visual queries. This question-answer frameworkeffectively uncovers intricate video details and shows promise as a method forenhancing video content. Following multiple conversational rounds, ChatGPT cansummarize enriched video content based on previous conversations. Wequalitatively demonstrate that our Video ChatCaptioner can generate captionscontaining more visual details about the videos. The code is publicly availableat ", "output": "Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Avoiding the introduction of ghosts when synthesising LDR images as highdynamic range (HDR) images is a challenging task. Convolutional neural networks(CNNs) are effective for HDR ghost removal in general, but are challenging todeal with the LDR images if there are large movements oroversaturation/undersaturation. Existing dual-branch methods combining CNN andTransformer omit part of the information from non-reference images, while thefeatures extracted by the CNN-based branch are bound to the kernel size withsmall receptive field, which are detrimental to the deblurring and the recoveryof oversaturated/undersaturated regions. In this paper, we propose a novelhierarchical dual Transformer method for ghost-free HDR (HDT-HDR) imagesgeneration, which extracts global features and local features simultaneously.First, we use a CNN-based head with spatial attention mechanisms to extractfeatures from all the LDR images. Second, the LDR features are delivered to theHierarchical Dual Transformer (HDT). In each Dual Transformer (DT), the globalfeatures are extracted by the window-based Transformer, while the local detailsare extracted using the channel attention mechanism with deformable CNNs.Finally, the ghost free HDR image is obtained by dimensional mapping on the HDToutput. Abundant experiments demonstrate that our HDT-HDR achieves thestate-of-the-art performance among existing HDR ghost removal methods.", "output": "High Dynamic Range Imaging with Context-aware Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High dynamic range (HDR) video reconstruction is attracting more and moreattention due to the superior visual quality compared with those of low dynamicrange (LDR) videos. The availability of LDR-HDR training pairs is essential forthe HDR reconstruction quality. However, there are still no real LDR-HDR pairsfor dynamic scenes due to the difficulty in capturing LDR-HDR framessimultaneously. In this work, we propose to utilize a staggered sensor tocapture two alternate exposure images simultaneously, which are then fused intoan HDR frame in both raw and sRGB domains. In this way, we build a large scaleLDR-HDR video dataset with 85 scenes and each scene contains 60 frames. Basedon this dataset, we further propose a Raw-HDRNet, which utilizes the raw LDRframes as inputs. We propose a pyramid flow-guided deformation convolution toalign neighboring frames. Experimental results demonstrate that 1) the proposeddataset can improve the HDR reconstruction performance on real scenes for threebenchmark networks; 2) Compared with sRGB inputs, utilizing raw inputs canfurther improve the reconstruction quality and our proposed Raw-HDRNet is astrong baseline for raw HDR reconstruction. Our dataset and code will bereleased after the acceptance of this paper.", "output": "HDR Video Reconstruction with a Large Dynamic Dataset in Raw and sRGB Domains."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-object tracking in sports scenes plays a critical role in gatheringplayers statistics, supporting further analysis, such as automatic tacticalanalysis. Yet existing MOT benchmarks cast little attention on the domain,limiting its development. In this work, we present a new large-scalemulti-object tracking dataset in diverse sports scenes, coined asemph{SportsMOT}, where all players on the court are supposed to be tracked. Itconsists of 240 video sequences, over 150K frames (almost 15times MOT17) andover 1.6M bounding boxes (3times MOT17) collected from 3 sports categories,including basketball, volleyball and football. Our dataset is characterizedwith two key properties: 1) fast and variable-speed motion and 2) similar yetdistinguishable appearance. We expect SportsMOT to encourage the MOT trackersto promote in both motion-based association and appearance-based association.We benchmark several state-of-the-art trackers and reveal the key challenge ofSportsMOT lies in object association. To alleviate the issue, we furtherpropose a new multi-object tracking framework, termed as emph{MixSort},introducing a MixFormer-like structure as an auxiliary association model toprevailing tracking-by-detection trackers. By integrating the customizedappearance-based association with the original motion-based association,MixSort achieves state-of-the-art performance on SportsMOT and MOT17. Based onMixSort, we give an in-depth analysis and provide some profound insights intoSportsMOT. The dataset and code will be available at", "output": "SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Meta AI Research approaches a general, promptable Segment AnythingModel (SAM) pre-trained on an unprecedentedly large segmentation dataset(SA-1B). Without a doubt, the emergence of SAM will yield significant benefitsfor a wide array of practical image segmentation applications. In this study,we conduct a series of intriguing investigations into the performance of SAMacross various applications, particularly in the fields of natural images,agriculture, manufacturing, remote sensing, and healthcare. We analyze anddiscuss the benefits and limitations of SAM and provide an outlook on futuredevelopment of segmentation tasks. Note that our work does not intend topropose new algorithms or theories, but rather provide a comprehensive view ofSAM in practice. This work is expected to provide insights that facilitatefuture research activities toward generic segmentation.", "output": "Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of approaches for trajectory prediction requires metrics tovalidate and compare their performance. Currently established metrics are basedon Euclidean distance, which means that errors are weighted equally in alldirections. Euclidean metrics are insufficient for structured environments likeroads, since they do not properly capture the agent's intent relative to theunderlying lane. In order to provide a reasonable assessment of trajectoryprediction approaches with regard to the downstream planning task, we propose anew metric that is lane distance-based: Lane Miss Rate (LMR). For thecalculation of LMR, the ground-truth and predicted endpoints are assigned tolane segments, more precisely their centerlines. Measured by the distance alongthe lane segments, predictions that are within a certain threshold distance tothe ground-truth count as hits, otherwise they count as misses. LMR is thendefined as the ratio of sequences that yield a miss. Our results on threestate-of-the-art trajectory prediction models show that LMR preserves the orderof Euclidean distance-based metrics. In contrast to the Euclidean Miss Rate,qualitative results show that LMR yields misses for sequences where predictionsare located on wrong lanes. Hits on the other hand result for sequences wherepredictions are located on the correct lane. This means that LMR implicitlyweights Euclidean error relative to the lane and goes into the direction ofcapturing intents of traffic agents. The source code of LMR for Argoverse 2 ispublicly available.", "output": "LMR: Lane Distance-Based Metric for Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present ImageReward -- the first general-purpose text-to-image humanpreference reward model -- to address various prevalent issues in generativemodels and align them with human values and preferences. Its training is basedon our systematic annotation pipeline that covers both the rating and rankingcomponents, collecting a dataset of 137k expert comparisons to date. In humanevaluation, ImageReward outperforms existing scoring methods (e.g., CLIP by38.6%), making it a promising automatic metric for evaluating and improvingtext-to-image synthesis. The reward model is publicly available via thetexttt{image-reward} package at url{", "output": "ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The swift and precise detection of vehicles holds significant researchsignificance in intelligent transportation systems (ITS). However, currentvehicle detection algorithms encounter challenges such as high computationalcomplexity, low detection rate, and limited feasibility on mobile devices. Toaddress these issues, this paper proposes a lightweight vehicle detectionalgorithm for YOLOv7-tiny called Ghost-YOLOv7. The model first scales the widthmultiple to 0.5 and replaces the standard convolution of the backbone networkwith Ghost convolution to achieve a lighter network and improve the detectionspeed; secondly, a Ghost bi-directional feature pyramid network (Ghost-BiFPN)neck network is designed to enhance feature extraction capability of thealgorithm and enrich semantic information; thirdly, a Ghost Decouoled Head(GDH) is employed for accurate prediction of vehicle location and class,enhancing model accuracy; finally, a coordinate attention mechanism isintroduced in the output layer to suppress environmental interference, and theWIoU loss function is employed to enhance the detection accuracy further.Experimental results on the PASCAL VOC dataset demonstrate that Ghost-YOLOv7outperforms the original YOLOv7-tiny model, achieving a 29.8% reduction incomputation, 37.3% reduction in the number of parameters, 35.1% reduction inmodel weights, and 1.1% higher mean average precision (mAP), while achieving adetection speed of 428 FPS. These results validate the effectiveness of theproposed method.", "output": "Fast vehicle detection algorithm based on lightweight YOLO7-tiny."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in areas such as natural language processing and computervision rely on intricate and massive models that have been trained using vastamounts of unlabelled or partly labeled data and training or deploying thesestate-of-the-art methods to resource constraint environments has been achallenge. Galaxy morphologies are crucial to understanding the processes bywhich galaxies form and evolve. Efficient methods to classify galaxymorphologies are required to extract physical information from modern-dayastronomy surveys. In this paper, we introduce methods to learn from lessamounts of data. We propose using a hybrid transformer-convolutionalarchitecture drawing much inspiration from the success of CoAtNet and MaxViT.Concretely, we use the transformer-convolutional hybrid with a new stack designfor the network, a different way of creating a relative self-attention layer,and pair it with a careful selection of data augmentation and regularizationtechniques. Our approach sets a new state-of-the-art on predicting galaxymorphologies from images on the Galaxy10 DECals dataset, a science objective,which consists of 17736 labeled images achieving $94.86%$ top-$1$ accuracy,beating the current state-of-the-art for this task by $4.62%$. Furthermore,this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet.We also find that models and training methods used for larger datasets wouldoften not work very well in the low-data regime. Our code and models will bereleased at a later date before the conference.", "output": "Astroformer: More Data Might Not be All You Need for Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper clarifies why bias cannot be completely mitigated in MachineLearning (ML) and proposes an end-to-end methodology to translate the ethicalprinciple of justice and fairness into the practice of ML development as anongoing agreement with stakeholders. The pro-ethical iterative processpresented in the paper aims to challenge asymmetric power dynamics in thefairness decision making within ML design and support ML development teams toidentify, mitigate and monitor bias at each step of ML systems development. Theprocess also provides guidance on how to explain the always imperfecttrade-offs in terms of bias to users.", "output": "Fairness: from the ethical principle to the practice of Machine Learning development as an ongoing agreement with stakeholders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning models for scoring sleep stages based on single-channel EEGhave been proposed as a promising method for remote sleep monitoring. However,applying these models to new datasets, particularly from wearable devices,raises two questions. First, when annotations on a target dataset areunavailable, which different data characteristics affect the sleep stagescoring performance the most and by how much? Second, when annotations areavailable, which dataset should be used as the source of transfer learning tooptimize performance? In this paper, we propose a novel method forcomputationally quantifying the impact of different data characteristics on thetransferability of deep learning models. Quantification is accomplished bytraining and evaluating two models with significant architectural differences,TinySleepNet and U-Time, under various transfer configurations in which thesource and target datasets have different recording channels, recordingenvironments, and subject conditions. For the first question, the environmenthad the highest impact on sleep stage scoring performance, with performancedegrading by over 14% when sleep annotations were unavailable. For the secondquestion, the most useful transfer sources for TinySleepNet and the U-Timemodels were MASS-SS1 and ISRUC-SG1, containing a high percentage of N1 (therarest sleep stage) relative to the others. The frontal and central EEGs werepreferred for TinySleepNet. The proposed approach enables full utilization ofexisting sleep datasets for training and planning model transfer to maximizethe sleep stage scoring performance on a target problem when sleep annotationsare limited or unavailable, supporting the realization of remote sleepmonitoring.", "output": "Quantifying the Impact of Data Characteristics on the Transferability of Sleep Stage Scoring Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement learning (RL) is a branch of machine learning that has beenused in a variety of applications such as robotics, game playing, andautonomous systems. In recent years, there has been growing interest inapplying RL to quantitative trading, where the goal is to make profitabletrades in financial markets. This paper explores the use of RL in quantitativetrading and presents a case study of a RL-based trading algorithm. The resultsshow that RL can be a powerful tool for quantitative trading, and that it hasthe potential to outperform traditional trading algorithms. The use ofreinforcement learning in quantitative trading represents a promising area ofresearch that can potentially lead to the development of more sophisticated andeffective trading systems. Future work could explore the use of alternativereinforcement learning algorithms, incorporate additional data sources, andtest the system on different asset classes. Overall, our research demonstratesthe potential of using reinforcement learning in quantitative trading andhighlights the importance of continued research and development in this area.By developing more sophisticated and effective trading systems, we canpotentially improve the efficiency of financial markets and generate greaterreturns for investors.", "output": "Quantitative Trading using Deep Q Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Wearable devices for seizure monitoring detection could significantly improvethe quality of life of epileptic patients. However, existing solutions thatmostly rely on full electrode set of electroencephalogram (EEG) measurementscould be inconvenient for every day use. In this paper, we propose a novelknowledge distillation approach to transfer the knowledge from a sophisticatedseizure detector (called the teacher) trained on data from the full set ofelectrodes to learn new detectors (called the student). They are both providinglightweight implementations and significantly reducing the number of electrodesneeded for recording the EEG. We consider the case where the teacher and thestudent seizure detectors are graph neural networks (GNN), since thesearchitectures actively use the connectivity information. We consider two cases(a) when a single student is learnt for all the patients using preselectedchannels; and (b) when personalized students are learnt for every individualpatient, with personalized channel selection using a Gumbelsoftmax approach.Our experiments on the publicly available Temple University Hospital EEGSeizure Data Corpus (TUSZ) show that both knowledge-distillation andpersonalization play significant roles in improving performance of seizuredetection, particularly for patients with scarce EEG data. We observe thatusing as few as two channels, we are able to obtain competitive seizuredetection performance. This, in turn, shows the potential of our approach inmore realistic scenario of wearable devices for personalized monitoring ofseizures, even with few recordings.", "output": "Knowledge-Distilled Graph Neural Networks for Personalized Epileptic Seizure Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The proliferation of multi-unit cortical recordings over the last twodecades, especially in macaques and during motor-control tasks, has generatedinterest in neural \"population dynamics\": the time evolution of neural activityacross a group of neurons working together. A good model of these dynamicsshould be able to infer the activity of unobserved neurons within the samepopulation and of the observed neurons at future times. Accordingly,Pandarinath and colleagues have introduced a benchmark to evaluate models onthese two (and related) criteria: four data sets, each consisting of firingrates from a population of neurons, recorded from macaque cortex duringmovement-related tasks. Here we show that simple, general-purpose architecturesbased on recurrent neural networks (RNNs) outperform more \"bespoke\" models, andindeed outperform all published models on all four data sets in the benchmark.Performance can be improved further still with a novel, hybrid architecturethat augments the RNN with self-attention, as in transformer networks. But puretransformer models fail to achieve this level of performance, either in ourwork or that of other groups. We argue that the autoregressive bias imposed byRNNs is critical for achieving the highest levels of performance. We conclude,however, by proposing that the benchmark be augmented with an alternativeevaluation of latent dynamics that favors generative over discriminative modelslike the ones we propose in this report.", "output": "Inferring Population Dynamics in Macaque Cortex."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Next generation cars embed intelligent assessment of car driving safetythrough innovative solutions often based on usage of artificial intelligence.The safety driving monitoring can be carried out using several methodologieswidely treated in scientific literature. In this context, the author proposesan innovative approach that uses ad-hoc bio-sensing system suitable toreconstruct the physio-based attentional status of the car driver. Toreconstruct the car driver physiological status, the author proposed the use ofa bio-sensing probe consisting of a coupled LEDs at Near infrared (NiR)spectrum with a photodetector. This probe placed over the monitored subjectallows to detect a physiological signal called PhotoPlethysmoGraphy (PPG). ThePPG signal formation is regulated by the change in oxygenated andnon-oxygenated hemoglobin concentration in the monitored subject bloodstreamwhich will be directly connected to cardiac activity in turn regulated by theAutonomic Nervous System (ANS) that characterizes the subject's attentionlevel. This so designed car driver drowsiness monitoring will be combined withfurther driving safety assessment based on correlated intelligent drivingscenario understanding.", "output": "Deep Learning Systems for Advanced Driving Assistance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent surge in the number of Electric Vehicles have created a need todevelop inexpensive energy-dense Battery Storage Systems. Many countries acrossthe planet have put in place concrete measures to reduce and subsequently limitthe number of vehicles powered by fossil fuels. Lithium-ion based batteries arepresently dominating the electric automotive sector. Energy research effortsare also focussed on accurate computation of State-of-Charge of such batteriesto provide reliable vehicle range estimates. Although such estimationalgorithms provide precise estimates, all such techniques available inliterature presume availability of superior quality battery datasets. Inreality, gaining access to proprietary battery usage datasets is very tough forbattery scientists. Moreover, open access datasets lack the diverse batterycharge/discharge patterns needed to build generalized models. Curating batterymeasurement data is time consuming and needs expensive equipment. To surmountsuch limited data scenarios, we introduce few Deep Learning-based methods tosynthesize high-fidelity battery datasets, these augmented synthetic datasetswill help battery researchers build better estimation models in the presence oflimited data. We have released the code and dataset used in the presentapproach to generate synthetic data. The battery data augmentation techniquesintroduced here will alleviate limited battery dataset challenges.", "output": "A Deep Learning Approach Towards Generating High-fidelity Diverse Synthetic Battery Datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We applied physics-informed neural networks to solve the constitutiverelations for nonlinear, path-dependent material behavior. As a result, thetrained network not only satisfies all thermodynamic constraints but alsoinstantly provides information about the current material state (i.e., freeenergy, stress, and the evolution of internal variables) under any givenloading scenario without requiring initial data. One advantage of this work isthat it bypasses the repetitive Newton iterations needed to solve nonlinearequations in complex material models. Additionally, strategies are provided toreduce the required order of derivation for obtaining the tangent operator. Thetrained model can be directly used in any finite element package (or othernumerical methods) as a user-defined material model. However, challenges remainin the proper definition of collocation points and in integrating severalnon-equality constraints that become active or non-active simultaneously. Wetested this methodology on rate-independent processes such as the classical vonMises plasticity model with a nonlinear hardening law, as well as local damagemodels for interface cracking behavior with a nonlinear softening law. Finally,we discuss the potential and remaining challenges for future developments ofthis new approach.", "output": "Learning solution of nonlinear constitutive material models using physics-informed neural networks: COMM-PINN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Combinatorial optimization (CO) aims to efficiently find the best solution toNP-hard problems ranging from statistical physics to social media marketing. Awide range of CO applications can benefit from local search methods becausethey allow reversible action over greedy policies. Deep Q-learning (DQN) usingmessage-passing neural networks (MPNN) has shown promise in replicating thelocal search behavior and obtaining comparable results to the local searchalgorithms. However, the over-smoothing and the information loss during theiterations of message passing limit its robustness across applications, and thelarge message vectors result in memory inefficiency. Our paper introducesRELS-DQN, a lightweight DQN framework that exhibits the local search behaviorwhile providing practical scalability. Using the RELS-DQN model trained on oneapplication, it can generalize to various applications by providing solutionvalues higher than or equal to both the local search algorithms and theexisting DQN models while remaining efficient in runtime and memory.", "output": "RELS-DQN: A Robust and Efficient Local Search Framework for Combinatorial Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over the past decade, neural network (NN)-based controllers have demonstratedremarkable efficacy in a variety of decision-making tasks. However, theirblack-box nature and the risk of unexpected behaviors and surprising resultspose a challenge to their deployment in real-world systems with strongguarantees of correctness and safety. We address these limitations byinvestigating the transformation of NN-based controllers into equivalent softdecision tree (SDT)-based controllers and its impact on verifiability.Differently from previous approaches, we focus on discrete-output NNcontrollers including rectified linear unit (ReLU) activation functions as wellas argmax operations. We then devise an exact but cost-effective transformationalgorithm, in that it can automatically prune redundant branches. We evaluateour approach using two benchmarks from the OpenAI Gym environment. Our resultsindicate that the SDT transformation can benefit formal verification, showingruntime improvements of up to 21x and 2x for MountainCar-v0 and CartPole-v0,respectively.", "output": "Exact and Cost-Effective Automated Transformation of Neural Network Controllers to Decision Tree Controllers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the continuous improvement of computing power and deep learningalgorithms in recent years, the foundation model has grown in popularity.Because of its powerful capabilities and excellent performance, this technologyis being adopted and applied by an increasing number of industries. In theintelligent transportation industry, artificial intelligence faces thefollowing typical challenges: few shots, poor generalization, and a lack ofmulti-modal techniques. Foundation model technology can significantly alleviatethe aforementioned issues. To address these, we designed the 1st FoundationModel Challenge, with the goal of increasing the popularity of foundation modeltechnology in traffic scenarios and promoting the rapid development of theintelligent transportation industry. The challenge is divided into two tracks:all-in-one and cross-modal image retrieval. Furthermore, we provide a newbaseline and benchmark for the two tracks, called Open-TransMind. According toour knowledge, Open-TransMind is the first open-source transportationfoundation model with multi-task and multi-modal capabilities. Simultaneously,Open-TransMind can achieve state-of-the-art performance on detection,classification, and segmentation datasets of traffic scenarios. Our source codeis available at ", "output": "Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deploying deep learning models in real-world certified systems requires theability to provide confidence estimates that accurately reflect theiruncertainty. In this paper, we demonstrate the use of the conformal predictionframework to construct reliable and trustworthy predictors for detectingrailway signals. Our approach is based on a novel dataset that includes imagestaken from the perspective of a train operator and state-of-the-art objectdetectors. We test several conformal approaches and introduce a new methodbased on conformal risk control. Our findings demonstrate the potential of theconformal prediction framework to evaluate model performance and providepractical guidance for achieving formally guaranteed uncertainty bounds.", "output": "Confident Object Detection via Conformal Prediction and Conformal Risk Control: an Application to Railway Signaling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Landslide susceptibility prediction has always been an important andchallenging content. However, there are some uncertain problems to be solved insusceptibility modeling, such as the error of landslide samples and the complexnonlinear relationship between environmental factors. A self-screening graphconvolutional network and long short-term memory network (SGCN-LSTM) isproposed int this paper to overcome the above problems in landslidesusceptibility prediction. The SGCN-LSTM model has the advantages of wide widthand good learning ability. The landslide samples with large errors outside theset threshold interval are eliminated by self-screening network, and thenonlinear relationship between environmental factors can be extracted from bothspatial nodes and time series, so as to better simulate the nonlinearrelationship between environmental factors. The SGCN-LSTM model was applied tolandslide susceptibility prediction in Anyuan County, Jiangxi Province, China,and compared with Cascade-parallel Long Short-Term Memory and ConditionalRandom Fields (CPLSTM-CRF), Random Forest (RF), Support Vector Machine (SVM),Stochastic Gradient Descent (SGD) and Logistic Regression (LR) models.Thelandslide prediction experiment in Anyuan County showed that the total accuracyand AUC of SGCN-LSTM model were the highest among the six models, and the totalaccuracy reached 92.38 %, which was 5.88%, 12.44%, 19.65%, 19.92% and 20.34%higher than those of CPLSTM-CRF, RF, SVM, SGD and LR models, respectively. TheAUC value reached 0.9782, which was 0.0305,0.0532,0.1875,0.1909 and 0.1829higher than the other five models, respectively. In conclusion, compared withsome existing traditional machine learning, the SGCN-LSTM model proposed inthis paper has higher landslide prediction accuracy and better robustness, andhas a good application prospect in the LSP field.", "output": "Landslide Susceptibility Prediction Modeling Based on Self-Screening Deep Learning Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fairness in AI has garnered quite some attention in research, andincreasingly also in society. The so-called \"Impossibility Theorem\" has beenone of the more striking research results with both theoretical and practicalconsequences, as it states that satisfying a certain combination of fairnessmeasures is impossible. To date, this negative result has not yet beencomplemented with a positive one: a characterization of which combinations offairness notions are possible. This work aims to fill this gap by identifyingmaximal sets of commonly used fairness measures that can be simultaneouslysatisfied. The fairness measures used are demographic parity, equalopportunity, false positive parity, predictive parity, predictive equality,overall accuracy equality and treatment equality. We conclude that in total 12maximal sets of these fairness measures are possible, among which sevencombinations of two measures, and five combinations of three measures. Our workraises interest questions regarding the practical relevance of each of these 12maximal fairness notions in various scenarios.", "output": "Maximal Fairness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ultra-low-resolution Infrared (IR) array sensors offer a low-cost,energy-efficient, and privacy-preserving solution for people counting, withapplications such as occupancy monitoring. Previous work has shown that DeepLearning (DL) can yield superior performance on this task. However, theliterature was missing an extensive comparative analysis of various efficientDL architectures for IR array-based people counting, that considers not onlytheir accuracy, but also the cost of deploying them on memory- andenergy-constrained Internet of Things (IoT) edge nodes. In this work, weaddress this need by comparing 6 different DL architectures on a novel datasetcomposed of IR images collected from a commercial 8x8 array, which we madeopenly available. With a wide architectural exploration of each model type, weobtain a rich set of Pareto-optimal solutions, spanning cross-validatedbalanced accuracy scores in the 55.70-82.70% range. When deployed on acommercial Microcontroller (MCU) by STMicroelectronics, the STM32L4A6ZG, thesemodels occupy 0.41-9.28kB of memory, and require 1.10-7.74ms per inference,while consuming 17.18-120.43 $mu$J of energy. Our models are significantlymore accurate than a previous deterministic method (up to +39.9%), while beingup to 3.53x faster and more energy efficient. Further, our models' accuracy iscomparable to state-of-the-art DL solutions on similar resolution sensors,despite a much lower complexity. All our models enable continuous, real-timeinference on a MCU-based IoT node, with years of autonomous operation withoutbattery recharging.", "output": "Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Energy-Based Models (EBMs) are known in the Machine Learning community forthe decades. Since the seminal works devoted to EBMs dating back to thenoughties there have been appearing a lot of efficient methods which solve thegenerative modelling problem by means of energy potentials (unnormalizedlikelihood functions). In contrast, the realm of Optimal Transport (OT) and, inparticular, neural OT solvers is much less explored and limited by few recentworks (excluding WGAN based approaches which utilize OT as a loss function anddo not model OT maps themselves). In our work, we bridge the gap between EBMsand Entropy-regularized OT. We present the novel methodology which allowsutilizing the recent developments and technical improvements of the former inorder to enrich the latter. We validate the applicability of our method on toy2D scenarios as well as standard unpaired image-to-image translation problems.For the sake of simplicity, we choose simple short- and long- run EBMs as abackbone of our Energy-guided Entropic OT method, leaving the application ofmore sophisticated EBMs for future research.", "output": "Energy-guided Entropic Neural Optimal Transport."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "N-body simulations are the most powerful method to study the non-linearevolution of large-scale structure. However, they require large amounts ofcomputational resources, making unfeasible their direct adoption in scenariosthat require broad explorations of parameter spaces. In this work, we show thatit is possible to perform fast dark matter density field emulations withcompetitive accuracy using simple machine-learning approaches. We build anemulator based on dimensionality reduction and machine learning regressioncombining simple Principal Component Analysis and supervised learning methods.For the estimations with a single free parameter, we train on the dark matterdensity parameter, $Omega_m$, while for emulations with two free parameters,we train on a range of $Omega_m$ and redshift. The method first adopts aprojection of a grid of simulations on a given basis; then, a machine learningregression is trained on this projected grid. Finally, new density cubes fordifferent cosmological parameters can be estimated without relying directly onnew N-body simulations by predicting and de-projecting the basis coefficients.We show that the proposed emulator can generate density cubes at non-linearcosmological scales with density distributions within a few percent compared tothe corresponding N-body simulations. The method enables gains of three ordersof magnitude in CPU run times compared to performing a full N-body simulationwhile reproducing the power spectrum and bispectrum within $sim 1%$ and $sim3%$, respectively, for the single free parameter emulation and $sim 5%$ and$sim 15%$ for two free parameters. This can significantly accelerate thegeneration of density cubes for a wide variety of cosmological models, openingthe doors to previously unfeasible applications, such as parameter and modelinferences at full survey scales as the ESA/NASA Euclid mission.", "output": "Fast emulation of cosmological density fields based on dimensionality reduction and supervised machine-learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies the problem of online performance optimization ofconstrained closed-loop control systems, where both the objective and theconstraints are unknown black-box functions affected by exogenous time-varyingcontextual disturbances. A primal-dual contextual Bayesian optimizationalgorithm is proposed that achieves sublinear cumulative regret with respect tothe dynamic optimal solution under certain regularity conditions. Furthermore,the algorithm achieves zero time-average constraint violation, ensuring thatthe average value of the constraint function satisfies the desired constraint.The method is applied to both sampled instances from Gaussian processes and acontinuous stirred tank reactor parameter tuning problem; simulation resultsshow that the method simultaneously provides close-to-optimal performance andmaintains constraint feasibility on average. This contrasts currentstate-of-the-art methods, which either suffer from large cumulative regret orsevere constraint violations for the case studies presented.", "output": "Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative models such as StyleGAN2 and Stable Diffusion have achievedstate-of-the-art performance in computer vision tasks such as image synthesis,inpainting, and de-noising. However, current generative models for faceinpainting often fail to preserve fine facial details and the identity of theperson, despite creating aesthetically convincing image structures andtextures. In this work, we propose Person Aware Tuning (PAT) of Mask-AwareTransformer (MAT) for face inpainting, which addresses this issue. Our proposedmethod, PATMAT, effectively preserves identity by incorporating referenceimages of a subject and fine-tuning a MAT architecture trained on faces. Byusing ~40 reference images, PATMAT creates anchor points in MAT's style module,and tunes the model using the fixed anchors to adapt the model to a new faceidentity. Moreover, PATMAT's use of multiple images per anchor during trainingallows the model to use fewer reference images than competing methods. Wedemonstrate that PATMAT outperforms state-of-the-art models in terms of imagequality, the preservation of person-specific details, and the identity of thesubject. Our results suggest that PATMAT can be a promising approach forimproving the quality of personalized face inpainting.", "output": "PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The short-form videos have explosive popularity and have dominated the newsocial media trends. Prevailing short-video platforms,~textit{e.g.}, Kuaishou(Kwai), TikTok, Instagram Reels, and YouTube Shorts, have changed the way weconsume and create content. For video content creation and understanding, theshot boundary detection (SBD) is one of the most essential components invarious scenarios. In this work, we release a new public Short video sHotbOundary deTection dataset, named SHOT, consisting of 853 complete short videosand 11,606 shot annotations, with 2,716 high quality shot boundary annotationsin 200 test videos. Leveraging this new data wealth, we propose to optimize themodel design for video SBD, by conducting neural architecture search in asearch space encapsulating various advanced 3D ConvNets and Transformers. Ourproposed approach, named AutoShot, achieves higher F1 scores than previousstate-of-the-art approaches, e.g., outperforming TransNetV2 by 4.2%, when beingderived and evaluated on our newly constructed SHOT dataset. Moreover, tovalidate the generalizability of the AutoShot architecture, we directlyevaluate it on another three public datasets: ClipShots, BBC and RAI, and theF1 scores of AutoShot outperform previous state-of-the-art approaches by 1.1%,0.9% and 1.2%, respectively. The SHOT dataset and code can be found in .", "output": "AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concept bottleneck models (CBM) are a popular way of creating moreinterpretable neural networks by having hidden layer neurons correspond tohuman-understandable concepts. However, existing CBMs and their variants havetwo crucial limitations: first, they need to collect labeled data for each ofthe predefined concepts, which is time consuming and labor intensive; second,the accuracy of a CBM is often significantly lower than that of a standardneural network, especially on more complex datasets. This poor performancecreates a barrier for adopting CBMs in practical real world applications.Motivated by these challenges, we propose Label-free CBM which is a novelframework to transform any neural network into an interpretable CBM withoutlabeled concept data, while retaining a high accuracy. Our Label-free CBM hasmany advantages, it is: scalable - we present the first CBM scaled to ImageNet,efficient - creating a CBM takes only a few hours even for very large datasets,and automated - training it for a new dataset requires minimal human effort.Our code is available at ", "output": "Label-Free Concept Bottleneck Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While deep learning models have become the predominant method for medicalimage segmentation, they are typically not capable of generalizing to unseensegmentation tasks involving new anatomies, image modalities, or labels. Givena new segmentation task, researchers generally have to train or fine-tunemodels, which is time-consuming and poses a substantial barrier for clinicalresearchers, who often lack the resources and expertise to train neuralnetworks. We present UniverSeg, a method for solving unseen medicalsegmentation tasks without additional training. Given a query image and exampleset of image-label pairs that define a new segmentation task, UniverSeg employsa new Cross-Block mechanism to produce accurate segmentation maps without theneed for additional training. To achieve generalization to new tasks, we havegathered and standardized a collection of 53 open-access medical segmentationdatasets with over 22,000 scans, which we refer to as MegaMedical. We used thiscollection to train UniverSeg on a diverse set of anatomies and imagingmodalities. We demonstrate that UniverSeg substantially outperforms severalrelated methods on unseen tasks, and thoroughly analyze and draw insights aboutimportant aspects of the proposed system. The UniverSeg source code and modelweights are freely available at ", "output": "UniverSeg: Universal Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As deep learning models increasingly find applications in critical domainssuch as medical imaging, the need for transparent and trustworthydecision-making becomes paramount. Many explainability methods provide insightsinto how these models make predictions by attributing importance to inputfeatures. As Vision Transformer (ViT) becomes a promising alternative toconvolutional neural networks for image classification, its interpretabilityremains an open research question. This paper investigates the performance ofvarious interpretation methods on a ViT applied to classify chest X-ray images.We introduce the notion of evaluating faithfulness, sensitivity, and complexityof ViT explanations. The obtained results indicate that Layerwise relevancepropagation for transformers outperforms Local interpretable model-agnosticexplanations and Attention visualization, providing a more accurate andreliable representation of what a ViT has actually learned. Our findingsprovide insights into the applicability of ViT explanations in medical imagingand highlight the importance of using appropriate evaluation criteria forcomparing them.", "output": "Towards Evaluating Explanations of Vision Transformers for Medical Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Denoising diffusion probabilistic models (DDPMs) employ a sequence of whiteGaussian noise samples to generate an image. In analogy with GANs, those noisemaps could be considered as the latent code associated with the generatedimage. However, this native noise space does not possess a convenientstructure, and is thus challenging to work with in editing tasks. Here, wepropose an alternative latent noise space for DDPM that enables a wide range ofediting operations via simple means, and present an inversion method forextracting these edit-friendly noise maps for any given image (real orsynthetically generated). As opposed to the native DDPM noise space, theedit-friendly noise maps do not have a standard normal distribution and are notstatistically independent across timesteps. However, they allow perfectreconstruction of any desired image, and simple transformations on themtranslate into meaningful manipulations of the output image (e.g., shifting,color edits). Moreover, in text-conditional models, fixing those noise mapswhile changing the text prompt, modifies semantics while retaining structure.We illustrate how this property enables text-based editing of real images viathe diverse DDPM sampling scheme (in contrast to the popular non-diverse DDIMinversion). We also show how it can be used within existing diffusion-basedediting methods to improve their quality and diversity.", "output": "An Edit Friendly DDPM Noise Space: Inversion and Manipulations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The growclusters package for R implements an enhanced version of k-meansclustering that allows discovery of local clusterings or partitions for acollection of data sets that each draw their cluster means from a single,global partition. The package contains functions to estimate a partitionstructure for multivariate data. Estimation is performed under a penalizedoptimization derived from Bayesian non-parametric formulations. This paperdescribes some of the functions and capabilities of the growclusters package,including the creation of R Shiny applications designed to visually illustratethe operation and functionality of the growclusters package.", "output": "The growclusters Package for R."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we consider the problem of learning a neural networkcontroller for a system required to satisfy a Signal Temporal Logic (STL)specification. We exploit STL quantitative semantics to define a notion ofrobust satisfaction. Guaranteeing the correctness of a neural networkcontroller, i.e., ensuring the satisfaction of the specification by thecontrolled system, is a difficult problem that received a lot of attentionrecently. We provide a general procedure to construct a set of trainable HighOrder Control Barrier Functions (HOCBFs) enforcing the satisfaction of formulasin a fragment of STL. We use the BarrierNet, implemented by a differentiableQuadratic Program (dQP) with HOCBF constraints, as the last layer of the neuralnetwork controller, to guarantee the satisfaction of the STL formulas. We trainthe HOCBFs together with other neural network parameters to further improve therobustness of the controller. Simulation results demonstrate that our approachensures satisfaction and outperforms existing algorithms.", "output": "Learning Robust and Correct Controllers from Signal Temporal Logic Specifications Using BarrierNet."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As more connected devices are implemented in a cyber-physical world and datais expected to be collected and processed in real time, the ability to handletime series data has become increasingly significant. To help analyze timeseries in data mining applications, many time series representation approacheshave been proposed to convert a raw time series into another series forrepresenting the original time series. However, existing approaches are notdesigned for open-ended time series (which is a sequence of data points beingcontinuously collected at a fixed interval without any length limit) becausethese approaches need to know the total length of the target time series inadvance and pre-process the entire time series using normalization methods.Furthermore, many representation approaches require users to configure and tunesome parameters beforehand in order to achieve satisfactory representationresults. In this paper, we propose NP-Free, a real-time Normalization-free andParameter-tuning-free representation approach for open-ended time series.Without needing to use any normalization method or tune any parameter, NP-Freecan generate a representation for a raw time series on the fly by convertingeach data point of the time series into a root-mean-square error (RMSE) valuebased on Long Short-Term Memory (LSTM) and a Look-Back and Predict-Forwardstrategy. To demonstrate the capability of NP-Free in representing time series,we conducted several experiments based on real-world open-source time seriesdatasets. We also evaluated the time consumption of NP-Free in generatingrepresentations.", "output": "NP-Free: A Real-Time Normalization-free and Parameter-tuning-free Representation Approach for Open-ended Time Series."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transition state (TS) search is key in chemistry for elucidating reactionmechanisms and exploring reaction networks. The search for accurate 3D TSstructures, however, requires numerous computationally intensive quantumchemistry calculations due to the complexity of potential energy surfaces.Here, we developed an object-aware SE(3) equivariant diffusion model thatsatisfies all physical symmetries and constraints for generating pairs ofstructures, i.e., reactant, TS, and product, in an elementary reaction.Provided reactant and product, this model generates a TS structure in secondsinstead of the hours required when performing quantum chemistry-basedoptimizations. The generated TS structures achieve an average error of 0.13 Aroot mean square deviation compared to true TS. With a confidence scoring modelfor uncertainty quantification, we approach an accuracy required for reactionrate estimation (2.6 kcal/mol) by only performing quantum chemistry-basedoptimizations on 14% of the most challenging reactions. We envision theproposed approach to be useful in constructing and pruning large reactionnetworks with unknown mechanisms.", "output": "Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a policy parameterization for learning-based control onnonlinear, partially-observed dynamical systems. The parameterization is basedon a nonlinear version of the Youla parameterization and the recently proposedRecurrent Equilibrium Network (REN) class of models. We prove that theresulting Youla-REN parameterization automatically satisfies stability(contraction) and user-tunable robustness (Lipschitz) conditions on theclosed-loop system. This means it can be used for safe learning-based controlwith no additional constraints or projections required to enforce stability orrobustness. We test the new policy class in simulation on two reinforcementlearning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum.We find that the Youla-REN performs similarly to existing learning-based andoptimal control methods while also ensuring stability and exhibiting improvedrobustness to adversarial disturbances.", "output": "Learning Over All Contracting and Lipschitz Closed-Loops for Partially-Observed Nonlinear Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modeling the mechanics of fluid in complex scenes is vital to applications indesign, graphics, and robotics. Learning-based methods provide fast anddifferentiable fluid simulators, however most prior work is unable toaccurately model how fluids interact with genuinely novel surfaces not seenduring training. We introduce SURFSUP, a framework that represents objectsimplicitly using signed distance functions (SDFs), rather than an explicitrepresentation of meshes or particles. This continuous representation ofgeometry enables more accurate simulation of fluid-object interactions overlong time periods while simultaneously making computation more efficient.Moreover, SURFSUP trained on simple shape primitives generalizes considerablyout-of-distribution, even to complex real-world scenes and objects. Finally, weshow we can invert our model to design simple objects to manipulate fluid flow.", "output": "SURFSUP: Learning Fluid Simulation for Novel Surfaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Early warning systems (EWS) are prediction algorithms that have recentlytaken a central role in efforts to improve graduation rates in public schoolsacross the US. These systems assist in targeting interventions at individualstudents by predicting which students are at risk of dropping out. Despitesignificant investments and adoption, there remain significant gaps in ourunderstanding of the efficacy of EWS. In this work, we draw on nearly adecade's worth of data from a system used throughout Wisconsin to provide thefirst large-scale evaluation of the long-term impact of EWS on graduationoutcomes.We present evidence that risk assessments made by the prediction system arehighly accurate, including for students from marginalized backgrounds. Despitethe system's accuracy and widespread use, we find no evidence that it has ledto improved graduation rates. We surface a robust statistical pattern that canexplain why these seemingly contradictory insights hold. Namely, environmentalfeatures, measured at the level of schools, contain significant signal aboutdropout risk. Within each school, however, academic outcomes are essentiallyindependent of individual student performance. This empirical observationindicates that assigning all students within the same school the sameprobability of graduation is a nearly optimal prediction.Our work provides an empirical backbone for the robust, qualitativeunderstanding among education researchers and policy-makers that dropout isstructurally determined. The primary barrier to improving outcomes lies not inidentifying students at risk of dropping out within specific schools, butrather in overcoming structural differences across different school districts.Our findings indicate that we should carefully evaluate the decision to fundearly warning systems without also devoting resources to interventions tacklingstructural barriers.", "output": "Difficult Lessons on Social Prediction from Wisconsin Public Schools."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Real-time forecasting of travel demand during wildfire evacuations is crucialfor emergency managers and transportation planners to make timely andbetter-informed decisions. However, few studies focus on accurate travel demandforecasting in large-scale emergency evacuations. Therefore, this studydevelops and tests a new methodological framework for modeling trip generationin wildfire evacuations by using (a) large-scale GPS data generated by mobiledevices and (b) state-of-the-art AI technologies. The proposed methodology aimsat forecasting evacuation trips and other types of trips. Based on the traveldemand inferred from the GPS data, we develop a new deep learning model, i.e.,Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN), alongwith a model updating scheme to achieve real-time forecasting of travel demandduring wildfire evacuations. The proposed methodological framework is tested inthis study for a real-world case study: the 2019 Kincade Fire in Sonoma County,CA. The results show that SA-MGCRN significantly outperforms all the selectedstate-of-the-art benchmarks in terms of prediction performance. Our findingsuggests that the most important model components of SA-MGCRN are evacuationorder/warning information, proximity to fire, and population change, which areconsistent with behavioral theories and empirical findings.", "output": "Situational-Aware Multi-Graph Convolutional Recurrent Network (SA-MGCRN) for Travel Demand Forecasting During Wildfires."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Our recent intensive study has found that physics-informed neural networks(PINN) tend to be local approximators after training. This observation leads tothis novel physics-informed radial basis network (PIRBN), which can maintainthe local property throughout the entire training process. Compared to deepneural networks, a PIRBN comprises of only one hidden layer and a radial basis\"activation\" function. Under appropriate conditions, we demonstrated that thetraining of PIRBNs using gradient descendent methods can converge to Gaussianprocesses. Besides, we studied the training dynamics of PIRBN via the neuraltangent kernel (NTK) theory. In addition, comprehensive investigationsregarding the initialisation strategies of PIRBN were conducted. Based onnumerical examples, PIRBN has been demonstrated to be more effective andefficient than PINN in solving PDEs with high-frequency features and ill-posedcomputational domains. Moreover, the existing PINN numerical techniques, suchas adaptive learning, decomposition and different types of loss functions, areapplicable to PIRBN. The programs that can regenerate all numerical results canbe found at ", "output": "Physics-informed radial basis network (PIRBN): A local approximation neural network for solving nonlinear PDEs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate delineation of key waveforms in an ECG is a critical initial step inextracting relevant features to support the diagnosis and treatment of heartconditions. Although deep learning based methods using a segmentation model tolocate P, QRS and T waves have shown promising results, their ability to handlesignals exhibiting arrhythmia remains unclear. In this study, we propose anovel approach that leverages a deep learning model to accurately delineatesignals with a wide range of arrhythmia. Our approach involves training asegmentation model using a hybrid loss function that combines segmentation withthe task of arrhythmia classification. In addition, we use a diverse trainingset containing various arrhythmia types, enabling our model to handle a widerange of challenging cases. Experimental results show that our model accuratelydelineates signals with a broad range of abnormal rhythm types, and thecombined training with classification guidance can effectively reduce falsepositive P wave predictions, particularly during atrial fibrillation and atrialflutter. Furthermore, our proposed method shows competitive performance withprevious delineation algorithms on the Lobachevsky University Database (LUDB).", "output": "An Arrhythmia Classification-Guided Segmentation Model for Electrocardiogram Delineation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural image compression methods have seen increasingly strong performance inrecent years. However, they suffer orders of magnitude higher computationalcomplexity compared to traditional codecs, which stands in the way ofreal-world deployment. This paper takes a step forward in closing this gap indecoding complexity by adopting shallow or even linear decoding transforms. Tocompensate for the resulting drop in compression performance, we exploit theoften asymmetrical computation budget between encoding and decoding, byadopting more powerful encoder networks and iterative encoding. Wetheoretically formalize the intuition behind, and our experimental resultsestablish a new frontier in the trade-off between rate-distortion and decodingcomplexity for neural image compression. Specifically, we achieverate-distortion performance competitive with the established mean-scalehyperprior architecture of Minnen et al. (2018), while reducing the overalldecoding complexity by 80 %, or over 90 % for the synthesis transform alone.Our code can be found at ", "output": "Asymmetrically-powered Neural Image Compression with Shallow Decoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) demonstrate great performance in compoundproperty and activity prediction due to their capability to efficiently learncomplex molecular graph structures. However, two main limitations persistincluding compound representation and model interpretability. While atom-levelmolecular graph representations are commonly used because of their ability tocapture natural topology, they may not fully express important substructures orfunctional groups which significantly influence molecular properties.Consequently, recent research proposes alternative representations employingreduction techniques to integrate higher-level information and leverages bothrepresentations for model learning. However, there is still a lack of studyabout different molecular graph representations on model learning andinterpretation. Interpretability is also crucial for drug discovery as it canoffer chemical insights and inspiration for optimization. Numerous studiesattempt to include model interpretation to explain the rationale behindpredictions, but most of them focus solely on individual prediction with littleanalysis of the interpretation on different molecular graph representations.This research introduces multiple molecular graph representations thatincorporate higher-level information and investigates their effects on modellearning and interpretation from diverse perspectives. The results indicatethat combining atom graph representation with reduced molecular graphrepresentation can yield promising model performance. Furthermore, theinterpretation results can provide significant features and potentialsubstructures consistently aligning with background knowledge. These multiplemolecular graph representations and interpretation analysis can bolster modelcomprehension and facilitate relevant applications in drug discovery.", "output": "Enhancing Model Learning and Interpretation Using Multiple Molecular Graph Representations for Compound Property and Activity Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent applications of deep convolutional neural networks in medical imagingraise concerns about their interpretability. While most explainable deeplearning applications use post hoc methods (such as GradCAM) to generatefeature attribution maps, there is a new type of case-based reasoning models,namely ProtoPNet and its variants, which identify prototypes during trainingand compare input image patches with those prototypes. We propose the firstmedical prototype network (MProtoNet) to extend ProtoPNet to brain tumorclassification with 3D multi-parametric magnetic resonance imaging (mpMRI)data. To address different requirements between 2D natural images and 3D mpMRIsespecially in terms of localizing attention regions, a new attention modulewith soft masking and online-CAM loss is introduced. Soft masking helps sharpenattention maps, while online-CAM loss directly utilizes image-level labels whentraining the attention module. MProtoNet achieves statistically significantimprovements in interpretability metrics of both correctness and localizationcoherence (with a best activation precision of $0.713pm0.058$) withouthuman-annotated labels during training, when compared with GradCAM and severalProtoPNet variants. The source code is available at", "output": "MProtoNet: A Case-Based Interpretable Model for Brain Tumor Classification with 3D Multi-parametric Magnetic Resonance Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A method for the nonintrusive and structure-preserving model reduction ofcanonical and noncanonical Hamiltonian systems is presented. Based on the ideaof operator inference, this technique is provably convergent and reduces to astraightforward linear solve given snapshot data and gray-box knowledge of thesystem Hamiltonian. Examples involving several hyperbolic partial differentialequations show that the proposed method yields reduced models which, inaddition to being accurate and stable with respect to the addition of basismodes, preserve conserved quantities well outside the range of their trainingdata.", "output": "Canonical and Noncanonical Hamiltonian Operator Inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Improving performance in multiple domains is a challenging task, and oftenrequires significant amounts of data to train and test models. Active learningtechniques provide a promising solution by enabling models to select the mostinformative samples for labeling, thus reducing the amount of labeled datarequired to achieve high performance. In this paper, we present an activelearning-based framework for improving performance across multiple domains. Ourapproach consists of two stages: first, we use an initial set of labeled datato train a base model, and then we iteratively select the most informativesamples for labeling to refine the model. We evaluate our approach on severalmulti-domain datasets, including image classification, sentiment analysis, andobject recognition. Our experiments demonstrate that our approach consistentlyoutperforms baseline methods and achieves state-of-the-art performance onseveral datasets. We also show that our method is highly efficient, requiringsignificantly fewer labeled samples than other active learning-based methods.Overall, our approach provides a practical and effective solution for improvingperformance across multiple domains using active learning techniques.", "output": "Optimizing Multi-Domain Performance with Active Learning-based Improvement Strategies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-Agent Reinforcement Learning (MARL) discovers policies that maximizereward but do not have safety guarantees during the learning and deploymentphases. Although shielding with Linear Temporal Logic (LTL) is a promisingformal method to ensure safety in single-agent Reinforcement Learning (RL), itresults in conservative behaviors when scaling to multi-agent scenarios.Additionally, it poses computational challenges for synthesizing shields incomplex multi-agent environments. This work introduces Model-based DynamicShielding (MBDS) to support MARL algorithm design. Our algorithm synthesizesdistributive shields, which are reactive systems running in parallel with eachMARL agent, to monitor and rectify unsafe behaviors. The shields candynamically split, merge, and recompute based on agents' states. This designenables efficient synthesis of shields to monitor agents in complexenvironments without coordination overheads. We also propose an algorithm tosynthesize shields without prior knowledge of the dynamics model. The proposedalgorithm obtains an approximate world model by interacting with theenvironment during the early stage of exploration, making our MBDS enjoy formalsafety guarantees with high probability. We demonstrate in simulations that ourframework can surpass existing baselines in terms of safety guarantees andlearning performance.", "output": "Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Labeling mistakes are frequently encountered in real-world applications. Ifnot treated well, the labeling mistakes can deteriorate the classificationperformances of a model seriously. To address this issue, we propose animproved Naive Bayes method for text classification. It is analytically simpleand free of subjective judgements on the correct and incorrect labels. Byspecifying the generating mechanism of incorrect labels, we optimize thecorresponding log-likelihood function iteratively by using an EM algorithm. Oursimulation and experiment results show that the improved Naive Bayes methodgreatly improves the performances of the Naive Bayes method with mislabeleddata.", "output": "Improved Naive Bayes with Mislabeled Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Soft-sensors are gaining popularity due to their ability to provide estimatesof key process variables with little intervention required on the asset and ata low cost. In oil and gas production, virtual flow metering (VFM) is a popularsoft-sensor that attempts to estimate multiphase flow rates in real time. VFMsare based on models, and these models require calibration. The calibration ishighly dependent on the application, both due to the great diversity of themodels, and in the available measurements. The most accurate calibration isachieved by careful tuning of the VFM parameters to well tests, but this can bework intensive, and not all wells have frequent well test data available. Thispaper presents a calibration method based on the measurement provided by theproduction separator, and the assumption that the observed flow should be equalto the sum of flow rates from each individual well. This allows us to jointlycalibrate the VFMs continuously. The method applies Sequential Monte Carlo(SMC) to infer a tuning factor and the flow composition for each well. Themethod is tested on a case with ten wells, using both synthetic and real data.The results are promising and the method is able to provide reasonableestimates of the parameters without relying on well tests. However, somechallenges are identified and discussed, particularly related to the processnoise and how to manage varying data quality.", "output": "Sequential Monte Carlo applied to virtual flow meter calibration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial training and data augmentation with noise are widely adoptedtechniques to enhance the performance of neural networks. This paperinvestigates adversarial training and data augmentation with noise in thecontext of regularized regression in a reproducing kernel Hilbert space (RKHS).We establish the limiting formula for these techniques as the attack and noisesize, as well as the regularization parameter, tend to zero. Based on thislimiting formula, we analyze specific scenarios and demonstrate that, withoutappropriate regularization, these two methods may have larger generalizationerror and Lipschitz constant than standard kernel regression. However, byselecting the appropriate regularization parameter, these two methods canoutperform standard kernel regression and achieve smaller generalization errorand Lipschitz constant. These findings support the empirical observations thatadversarial training can lead to overfitting, and appropriate regularizationmethods, such as early stopping, can alleviate this issue.", "output": "Understanding Overfitting in Adversarial Training in Kernel Regression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When choosing between competing symbolic models for a data set, a human willnaturally prefer the \"simpler\" expression or the one which more closelyresembles equations previously seen in a similar context. This suggests anon-uniform prior on functions, which is, however, rarely considered within asymbolic regression (SR) framework. In this paper we develop methods toincorporate detailed prior information on both functions and their parametersinto SR. Our prior on the structure of a function is based on a $n$-gramlanguage model, which is sensitive to the arrangement of operators relative toone another in addition to the frequency of occurrence of each operator. Wealso develop a formalism based on the Fractional Bayes Factor to treatnumerical parameter priors in such a way that models may be fairly comparedthough the Bayesian evidence, and explicitly compare Bayesian, MinimumDescription Length and heuristic methods for model selection. We demonstratethe performance of our priors relative to literature standards on benchmarksand a real-world dataset from the field of cosmology.", "output": "Priors for symbolic regression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Falls are the public health issue for the elderly all over the world sincethe fall-induced injuries are associated with a large amount of healthcarecost. Falls can cause serious injuries, even leading to death if the elderlysuffers a \"long-lie\". Hence, a reliable fall detection (FD) system is requiredto provide an emergency alarm for first aid. Due to the advances in wearabledevice technology and artificial intelligence, some fall detection systems havebeen developed using machine learning and deep learning methods to analyze thesignal collected from accelerometer and gyroscopes. In order to achieve betterfall detection performance, an ensemble model that combines a coarse-fineconvolutional neural network and gated recurrent unit is proposed in thisstudy. The parallel structure design used in this model restores the differentgrains of spatial characteristics and capture temporal dependencies for featurerepresentation. This study applies the FallAllD public dataset to validate thereliability of the proposed model, which achieves a recall, precision, andF-score of 92.54%, 96.13%, and 94.26%, respectively. The results demonstratethe reliability of the proposed ensemble model in discriminating falls fromdaily living activities and its superior performance compared to thestate-of-the-art convolutional neural network long short-term memory (CNN-LSTM)for FD.", "output": "Deep Learning-based Fall Detection Algorithm Using Ensemble Model of Coarse-fine CNN and GRU Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Heterogeneous graph neural networks aim to discover discriminative nodeembeddings and relations from multi-relational networks.One challenge ofheterogeneous graph learning is the design of learnable meta-paths, whichsignificantly influences the quality of learned embeddings.Thus, in this paper,we propose an Attributed Multi-Order Graph Convolutional Network (AMOGCN),which automatically studies meta-paths containing multi-hop neighbors from anadaptive aggregation of multi-order adjacency matrices. The proposed modelfirst builds different orders of adjacency matrices from manually designed nodeconnections. After that, an intact multi-order adjacency matrix is attachedfrom the automatic fusion of various orders of adjacency matrices. This processis supervised by the node semantic information, which is extracted from thenode homophily evaluated by attributes. Eventually, we utilize a one-layersimplifying graph convolutional network with the learned multi-order adjacencymatrix, which is equivalent to the cross-hop node information propagation withmulti-layer graph neural networks. Substantial experiments reveal that AMOGCNgains superior semi-supervised classification performance compared withstate-of-the-art competitors.", "output": "Attributed Multi-order Graph Convolutional Network for Heterogeneous Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a framework for developing forecasting models bystreamlining the connections between core components of the developmentalprocess. The proposed framework enables swift and robust integration of newdatasets, experimentation on different algorithms, and selection of the bestmodels. We start with the datasets of different issues and apply pre-processingsteps to clean and engineer meaningful representations of time-series data. Toidentify robust training configurations, we introduce a novel mechanism ofmultiple cross-validation strategies. We apply different evaluation metrics tofind the best-suited models for varying applications. One of the referentapplications is our participation in the intelligent forecasting competitionheld by the United States Agency of International Development (USAID). Finally,we leverage the flexibility of the framework by applying different evaluationmetrics to assess the performance of the models in inventory managementsettings.", "output": "Streamlined Framework for Agile Forecasting Model Development towards Efficient Inventory Management."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Effective quantification of uncertainty is an essential and still missingstep towards a greater adoption of deep-learning approaches in differentapplications, including mission-critical ones. In particular, investigations onthe predictive uncertainty of deep-learning models describing non-lineardynamical systems are very limited to date. This paper is aimed at filling thisgap and presents preliminary results on uncertainty quantification for systemidentification with neural state-space models. We frame the learning problem ina Bayesian probabilistic setting and obtain posterior distributions for theneural network's weights and outputs through approximate inference techniques.Based on the posterior, we construct credible intervals on the outputs anddefine a surprise index which can effectively diagnose usage of the model in apotentially dangerous out-of-distribution regime, where predictions cannot betrusted.", "output": "Neural State-Space Models: Empirical Evaluation of Uncertainty Quantification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exact computation of the partition function is known to be intractable,necessitating approximate inference techniques. Existing methods forapproximate inference are slow to converge for many benchmarks. The control ofaccuracy-complexity trade-off is also non-trivial in many of these methods. Wepropose a novel incremental build-infer-approximate (IBIA) framework forapproximate inference that addresses these issues. In this framework, theprobabilistic graphical model is converted into a sequence of clique treeforests (SCTF) with bounded clique sizes. We show that the SCTF can be used toefficiently compute the partition function. We propose two new algorithms whichare used to construct the SCTF and prove the correctness of both. The first isan algorithm for incremental construction of CTFs that is guaranteed to give avalid CTF with bounded clique sizes and the second is an approximationalgorithm that takes a calibrated CTF as input and yields a valid andcalibrated CTF with reduced clique sizes as the output. We have evaluated ourmethod using several benchmark sets from recent UAI competitions and ourresults show good accuracies with competitive runtimes.", "output": "IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sepsis is a life-threatening organ malfunction caused by the host's inabilityto fight infection, which can lead to death without proper and immediatetreatment. Therefore, early diagnosis and medical treatment of sepsis incritically ill populations at high risk for sepsis and sepsis-associatedmortality are vital to providing the patient with rapid therapy. Studies showthat advancing sepsis detection by 6 hours leads to earlier administration ofantibiotics, which is associated with improved mortality. However, clinicalscores like Sequential Organ Failure Assessment (SOFA) are not applicable forearly prediction, while machine learning algorithms can help capture theprogressing pattern for early prediction. Therefore, we aim to develop amachine learning algorithm that predicts sepsis onset 6 hours before it issuspected clinically. Although some machine learning algorithms have beenapplied to sepsis prediction, many of them did not consider the fact that sixhours is not a small gap. To overcome this big gap challenge, we explore amulti-subset approach in which the likelihood of sepsis occurring earlier than6 hours is output from a previous subset and feed to the target subset asadditional features. Moreover, we use the hourly sampled data like vital signsin an observation window to derive a temporal change trend to further assist,which however is often ignored by previous studies. Our empirical study showsthat both the multi-subset approach to alleviating the 6-hour gap and the addedtemporal trend features can help improve the performance of sepsis-relatedearly prediction.", "output": "Multi-Subset Approach to Early Sepsis Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The lack of interpretability of the Vision Transformer may hinder its use incritical real-world applications despite its effectiveness. To overcome thisissue, we propose a post-hoc interpretability method called VISION DIFFMASK,which uses the activations of the model's hidden layers to predict the relevantparts of the input that contribute to its final predictions. Our approach usesa gating mechanism to identify the minimal subset of the original input thatpreserves the predicted distribution over classes. We demonstrate thefaithfulness of our method, by introducing a faithfulness task, and comparingit to other state-of-the-art attribution methods on CIFAR-10 and ImageNet-1K,achieving compelling results. To aid reproducibility and further extension ofour work, we open source our implementation:", "output": "VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose an ultrafast automated model compression frameworkcalled SeerNet for flexible network deployment. Conventionalnon-differen-tiable methods discretely search the desirable compression policybased on the accuracy from exhaustively trained lightweight models, andexisting differentiable methods optimize an extremely large supernet to obtainthe required compressed model for deployment. They both cause heavycomputational cost due to the complex compression policy search and evaluationprocess. On the contrary, we obtain the optimal efficient networks by directlyoptimizing the compression policy with an accurate performance predictor, wherethe ultrafast automated model compression for various computational costconstraint is achieved without complex compression policy search andevaluation. Specifically, we first train the performance predictor based on theaccuracy from uncertain compression policies actively selected by efficientevolutionary search, so that informative supervision is provided to learn theaccurate performance predictor with acceptable cost. Then we leverage thegradient that maximizes the predicted performance under the barrier complexityconstraint for ultrafast acquisition of the desirable compression policy, whereadaptive update stepsizes with momentum are employed to enhance optimality ofthe acquired pruning and quantization strategy. Compared with thestate-of-the-art automated model compression methods, experimental results onimage classification and object detection show that our method achievescompetitive accuracy-complexity trade-offs with significant reduction of thesearch cost.", "output": "Learning Accurate Performance Predictors for Ultrafast Automated Model Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a comprehensive, multi-stage machine learningmethodology that effectively integrates information systems and artificialintelligence to enhance decision-making processes within the domain ofoperations research. The proposed framework adeptly addresses commonlimitations of existing solutions, such as the neglect of data-drivenestimation for vital production parameters, exclusive generation of pointforecasts without considering model uncertainty, and lacking explanationsregarding the sources of such uncertainty. Our approach employs QuantileRegression Forests for generating interval predictions, alongside both localand global variants of SHapley Additive Explanations for the examinedpredictive process monitoring problem. The practical applicability of theproposed methodology is substantiated through a real-world production planningcase study, emphasizing the potential of prescriptive analytics in refiningdecision-making procedures. This paper accentuates the imperative of addressingthese challenges to fully harness the extensive and rich data resourcesaccessible for well-informed decision-making.", "output": "Quantifying and Explaining Machine Learning Uncertainty in Predictive Process Monitoring: An Operations Research Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a systematic investigation into the effectiveness ofSelf-Supervised Learning (SSL) methods for Electrocardiogram (ECG) arrhythmiadetection. We begin by conducting a novel distribution analysis on threepopular ECG-based arrhythmia datasets: PTB-XL, Chapman, and Ribeiro. To thebest of our knowledge, our study is the first to quantify these distributionsin this area. We then perform a comprehensive set of experiments usingdifferent augmentations and parameters to evaluate the effectiveness of variousSSL methods, namely SimCRL, BYOL, and SwAV, for ECG representation learning,where we observe the best performance achieved by SwAV. Furthermore, ouranalysis shows that SSL methods achieve highly competitive results to thoseachieved by supervised state-of-the-art methods. To further assess theperformance of these methods on both In-Distribution (ID) andOut-of-Distribution (OOD) ECG data, we conduct cross-dataset training andtesting experiments. Our comprehensive experiments show almost identicalresults when comparing ID and OOD schemes, indicating that SSL techniques canlearn highly effective representations that generalize well across differentOOD datasets. This finding can have major implications for ECG-based arrhythmiadetection. Lastly, to further analyze our results, we perform detailedper-disease studies on the performance of the SSL methods on the threedatasets.", "output": "In-Distribution and Out-of-Distribution Self-supervised ECG Representation Learning for Arrhythmia Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision transformers have been applied successfully for image recognitiontasks. There have been either multi-headed self-attention based (ViTcite{dosovitskiy2020image}, DeIT, cite{touvron2021training}) similar to theoriginal work in textual models or more recently based on spectral layers(Fnetcite{lee2021fnet}, GFNetcite{rao2021global},AFNOcite{guibas2021efficient}). We hypothesize that both spectral andmulti-headed attention plays a major role. We investigate this hypothesisthrough this work and observe that indeed combining spectral and multi-headedattention layers provides a better transformer architecture. We thus proposethe novel Spectformer architecture for transformers that combines spectral andmulti-headed attention layers. We believe that the resulting representationallows the transformer to capture the feature representation appropriately andit yields improved performance over other transformer representations. Forinstance, it improves the top-1 accuracy by 2% on ImageNet compared to bothGFNet-H and LiT. SpectFormer-S reaches 84.25% top-1 accuracy on ImageNet-1K(state of the art for small version). Further, Spectformer-L achieves 85.7%that is the state of the art for the comparable base version of thetransformers. We further ensure that we obtain reasonable results in otherscenarios such as transfer learning on standard datasets such as CIFAR-10,CIFAR-100, Oxford-IIIT-flower, and Standford Car datasets. We then investigateits use in downstream tasks such of object detection and instance segmentationon the MS-COCO dataset and observe that Spectformer shows consistentperformance that is comparable to the best backbones and can be furtheroptimized and improved. Hence, we believe that combined spectral and attentionlayers are what are needed for vision transformers.", "output": "SpectFormer: Frequency and Attention is what you need in a Vision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Preserving the individuals' privacy in sharing spatial-temporal datasets iscritical to prevent re-identification attacks based on unique trajectories.Existing privacy techniques tend to propose ideal privacy-utility tradeoffs,however, largely ignore the fairness implications of mobility models andwhether such techniques perform equally for different groups of users. Thequantification between fairness and privacy-aware models is still unclear andthere barely exists any defined sets of metrics for measuring fairness in thespatial-temporal context. In this work, we define a set of fairness metricsdesigned explicitly for human mobility, based on structural similarity andentropy of the trajectories. Under these definitions, we examine the fairnessof two state-of-the-art privacy-preserving models that rely on GAN andrepresentation learning to reduce the re-identification rate of users for datasharing. Our results show that while both models guarantee group fairness interms of demographic parity, they violate individual fairness criteria,indicating that users with highly similar trajectories receive disparateprivacy gain. We conclude that the tension between the re-identification taskand individual fairness needs to be considered for future spatial-temporal dataanalysis and modelling to achieve a privacy-preserving fairness-aware setting.", "output": "Analysing Fairness of Privacy-Utility Mobility Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifying EEG data is integral to the performance of Brain ComputerInterfaces (BCI) and their applications. However, external noise oftenobstructs EEG data due to its biological nature and complex data collectionprocess. Especially when dealing with classification tasks, standard EEGpreprocessing approaches extract relevant events and features from the entiredataset. However, these approaches treat all relevant cognitive events equallyand overlook the dynamic nature of the brain over time. In contrast, we areinspired by neuroscience studies to use a novel approach that integratesfeature selection and time segmentation of EEG data. When tested on theEEGEyeNet dataset, our proposed method significantly increases the performanceof Machine Learning classifiers while reducing their respective computationalcomplexity.", "output": "Two Heads are Better than One: A Bio-inspired Method for Improving Classification on EEG-ET Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Wi-Fi channel state information (CSI) has become a promising solution fornon-invasive breathing and body motion monitoring during sleep. Sleep disordersof apnea and periodic limb movement disorder (PLMD) are often unconscious andfatal. The existing researches detect abnormal sleep disorders in impracticallycontrolled environments. Moreover, it leads to compelling challenges toclassify complex macro- and micro-scales of sleep movements as well asentangled similar waveforms of cases of apnea and PLMD. In this paper, wepropose the attention-based learning for sleep apnea and limb movementdetection (ALESAL) system that can jointly detect sleep apnea and PLMD underdifferent sleep postures across a variety of patients. ALESAL containsantenna-pair and time attention mechanisms for mitigating the impact of modestantenna pairs and emphasizing the duration of interest, respectively.Performance results show that our proposed ALESAL system can achieve a weightedF1-score of 84.33, outperforming the other existing non-attention based methodsof support vector machine and deep multilayer perceptron.", "output": "Attention-based Learning for Sleep Apnea and Limb Movement Detection using Wi-Fi CSI Signals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Electrocardiography is the most common method to investigate the condition ofthe heart through the observation of cardiac rhythm and electrical activity,for both diagnosis and monitoring purposes. Analysis of electrocardiograms(ECGs) is commonly performed through the investigation of specific patterns,which are visually recognizable by trained physicians and are known to reflectcardiac (dis)function. In this work we study the use of $beta$-variationalautoencoders (VAEs) as an explainable feature extractor, and improve on itspredictive capacities by jointly optimizing signal reconstruction and cardiacfunction prediction. The extracted features are then used for cardiac functionprediction using logistic regression. The method is trained and tested on datafrom 7255 patients, who were treated for acute coronary syndrome at the LeidenUniversity Medical Center between 2010 and 2021. The results show that ourmethod significantly improved prediction and explainability compared to avanilla $beta$-VAE, while still yielding similar reconstruction performance.", "output": "Joint optimization of a $\\beta$-VAE for ECG task-specific feature extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Non-Fungible Tokens (NFTs) are non-interchangeable assets, usually digitalart, which are stored on the blockchain. Preliminary studies find that femaleand darker-skinned NFTs are valued less than their male and lighter-skinnedcounterparts. However, these studies analyze only the CryptoPunks collection.We test the statistical significance of race and gender biases in the prices ofCryptoPunks and present the first study of gender bias in the broader NFTmarket. We find evidence of racial bias but not gender bias. Our work alsointroduces a dataset of gender-labeled NFT collections to advance the broaderstudy of social equity in this emerging market.", "output": "Exploring Gender and Race Biases in the NFT Market."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sleep abnormalities can have severe health consequences. Automated sleepstaging, i.e. labelling the sequence of sleep stages from the patient'sphysiological recordings, could simplify the diagnostic process. Previous workon automated sleep staging has achieved great results, mainly relying on theEEG signal. However, often multiple sources of information are available beyondEEG. This can be particularly beneficial when the EEG recordings are noisy oreven missing completely. In this paper, we propose CoRe-Sleep, a CoordinatedRepresentation multimodal fusion network that is particularly focused onimproving the robustness of signal analysis on imperfect data. We demonstratehow appropriately handling multimodal information can be the key to achievingsuch robustness. CoRe-Sleep tolerates noisy or missing modalities segments,allowing training on incomplete data. Additionally, it shows state-of-the-artperformance when testing on both multimodal and unimodal data using a singlemodel on SHHS-1, the largest publicly available study that includes sleep stagelabels. The results indicate that training the model on multimodal data doespositively influence performance when tested on unimodal data. This work aimsat bridging the gap between automated analysis tools and their clinicalutility.", "output": "CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which isdemonstrated to be one small step for generative AI (GAI), but one giant leapfor artificial general intelligence (AGI). Since its official release inNovember 2022, ChatGPT has quickly attracted numerous users with extensivemedia coverage. Such unprecedented attention has also motivated numerousresearchers to investigate ChatGPT from various aspects. According to Googlescholar, there are more than 500 articles with ChatGPT in their titles ormentioning it in their abstracts. Considering this, a review is urgentlyneeded, and our work fills this gap. Overall, this work is the first to surveyChatGPT with a comprehensive review of its underlying technology, applications,and challenges. Moreover, we present an outlook on how ChatGPT might evolve torealize general-purpose AIGC (a.k.a. AI-generated content), which will be asignificant milestone for the development of AGI.", "output": "One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning-based wearable human activity recognition (WHAR) modelsenable the development of various smart and connected community applicationssuch as sleep pattern monitoring, medication reminders, cognitive healthassessment, sports analytics, etc. However, the widespread adoption of theseWHAR models is impeded by their degraded performance in the presence of datadistribution heterogeneities caused by the sensor placement at different bodypositions, inherent biases and heterogeneities across devices, and personal andenvironmental diversities. Various traditional machine learning algorithms andtransfer learning techniques have been proposed in the literature to addressthe underpinning challenges of handling such data heterogeneities. Domainadaptation is one such transfer learning techniques that has gained significantpopularity in recent literature. In this paper, we survey the recent progressof domain adaptation techniques in the Inertial Measurement Unit (IMU)-basedhuman activity recognition area, discuss potential future directions.", "output": "Domain Adaptation for Inertial Measurement Unit-based Human Activity Recognition: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The demand for device-free indoor localization using commercial Wi-Fi deviceshas rapidly increased in various fields due to its convenience and versatileapplications. However, random frequency offset (RFO) in wireless channels poseschallenges to the accuracy of indoor localization when using fluctuatingchannel state information (CSI). To mitigate the RFO problem, an error vectorspectrum (EVS) is conceived thanks to its higher resolution of signal androbustness to RFO. To address these challenges, this paper proposed a novelerror vector assisted learning (EVAL) for device-free indoor localization. Theproposed EVAL scheme employs deep neural networks to classify the location of aperson in the indoor environment by extracting ample channel features from thephysical layer signals. We conducted realistic experiments based on OpenWiFiproject to extract both EVS and CSI to examine the performance of differentdevice-free localization techniques. Experimental results show that ourproposed EVAL scheme outperforms conventional machine learning methods andbenchmarks utilizing either CSI amplitude or phase information. Compared tomost existing CSI-based localization schemes, a new paradigm with higherpositioning accuracy by adopting EVS is revealed by our proposed EVAL system.", "output": "A New Paradigm for Device-free Indoor Localization: Deep Learning with Error Vector Spectrum in Wi-Fi Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Various faults can occur during the operation of PV arrays, and both thedust-affected operating conditions and various diode configurations make thefaults more complicated. However, current methods for fault diagnosis based onI-V characteristic curves only utilize partial feature information and oftenrely on calibrating the field characteristic curves to standard test conditions(STC). It is difficult to apply it in practice and to accurately identifymultiple complex faults with similarities in different blocking diodesconfigurations of PV arrays under the influence of dust. Therefore, a novelfault diagnosis method for PV arrays considering dust impact is proposed. Inthe preprocessing stage, the Isc-Voc normalized Gramian angular differencefield (GADF) method is presented, which normalizes and transforms the resampledPV array characteristic curves from the field including I-V and P-V to obtainthe transformed graphical feature matrices. Then, in the fault diagnosis stage,the model of convolutional neural network (CNN) with convolutional blockattention modules (CBAM) is designed to extract fault differentiationinformation from the transformed graphical matrices containing full featureinformation and to classify faults. And different graphical featuretransformation methods are compared through simulation cases, and differentCNN-based classification methods are also analyzed. The results indicate thatthe developed method for PV arrays with different blocking diodesconfigurations under various operating conditions has high fault diagnosisaccuracy and reliability.", "output": "Fault diagnosis for PV arrays considering dust impact based on transformed graphical feature of characteristic curves and convolutional neural network with CBAM modules."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neurophysiological time series recordings like the electroencephalogram (EEG)or local field potentials are obtained from multiple sensors. They can bedecoded by machine learning models in order to estimate the ongoing brain stateof a patient or healthy user. In a brain-computer interface (BCI), this decodedbrain state information can be used with minimal time delay to either controlan application, e.g., for communication or for rehabilitation after stroke, orto passively monitor the ongoing brain state of the subject, e.g., in ademanding work environment. A specific challenge in such decoding tasks isposed by the small dataset sizes in BCI compared to other domains of machinelearning like computer vision or natural language processing. A possibility totackle classification or regression problems in BCI despite small training datasets is through transfer learning, which utilizes data from other sessions,subjects or even datasets to train a model. In this exploratory study, wepropose novel domain-specific embeddings for neurophysiological data. Ourapproach is based on metric learning and builds upon the recently proposedladder loss. Using embeddings allowed us to benefit, both from the goodgeneralisation abilities and robustness of deep learning and from the fasttraining of classical machine learning models for subject-specific calibration.In offline analyses using EEG data of 14 subjects, we tested the embeddings'feasibility and compared their efficiency with state-of-the-art deep learningmodels and conventional machine learning pipelines. In summary, we propose theuse of metric learning to obtain pre-trained embeddings of EEG-BCI data as ameans to incorporate domain knowledge and to reach competitive performance onnovel subjects with minimal calibration requirements.", "output": "An embedding for EEG signals learned using a triplet loss."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Electroencephalography (EEG) is an objective tool for emotion recognition andshows promising performance. However, the label scarcity problem is a mainchallenge in this field, which limits the wide application of EEG-based emotionrecognition. In this paper, we propose a novel semi-supervised learningframework (EEGMatch) to leverage both labeled and unlabeled EEG data. First, anEEG-Mixup based data augmentation method is developed to generate more validsamples for model learning. Second, a semi-supervised two-step pairwiselearning method is proposed to bridge prototype-wise and instance-wise pairwiselearning, where the prototype-wise pairwise learning measures the globalrelationship between EEG data and the prototypical representation of eachemotion class and the instance-wise pairwise learning captures the localintrinsic relationship among EEG data. Third, a semi-supervised multi-domainadaptation is introduced to align the data representation among multipledomains (labeled source domain, unlabeled source domain, and target domain),where the distribution mismatch is alleviated. Extensive experiments areconducted on two benchmark databases (SEED and SEED-IV) under a cross-subjectleave-one-subject-out cross-validation evaluation protocol. The results showthe proposed EEGmatch performs better than the state-of-the-art methods underdifferent incomplete label conditions (with 6.89% improvement on SEED and 1.44%improvement on SEED-IV), which demonstrates the effectiveness of the proposedEEGMatch in dealing with the label scarcity problem in emotion recognitionusing EEG signals. The source code is available at", "output": "EEGMatch: Learning with Incomplete Labels for Semi-Supervised EEG-based Cross-Subject Emotion Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional neural networks learns spatial features and are heavilyinterlinked within kernels. The SE module have broken the traditional route ofneural networks passing the entire result to next layer. Instead SE only passesimportant features to be learned with its squeeze and excitation (SE) module.We propose variations of the SE module which improvises the process of squeezeand excitation and enhances the performance. The proposed squeezing or excitingthe layer makes it possible for having a smooth transition of layer weights.These proposed variations also retain the characteristics of SE module. Theexperimented results are carried out on residual networks and the results aretabulated.", "output": "Variations of Squeeze and Excitation networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This chapter presents some of the fundamental assumptions and principles thatcould form the philosophical foundation of GeoAI and spatial data science.Instead of reviewing the well-established characteristics of spatial data(analysis), including interaction, neighborhoods, and autocorrelation, thechapter highlights themes such as sustainability, bias in training data,diversity in schema knowledge, and the (potential lack of) neutrality of GeoAIsystems from a unifying ethical perspective. Reflecting on our profession'sethical implications will assist us in conducting potentially disruptiveresearch more responsibly, identifying pitfalls in designing, training, anddeploying GeoAI-based systems, and developing a shared understanding of thebenefits but also potential dangers of artificial intelligence and machinelearning research across academic fields, all while sharing our unique(geo)spatial perspective with others.", "output": "Philosophical Foundations of GeoAI: Exploring Sustainability, Diversity, and Bias in GeoAI and Spatial Data Science."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cellular user positioning is a promising service provided by Fifth GenerationNew Radio (5G NR) networks. Besides, Machine Learning (ML) techniques areforeseen to become an integrated part of 5G NR systems improving radioperformance and reducing complexity. In this paper, we investigate MLtechniques for positioning using 5G NR fingerprints consisting of uplinkchannel estimates from the physical layer channel. We show that it is possibleto use Sounding Reference Signals (SRS) channel fingerprints to providesufficient data to infer user position. Furthermore, we show that smallfully-connected moderately Deep Neural Networks, even when applied to verysparse SRS data, can achieve successful outdoor user positioning withmeter-level accuracy in a commercial 5G environment.", "output": "ML-Enabled Outdoor User Positioning in 5G NR Systems via Uplink SRS Channel Estimates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper considers reliable and secure Spectrum Sensing (SS) based onFederated Learning (FL) in the Cognitive Radio (CR) environment. Motivation,architectures, and algorithms of FL in SS are discussed. Security and privacythreats on these algorithms are overviewed, along with possible countermeasuresto such attacks. Some illustrative examples are also provided, with designrecommendations for FL-based SS in future CRs.", "output": "Secure Federated Learning for Cognitive Radio Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Correct identification and categorization of plant diseases are crucial forensuring the safety of the global food supply and the overall financial successof stakeholders. In this regard, a wide range of solutions has been madeavailable by introducing deep learning-based classification systems fordifferent staple crops. Despite being one of the most important commercialcrops in many parts of the globe, research proposing a smart solution forautomatically classifying apple leaf diseases remains relatively unexplored.This study presents a technique for identifying apple leaf diseases based ontransfer learning. The system extracts features using a pretrainedEfficientNetV2S architecture and passes to a classifier block for effectiveprediction. The class imbalance issues are tackled by utilizing runtime dataaugmentation. The effect of various hyperparameters, such as input resolution,learning rate, number of epochs, etc., has been investigated carefully. Thecompetence of the proposed pipeline has been evaluated on the apple leafdisease subset from the publicly available `PlantVillage' dataset, where itachieved an accuracy of 99.21%, outperforming the existing works.", "output": "An Efficient Transfer Learning-based Approach for Apple Leaf Disease Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spiking neural networks have attracted extensive attention from researchersin many fields due to their brain-like information processing mechanism. Theproposal of surrogate gradient enables the spiking neural networks to migrateto more complex tasks, and gradually close the gap with the conventionalartificial neural networks. Current spiking neural networks utilize the outputof all moments to produce the final prediction, which compromises theirtemporal characteristics and causes a reduction in performance and efficiency.We propose a temporal knowledge sharing approach (TKS) that enables theinteraction of information between different moments, by selecting the outputof specific moments to compose teacher signals to guide the training of thenetwork along with the real labels. We have validated TKS on both staticdatasets CIFAR10, CIFAR100, ImageNet-1k and neuromorphic datasets DVS-CIFAR10,NCALTECH101. Our experimental results indicate that we have achieved thecurrent optimal performance in comparison with other algorithms. Experiments onFine-grained classification datasets further demonstrate our algorithm'ssuperiority with CUB-200-2011, StanfordDogs, and StanfordCars. TKS algorithmhelps the model to have stronger temporal generalization capability, allowingthe network to guarantee performance with large time steps in the trainingphase and with small time steps in the testing phase. This greatly facilitatesthe deployment of SNNs on edge devices.", "output": "Temporal Knowledge Sharing enable Spiking Neural Network Learning from Past and Future."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents two computationally efficient algorithms for theorientation estimation of inertial measurement units (IMUs): thecorrentropy-based gradient descent (CGD) and the correntropy-based decoupledorientation estimation (CDOE). Traditional methods, such as gradient descent(GD) and decoupled orientation estimation (DOE), rely on the mean squared error(MSE) criterion, making them vulnerable to external acceleration and magneticinterference. To address this issue, we demonstrate that the multi-kernelcorrentropy loss (MKCL) is an optimal objective function for maximum likelihoodestimation (MLE) when the noise follows a type of heavy-tailed distribution. Incertain situations, the estimation error of the MKCL is bounded even in thepresence of arbitrarily large outliers. By replacing the standard MSE costfunction with MKCL, we develop the CGD and CDOE algorithms. We evaluate theeffectiveness of our proposed methods by comparing them with existingalgorithms in various situations. Experimental results indicate that ourproposed methods (CGD and CDOE) outperform their conventional counterparts (GDand DOE), especially when faced with external acceleration and magneticdisturbances. Furthermore, the new algorithms demonstrate significantly lowercomputational complexity than Kalman filter-based approaches, making themsuitable for applications with low-cost microprocessors.", "output": "Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent Methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unmanned aerial vehicles (UAV) or drones play many roles in a modern smartcity such as the delivery of goods, mapping real-time road traffic andmonitoring pollution. The ability of drones to perform these functions oftenrequires the support of machine learning technology. However, traditionalmachine learning models for drones encounter data privacy problems,communication costs and energy limitations. Federated Learning, an emergingdistributed machine learning approach, is an excellent solution to addressthese issues. Federated learning (FL) allows drones to train local modelswithout transmitting raw data. However, existing FL requires a central serverto aggregate the trained model parameters of the UAV. A failure of the centralserver can significantly impact the overall training. In this paper, we proposetwo aggregation methods: Commutative FL and Alternate FL, based on the existingarchitecture of decentralised Federated Learning for UAV Networks (DFL-UN) byadding a unique aggregation method of decentralised FL. Those two methods caneffectively control energy consumption and communication cost by controllingthe number of local training epochs, local communication, and globalcommunication. The simulation results of the proposed training methods are alsopresented to verify the feasibility and efficiency of the architecture comparedwith two benchmark methods (e.g. standard machine learning training andstandard single aggregation server training). The simulation results show thatthe proposed methods outperform the benchmark methods in terms of operationalstability, energy consumption and communication cost.", "output": "Decentralized federated learning methods for reducing communication cost and energy consumption in UAV networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep reinforcement learning (DRL) has demonstrated its potential in solvingcomplex manufacturing decision-making problems, especially in a context wherethe system learns over time with actual operation in the absence of trainingdata. One interesting and challenging application for such methods is theassembly sequence planning (ASP) problem. In this paper, we propose an approachto the implementation of DRL methods in ASP. The proposed approach introducesin the RL environment parametric actions to improve training time and sampleefficiency and uses two different reward signals: (1) user's preferences and(2) total assembly time duration. The user's preferences signal addresses thedifficulties and non-ergonomic properties of the assembly faced by the humanand the total assembly time signal enforces the optimization of the assembly.Three of the most powerful deep RL methods were studied, Advantage Actor-Critic(A2C), Deep Q-Learning (DQN), and Rainbow, in two different scenarios: astochastic and a deterministic one. Finally, the performance of the DRLalgorithms was compared to tabular Q-Learnings performance. After 10,000episodes, the system achieved near optimal behaviour for the algorithms tabularQ-Learning, A2C, and Rainbow. Though, for more complex scenarios, the algorithmtabular Q-Learning is expected to underperform in comparison to the other 2algorithms. The results support the potential for the application of deepreinforcement learning in assembly sequence planning problems with humaninteraction.", "output": "Deep reinforcement learning applied to an assembly sequence planning problem with user preferences."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Counterfactual explanation methods provide information on how feature valuesof individual observations must be changed to obtain a desired prediction.Despite the increasing amount of proposed methods in research, only a fewimplementations exist whose interfaces and requirements vary widely. In thiswork, we introduce the counterfactuals R package, which provides a modular andunified R6-based interface for counterfactual explanation methods. Weimplemented three existing counterfactual explanation methods and propose someoptional methodological extensions to generalize these methods to differentscenarios and to make them more comparable. We explain the structure andworkflow of the package using real use cases and show how to integrateadditional counterfactual explanation methods into the package. In addition, wecompared the implemented methods for a variety of models and datasets withregard to the quality of their counterfactual explanations and their runtimebehavior.", "output": "counterfactuals: An R Package for Counterfactual Explanation Methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training a classifier with noisy labels typically requires the learner tospecify the distribution of label noise, which is often unknown in practice.Although there have been some recent attempts to relax that requirement, weshow that the Bayes decision rule is unidentified in most classificationproblems with noisy labels. This suggests it is generally not possible tobypass/relax the requirement. In the special cases in which the Bayes decisionrule is identified, we develop a simple algorithm to learn the Bayes decisionrule, that does not require knowledge of the noise distribution.", "output": "Bayes classifier cannot be learned from noisy responses with unknown noise rates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial examples have been found for various deep as well as shallowlearning models, and have at various times been suggested to be either fixablemodel-specific bugs, or else inherent dataset feature, or both. We presenttheoretical and empirical results to show that adversarial examples areapproximate discontinuities resulting from models that specify approximatelybijective maps $f: Bbb R^n to Bbb R^m; n neq m$ over their inputs, and thisdiscontinuity follows from the topological invariance of dimension.", "output": "Adversarial Examples from Dimensional Invariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models have become ubiquitous in modern life, findingapplications in various domains such as natural language processing, languagetranslation, and speech recognition. Recently, a breakthrough work [Zhao,Panigrahi, Ge, and Arora Arxiv 2023] explains the attention model fromprobabilistic context-free grammar (PCFG). One of the central computation taskfor computing probability in PCFG is formulating a particular tensor low rankapproximation problem, we can call it tensor cycle rank. Given an $n times ntimes n$ third order tensor $A$, we say that $A$ has cycle rank-$k$ if thereexists three $n times k^2$ size matrices $U , V$, and $W$ such that for eachentry in each begin{align*} A_{a,b,c} = sum_{i=1}^k sum_{j=1}^k sum_{l=1}^kU_{a,i+k(j-1)} otimes V_{b, j + k(l-1)} otimes W_{c, l + k(i-1) }end{align*} for all $a in [n], b in [n], c in [n]$. For the tensorclassical rank, tucker rank and train rank, it has been well studied in [Song,Woodruff, Zhong SODA 2019]. In this paper, we generalize the previous``rotation and sketch'' technique in page 186 of [Song, Woodruff, Zhong SODA2019] and show an input sparsity time algorithm for cycle rank.", "output": "Solving Tensor Low Cycle Rank Approximation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms play an important role in a variety of importantdecision-making processes, including targeted advertisement displays, home loanapprovals, and criminal behavior predictions. Given the far-reaching impact ofthese algorithms, it is crucial that they operate fairly, free from bias orprejudice towards certain groups in the population. Ensuring impartiality inthese algorithms is essential for promoting equality and avoidingdiscrimination. To this end we introduce a unified framework for randomizedsubset selection that incorporates group fairness constraints. Our probleminvolves a global utility function and a set of group utility functions foreach group, here a group refers to a group of individuals (e.g., people)sharing the same attributes (e.g., gender). Our aim is to generate adistribution across feasible subsets, specifying the selection probability ofeach feasible set, to maximize the global utility function while meeting apredetermined quota for each group utility function in expectation. Note thatthere may not necessarily be any direct connections between the global utilityfunction and each group utility function. We demonstrate that this frameworkunifies and generalizes many significant applications in machine learning andoperations research. Our algorithmic results either improves the best knownresult or provide the first approximation algorithms for new applications.", "output": "Beyond Submodularity: A Unified Framework of Randomized Set Selection with Group Fairness Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent works have shown that large models pretrained on common visuallearning tasks can provide useful representations for a wide range ofspecialized perception problems, as well as a variety of robotic manipulationtasks. While prior work on robotic manipulation has predominantly used frozenpretrained features, we demonstrate that in robotics this approach can fail toreach optimal performance, and that fine-tuning of the full model can lead tosignificantly better results. Unfortunately, fine-tuning disrupts thepretrained visual representation, and causes representational drift towards thefine-tuned task thus leading to a loss of the versatility of the originalmodel. We introduce \"lossless adaptation\" to address this shortcoming ofclassical fine-tuning. We demonstrate that appropriate placement of ourparameter efficient adapters can significantly reduce the performance gapbetween frozen pretrained representations and full end-to-end fine-tuningwithout changes to the original representation and thus preserving originalcapabilities of the pretrained model. We perform a comprehensive investigationacross three major model architectures (ViTs, NFNets, and ResNets), supervised(ImageNet-1K classification) and self-supervised pretrained weights (CLIP,BYOL, Visual MAE) in 3 task domains and 35 individual tasks, and demonstratethat our claims are strongly validated in various settings.", "output": "Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Without access to the source data, source-free domain adaptation (SFDA)transfers knowledge from a source-domain trained model to target domains.Recently, SFDA has gained popularity due to the need to protect the dataprivacy of the source domain, but it suffers from catastrophic forgetting onthe source domain due to the lack of data. To systematically investigate themechanism of catastrophic forgetting, we first reimplement previous SFDAapproaches within a unified framework and evaluate them on four benchmarks. Weobserve that there is a trade-off between adaptation gain and forgetting loss,which motivates us to design a consistency regularization to mitigateforgetting. In particular, we propose a continual source-free domain adaptationapproach named CoSDA, which employs a dual-speed optimized teacher-studentmodel pair and is equipped with consistency learning capability. Ourexperiments demonstrate that CoSDA outperforms state-of-the-art approaches incontinuous adaptation. Notably, our CoSDA can also be integrated with otherSFDA methods to alleviate forgetting.", "output": "CoSDA: Continual Source-Free Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent approaches have attempted to personalize dialogue systems byleveraging profile information into models. However, this knowledge is scarceand difficult to obtain, which makes the extraction/generation of profileinformation from dialogues a fundamental asset. To surpass this limitation, weintroduce the Profile Generation Task (PGTask). We contribute with a newdataset for this problem, comprising profile sentences aligned with relatedutterances, extracted from a corpus of dialogues. Furthermore, usingstate-of-the-art methods, we provide a benchmark for profile generation on thisnovel dataset. Our experiments disclose the challenges of profile generation,and we hope that this introduces a new research direction.", "output": "PGTask: Introducing the Task of Profile Generation from Dialogues."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Controllable text generation (CTG) by large language models has a hugepotential to transform education for teachers and students alike. Specifically,high quality and diverse question generation can dramatically reduce the loadon teachers and improve the quality of their educational content. Recent workin this domain has made progress with generation, but fails to show that realteachers judge the generated questions as sufficiently useful for the classroomsetting; or if instead the questions have errors and/or pedagogically unhelpfulcontent. We conduct a human evaluation with teachers to assess the quality andusefulness of outputs from combining CTG and question taxonomies (Bloom's and adifficulty taxonomy). The results demonstrate that the questions generated arehigh quality and sufficiently useful, showing their promise for widespread usein the classroom setting.", "output": "How Useful are Educational Questions Generated by Large Language Models?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It has been reported that clustering-based topic models, which clusterhigh-quality sentence embeddings with an appropriate word selection method, cangenerate better topics than generative probabilistic topic models. However,these approaches suffer from the inability to select appropriate parameters andincomplete models that overlook the quantitative relation between words withtopics and topics with text. To solve these issues, we propose graph to topic(G2T), a simple but effective framework for topic modelling. The framework iscomposed of four modules. First, document representation is acquired usingpretrained language models. Second, a semantic graph is constructed accordingto the similarity between document representations. Third, communities indocument semantic graphs are identified, and the relationship between topicsand documents is quantified accordingly. Fourth, the word--topic distributionis computed based on a variant of TFIDF. Automatic evaluation suggests that G2Tachieved state-of-the-art performance on both English and Chinese documentswith different lengths. Human judgements demonstrate that G2T can producetopics with better interpretability and coverage than baselines. In addition,G2T can not only determine the topic number automatically but also give theprobabilistic distribution of words in topics and topics in documents. Finally,G2T is publicly available, and the distillation experiments provide instructionon how it works.", "output": "G2T: A simple but versatile framework for topic modeling based on pretrained language model and community detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper considers distributed optimization algorithms, with application inbinary classification via distributed support-vector-machines (D-SVM) overmulti-agent networks subject to some link nonlinearities. The agents solve aconsensus-constraint distributed optimization cooperatively via continuous-timedynamics, while the links are subject to strongly sign-preserving odd nonlinearconditions. Logarithmic quantization and clipping (saturation) are two examplesof such nonlinearities. In contrast to existing literature that mostlyconsiders ideal links and perfect information exchange over linear channels, weshow how general sector-bounded models affect the convergence to the optimizer(i.e., the SVM classifier) over dynamic balanced directed networks. In general,any odd sector-bounded nonlinear mapping can be applied to our dynamics. Themain challenge is to show that the proposed system dynamics always have onezero eigenvalue (associated with the consensus) and the other eigenvalues allhave negative real parts. This is done by recalling arguments from matrixperturbation theory. Then, the solution is shown to converge to the agreementstate under certain conditions. For example, the gradient tracking (GT) stepsize is tighter than the linear case by factors related to the upper/lowersector bounds. To the best of our knowledge, no existing work in distributedoptimization and learning literature considers non-ideal link conditions.", "output": "D-SVM over Networked Systems with Non-Ideal Linking Conditions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The remarkable performance of overparameterized deep neural networks (DNNs)must arise from an interplay between network architecture, training algorithms,and structure in the data. To disentangle these three components, we apply aBayesian picture, based on the functions expressed by a DNN, to supervisedlearning. The prior over functions is determined by the network, and is variedby exploiting a transition between ordered and chaotic regimes. For Booleanfunction classification, we approximate the likelihood using the error spectrumof functions on data. When combined with the prior, this accurately predictsthe posterior, measured for DNNs trained with stochastic gradient descent. Thisanalysis reveals that structured data, combined with an intrinsic Occam'srazor-like inductive bias towards (Kolmogorov) simple functions that is strongenough to counteract the exponential growth of the number of functions withcomplexity, is a key to the success of DNNs.", "output": "Do deep neural networks have an inbuilt Occam's razor?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial control is a core capability in controllable image generation.Advancements in layout-guided image generation have shown promising results onin-distribution (ID) datasets with similar spatial configurations. However, itis unclear how these models perform when facing out-of-distribution (OOD)samples with arbitrary, unseen layouts. In this paper, we propose LayoutBench,a diagnostic benchmark for layout-guided image generation that examines fourcategories of spatial control skills: number, position, size, and shape. Webenchmark two recent representative layout-guided image generation methods andobserve that the good ID layout control may not generalize well to arbitrarylayouts in the wild (e.g., objects at the boundary). Next, we proposeIterInpaint, a new baseline that generates foreground and background regions ina step-by-step manner via inpainting, demonstrating stronger generalizabilitythan existing models on OOD layouts in LayoutBench. We perform quantitative andqualitative evaluation and fine-grained analysis on the four LayoutBench skillsto pinpoint the weaknesses of existing models. Lastly, we show comprehensiveablation studies on IterInpaint, including training task ratio, crop&amp;paste vs.repaint, and generation order. Project website: ", "output": "Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-objective portfolio optimisation is a critical problem researchedacross various fields of study as it achieves the objective of maximising theexpected return while minimising the risk of a given portfolio at the sametime. However, many studies fail to include realistic constraints in the model,which limits practical trading strategies. This study introduces realisticconstraints, such as transaction and holding costs, into an optimisation model.Due to the non-convex nature of this problem, metaheuristic algorithms, such asNSGA-II, R-NSGA-II, NSGA-III and U-NSGA-III, will play a vital role in solvingthe problem. Furthermore, a learnheuristic approach is taken as surrogatemodels enhance the metaheuristics employed. These algorithms are then comparedto the baseline metaheuristic algorithms, which solve a constrained,multi-objective optimisation problem without using learnheuristics. The resultsof this study show that, despite taking significantly longer to run tocompletion, the learnheuristic algorithms outperform the baseline algorithms interms of hypervolume and rate of convergence. Furthermore, the backtestingresults indicate that utilising learnheuristics to generate weights for assetallocation leads to a lower risk percentage, higher expected return and higherSharpe ratio than backtesting without using learnheuristics. This leads us toconclude that using learnheuristics to solve a constrained, multi-objectiveportfolio optimisation problem produces superior and preferable results thansolving the problem without using learnheuristics.", "output": "A Learnheuristic Approach to A Constrained Multi-Objective Portfolio Optimisation Problem."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider an important problem in scientific discovery, identifying sparsegoverning equations for nonlinear dynamical systems. This involves solvingsparse ridge regression problems to provable optimality in order to determinewhich terms drive the underlying dynamics. We propose a fast algorithm,OKRidge, for sparse ridge regression, using a novel lower bound calculationinvolving, first, a saddle point formulation, and from there, either solving(i) a linear system or (ii) using an ADMM-based approach, where the proximaloperators can be efficiently evaluated by solving another linear system and anisotonic regression problem. We also propose a method to warm-start our solver,which leverages a beam search. Experimentally, our methods attain provableoptimality with run times that are orders of magnitude faster than those of theexisting MIP formulations solved by the commercial solver Gurobi.", "output": "OKRidge: Scalable Optimal k-Sparse Ridge Regression for Learning Dynamical Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel way of solving the issue of classification ofout-of-vocabulary gestures using Artificial Neural Networks (ANNs) trained inthe Generative Adversarial Network (GAN) framework. A generative model augmentsthe data set in an online fashion with new samples and stochastic targetvectors, while a discriminative model determines the class of the samples. Theapproach was evaluated on the UC2017 SG and UC2018 DualMyo data sets. Thegenerative models performance was measured with a distance metric betweengenerated and real samples. The discriminative models were evaluated by theiraccuracy on trained and novel classes. In terms of sample generation quality,the GAN is significantly better than a random distribution (noise) in meandistance, for all classes. In the classification tests, the baseline neuralnetwork was not capable of identifying untrained gestures. When the proposedmethodology was implemented, we found that there is a trade-off between thedetection of trained and untrained gestures, with some trained samples beingmistaken as novelty. Nevertheless, a novelty detection accuracy of 95.4% or90.2% (depending on the data set) was achieved with just 5% loss of accuracy ontrained classes.", "output": "Improving novelty detection with generative adversarial networks on hand gesture data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have recently become the de-facto approach for generativemodeling in the 2D domain. However, extending diffusion models to 3D ischallenging due to the difficulties in acquiring 3D ground truth data fortraining. On the other hand, 3D GANs that integrate implicit 3D representationsinto GANs have shown remarkable 3D-aware generation when trained only onsingle-view image datasets. However, 3D GANs do not provide straightforwardways to precisely control image synthesis. To address these challenges, Wepresent Control3Diff, a 3D diffusion model that combines the strengths ofdiffusion models and 3D GANs for versatile, controllable 3D-aware imagesynthesis for single-view datasets. Control3Diff explicitly models theunderlying latent distribution (optionally conditioned on external inputs),thus enabling direct control during the diffusion process. Moreover, ourapproach is general and applicable to any type of controlling input, allowingus to train it with the same diffusion objective without any auxiliarysupervision. We validate the efficacy of Control3Diff on standard imagegeneration benchmarks, including FFHQ, AFHQ, and ShapeNet, using variousconditioning inputs such as images, sketches, and text prompts. Please see theproject website (url{ for video comparisons.", "output": "Learning Controllable 3D Diffusion Models from Single-view Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Individual human decision-makers may benefit from different forms of supportto improve decision outcomes. However, a key question is which form of supportwill lead to accurate decisions at a low cost. In this work, we proposelearning a decision support policy that, for a given input, chooses which formof support, if any, to provide. We consider decision-makers for whom we have noprior information and formalize learning their respective policies as amulti-objective optimization problem that trades off accuracy and cost. Usingtechniques from stochastic contextual bandits, we propose $texttt{THREAD}$, anonline algorithm to personalize a decision support policy for eachdecision-maker, and devise a hyper-parameter tuning strategy to identify acost-performance trade-off using simulated human behavior. We providecomputational experiments to demonstrate the benefits of $texttt{THREAD}$compared to offline baselines. We then introduce $texttt{Modiste}$, aninteractive tool that provides $texttt{THREAD}$ with an interface. We conducthuman subject experiments to show how $texttt{Modiste}$ learns policiespersonalized to each decision-maker and discuss the nuances of learningdecision support policies online for real users.", "output": "Learning Personalized Decision Support Policies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Radiance Field training can be accelerated through the use ofgrid-based representations in NeRF's learned mapping from spatial coordinatesto colors and volumetric density. However, these grid-based approaches lack anexplicit understanding of scale and therefore often introduce aliasing, usuallyin the form of jaggies or missing scene content. Anti-aliasing has previouslybeen addressed by mip-NeRF 360, which reasons about sub-volumes along a conerather than points along a ray, but this approach is not natively compatiblewith current grid-based techniques. We show how ideas from rendering and signalprocessing can be used to construct a technique that combines mip-NeRF 360 andgrid-based models such as Instant NGP to yield error rates that are 8% - 76%lower than either prior technique, and that trains 22x faster than mip-NeRF360.", "output": "Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretability methods are valuable only if their explanations faithfullydescribe the explained model. In this work, we consider neural networks whosepredictions are invariant under a specific symmetry group. This includespopular architectures, ranging from convolutional to graph neural networks. Anyexplanation that faithfully explains this type of model needs to be inagreement with this invariance property. We formalize this intuition throughthe notion of explanation invariance and equivariance by leveraging theformalism from geometric deep learning. Through this rigorous formalism, wederive (1) two metrics to measure the robustness of any interpretability methodwith respect to the model symmetry group; (2) theoretical robustness guaranteesfor some popular interpretability methods and (3) a systematic approach toincrease the invariance of any interpretability method with respect to asymmetry group. By empirically measuring our metrics for explanations of modelsassociated with various modalities and symmetry groups, we derive a set of 5guidelines to allow users and developers of interpretability methods to producerobust explanations.", "output": "Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Plain text has become a prevalent interface for text-to-image synthesis.However, its limited customization options hinder users from accuratelydescribing desired outputs. For example, plain text makes it hard to specifycontinuous quantities, such as the precise RGB color value or importance ofeach word. Furthermore, creating detailed text prompts for complex scenes istedious for humans to write and challenging for text encoders to interpret. Toaddress these challenges, we propose using a rich-text editor supportingformats such as font style, size, color, and footnote. We extract each word'sattributes from rich text to enable local style control, explicit tokenreweighting, precise color rendering, and detailed region synthesis. We achievethese capabilities through a region-based diffusion process. We first obtaineach word's region based on cross-attention maps of a vanilla diffusion processusing plain text. For each region, we enforce its text attributes by creatingregion-specific detailed prompts and applying region-specific guidance. Wepresent various examples of image generation from rich text and demonstratethat our method outperforms strong baselines with quantitative evaluations.", "output": "Expressive Text-to-Image Generation with Rich Text."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Previous hypergraph expansions are solely carried out on either vertex levelor hyperedge level, thereby missing the symmetric nature of data co-occurrence,and resulting in information loss. To address the problem, this paper treatsvertices and hyperedges equally and proposes a new hypergraph formulation namedthe emph{line expansion (LE)} for hypergraphs learning. The new expansionbijectively induces a homogeneous structure from the hypergraph by treatingvertex-hyperedge pairs as \"line nodes\". By reducing the hypergraph to a simplegraph, the proposed emph{line expansion} makes existing graph learningalgorithms compatible with the higher-order structure and has been proven as aunifying framework for various hypergraph expansions. We evaluate the proposedline expansion on five hypergraph datasets, the results show that our methodbeats SOTA baselines by a significant margin.", "output": "Semi-supervised Hypergraph Node Classification on Hypergraph Line Expansion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Overfitting data is a well-known phenomenon related with the generation of amodel that mimics too closely (or exactly) a particular instance of data, andmay therefore fail to predict future observations reliably. In practice, thisbehaviour is controlled by various--sometimes heuristics--regularizationtechniques, which are motivated by developing upper bounds to thegeneralization error. In this work, we study the generalization error ofclassifiers relying on stochastic encodings trained on the cross-entropy loss,which is often used in deep learning for classification problems. We derivebounds to the generalization error showing that there exists a regime where thegeneralization error is bounded by the mutual information between inputfeatures and the corresponding representations in the latent space, which arerandomly generated according to the encoding distribution. Our bounds providean information-theoretic understanding of generalization in the so-called classof variational classifiers, which are regularized by a Kullback-Leibler (KL)divergence term. These results give theoretical grounds for the highly popularKL term in variational inference methods that was already recognized to acteffectively as a regularization penalty. We further observe connections withwell studied notions such as Variational Autoencoders, Information Dropout,Information Bottleneck and Boltzmann Machines. Finally, we perform numericalexperiments on MNIST and CIFAR datasets and show that mutual information isindeed highly representative of the behaviour of the generalization error.", "output": "The Role of Mutual Information in Variational Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Univariate and multivariate normal probability distributions are widely usedwhen modeling decisions under uncertainty. Computing the performance of suchmodels requires integrating these distributions over specific domains, whichcan vary widely across models. Besides some special cases, there exist nogeneral analytical expressions, standard numerical methods or software forthese integrals. Here we present mathematical results and open-source softwarethat provide (i) the probability in any domain of a normal in any dimensionswith any parameters, (ii) the probability density, cumulative distribution, andinverse cumulative distribution of any function of a normal vector, (iii) theclassification errors among any number of normal distributions, theBayes-optimal discriminability index and relation to the operatingcharacteristic, (iv) dimension reduction and visualizations for such problems,and (v) tests for how reliably these methods may be used on given data. Wedemonstrate these tools with vision research applications of detectingoccluding objects in natural scenes, and detecting camouflage.", "output": "A method to integrate and classify normal distributions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Areas under ROC (AUROC) and precision-recall curves (AUPRC) are commonmetrics for evaluating classification performance for imbalanced problems.Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanceddatasets. While stochastic optimization of AUROC has been studied extensively,principled stochastic optimization of AUPRC has been rarely explored. In thiswork, we propose a principled technical method to optimize AUPRC for deeplearning. Our approach is based on maximizing the averaged precision (AP),which is an unbiased point estimator of AUPRC. We cast the objective into a sumof {it dependent compositional functions} with inner functions dependent onrandom variables of the outer level. We propose efficient adaptive andnon-adaptive stochastic algorithms named SOAP with {it provable convergenceguarantee under mild conditions} by leveraging recent advances in stochasticcompositional optimization. Extensive experimental results on image and graphdatasets demonstrate that our proposed method outperforms prior methods onimbalanced problems in terms of AUPRC. To the best of our knowledge, our workrepresents the first attempt to optimize AUPRC with provable convergence. TheSOAP has been implemented in the libAUC library at~url{", "output": "Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the infinite-horizon LQR control problem. Motivated bycompetitive analysis in online learning, as a criterion for controller designwe introduce the dynamic regret, defined as the difference between the LQR costof a causal controller (that has only access to past disturbances) and the LQRcost of the emph{unique} clairvoyant one (that has also access to futuredisturbances) that is known to dominate all other controllers. The regretitself is a function of the disturbances, and we propose to find a causalcontroller that minimizes the worst-case regret over all bounded energydisturbances. The resulting controller has the interpretation of guaranteeingthe smallest regret compared to the best non-causal controller that can see thefuture. We derive explicit formulas for the optimal regret and for theregret-optimal controller for the state-space setting. These explicit solutionsare obtained by showing that the regret-optimal control problem can be reducedto a Nehari extension problem that can be solved explicitly. The regret-optimalcontroller is shown to be linear and can be expressed as the sum of theclassical $H_2$ state-feedback law and an $n$-th order controller ($n$ is thestate dimension), and its construction simply requires a solution to thestandard LQR Riccati equation and two Lyapunov equations. Simulations over arange of plants demonstrate that the regret-optimal controller interpolatesnicely between the $H_2$ and the $H_infty$ optimal controllers, and generallyhas $H_2$ and $H_infty$ costs that are simultaneously close to their optimalvalues. The regret-optimal controller thus presents itself as a viable optionfor control systems design.", "output": "Regret-Optimal LQR Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In federated learning (FL), data does not leave personal devices when theyare jointly training a machine learning model. Instead, these devices sharegradients, parameters, or other model updates, with a central party (e.g., acompany) coordinating the training. Because data never \"leaves\" personaldevices, FL is often presented as privacy-preserving. Yet, recently it wasshown that this protection is but a thin facade, as even a passive,honest-but-curious attacker observing gradients can reconstruct data ofindividual users contributing to the protocol. In this work, we show a noveldata reconstruction attack which allows an active and dishonest central partyto efficiently extract user data from the received gradients. While prior workon data reconstruction in FL relies on solving computationally expensiveoptimization problems or on making easily detectable modifications to theshared model's architecture or parameters, in our attack the central partymakes inconspicuous changes to the shared model's weights before sending themout to the users. We call the modified weights of our attack trap weights. Ouractive attacker is able to recover user data perfectly, i.e., with zero error,even when this data stems from the same class. Recovery comes with near-zerocosts: the attack requires no complex optimization objectives. Instead, ourattacker exploits inherent data leakage from model gradients and simplyamplifies this effect by maliciously altering the weights of the shared modelthrough the trap weights. These specificities enable our attack to scale tofully-connected and convolutional deep neural networks trained with largemini-batches of data. For example, for the high-dimensional vision datasetImageNet, we perfectly reconstruct more than 50% of the training data pointsfrom mini-batches as large as 100 data points.", "output": "When the Curious Abandon Honesty: Federated Learning Is Not Private."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vertical federated learning (VFL) system has recently become prominent as aconcept to process data distributed across many individual sources without theneed to centralize it. Multiple participants collaboratively train models basedon their local data in a privacy-aware manner. To date, VFL has become a defacto solution to securely learn a model among organizations, allowingknowledge to be shared without compromising privacy of any individuals. Despitethe prosperous development of VFL systems, we find that certain inputs of aparticipant, named adversarial dominating inputs (ADIs), can dominate the jointinference towards the direction of the adversary's will and force other(victim) participants to make negligible contributions, losing rewards that areusually offered regarding the importance of their contributions in federatedlearning scenarios. We conduct a systematic study on ADIs by first provingtheir existence in typical VFL systems. We then propose gradient-based methodsto synthesize ADIs of various formats and exploit common VFL systems. Wefurther launch greybox fuzz testing, guided by the saliency score of ``victim''participants, to perturb adversary-controlled inputs and systematically explorethe VFL attack surface in a privacy-preserving manner. We conduct an in-depthstudy on the influence of critical parameters and settings in synthesizingADIs. Our study reveals new VFL attack opportunities, promoting theidentification of unknown threats before breaches and building more secure VFLsystems.", "output": "ADI: Adversarial Dominating Inputs in Vertical Federated Learning Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning optimal control policies directly on physical systems is challengingsince even a single failure can lead to costly hardware damage. Most existingmodel-free learning methods that guarantee safety, i.e., no failures, duringexploration are limited to local optima. A notable exception is the GoSafealgorithm, which, unfortunately, cannot handle high-dimensional systems andhence cannot be applied to most real-world dynamical systems. This workproposes GoSafeOpt as the first algorithm that can safely discover globallyoptimal policies for high-dimensional systems while giving safety andoptimality guarantees. We demonstrate the superiority of GoSafeOpt overcompeting model-free safe learning methods on a robot arm that would beprohibitive for GoSafe.", "output": "GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Methods for teaching motion skills to robots focus on training for a singleskill at a time. Robots capable of learning from demonstration can considerablybenefit from the added ability to learn new movement skills without forgettingwhat was learned in the past. To this end, we propose an approach for continuallearning from demonstration using hypernetworks and neural ordinarydifferential equation solvers. We empirically demonstrate the effectiveness ofthis approach in remembering long sequences of trajectory learning taskswithout the need to store any data from past demonstrations. Our results showthat hypernetworks outperform other state-of-the-art continual learningapproaches for learning from demonstration. In our experiments, we use thepopular LASA benchmark, and two new datasets of kinesthetic demonstrationscollected with a real robot that we introduce in this paper called theHelloWorld and RoboTasks datasets. We evaluate our approach on a physical robotand demonstrate its effectiveness in learning real-world robotic tasksinvolving changing positions as well as orientations. We report both trajectoryerror metrics and continual learning metrics, and we propose two new continuallearning metrics. Our code, along with the newly collected datasets, isavailable at ", "output": "Continual Learning from Demonstration of Robotics Skills."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a new consistency-based approach for defining and solvingnonnegative/positive matrix and tensor completion problems. The novelty of theframework is that instead of artificially making the problem well-posed in theform of an application-arbitrary optimization problem, e.g., minimizing a bulkstructural measure such as rank or norm, we show that a singleproperty/constraint: preserving unit-scale consistency, guarantees theexistence of both a solution and, under relatively weak support assumptions,uniqueness. The framework and solution algorithms also generalize directly totensors of arbitrary dimensions while maintaining computational complexity thatis linear in problem size for fixed dimension d. In the context of recommendersystem (RS) applications, we prove that two reasonable properties that shouldbe expected to hold for any solution to the RS problem are sufficient to permituniqueness guarantees to be established within our framework. Key theoreticalcontributions include a general unit-consistent tensor-completion frameworkwith proofs of its properties, e.g., consensus-order and fairness, andalgorithms with optimal runtime and space complexities, e.g., O(1)term-completion with preprocessing complexity that is linear in the number ofknown terms of the matrix/tensor. From a practical perspective, the seamlessability of the framework to generalize to exploit high-dimensional structuralrelationships among key state variables, e.g., user and product attributes,offers a means for extracting significantly more information than is possiblefor alternative methods that cannot generalize beyond direct user-productrelationships. Finally, we propose our consensus ordering property as anadmissibility criterion for any proposed RS method.", "output": "Tensor Completion with Provable Consistency and Fairness Guarantees for Recommender Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "``Benign overfitting'', the ability of certain algorithms to interpolatenoisy training data and yet perform well out-of-sample, has been a topic ofconsiderable recent interest. We show, using a fixed design setup, that animportant class of predictors, kernel machines with translation-invariantkernels, does not exhibit benign overfitting in fixed dimensions. Inparticular, the estimated predictor does not converge to the ground truth withincreasing sample size, for any non-zero regression function and any (evenadaptive) bandwidth selection. To prove these results, we give exactexpressions for the generalization error, and its decomposition in terms of anapproximation error and an estimation error that elicits a trade-off based onthe selection of the kernel bandwidth. Our results apply to commonly usedtranslation-invariant kernels such as Gaussian, Laplace, and Cauchy.", "output": "On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A significant gap remains between today's visual pattern recognition modelsand human-level visual cognition especially when it comes to few-shot learningand compositional reasoning of novel concepts. We introduce Bongard-HOI, a newvisual reasoning benchmark that focuses on compositional learning ofhuman-object interactions (HOIs) from natural images. It is inspired by twodesirable characteristics from the classical Bongard problems (BPs): 1)few-shot concept learning, and 2) context-dependent reasoning. We carefullycurate the few-shot instances with hard negatives, where positive and negativeimages only disagree on action labels, making mere recognition of objectcategories insufficient to complete our benchmarks. We also design multipletest sets to systematically study the generalization of visual learning models,where we vary the overlap of the HOI concepts between the training and testsets of few-shot instances, from partial to no overlaps. Bongard-HOI presents asubstantial challenge to today's visual recognition models. Thestate-of-the-art HOI detection model achieves only 62% accuracy on few-shotbinary prediction while even amateur human testers on MTurk have 91% accuracy.With the Bongard-HOI benchmark, we hope to further advance research efforts invisual reasoning, especially in holistic perception-reasoning systems andbetter representation learning.", "output": "Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The most popular methods for measuring importance of the variables in a blackbox prediction algorithm make use of synthetic inputs that combine predictorvariables from multiple subjects. These inputs can be unlikely, physicallyimpossible, or even logically impossible. As a result, the predictions for suchcases can be based on data very unlike any the black box was trained on. Wethink that users cannot trust an explanation of the decision of a predictionalgorithm when the explanation uses such values. Instead we advocate a methodcalled Cohort Shapley that is grounded in economic game theory and unlike mostother game theoretic methods, it uses only actually observed data to quantifyvariable importance. Cohort Shapley works by narrowing the cohort of subjectsjudged to be similar to a target subject on one or more features. We illustrateit on an algorithmic fairness problem where it is essential to attributeimportance to protected variables that the model was not trained on.", "output": "Variable importance without impossible data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While diffusion models have shown great success in image generation, theirnoise-inverting generative process does not explicitly consider the structureof images, such as their inherent multi-scale nature. Inspired by diffusionmodels and the empirical success of coarse-to-fine modelling, we propose a newdiffusion-like model that generates images through stochastically reversing theheat equation, a PDE that locally erases fine-scale information when run overthe 2D plane of the image. We interpret the solution of the forward heatequation with constant additive noise as a variational approximation in thediffusion latent variable model. Our new model shows emergent qualitativeproperties not seen in standard diffusion models, such as disentanglement ofoverall colour and shape in images. Spectral analysis on natural imageshighlights connections to diffusion models and reveals an implicitcoarse-to-fine inductive bias in them.", "output": "Generative Modelling With Inverse Heat Dissipation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Densely annotating LiDAR point clouds is costly, which restrains thescalability of fully-supervised learning methods. In this work, we study theunderexplored semi-supervised learning (SSL) in LiDAR segmentation. Our coreidea is to leverage the strong spatial cues of LiDAR point clouds to betterexploit unlabeled data. We propose LaserMix to mix laser beams from differentLiDAR scans, and then encourage the model to make consistent and confidentpredictions before and after mixing. Our framework has three appealingproperties: 1) Generic: LaserMix is agnostic to LiDAR representations (e.g.,range view and voxel), and hence our SSL framework can be universally applied.2) Statistically grounded: We provide a detailed analysis to theoreticallyexplain the applicability of the proposed framework. 3) Effective:Comprehensive experimental analysis on popular LiDAR segmentation datasets(nuScenes, SemanticKITTI, and ScribbleKITTI) demonstrates our effectiveness andsuperiority. Notably, we achieve competitive results over fully-supervisedcounterparts with 2x to 5x fewer labels and improve the supervised-onlybaseline significantly by 10.8% on average. We hope this concise yethigh-performing framework could facilitate future research in semi-supervisedLiDAR segmentation. Code is publicly available.", "output": "LaserMix for Semi-Supervised LiDAR Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given its vast application on online social networks, Influence Maximization(IM) has garnered considerable attention over the last couple of decades. Dueto the intricacy of IM, most current research concentrates on estimating thefirst-order contribution of the nodes to select a seed set, disregarding thehigher-order interplay between different seeds. Consequently, the actualinfluence spread frequently deviates from expectations, and it remains unclearhow the seed set quantitatively contributes to this deviation. To address thisdeficiency, this work dissects the influence exerted on individual seeds andtheir higher-order interactions utilizing the Sobol index, a variance-basedsensitivity analysis. To adapt to IM contexts, seed selection is phrased asbinary variables and split into distributions of varying orders. Based on ouranalysis with various Sobol indices, an IM algorithm dubbed SIM is proposed toimprove the performance of current IM algorithms by over-selecting nodesfollowed by strategic pruning. A case study is carried out to demonstrate thatthe explanation of the impact effect can dependably identify the keyhigher-order interactions among seeds. SIM is empirically proved to be superiorin effectiveness and competitive in efficiency by experiments on synthetic andreal-world graphs.", "output": "Understanding Influence Maximization via Higher-Order Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven reduced-order models often fail to make accurate forecasts ofhigh-dimensional nonlinear dynamical systems that are sensitive alongcoordinates with low-variance because such coordinates are often truncated,e.g., by proper orthogonal decomposition, kernel principal component analysis,and autoencoders. Such systems are encountered frequently in shear-dominatedfluid flows where non-normality plays a significant role in the growth ofdisturbances. In order to address these issues, we employ ideas from activesubspaces to find low-dimensional systems of coordinates for model reductionthat balance adjoint-based information about the system's sensitivity with thevariance of states along trajectories. The resulting method, which we refer toas covariance balancing reduction using adjoint snapshots (CoBRAS), isanalogous to balanced truncation with state and adjoint-based gradientcovariance matrices replacing the system Gramians and obeying the same keytransformation laws. Here, the extracted coordinates are associated with anoblique projection that can be used to construct Petrov-Galerkin reduced-ordermodels. We provide an efficient snapshot-based computational method analogousto balanced proper orthogonal decomposition. This also leads to the observationthat the reduced coordinates can be computed relying on inner products of stateand gradient samples alone, allowing us to find rich nonlinear coordinates byreplacing the inner product with a kernel function. In these coordinates,reduced-order models can be learned using regression. We demonstrate thesetechniques and compare to a variety of other methods on a simple, yetchallenging three-dimensional system and a nonlinear axisymmetric jet flowsimulation with $10^5$ state variables.", "output": "Model Reduction for Nonlinear Systems by Balanced Truncation of State and Gradient Covariance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of graph structure identification, i.e., of recoveringthe graph of dependencies among time series. We model these time series data ascomponents of the state of linear stochastic networked dynamical systems. Weassume partial observability, where the state evolution of only a subset ofnodes comprising the network is observed. We devise a new feature vectorcomputed from the observed time series and prove that these features arelinearly separable, i.e., there exists a hyperplane that separates the clusterof features associated with connected pairs of nodes from those associated withdisconnected pairs. This renders the features amenable to train a variety ofclassifiers to perform causal inference. In particular, we use these featuresto train Convolutional Neural Networks (CNNs). The resulting causal inferencemechanism outperforms state-of-the-art counterparts w.r.t. sample-complexity.The trained CNNs generalize well over structurally distinct networks (dense orsparse) and noise-level profiles. Remarkably, they also generalize well toreal-world networks while trained over a synthetic network (realization of arandom graph). Finally, the proposed method consistently reconstructs the graphin a pairwise manner, that is, by deciding if an edge or arrow is present orabsent in each pair of nodes, from the corresponding time series of each pair.This fits the framework of large-scale systems, where observation or processingof all nodes in the network is prohibitive.", "output": "Recovering the Graph Underlying Networked Dynamical Systems under Partial Observability: A Deep Learning Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Network (Deep Learning) is a modern model in Artificial Intelligenceand it has been exploited in Survival Analysis. Although several improvementshave been shown by previous works, training an excellent deep learning modelrequires a huge amount of data, which may not hold in practice. To address thischallenge, we develop a Kullback-Leibler-based (KL) deep learning procedure tointegrate external survival prediction models with newly collectedtime-to-event data. Time-dependent KL discrimination information is utilized tomeasure the discrepancy between the external and internal data. This is thefirst work considering using prior information to deal with short data problemin Survival Analysis for deep learning. Simulation and real data results showthat the proposed model achieves better performance and higher robustnesscompared with previous works.", "output": "KL-divergence Based Deep Learning for Discrete Time Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How can we design protein sequences folding into the desired structureseffectively and efficiently? AI methods for structure-based protein design haveattracted increasing attention in recent years; however, few methods cansimultaneously improve the accuracy and efficiency due to the lack ofexpressive features and autoregressive sequence decoder. To address theseissues, we propose PiFold, which contains a novel residue featurizer and PiGNNlayers to generate protein sequences in a one-shot way with improved recovery.Experiments show that PiFold could achieve 51.66% recovery on CATH 4.2, whilethe inference speed is 70 times faster than the autoregressive competitors. Inaddition, PiFold achieves 58.72% and 60.42% recovery scores on TS50 andTS500, respectively. We conduct comprehensive ablation studies to reveal therole of different types of protein features and model designs, inspiringfurther simplification and improvement. The PyTorch code is available athref{", "output": "PiFold: Toward effective and efficient protein inverse folding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The method of random Fourier features (RFF), proposed in a seminal paper byRahimi and Recht (NIPS'07), is a powerful technique to find approximatelow-dimensional representations of points in (high-dimensional) kernel space,for shift-invariant kernels. While RFF has been analyzed under various notionsof error guarantee, the ability to preserve the kernel distance withemph{relative} error is less understood. We show that for a significant rangeof kernels, including the well-known Laplacian kernels, RFF cannot approximatethe kernel distance with small relative error using low dimensions. Wecomplement this by showing as long as the shift-invariant kernel is analytic,RFF with $mathrm{poly}(epsilon^{-1} log n)$ dimensions achieves$epsilon$-relative error for pairwise kernel distance of $n$ points, and thedimension bound is improved to $mathrm{poly}(epsilon^{-1}log k)$ for thespecific application of kernel $k$-means. Finally, going beyond RFF, we makethe first step towards data-oblivious dimension-reduction for generalshift-invariant kernels, and we obtain a similar $mathrm{poly}(epsilon^{-1}log n)$ dimension bound for Laplacian kernels. We also validate thedimension-error tradeoff of our methods on simulated datasets, and theydemonstrate superior performance compared with other popular methods includingrandom-projection and Nystr\"{o}m methods.", "output": "On The Relative Error of Random Fourier Features for Preserving Kernel Distance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian neural networks (BNNs) have received an increased interest in thelast years. In BNNs, a complete posterior distribution of the unknown weightand bias parameters of the network is produced during the training stage. Thisprobabilistic estimation offers several advantages with respect to point-wiseestimates, in particular, the ability to provide uncertainty quantificationwhen predicting new data. This feature inherent to the Bayesian paradigm, isuseful in countless machine learning applications. It is particularly appealingin areas where decision-making has a crucial impact, such as medical healthcareor autonomous driving. The main challenge of BNNs is the computational cost ofthe training procedure since Bayesian techniques often face a severe curse ofdimensionality. Adaptive importance sampling (AIS) is one of the most prominentMonte Carlo methodologies benefiting from sounded convergence guarantees andease for adaptation. This work aims to show that AIS constitutes a successfulapproach for designing BNNs. More precisely, we propose a novel algorithmPMCnet that includes an efficient adaptation mechanism, exploiting geometricinformation on the complex (often multimodal) posterior distribution. Numericalresults illustrate the excellent performance and the improved explorationcapabilities of the proposed method for both shallow and deep neural networks.", "output": "Efficient Bayes Inference in Neural Networks through Adaptive Importance Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifier-free guided diffusion models have recently been shown to be highlyeffective at high-resolution image generation, and they have been widely usedin large-scale diffusion frameworks including DALLE-2, Stable Diffusion andImagen. However, a downside of classifier-free guided diffusion models is thatthey are computationally expensive at inference time since they requireevaluating two diffusion models, a class-conditional model and an unconditionalmodel, tens to hundreds of times. To deal with this limitation, we propose anapproach to distilling classifier-free guided diffusion models into models thatare fast to sample from: Given a pre-trained classifier-free guided model, wefirst learn a single model to match the output of the combined conditional andunconditional models, and then we progressively distill that model to adiffusion model that requires much fewer sampling steps. For standard diffusionmodels trained on the pixel-space, our approach is able to generate imagesvisually comparable to that of the original model using as few as 4 samplingsteps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable tothat of the original model while being up to 256 times faster to sample from.For diffusion models trained on the latent-space (e.g., Stable Diffusion), ourapproach is able to generate high-fidelity images using as few as 1 to 4denoising steps, accelerating inference by at least 10-fold compared toexisting methods on ImageNet 256x256 and LAION datasets. We further demonstratethe effectiveness of our approach on text-guided image editing and inpainting,where our distilled model is able to generate high-quality results using as fewas 2-4 denoising steps.", "output": "On Distillation of Guided Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Progress in deep learning highlights the tremendous potential of utilizingdiverse robotic datasets for attaining effective generalization and makes itenticing to consider leveraging broad datasets for attaining robustgeneralization in robotic learning as well. However, in practice, we often wantto learn a new skill in a new environment that is unlikely to be contained inthe prior data. Therefore we ask: how can we leverage existing diverse offlinedatasets in combination with small amounts of task-specific data to solve newtasks, while still enjoying the generalization benefits of training on largeamounts of data? In this paper, we demonstrate that end-to-end offline RL canbe an effective approach for doing this, without the need for anyrepresentation learning or vision-based pre-training. We present pre-trainingfor robots (PTR), a framework based on offline RL that attempts to effectivelylearn new tasks by combining pre-training on existing robotic datasets withrapid fine-tuning on a new task, with as few as 10 demonstrations. PTR utilizesan existing offline RL method, conservative Q-learning (CQL), but extends it toinclude several crucial design decisions that enable PTR to actually work andoutperform a variety of prior methods. To our knowledge, PTR is the first RLmethod that succeeds at learning new tasks in a new domain on a real WidowXrobot with as few as 10 task demonstrations, by effectively leveraging anexisting dataset of diverse multi-task robot data collected in a variety of toykitchens. We also demonstrate that PTR can enable effective autonomousfine-tuning and improvement in a handful of trials, without needing anydemonstrations. An accompanying overview video can be found in thesupplementary material and at this anonymous URL:", "output": "Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new task to benchmark scene understanding of embodied agents:Situated Question Answering in 3D Scenes (SQA3D). Given a scene context (e.g.,3D scan), SQA3D requires the tested agent to first understand its situation(position, orientation, etc.) in the 3D scene as described by text, then reasonabout its surrounding environment and answer a question under that situation.Based upon 650 scenes from ScanNet, we provide a dataset centered around 6.8kunique situations, along with 20.4k descriptions and 33.4k diverse reasoningquestions for these situations. These questions examine a wide spectrum ofreasoning capabilities for an intelligent agent, ranging from spatial relationcomprehension to commonsense understanding, navigation, and multi-hopreasoning. SQA3D imposes a significant challenge to current multi-modalespecially 3D reasoning models. We evaluate various state-of-the-art approachesand find that the best one only achieves an overall score of 47.20%, whileamateur human participants can reach 90.06%. We believe SQA3D could facilitatefuture embodied AI research with stronger situation understanding and reasoningcapability.", "output": "SQA3D: Situated Question Answering in 3D Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning (ML) enabled classification models are becoming increasinglypopular for tackling the sheer volume and speed of online misinformation andother content that could be identified as harmful. In building these models,data scientists need to take a stance on the legitimacy, authoritativeness andobjectivity of the sources of ``truth\" used for model training and testing.This has political, ethical and epistemic implications which are rarelyaddressed in technical papers. Despite (and due to) their reported highaccuracy and performance, ML-driven moderation systems have the potential toshape online public debate and create downstream negative impacts such as unduecensorship and the reinforcing of false beliefs. Using collaborativeethnography and theoretical insights from social studies of science andexpertise, we offer a critical analysis of the process of building ML modelsfor (mis)information classification: we identify a series of algorithmiccontingencies--key moments during model development that could lead todifferent future outcomes, uncertainty and harmful effects as these tools aredeployed by social media platforms. We conclude by offering a tentative pathtoward reflexive and responsible development of ML tools for moderatingmisinformation and other harmful content online.", "output": "Addressing contingency in algorithmic (mis)information classification: Toward a responsible machine learning agenda."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate time prediction of patients' critical events is crucial in urgentscenarios where timely decision-making is important. Though many studies haveproposed automatic prediction methods using Electronic Health Records (EHR),their coarse-grained time resolutions limit their practical usage in urgentenvironments such as the emergency department (ED) and intensive care unit(ICU). Therefore, in this study, we propose an hourly prediction method basedon self-supervised predictive coding and multi-modal fusion for two criticaltasks: mortality and vasopressor need prediction. Through extensiveexperiments, we prove significant performance gains from both multi-modalfusion and self-supervised predictive regularization, most notably infar-future prediction, which becomes especially important in practice. Ouruni-modal/bi-modal/bi-modal self-supervision scored 0.846/0.877/0.897(0.824/0.855/0.886) and 0.817/0.820/0.858 (0.807/0.81/0.855) with mortality(far-future mortality) and with vasopressor need (far-future vasopressor need)prediction data in AUROC, respectively.", "output": "Self-Supervised Predictive Coding with Multimodal Fusion for Patient Deterioration Prediction in Fine-grained Time Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of iterative machine teaching, where a teachersequentially provides examples based on the status of a learner under adiscrete input space (i.e., a pool of finite samples), which greatly limits theteacher's capability. To address this issue, we study iterative teaching undera continuous input space where the input example (i.e., image) can be eithergenerated by solving an optimization problem or drawn directly from acontinuous distribution. Specifically, we propose data hallucination teaching(DHT) where the teacher can generate input data intelligently based on labels,the learner's status and the target concept. We study a number of challengingteaching setups (e.g., linear/neural learners in omniscient and black-boxsettings). Extensive empirical results verify the effectiveness of DHT.", "output": "Iterative Teaching by Data Hallucination."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of computing mixed Nash equilibria of two-playerzero-sum games with continuous sets of pure strategies and with first-orderaccess to the payoff function. This problem arises for example ingame-theory-inspired machine learning applications, such asdistributionally-robust learning. In those applications, the strategy sets arehigh-dimensional and thus methods based on discretisation cannot tractablyreturn high-accuracy solutions.In this paper, we introduce and analyze a particle-based method that enjoysguaranteed local convergence for this problem. This method consists inparametrizing the mixed strategies as atomic measures and applying proximalpoint updates to both the atoms' weights and positions. It can be interpretedas a time-implicit discretization of the \"interacting\" Wasserstein-Fisher-Raogradient flow.We prove that, under non-degeneracy assumptions, this method converges at anexponential rate to the exact mixed Nash equilibrium from any initializationsatisfying a natural notion of closeness to optimality. We illustrate ourresults with numerical experiments and discuss applications to max-margin anddistributionally-robust classification using two-layer neural networks, whereour method has a natural interpretation as a simultaneous training of thenetwork's weights and of the adversarial distribution.", "output": "An Exponentially Converging Particle Method for the Mixed Nash Equilibrium of Continuous Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The task of testing whether two uncharacterized quantum devices behave in thesame way is crucial for benchmarking near-term quantum computers and quantumsimulators, but has so far remained open for continuous-variable quantumsystems. In this Letter, we develop a machine learning algorithm for comparingunknown continuous variable states using limited and noisy data. The algorithmworks on non-Gaussian quantum states for which similarity testing could not beachieved with previous techniques. Our approach is based on a convolutionalneural network that assesses the similarity of quantum states based on alower-dimensional state representation built from measurement data. The networkcan be trained offline with classically simulated data from a fiducial set ofstates sharing structural similarities with the states to be tested, or withexperimental data generated by measurements on the fiducial states, or with acombination of simulated and experimental data. We test the performance of themodel on noisy cat states and states generated by arbitrary selectivenumber-dependent phase gates. Our network can also be applied to the problem ofcomparing continuous variable states across different experimental platforms,with different sets of achievable measurements, and to the problem ofexperimentally testing whether two states are equivalent up to Gaussian unitarytransformations.", "output": "Quantum Similarity Testing with Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Causal DAG(Directed Acyclic Graph) usually lies in a 2D plane withoutdistinguishing correlation changes and causal effects. Also, the causal effectis often approximately estimated by averaging the population's correlationchanges. Now, AI(Artificial Intelligence) enables much larger-scale structuralmodeling, whose complex hidden confoundings make the approximation errors nolonger ignorable but can snowball to considerable population-level CausalRepresentation Bias. Such bias has caused significant problems: ungeneralizablecausal models, unrevealed individual-level features, not utilizable causalknowledge in DL(Deep Learning), etc. In short, DAG must be redefined to enablea new framework for causal AI.Observational time series can only reflect correlation changes in statistics.But the DL-based autoencoder can represent them as individual-level featurechanges in latent space to reflect causal effects. In this paper, we introducethe redefined do-DAG concept and propose Causal Representation Learning (CRL)framework as the generic solution, along with a novel architecture to realizeCRL and experimentally verify its feasibility.", "output": "Realization of Causal Representation Learning and Redefined DAG for Causal AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose Token Turing Machines (TTM), a sequential, autoregressiveTransformer model with memory for real-world sequential visual understanding.Our model is inspired by the seminal Neural Turing Machine, and has an externalmemory consisting of a set of tokens which summarise the previous history(i.e., frames). This memory is efficiently addressed, read and written using aTransformer as the processing unit/controller at each step. The model's memorymodule ensures that a new observation will only be processed with the contentsof the memory (and not the entire history), meaning that it can efficientlyprocess long sequences with a bounded computational cost at each step. We showthat TTM outperforms other alternatives, such as other Transformer modelsdesigned for long sequences and recurrent neural networks, on two real-worldsequential visual understanding tasks: online temporal activity detection fromvideos and vision-based robot action policy learning.Code is publicly available at:", "output": "Token Turing Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing Graph Neural Networks (GNNs) follow the message-passing mechanismthat conducts information interaction among nodes iteratively. Whileconsiderable progress has been made, such node interaction paradigms still havethe following limitation. First, the scalability limitation precludes the broadapplication of GNNs in large-scale industrial settings since the nodeinteraction among rapidly expanding neighbors incurs high computation andmemory costs. Second, the over-smoothing problem restricts the discriminationability of nodes, i.e., node representations of different classes will convergeto indistinguishable after repeated node interactions. In this work, we proposea novel hop interaction paradigm to address these limitations simultaneously.The core idea is to convert the interaction target among nodes to pre-processedmulti-hop features inside each node. We design a simple yet effective HopGNNframework that can easily utilize existing GNNs to achieve hop interaction.Furthermore, we propose a multi-task learning strategy with a self-supervisedlearning objective to enhance HopGNN. We conduct extensive experiments on 12benchmark datasets in a wide range of domains, scales, and smoothness ofgraphs. Experimental results show that our methods achieve superior performancewhile maintaining high scalability and efficiency. The code is at", "output": "From Node Interaction to Hop Interaction: New Effective and Scalable Graph Learning Paradigm."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the rapid advancements in recent years, medical image analysis islargely dominated by deep learning (DL). However, building powerful and robustDL models requires training with large multi-party datasets. While multiplestakeholders have provided publicly available datasets, the ways in which thesedata are labeled vary widely. For Instance, an institution might provide adataset of chest radiographs containing labels denoting the presence ofpneumonia, while another institution might have a focus on determining thepresence of metastases in the lung. Training a single AI model utilizing allthese data is not feasible with conventional federated learning (FL). Thisprompts us to propose an extension to the widespread FL process, namelyflexible federated learning (FFL) for collaborative training on such data.Using 695,000 chest radiographs from five institutions from across the globe -each with differing labels - we demonstrate that having heterogeneously labeleddatasets, FFL-based training leads to significant performance increase comparedto conventional FL training, where only the uniformly annotated images areutilized. We believe that our proposed algorithm could accelerate the processof bringing collaborative training methods from research and simulation phaseto the real-world applications in healthcare.", "output": "Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Our goal with this survey is to provide an overview of the state of the artdeep learning technologies for face generation and editing. We will coverpopular latest architectures and discuss key ideas that make them work, such asinversion, latent representation, loss functions, training procedures, editingmethods, and cross domain style transfer. We particularly focus on GAN-basedarchitectures that have culminated in the StyleGAN approaches, which allowgeneration of high-quality face images and offer rich interfaces forcontrollable semantics editing and preserving photo quality. We aim to providean entry point into the field for readers that have basic knowledge about thefield of deep learning and are looking for an accessible introduction andoverview.", "output": "Face Generation and Editing with StyleGAN: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new approach to learning the subgrid-scale model when simulatingpartial differential equations (PDEs) solved by the method of lines and theirrepresentation in chaotic ordinary differential equations, based on neuralordinary differential equations (NODEs). Solving systems with fine temporal andspatial grid scales is an ongoing computational challenge, and closure modelsare generally difficult to tune. Machine learning approaches have increased theaccuracy and efficiency of computational fluid dynamics solvers. In thisapproach neural networks are used to learn the coarse- to fine-grid map, whichcan be viewed as subgrid-scale parameterization. We propose a strategy thatuses the NODE and partial knowledge to learn the source dynamics at acontinuous level. Our method inherits the advantages of NODEs and can be usedto parameterize subgrid scales, approximate coupling operators, and improve theefficiency of low-order solvers. Numerical results with the two-scale Lorenz 96ODE, the convection-diffusion PDE, and the viscous Burgers' PDE are used toillustrate this approach.", "output": "Learning Subgrid-scale Models with Neural Ordinary Differential Equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is a framework for users to jointly train a machinelearning model. FL is promoted as a privacy-enhancing technology (PET) thatprovides data minimization: data never \"leaves\" personal devices and usersshare only model updates with a server (e.g., a company) coordinating thedistributed training. While prior work showed that in vanilla FL a maliciousserver can extract users' private data from the model updates, in this work wetake it further and demonstrate that a malicious server can reconstruct userdata even in hardened versions of the protocol. More precisely, we propose anattack against FL protected with distributed differential privacy (DDP) andsecure aggregation (SA). Our attack method is based on the introduction ofsybil devices that deviate from the protocol to expose individual users' datafor reconstruction by the server. The underlying root cause for thevulnerability to our attack is a power imbalance: the server orchestrates thewhole protocol and users are given little guarantees about the selection ofother users participating in the protocol. Moving forward, we discussrequirements for privacy guarantees in FL. We conclude that users should onlyparticipate in the protocol when they trust the server or they apply localprimitives such as local DP, shifting power away from the server. Yet, thelatter approaches come at significant overhead in terms of performancedegradation of the trained model, making them less likely to be deployed inpractice.", "output": "Reconstructing Individual Data Points in Federated Learning Hardened with Differential Privacy and Secure Aggregation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "$L_{p}$-norm regularization schemes such as $L_{0}$, $L_{1}$, and$L_{2}$-norm regularization and $L_{p}$-norm-based regularization techniquessuch as weight decay and group LASSO compute a quantity which depends on modelweights considered in isolation from one another. This paper describes a novelregularizer which is not based on an $L_{p}$-norm. In contrast with$L_{p}$-norm-based regularization, this regularizer is concerned with thespatial arrangement of weights within a weight matrix. This regularizer is anadditive term for the loss function and is differentiable, simple and fast tocompute, scale-invariant, requires a trivial amount of additional memory, andcan easily be parallelized. Empirically this method yields approximately a oneorder-of-magnitude improvement in the number of nonzero model parameters at agiven level of accuracy.", "output": "A Novel Sparse Regularizer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper demonstrates an approach for learning highly semantic imagerepresentations without relying on hand-crafted data-augmentations. Weintroduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), anon-generative approach for self-supervised learning from images. The ideabehind I-JEPA is simple: from a single context block, predict therepresentations of various target blocks in the same image. A core designchoice to guide I-JEPA towards producing semantic representations is themasking strategy; specifically, it is crucial to (a) sample target blocks withsufficiently large scale (semantic), and to (b) use a sufficiently informative(spatially distributed) context block. Empirically, when combined with VisionTransformers, we find I-JEPA to be highly scalable. For instance, we train aViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strongdownstream performance across a wide range of tasks, from linear classificationto object counting and depth prediction.", "output": "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Architecture Search (NAS) is widely used to automatically obtain theneural network with the best performance among a large number of candidatearchitectures. To reduce the search time, zero-shot NAS aims at designingtraining-free proxies that can predict the test performance of a givenarchitecture. However, as shown recently, none of the zero-shot proxiesproposed to date can actually work consistently better than a naive proxy,namely, the number of network parameters (#Params). To improve this state ofaffairs, as the main theoretical contribution, we first reveal how somespecific gradient properties across different samples impact the convergencerate and generalization capacity of neural networks. Based on this theoreticalanalysis, we propose a new zero-shot proxy, ZiCo, the first proxy that worksconsistently better than #Params. We demonstrate that ZiCo works better thanState-Of-The-Art (SOTA) proxies on several popular NAS-Benchmarks (NASBench101,NATSBench-SSS/TSS, TransNASBench-101) for multiple applications (e.g., imageclassification/reconstruction and pixel-level prediction). Finally, wedemonstrate that the optimal architectures found via ZiCo are as competitive asthe ones found by one-shot and multi-shot NAS methods, but with much lesssearch time. For example, ZiCo-based NAS can find optimal architectures with78.1%, 79.4%, and 80.4% test accuracy under inference budgets of 450M, 600M,and 1000M FLOPs, respectively, on ImageNet within 0.4 GPU days. Our code isavailable at ", "output": "ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present two new classes of algorithms for efficient field integration ongraphs encoding point clouds. The first class, SeparatorFactorization(SF),leverages the bounded genus of point cloud mesh graphs, while the second class,RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representationsfor point clouds. Both can be viewed as providing the functionality of FastMultipole Methods (FMMs), which have had a tremendous impact on efficientintegration, but for non-Euclidean spaces. We focus on geometries induced bydistributions of walk lengths between points (e.g., shortest-path distance). Weprovide an extensive theoretical analysis of our algorithms, obtaining newresults in structural graph theory as a byproduct. We also perform exhaustiveempirical evaluation, including on-surface interpolation for rigid anddeformable objects (particularly for mesh-dynamics modeling), Wassersteindistance computations for point clouds, and the Gromov-Wasserstein variant.", "output": "Efficient Graph Field Integrators Meet Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reliability of machine learning evaluation -- the consistency of observedevaluation scores across replicated model training runs -- is affected byseveral sources of nondeterminism which can be regarded as measurement noise.Current tendencies to remove noise in order to enforce reproducibility ofresearch results neglect inherent nondeterminism at the implementation leveland disregard crucial interaction effects between algorithmic noise factors anddata properties. This limits the scope of conclusions that can be drawn fromsuch experiments. Instead of removing noise, we propose to incorporate severalsources of variance, including their interaction with data properties, into ananalysis of significance and reliability of machine learning evaluation, withthe aim to draw inferences beyond particular instances of trained models. Weshow how to use linear mixed effects models (LMEMs) to analyze performanceevaluation scores, and to conduct statistical inference with a generalizedlikelihood ratio test (GLRT). This allows us to incorporate arbitrary sourcesof noise like meta-parameter variations into statistical significance testing,and to assess performance differences conditional on data properties.Furthermore, a variance component analysis (VCA) enables the analysis of thecontribution of noise sources to overall variance and the computation of areliability coefficient by the ratio of substantial to total variance.", "output": "Towards Inferential Reproducibility of Machine Learning Research."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over the last few years, key architectural advances have been proposed forneural network interatomic potentials (NNIPs), such as incorporatingmessage-passing networks, equivariance, or many-body expansion terms. Althoughmodern NNIP models exhibit small differences in energy/forces errors,improvements in accuracy are still considered the main target when developingnew NNIP architectures. In this work, we show how architectural andoptimization choices influence the generalization of NNIPs, revealing trends inmolecular dynamics (MD) stability, data efficiency, and loss landscapes. Usingthe 3BPA dataset, we show that test errors in NNIP follow a scaling relationand can be robust to noise, but cannot predict MD stability in thehigh-accuracy regime. To circumvent this problem, we propose the use of losslandscape visualizations and a metric of loss entropy for predicting thegeneralization power of NNIPs. With a large-scale study on NequIP and MACE, weshow that the loss entropy predicts out-of-distribution error and MD stabilitydespite being computed only on the training set. Using this probe, wedemonstrate how the choice of optimizers, loss function weighting, datanormalization, and other architectural decisions influence the extrapolationbehavior of NNIPs. Finally, we relate loss entropy to data efficiency,demonstrating that flatter landscapes also predict learning curve slopes. Ourwork provides a deep learning justification for the extrapolation performanceof many common NNIPs, and introduces tools beyond accuracy metrics that can beused to inform the development of next-generation models.", "output": "Data efficiency and extrapolation trends in neural network interatomic potentials."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces Block Data Representations (BDR), a framework forexploring and evaluating a wide spectrum of narrow-precision formats for deeplearning. It enables comparison of popular quantization standards, and throughBDR, new formats based on shared microexponents (MX) are identified, whichoutperform other state-of-the-art quantization approaches, includingnarrow-precision floating-point and block floating-point. MX utilizes multiplelevels of quantization scaling with ultra-fine scaling factors based on sharedmicroexponents in the hardware. The effectiveness of MX is demonstrated onreal-world models including large-scale generative pretraining and inferencing,and production-scale recommendation systems.", "output": "With Shared Microexponents, A Little Shifting Goes a Long Way."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning agent behaviors from observational data has shown to improve ourunderstanding of their decision-making processes, advancing our ability toexplain their interactions with the environment and other agents. Whilemultiple learning techniques have been proposed in the literature, there is oneparticular setting that has not been explored yet: multi agent systems whereagent identities remain anonymous. For instance, in financial markets labeleddata that identifies market participant strategies is typically proprietary,and only the anonymous state-action pairs that result from the interaction ofmultiple market participants are publicly available. As a result, sequences ofagent actions are not observable, restricting the applicability of existingwork. In this paper, we propose a Policy Clustering algorithm, called K-SHAP,that learns to group anonymous state-action pairs according to the agentpolicies. We frame the problem as an Imitation Learning (IL) task, and we learna world-policy able to mimic all the agent behaviors upon differentenvironmental states. We leverage the world-policy to explain each anonymousobservation through an additive feature attribution method called SHAP (SHapleyAdditive exPlanations). Finally, by clustering the explanations we show that weare able to identify different agent policies and group observationsaccordingly. We evaluate our approach on simulated synthetic market data and areal-world financial dataset. We show that our proposal significantly andconsistently outperforms the existing methods, identifying different agentstrategies.", "output": "K-SHAP: Policy Clustering Algorithm for Anonymous State-Action Pairs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning has been widely utilized to mitigate the data scarcityproblem in the field of Alzheimer's disease (AD). Conventional transferlearning relies on re-using models trained on AD-irrelevant tasks such asnatural image classification. However, it often leads to negative transfer dueto the discrepancy between the non-medical source and target medical domains.To address this, we present evidence-empowered transfer learning for ADdiagnosis. Unlike conventional approaches, we leverage an AD-relevant auxiliarytask, namely morphological change prediction, without requiring additional MRIdata. In this auxiliary task, the diagnosis model learns the evidential andtransferable knowledge from morphological features in MRI scans. Experimentalresults demonstrate that our framework is not only effective in improvingdetection performance regardless of model capacity, but also moredata-efficient and faithful.", "output": "Evidence-empowered Transfer Learning for Alzheimer's Disease."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have taken the scientific world by storm,changing the landscape of natural language processing and human-computerinteraction. These powerful tools can answer complex questions and,surprisingly, perform challenging creative tasks (e.g., generate code andapplications to solve problems, write stories, pieces of music, etc.). In thispaper, we present a collaborative game design framework that combinesinteractive evolution and large language models to simulate the typical humandesign process. We use the former to exploit users' feedback for selecting themost promising ideas and large language models for a very complex creative task- the recombination and variation of ideas. In our framework, the processstarts with a brief and a set of candidate designs, either generated using alanguage model or proposed by the users. Next, users collaborate on the designprocess by providing feedback to an interactive genetic algorithm that selects,recombines, and mutates the most promising designs. We evaluated our frameworkon three game design tasks with human designers who collaborated remotely.", "output": "ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rendering and inverse-rendering algorithms that drive conventional computergraphics have recently been superseded by neural representations (NR). NRs haverecently been used to learn the geometric and the material properties of thescenes and use the information to synthesize photorealistic imagery, therebypromising a replacement for traditional rendering algorithms with scalablequality and predictable performance. In this work we ask the question: Doesneural graphics (NG) need hardware support? We studied representative NGapplications showing that, if we want to render 4k res. at 60FPS there is a gapof 1.5X-55X in the desired performance on current GPUs. For AR/VR applications,there is an even larger gap of 2-4 OOM between the desired performance and therequired system power. We identify that the input encoding and the MLP kernelsare the performance bottlenecks, consuming 72%,60% and 59% of application timefor multi res. hashgrid, multi res. densegrid and low res. densegrid encodings,respectively. We propose a NG processing cluster, a scalable and flexiblehardware architecture that directly accelerates the input encoding and MLPkernels through dedicated engines and supports a wide range of NG applications.We also accelerate the rest of the kernels by fusing them together in Vulkan,which leads to 9.94X kernel-level performance improvement compared to un-fusedimplementation of the pre-processing and the post-processing kernels. Ourresults show that, NGPC gives up to 58X end-to-end application-levelperformance improvement, for multi res. hashgrid encoding on average across thefour NG applications, the performance benefits are 12X,20X,33X and 39X for thescaling factor of 8,16,32 and 64, respectively. Our results show that withmulti res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPSfor NeRF and 8k res. at 120FPS for all our other NG applications.", "output": "Hardware Acceleration of Neural Graphics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of finding the optimal value of n in the n-steptemporal difference (TD) learning algorithm. We find the optimal n by resortingto a model-free optimization technique involving a one-simulation simultaneousperturbation stochastic approximation (SPSA) based procedure that we adopt tothe discrete optimization setting by using a random projection approach. Weprove the convergence of our proposed algorithm, SDPSA, using a differentialinclusions approach and show that it finds the optimal value of n in n-step TD.Through experiments, we show that the optimal value of n is achieved with SDPSAfor arbitrary initial values.", "output": "n-Step Temporal Difference Learning with Optimal n."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have emerged as a popular strategy for handlingnon-Euclidean data due to their state-of-the-art performance. However, most ofthe current GNN model designs mainly focus on task accuracy, lacking inconsidering hardware resources limitation and real-time requirements of edgeapplication scenarios. Comprehensive profiling of typical GNN models indicatesthat their execution characteristics are significantly affected acrossdifferent computing platforms, which demands hardware awareness for efficientGNN designs. In this work, HGNAS is proposed as the first Hardware-aware GraphNeural Architecture Search framework targeting resource constraint edgedevices. By decoupling the GNN paradigm, HGNAS constructs a fine-grained designspace and leverages an efficient multi-stage search strategy to explore optimalarchitectures within a few GPU hours. Moreover, HGNAS achieves hardwareawareness during the GNN architecture design by leveraging a hardwareperformance predictor, which could balance the GNN model accuracy andefficiency corresponding to the characteristics of targeted devices.Experimental results show that HGNAS can achieve about $10.6times$ speedup and$88.2%$ peak memory reduction with a negligible accuracy loss compared toDGCNN on various edge devices, including Nvidia RTX3080, Jetson TX2, Inteli7-8700K and Raspberry Pi 3B+.", "output": "Hardware-Aware Graph Neural Network Automated Design for Edge Computing Platforms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vibration signals have been increasingly utilized in various engineeringfields for analysis and monitoring purposes, including structural healthmonitoring, fault diagnosis and damage detection, where vibration signals canprovide valuable information about the condition and integrity of structures.In recent years, there has been a growing trend towards the use of vibrationsignals in the field of bioengineering. Activity-induced structural vibrations,particularly footstep-induced signals, are useful for analyzing the movement ofbiological systems such as the human body and animals, providing valuableinformation regarding an individual's gait, body mass, and posture, making theman attractive tool for health monitoring, security, and human-computerinteraction. However, the presence of various types of noise can compromise theaccuracy of footstep-induced signal analysis. In this paper, we propose a novelensemble model that leverages both the ensemble of multiple signals and ofrecurrent and convolutional neural network predictions. The proposed modelconsists of three stages: preprocessing, hybrid modeling, and ensemble. In thepreprocessing stage, features are extracted using the Fast Fourier Transformand wavelet transform to capture the underlying physics-governed dynamics ofthe system and extract spatial and temporal features. In the hybrid modelingstage, a bi-directional LSTM is used to denoise the noisy signal concatenatedwith FFT results, and a CNN is used to obtain a condensed featurerepresentation of the signal. In the ensemble stage, three layers of afully-connected neural network are used to produce the final denoised signal.The proposed model addresses the challenges associated with structuralvibration signals, which outperforms the prevailing algorithms for a wide rangeof noise levels, evaluated using PSNR, SNR, and WMAPE.", "output": "Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid CNN-RNN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Executing machine learning inference tasks on resource-constrained edgedevices requires careful hardware-software co-design optimizations. Recentexamples have shown how transformer-based deep neural network models such asALBERT can be used to enable the execution of natural language processing (NLP)inference on mobile systems-on-chip housing custom hardware accelerators.However, while these existing solutions are effective in alleviating thelatency, energy, and area costs of running single NLP tasks, achievingmulti-task inference requires running computations over multiple variants ofthe model parameters, which are tailored to each of the targeted tasks. Thisapproach leads to either prohibitive on-chip memory requirements or paying thecost of off-chip memory access. This paper proposes adapter-ALBERT, anefficient model optimization for maximal data reuse across different tasks. Theproposed model's performance and robustness to data compression methods areevaluated across several language tasks from the GLUE benchmark. Additionally,we demonstrate the advantage of mapping the model to a heterogeneous on-chipmemory architecture by performing simulations on a validated NLP edgeaccelerator to extrapolate performance, power, and area improvements over theexecution of a traditional ALBERT model on the same hardware platform.", "output": "Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present an artificial intelligence system to remotely assess the motorperformance of individuals with Parkinson's disease (PD). Participantsperformed a motor task (i.e., tapping fingers) in front of a webcam, and datafrom 250 global participants were rated by three expert neurologists followingthe Movement Disorder Society Unified Parkinson's Disease Rating Scale(MDS-UPDRS). The neurologists' ratings were highly reliable, with anintra-class correlation coefficient (ICC) of 0.88. We developed computeralgorithms to obtain objective measurements that align with the MDS-UPDRSguideline and are strongly correlated with the neurologists' ratings. Ourmachine learning model trained on these measures outperformed an MDS-UPDRScertified rater, with a mean absolute error (MAE) of 0.59 compared to therater's MAE of 0.79. However, the model performed slightly worse than theexpert neurologists (0.53 MAE). The methodology can be replicated for similarmotor tasks, providing the possibility of evaluating individuals with PD andother movement disorders remotely, objectively, and in areas with limitedaccess to neurological care.", "output": "Using AI to Measure Parkinson's Disease Severity at Home."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Frank-Wolfe algorithm is a popular method in structurally constrainedmachine learning applications, due to its fast per-iteration complexity.However, one major limitation of the method is a slow rate of convergence thatis difficult to accelerate due to erratic, zig-zagging step directions, evenasymptotically close to the solution. We view this as an artifact ofdiscretization; that is to say, the Frank-Wolfe emph{flow}, which is itstrajectory at asymptotically small step sizes, does not zig-zag, and reducingdiscretization error will go hand-in-hand in producing a more stabilizedmethod, with better convergence properties. We propose two improvements: amultistep Frank-Wolfe method that directly applies optimized higher-orderdiscretization schemes; and an LMO-averaging scheme with reduced discretizationerror, and whose local convergence rate over general convex sets acceleratesfrom a rate of $O(1/k)$ to up to $O(1/k^{3/2})$.", "output": "Reducing Discretization Error in the Frank-Wolfe Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning image classification and image generation using the same set ofnetwork parameters is a challenging problem. Recent advanced approaches performwell in one task often exhibit poor performance in the other. This workintroduces an energy-based classifier and generator, namely EGC, which canachieve superior performance in both tasks using a single neural network.Unlike a conventional classifier that outputs a label given an image (i.e., aconditional distribution $p(y|mathbf{x})$), the forward pass in EGC is aclassifier that outputs a joint distribution $p(mathbf{x},y)$, enabling animage generator in its backward pass by marginalizing out the label $y$. Thisis done by estimating the energy and classification probability given a noisyimage in the forward pass, while denoising it using the score functionestimated in the backward pass. EGC achieves competitive generation resultscompared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUNChurch, while achieving superior classification accuracy and robustness againstadversarial attacks on CIFAR-10. This work represents the first successfulattempt to simultaneously excel in both tasks using a single set of networkparameters. We believe that EGC bridges the gap between discriminative andgenerative learning.", "output": "EGC: Image Generation and Classification via a Diffusion Energy-Based Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Response curves exhibit the magnitude of the response of a sensitive systemto a varying stimulus. However, response of such systems may be sensitive tomultiple stimuli (i.e., input features) that are not necessarily independent.As a consequence, the shape of response curves generated for a selected inputfeature (referred to as \"active feature\") might depend on the values of theother input features (referred to as \"passive features\"). In this work, weconsider the case of systems whose response is approximated using regressionneural networks. We propose to use counterfactual explanations (CFEs) for theidentification of the features with the highest relevance on the shape ofresponse curves generated by neural network black boxes. CFEs are generated bya genetic algorithm-based approach that solves a multi-objective optimizationproblem. In particular, given a response curve generated for an active feature,a CFE finds the minimum combination of passive features that need to bemodified to alter the shape of the response curve. We tested our method on asynthetic dataset with 1-D inputs and two crop yield prediction datasets with2-D inputs. The relevance ranking of features and feature combinations obtainedon the synthetic dataset coincided with the analysis of the equation that wasused to generate the problem. Results obtained on the yield prediction datasetsrevealed that the impact on fertilizer responsivity of passive features dependson the terrain characteristics of each field.", "output": "Counterfactual Explanations of Neural Network-Generated Response Curves."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In applied fields where the speed of inference and model flexibility arecrucial, the use of Bayesian inference for models with a stochastic process astheir prior, e.g. Gaussian processes (GPs) is ubiquitous. Recent literature hasdemonstrated that the computational bottleneck caused by GP priors or theirfinite realizations can be encoded using deep generative models such asvariational autoencoders (VAEs), and the learned generators can then be usedinstead of the original priors during Markov chain Monte Carlo (MCMC) inferencein a drop-in manner. While this approach enables fast and highly efficientinference, it loses information about the stochastic process hyperparameters,and, as a consequence, makes inference over hyperparameters impossible and thelearned priors indistinct. We propose to resolve this issue and disentangle thelearned priors by conditioning the VAE on stochastic process hyperparameters.This way, the hyperparameters are encoded alongside GP realisations and can beexplicitly estimated at the inference stage. We believe that the new method,termed PriorCVAE, will be a useful tool among approximate inference approachesand has the potential to have a large impact on spatial and spatiotemporalinference in crucial real-life applications. Code showcasing PriorCVAE can befound on GitHub: ", "output": "PriorCVAE: scalable MCMC parameter inference with Bayesian deep generative modelling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human intelligence has the remarkable ability to assemble basic skills intocomplex ones so as to solve complex tasks. This ability is equally importantfor Artificial Intelligence (AI), and thus, we assert that in addition to thedevelopment of large, comprehensive intelligent models, it is equally crucialto equip such models with the capability to harness various domain-specificexpert models for complex task-solving in the pursuit of Artificial GeneralIntelligence (AGI). Recent developments in Large Language Models (LLMs) havedemonstrated remarkable learning and reasoning abilities, making them promisingas a controller to select, synthesize, and execute external models to solvecomplex tasks. In this project, we develop OpenAGI, an open-source AGI researchplatform, specifically designed to offer complex, multi-step tasks andaccompanied by task-specific datasets, evaluation metrics, and a diverse rangeof extensible models. OpenAGI formulates complex tasks as natural languagequeries, serving as input to the LLM. The LLM subsequently selects,synthesizes, and executes models provided by OpenAGI to address the task.Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF)mechanism, which uses the task-solving result as feedback to improve the LLM'stask-solving ability. Thus, the LLM is responsible for synthesizing variousexternal models for solving complex tasks, while RLTF provides feedback toimprove its task-solving ability, enabling a feedback loop for self-improvingAI. We believe that the paradigm of LLMs operating various expert models forcomplex task-solving is a promising approach towards AGI. To facilitate thecommunity's long-term improvement and evaluation of AGI's ability, weopen-source the code, benchmark, and evaluation methods of the OpenAGI projectat ", "output": "OpenAGI: When LLM Meets Domain Experts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is a new distributed learning paradigm, with privacy,utility, and efficiency as its primary pillars. Existing research indicatesthat it is unlikely to simultaneously attain infinitesimal privacy leakage,utility loss, and efficiency. Therefore, how to find an optimal trade-offsolution is the key consideration when designing the FL algorithm. One commonway is to cast the trade-off problem as a multi-objective optimization problem,i.e., the goal is to minimize the utility loss and efficiency reduction whileconstraining the privacy leakage not exceeding a predefined value. However,existing multi-objective optimization frameworks are very time-consuming, anddo not guarantee the existence of the Pareto frontier, this motivates us toseek a solution to transform the multi-objective problem into asingle-objective problem because it is more efficient and easier to be solved.To this end, we propose FedPAC, a unified framework that leverages PAC learningto quantify multiple objectives in terms of sample complexity, suchquantification allows us to constrain the solution space of multiple objectivesto a shared dimension, so that it can be solved with the help of asingle-objective optimization algorithm. Specifically, we provide the resultsand detailed analyses of how to quantify the utility loss, privacy leakage,privacy-utility-efficiency trade-off, as well as the cost of the attacker fromthe PAC learning perspective.", "output": "Probably Approximately Correct Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Score-based diffusion models learn to reverse a stochastic differentialequation that maps data to noise. However, for complex tasks, numerical errorcan compound and result in highly unnatural samples. Previous work mitigatesthis drift with thresholding, which projects to the natural data domain (suchas pixel space for images) after each diffusion step, but this leads to amismatch between the training and generative processes. To incorporate dataconstraints in a principled manner, we present Reflected Diffusion Models,which instead reverse a reflected stochastic differential equation evolving onthe support of the data. Our approach learns the perturbed score functionthrough a generalized score matching loss and extends key components ofstandard diffusion models including diffusion guidance, likelihood-basedtraining, and ODE sampling. We also bridge the theoretical gap withthresholding: such schemes are just discretizations of reflected SDEs. Onstandard image benchmarks, our method is competitive with or surpasses thestate of the art and, for classifier-free guidance, our approach enables fastexact sampling with ODEs and produces more faithful samples under high guidanceweight.", "output": "Reflected Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a deep-learning based approach for measuring small planetaryradial velocities in the presence of stellar variability. We use neuralnetworks to reduce stellar RV jitter in three years of HARPS-N sun-as-a-starspectra. We develop and compare dimensionality-reduction and data splittingmethods, as well as various neural network architectures including single lineCNNs, an ensemble of single line CNNs, and a multi-line CNN. We injectplanet-like RVs into the spectra and use the network to recover them. We findthat the multi-line CNN is able to recover planets with 0.2 m/s semi-amplitude,50 day period, with 8.8% error in the amplitude and 0.7% in the period. Thisapproach shows promise for mitigating stellar RV variability and enabling thedetection of small planetary RVs with unprecedented precision.", "output": "Deep-learning based measurement of planetary radial velocities in the presence of stellar variability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent data regulations necessitate machine unlearning (MU): The removal ofthe effect of specific examples from the model. While exact unlearning ispossible by conducting a model retraining with the remaining data from scratch,its computational cost has led to the development of approximate but efficientunlearning schemes. Beyond data-centric MU solutions, we advance MU through anovel model-based viewpoint: sparsification via weight pruning. Our results inboth theory and practice indicate that model sparsity can boost themulti-criteria unlearning performance of an approximate unlearner, closing theapproximation gap, while continuing to be efficient. With this insight, wedevelop two new sparsity-aware unlearning meta-schemes, termed `prune first,then unlearn' and `sparsity-aware unlearning'. Extensive experiments show thatour findings and proposals consistently benefit MU in various scenarios,including class-wise data scrubbing, random data scrubbing, and backdoor dataforgetting. One highlight is the 77% unlearning efficacy gain of fine-tuning(one of the simplest approximate unlearning methods) in the proposedsparsity-aware unlearning paradigm. Codes are available at", "output": "Model Sparsification Can Simplify Machine Unlearning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Time series classification (TSC) is a challenging task due to the diversityof types of feature that may be relevant for different classification tasks,including trends, variance, frequency, magnitude, and various patterns. Toaddress this challenge, several alternative classes of approach have beendeveloped, including similarity-based, features and intervals, shapelets,dictionary, kernel, neural network, and hybrid approaches. While kernel, neuralnetwork, and hybrid approaches perform well overall, some specializedapproaches are better suited for specific tasks. In this paper, we propose anew similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), whichoutperforms previous state-of-the-art similarity-based classifiers across theUCR benchmark and outperforms state-of-the-art kernel, neural network, andhybrid methods on specific datasets in the benchmark that are best addressed bysimilarity-base methods. PF 2.0 incorporates three recent advances in timeseries similarity measures -- (1) computationally efficient early abandoningand pruning to speedup elastic similarity computations; (2) a new elasticsimilarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost functiontuning. It rationalizes the set of similarity measures employed, reducing theeight base measures of the original PF to three and using the first derivativetransform with all similarity measures, rather than a limited subset. We haveimplemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PFframework more efficient.", "output": "Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of approaches for trajectory prediction requires metrics tovalidate and compare their performance. Currently established metrics are basedon Euclidean distance, which means that errors are weighted equally in alldirections. Euclidean metrics are insufficient for structured environments likeroads, since they do not properly capture the agent's intent relative to theunderlying lane. In order to provide a reasonable assessment of trajectoryprediction approaches with regard to the downstream planning task, we propose anew metric that is lane distance-based: Lane Miss Rate (LMR). For thecalculation of LMR, the ground-truth and predicted endpoints are assigned tolane segments, more precisely their centerlines. Measured by the distance alongthe lane segments, predictions that are within a certain threshold distance tothe ground-truth count as hits, otherwise they count as misses. LMR is thendefined as the ratio of sequences that yield a miss. Our results on threestate-of-the-art trajectory prediction models show that LMR preserves the orderof Euclidean distance-based metrics. In contrast to the Euclidean Miss Rate,qualitative results show that LMR yields misses for sequences where predictionsare located on wrong lanes. Hits on the other hand result for sequences wherepredictions are located on the correct lane. This means that LMR implicitlyweights Euclidean error relative to the lane and goes into the direction ofcapturing intents of traffic agents. The source code of LMR for Argoverse 2 ispublicly available.", "output": "LMR: Lane Distance-Based Metric for Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present ImageReward -- the first general-purpose text-to-image humanpreference reward model -- to address various prevalent issues in generativemodels and align them with human values and preferences. Its training is basedon our systematic annotation pipeline that covers both the rating and rankingcomponents, collecting a dataset of 137k expert comparisons to date. In humanevaluation, ImageReward outperforms existing scoring methods (e.g., CLIP by38.6%), making it a promising automatic metric for evaluating and improvingtext-to-image synthesis. The reward model is publicly available via thetexttt{image-reward} package at url{", "output": "ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in areas such as natural language processing and computervision rely on intricate and massive models that have been trained using vastamounts of unlabelled or partly labeled data and training or deploying thesestate-of-the-art methods to resource constraint environments has been achallenge. Galaxy morphologies are crucial to understanding the processes bywhich galaxies form and evolve. Efficient methods to classify galaxymorphologies are required to extract physical information from modern-dayastronomy surveys. In this paper, we introduce methods to learn from lessamounts of data. We propose using a hybrid transformer-convolutionalarchitecture drawing much inspiration from the success of CoAtNet and MaxViT.Concretely, we use the transformer-convolutional hybrid with a new stack designfor the network, a different way of creating a relative self-attention layer,and pair it with a careful selection of data augmentation and regularizationtechniques. Our approach sets a new state-of-the-art on predicting galaxymorphologies from images on the Galaxy10 DECals dataset, a science objective,which consists of 17736 labeled images achieving $94.86%$ top-$1$ accuracy,beating the current state-of-the-art for this task by $4.62%$. Furthermore,this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet.We also find that models and training methods used for larger datasets wouldoften not work very well in the low-data regime. Our code and models will bereleased at a later date before the conference.", "output": "Astroformer: More Data Might Not be All You Need for Classification."}]
\ No newline at end of file