From 31e70750ea367318affb4f923f5b2a36986b7e98 Mon Sep 17 00:00:00 2001 From: wangrongsheng Date: Tue, 25 Apr 2023 14:40:48 +0800 Subject: [PATCH] * update 2023-04-25 14:40:48 --- data/2023-04-25.json | 1 + 1 file changed, 1 insertion(+) create mode 100644 data/2023-04-25.json diff --git a/data/2023-04-25.json b/data/2023-04-25.json new file mode 100644 index 0000000..44a3819 --- /dev/null +++ b/data/2023-04-25.json @@ -0,0 +1 @@ +[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language models have become very popular recently and many claims have beenmade about their abilities, including for commonsense reasoning. Given theincreasingly better results of current language models on previous staticbenchmarks for commonsense reasoning, we explore an alternative dialecticalevaluation. The goal of this kind of evaluation is not to obtain an aggregateperformance value but to find failures and map the boundaries of the system.Dialoguing with the system gives the opportunity to check for consistency andget more reassurance of these boundaries beyond anecdotal evidence. In thispaper we conduct some qualitative investigations of this kind of evaluation forthe particular case of spatial reasoning (which is a fundamental aspect ofcommonsense reasoning). We conclude with some suggestions for future work bothto improve the capabilities of language models and to systematise this kind ofdialectical evaluation.", "output": "Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human cognition has a ``large-scale first'' cognitive mechanism, thereforepossesses adaptive multi-granularity description capabilities. This results incomputational characteristics such as efficiency, robustness, andinterpretability. Although most existing artificial intelligence learningmethods have certain multi-granularity features, they do not fully align withthe ``large-scale first'' cognitive mechanism. Multi-granularity granular-ballcomputing is an important model method developed in recent years. This methodcan use granular-balls of different sizes to adaptively represent and cover thesample space, and perform learning based on granular-balls. Since the number ofcoarse-grained \"granular-ball\" is smaller than the number of sample points,granular-ball computing is more efficient; the coarse-grained characteristicsof granular-balls are less likely to be affected by fine-grained sample points,making them more robust; the multi-granularity structure of granular-balls canproduce topological structures and coarse-grained descriptions, providingnatural interpretability. Granular-ball computing has now been effectivelyextended to various fields of artificial intelligence, developing theoreticalmethods such as granular-ball classifiers, granular-ball clustering methods,granular-ball neural networks, granular-ball rough sets, and granular-ballevolutionary computation, significantly improving the efficiency, noiserobustness, and interpretability of existing methods. It has good innovation,practicality, and development potential. This article provides a systematicintroduction to these methods and analyzes the main problems currently faced bygranular-ball computing, discussing both the primary applicable scenarios forgranular-ball computing and offering references and suggestions for futureresearchers to improve this theory.", "output": "Granular ball computing: an efficient, robust, and interpretable adaptive multi-granularity representation and computation method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Meta-learning performs adaptation through a limited amount of support set,which may cause a sample bias problem. To solve this problem, transductivemeta-learning is getting more and more attention, going beyond the conventionalinductive learning perspective. This paper proposes so-called task-adaptivepseudo labeling for transductive meta-learning. Specifically, pseudo labels forunlabeled query sets are generated from labeled support sets through labelpropagation. Pseudo labels enable to adopt the supervised setting as it is andalso use the unlabeled query set in the adaptation process. As a result, theproposed method is able to deal with more examples in the adaptation processthan inductive ones, which can result in better classification performance ofthe model. Note that the proposed method is the first approach of applying taskadaptation to pseudo labeling. Experiments show that the proposed methodoutperforms the state-of-the-art (SOTA) technique in 5-way 1-shot few-shotclassification.", "output": "Task-Adaptive Pseudo Labeling for Transductive Meta-Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we explore the impact of adding tactile sensation to videoprediction models for physical robot interactions. Predicting the impact ofrobotic actions on the environment is a fundamental challenge in robotics.Current methods leverage visual and robot action data to generate videopredictions over a given time period, which can then be used to adjust robotactions. However, humans rely on both visual and tactile feedback to developand maintain a mental model of their physical surroundings. In this paper, weinvestigate the impact of integrating tactile feedback into video predictionmodels for physical robot interactions. We propose three multi-modalintegration approaches and compare the performance of these tactile-enhancedvideo prediction models. Additionally, we introduce two new datasets of robotpushing that use a magnetic-based tactile sensor for unsupervised learning. Thefirst dataset contains visually identical objects with different physicalproperties, while the second dataset mimics existing robot-pushing datasets ofhousehold object clusters. Our results demonstrate that incorporating tactilefeedback into video prediction models improves scene prediction accuracy andenhances the agent's perception of physical interactions and understanding ofcause-effect relationships during physical robot interactions.", "output": "Combining Vision and Tactile Sensation for Video Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-task learning has shown considerable promise for improving theperformance of deep learning-driven vision systems for the purpose of roboticgrasping. However, high architectural and computational complexity can resultin poor suitability for deployment on embedded devices that are typicallyleveraged in robotic arms for real-world manufacturing and warehouseenvironments. As such, the design of highly efficient multi-task deep neuralnetwork architectures tailored for computer vision tasks for robotic graspingon the edge is highly desired for widespread adoption in manufacturingenvironments. Motivated by this, we propose Fast GraspNeXt, a fastself-attention neural network architecture tailored for embedded multi-tasklearning in computer vision tasks for robotic grasping. To build FastGraspNeXt, we leverage a generative network architecture search strategy with aset of architectural constraints customized to achieve a strong balance betweenmulti-task learning performance and embedded inference efficiency. Experimentalresults on the MetaGraspNet benchmark dataset show that the Fast GraspNeXtnetwork design achieves the highest performance (average precision (AP),accuracy, and mean squared error (MSE)) across multiple computer vision taskswhen compared to other efficient multi-task network architecture designs, whilehaving only 17.8M parameters (about >5x smaller), 259 GFLOPs (as much as >5xlower) and as much as >3.15x faster on a NVIDIA Jetson TX2 embedded processor.", "output": "Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present SSS3D, a fast multi-objective NAS framework designed to findcomputationally efficient 3D semantic scene segmentation networks. It usesRandLA-Net, an off-the-shelf point-based network, as a super-network to enableweight sharing and reduce search time by 99.67% for single-stage searches.SSS3D has a complex search space composed of sampling and architecturalparameters that can form 2.88 * 10^17 possible networks. To further reducesearch time, SSS3D splits the complete search space and introduces a two-stagesearch that finds optimal subnetworks in 54% of the time required bysingle-stage searches.", "output": "SSS3D: Fast Neural Architecture Search For Efficient Three-Dimensional Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The application of Artificial intelligence for teaching and learning in theacademic sphere is a trending subject of interest in the computing education.ChatGPT, as an AI-based tool, provides various advantages, such as heightenedstudent involvement, cooperation, accessibility and availability. This paperaddresses the prospects and obstacles associated with utilizing ChatGPT as atool for learning and assessment in undergraduate Computer Science curriculumin particular to teaching and learning fundamental programming courses.Students having completed the course work for a Data Structures and Algorithms(a sophomore level course) participated in this study. Two groups of studentswere given programming challenges to solve within a short period of time. Thecontrol group (group A) had access to text books and notes of programmingcourses, however no Internet access was provided. Group B students were givenaccess to ChatGPT and were encouraged to use it to help solve the programmingchallenges. The challenge was conducted in a computer lab environment using PC2environment. Each team of students address the problem by writing executablecode that satisfies certain number of test cases. Student teams were scoredbased on their performance in terms of number of successful passed testcases.Results show that students using ChatGPT had an advantage in terms of earnedscores, however there were inconsistencies and inaccuracies in the submittedcode consequently affecting the overall performance. After a thorough analysis,the paper's findings indicate that incorporating AI in higher education bringsabout various opportunities and challenges.", "output": "Exploring the Use of ChatGPT as a Tool for Learning and Assessment in Undergraduate Computer Science Curriculum: Opportunities and Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This article explores the ethical problems arising from the use of ChatGPT asa kind of generative AI and suggests responses based on the Human-CenteredArtificial Intelligence (HCAI) framework. The HCAI framework is appropriatebecause it understands technology above all as a tool to empower, augment, andenhance human agency while referring to human wellbeing as a grand challenge,thus perfectly aligning itself with ethics, the science of human flourishing.Further, HCAI provides objectives, principles, procedures, and structures forreliable, safe, and trustworthy AI which we apply to our ChatGPT assessments.The main danger ChatGPT presents is the propensity to be used as a weapon ofmass deception (WMD) and an enabler of criminal activities involving deceit. Wereview technical specifications to better comprehend its potentials andlimitations. We then suggest both technical (watermarking, styleme, detectors,and fact-checkers) and non-technical measures (terms of use, transparency,educator considerations, HITL) to mitigate ChatGPT misuse or abuse andrecommend best uses (creative writing, non-creative writing, teaching andlearning). We conclude with considerations regarding the role of humans inensuring the proper use of ChatGPT for individual and social wellbeing.", "output": "ChatGPT: More than a Weapon of Mass Deception, Ethical challenges and responses from the Human-Centered Artificial Intelligence (HCAI) perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fairness is central to the ethical and responsible development and use of AIsystems, with a large number of frameworks and formal notions of algorithmicfairness being available. However, many of the fairness solutions proposedrevolve around technical considerations and not the needs of and consequencesfor the most impacted communities. We therefore want to take the focus awayfrom definitions and allow for the inclusion of societal and relational aspectsto represent how the effects of AI systems impact and are experienced byindividuals and social groups. In this paper, we do this by means of proposingthe ACROCPoLis framework to represent allocation processes with a modelingemphasis on fairness aspects. The framework provides a shared vocabulary inwhich the factors relevant to fairness assessments for different situations andprocedures are made explicit, as well as their interrelationships. This enablesus to compare analogous situations, to highlight the differences in dissimilarsituations, and to capture differing interpretations of the same situation bydifferent stakeholders.", "output": "ACROCPoLis: A Descriptive Framework for Making Sense of Fairness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Public attention towards explainability of artificial intelligence (AI)systems has been rising in recent years to offer methodologies for humanoversight. This has translated into the proliferation of research outputs, suchas from Explainable AI, to enhance transparency and control for systemdebugging and monitoring, and intelligibility of system process and output foruser services. Yet, such outputs are difficult to adopt on a practical leveldue to a lack of a common regulatory baseline, and the contextual nature ofexplanations. Governmental policies are now attempting to tackle such exigence,however it remains unclear to what extent published communications,regulations, and standards adopt an informed perspective to support research,industry, and civil interests. In this study, we perform the first thematic andgap analysis of this plethora of policies and standards on explainability inthe EU, US, and UK. Through a rigorous survey of policy documents, we firstcontribute an overview of governmental regulatory trajectories within AIexplainability and its sociotechnical impacts. We find that policies are ofteninformed by coarse notions and requirements for explanations. This might be dueto the willingness to conciliate explanations foremost as a risk managementtool for AI oversight, but also due to the lack of a consensus on whatconstitutes a valid algorithmic explanation, and how feasible theimplementation and deployment of such explanations are across stakeholders ofan organization. Informed by AI explainability research, we conduct a gapanalysis of existing policies, leading us to formulate a set of recommendationson how to address explainability in regulations for AI systems, especiallydiscussing the definition, feasibility, and usability of explanations, as wellas allocating accountability to explanation providers.", "output": "Explainability in AI Policies: A Critical Review of Communications, Reports, Regulations, and Standards in the EU, US, and UK."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate how to build and train spatial representations for robotdecision making with Transformers. In particular, for robots to operate in arange of environments, we must be able to quickly train or fine-tune robotsensorimotor policies that are robust to clutter, data efficient, andgeneralize well to different circumstances. As a solution, we propose SpatialLanguage Attention Policies (SLAP). SLAP uses three-dimensional tokens as theinput representation to train a single multi-task, language-conditioned actionprediction policy. Our method shows 80% success rate in the real world acrosseight tasks with a single model, and a 47.5% success rate when unseen clutterand unseen object configurations are introduced, even with only a handful ofexamples per task. This represents an improvement of 30% over prior work (20%given unseen distractors and configurations).", "output": "Spatial-Language Attention Policies for Efficient Robot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Maritime obstacle detection is critical for safe navigation of autonomoussurface vehicles (ASVs). While the accuracy of image-based detection methodshas advanced substantially, their computational and memory requirementsprohibit deployment on embedded devices. In this paper we analyze the currentlybest-performing maritime obstacle detection network WaSR. Based on the analysiswe then propose replacements for the most computationally intensive stages andpropose its embedded-compute-ready variant eWaSR. In particular, the new designfollows the most recent advancements of transformer-based lightweight networks.eWaSR achieves comparable detection results to state-of-the-art WaSR with only0.52% F1 score performance drop and outperforms other state-of-the-artembedded-ready architectures by over 9.74% in F1 score. On a standard GPU,eWaSR runs 10x faster than the original WaSR (115 FPS vs 11 FPS). Tests on areal embedded device OAK-D show that, while WaSR cannot run due to memoryrestrictions, eWaSR runs comfortably at 5.5 FPS. This makes eWaSR the firstpractical embedded-compute-ready maritime obstacle detection network. Thesource code and trained eWaSR models are publicly available here:", "output": "eWaSR -- an embedded-compute-ready maritime obstacle detection network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Written answers to open-ended questions can have a higher long-term effect onlearning than multiple-choice questions. However, it is critical that teachersimmediately review the answers, and ask to redo those that are incoherent. Thiscan be a difficult task and can be time-consuming for teachers. A possiblesolution is to automate the detection of incoherent answers. One option is toautomate the review with Large Language Models (LLM). In this paper, we analyzethe responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM,and YOU. We used them with zero, one, two, three and four shots. We comparedtheir performance with the results of various classifiers trained with MachineLearning (ML). We found that LLMs perform worse than MLs in detectingincoherent answers. The difficulty seems to reside in recursive questions thatcontain both questions and answers, and in responses from students with typicalfourth-grader misspellings. Upon closer examination, we have found that theChatGPT model faces the same challenges.", "output": "Who's the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robustness to natural distribution shifts has seen remarkable progress thanksto recent pre-training strategies combined with better fine-tuning methods.However, such fine-tuning assumes access to large amounts of labelled data, andthe extent to which the observations hold when the amount of training data isnot as high remains unknown. We address this gap by performing the firstin-depth study of robustness to various natural distribution shifts indifferent low-shot regimes: spanning datasets, architectures, pre-trainedinitializations, and state-of-the-art robustness interventions. Mostimportantly, we find that there is no single model of choice that is often morerobust than others, and existing interventions can fail to improve robustnesson some datasets even if they do so in the full-shot regime. We hope that ourwork will motivate the community to focus on this problem of practicalimportance.", "output": "Benchmarking Low-Shot Robustness to Natural Distribution Shifts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is widely acknowledged that large models have the potential to deliversuperior performance across a broad range of domains. Despite the remarkableprogress made in the field of machine learning systems research, which hasenabled the development and exploration of large models, such abilities remainconfined to a small group of advanced users and industry leaders, resulting inan implicit technical barrier for the wider community to access and leveragethese technologies. In this paper, we introduce PyTorch Fully Sharded DataParallel (FSDP) as an industry-grade solution for large model training. FSDPhas been closely co-designed with several key PyTorch core components includingTensor implementation, dispatcher system, and CUDA memory caching allocator, toprovide non-intrusive user experiences and high training efficiency.Additionally, FSDP natively incorporates a range of techniques and settings tooptimize resource utilization across a variety of hardware configurations. Theexperimental results demonstrate that FSDP is capable of achieving comparableperformance to Distributed Data Parallel while providing support forsignificantly larger models with near-linear scalability in terms of TFLOPS.", "output": "PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The illusion of consensus occurs when people believe there is consensusacross multiple sources, but the sources are the same and thus there is no\"true\" consensus. We explore this phenomenon in the context of an AI-basedintelligent agent designed to augment metacognition on social media.Misinformation, especially on platforms like Twitter, is a global problem forwhich there is currently no good solution. As an explainable AI (XAI) system,the agent provides explanations for its decisions on the misinformed nature ofsocial media content. In this late-breaking study, we explored the roles oftrust (attitude) and reliance (behaviour) as key elements of XAI userexperience (UX) and whether these influenced the illusion of consensus.Findings show no effect of trust, but an effect of reliance on consensus-basedexplanations. This work may guide the design of anti-misinformation systemsthat use XAI, especially the user-centred design of explanations.", "output": "Trust and Reliance in Consensus-Based Explanations from an Anti-Misinformation Agent."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The commercial use of Machine Learning (ML) is spreading; at the same time,ML models are becoming more complex and more expensive to train, which makesIntellectual Property Protection (IPP) of trained models a pressing issue.Unlike other domains that can build on a solid understanding of the threats,attacks and defenses available to protect their IP, the ML-related research inthis regard is still very fragmented. This is also due to a missing unifiedview as well as a common taxonomy of these aspects.In this paper, we systematize our findings on IPP in ML, while focusing onthreats and attacks identified and defenses proposed at the time of writing. Wedevelop a comprehensive threat model for IP in ML, categorizing attacks anddefenses within a unified and consolidated taxonomy, thus bridging researchfrom both the ML and security communities.", "output": "Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The energy inefficiency of the apps can be a major issue for the app userswhich is discussed on App Stores extensively. Previous research has shown theimportance of investigating the energy related app reviews to identify themajor causes or categories of energy related user feedback. However, there isno study that efficiently extracts the energy related app reviewsautomatically. In this paper, we empirically study different techniques forautomatic extraction of the energy related user feedback. We compare theaccuracy, F1-score and run time of numerous machine-learning models withrelevant feature combinations and relatively modern Neural Network-basedmodels. In total, 60 machine learning models are compared to 30 models that webuild using six neural network architectures and three word embedding models.We develop a visualization tool for this study through which a developer cantraverse through this large-scale result set. The results show that neuralnetworks outperform the other machine learning techniques and can achieve thehighest F1-score of 0.935. To replicate the research results, we have opensourced the interactive visualization tool. After identifying the best resultsand extracting the energy related reviews, we further compare varioustechniques to help the developers automatically investigate the emerging issuesthat might be responsible for energy inefficiency of the apps. We experimentthe previously used string matching with results obtained from applying two ofthe state-of-the-art topic modeling algorithms, OBTM and AOLDA. Finally, we runa qualitative study performed in collaboration with developers and studentsfrom different institutions to determine their preferences for identifyingnecessary topics from previously categorized reviews, which shows OBTM producesthe most helpful results.", "output": "On the Identification of the Energy related Issues from the App Reviews."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Communication between people is characterized by a broad range of nonverbalcues. Transferring these cues into the design of robots and other artificialagents that interact with people may foster more natural, inviting, andaccessible experiences. In this position paper, we offer a series of definitivenonverbal codes for human-robot interaction (HRI) that address the five humansensory systems (visual, auditory, haptic, olfactory, gustatory) drawn from thefield of communication studies. We discuss how these codes can be translatedinto design patterns for HRI using a curated sample of the communicationstudies and HRI literatures. As nonverbal codes are an essential mode in humancommunication, we argue that integrating robotic nonverbal codes in HRI willafford robots a feeling of \"aliveness\" or \"social agency\" that would otherwisebe missing. We end with suggestions for research directions to stimulate workon nonverbal communication within the field of HRI and improve communicationbetween human and robots.", "output": "Nonverbal Cues in Human-Robot Interaction: A Communication Studies Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As a prominent instance of vandalism edits, Wiki search poisoning for illicitpromotion is a cybercrime in which the adversary aims at editing Wiki articlesto promote illicit businesses through Wiki search results of relevant queries.In this paper, we report a study that, for the first time, shows that suchstealthy blackhat SEO on Wiki can be automated. Our technique, called MAWSEO,employs adversarial revisions to achieve real-world cybercriminal objectives,including rank boosting, vandalism detection evasion, topic relevancy, semanticconsistency, user awareness (but not alarming) of promotional content, etc. Ourevaluation and user study demonstrate that MAWSEO is able to effectively andefficiently generate adversarial vandalism edits, which can bypassstate-of-the-art built-in Wiki vandalism detectors, and also get promotionalcontent through to Wiki users without triggering their alarms. In addition, weinvestigated potential defense, including coherence based detection andadversarial training of vandalism detection, against our attack in the Wikiecosystem.", "output": "MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose lookahead diffusion probabilistic models (LA-DPMs) to exploit thecorrelation in the outputs of the deep neural networks (DNNs) over subsequenttimesteps in diffusion probabilistic models (DPMs) to refine the meanestimation of the conditional Gaussian distributions in the backward process. Atypical DPM first obtains an estimate of the original data sample$boldsymbol{x}$ by feeding the most recent state $boldsymbol{z}_i$ and index$i$ into the DNN model and then computes the mean vector of the conditionalGaussian distribution for $boldsymbol{z}_{i-1}$. We propose to calculate amore accurate estimate for $boldsymbol{x}$ by performing extrapolation on thetwo estimates of $boldsymbol{x}$ that are obtained by feeding$(boldsymbol{z}_{i+1},i+1)$ and $(boldsymbol{z}_{i},i)$ into the DNN model.The extrapolation can be easily integrated into the backward process ofexisting DPMs by introducing an additional connection over two consecutivetimesteps, and fine-tuning is not required. Extensive experiments showed thatplugging in the additional connection into DDPM, DDIM, DEIS, S-PNDM, andhigh-order DPM-Solvers leads to a significant performance gain in terms of FIDscore.", "output": "Lookahead Diffusion Probabilistic Models for Refining Mean Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a deep learning based model predictive control (MPC)algorithm for systems with unmatched and bounded state-action dependentuncertainties of unknown structure. We utilize a deep neural network (DNN) asan oracle in the underlying optimization problem of learning based MPC (LBMPC)to estimate unmatched uncertainties. Generally, non-parametric oracles such asDNN are considered difficult to employ with LBMPC due to the technicaldifficulties associated with estimation of their coefficients in real time. Weemploy a dual-timescale adaptation mechanism, where the weights of the lastlayer of the neural network are updated in real time while the inner layers aretrained on a slower timescale using the training data collected online andselectively stored in a buffer. Our results are validated through a numericalexperiment on the compression system model of jet engine. These resultsindicate that the proposed approach is implementable in real time and carriesthe theoretical guarantees of LBMPC.", "output": "Unmatched uncertainty mitigation through neural network supported model predictive control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The spread of misinformation in social media outlets has become a prevalentsocietal problem and is the cause of many kinds of social unrest. Curtailingits prevalence is of great importance and machine learning has shownsignificant promise. However, there are two main challenges when applyingmachine learning to this problem. First, while much too prevalent in onerespect, misinformation, actually, represents only a minor proportion of allthe postings seen on social media. Second, labeling the massive amount of datanecessary to train a useful classifier becomes impractical. Considering thesechallenges, we propose a simple semi-supervised learning framework in order todeal with extreme class imbalances that has the advantage, over otherapproaches, of using actual rather than simulated data to inflate the minorityclass. We tested our framework on two sets of Covid-related Twitter data andobtained significant improvement in F1-measure on extremely imbalancedscenarios, as compared to simple classical and deep-learning data generationmethods such as SMOTE, ADASYN, or GAN-based data generation.", "output": "A Semi-Supervised Framework for Misinformation Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Segment Anything Model (SAM) is a recently developed large model forgeneral-purpose segmentation for computer vision tasks. SAM was trained using11 million images with over 1 billion masks and can produce segmentationresults for a wide range of objects in natural scene images. SAM can be viewedas a general perception model for segmentation (partitioning images intosemantically meaningful regions). Thus, how to utilize such a large foundationmodel for medical image segmentation is an emerging research target. This papershows that although SAM does not immediately give high-quality segmentation formedical images, its generated masks, features, and stability scores are usefulfor building and training better medical image segmentation models. Inparticular, we demonstrate how to use SAM to augment image inputs for acommonly-used medical image segmentation model (e.g., U-Net). Experiments ontwo datasets show the effectiveness of our proposed method.", "output": "Input Augmentation with SAM: Boosting Medical Image Segmentation with Segmentation Foundation Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce GEDI, a Bayesian framework that combines existingself-supervised learning objectives with likelihood-based generative models.This framework leverages the benefits of both GEnerative and DIscriminativeapproaches, resulting in improved symbolic representations over standalonesolutions. Additionally, GEDI can be easily integrated and trained jointly withexisting neuro-symbolic frameworks without the need for additional supervisionor costly pre-training steps. We demonstrate through experiments on real-worlddata, including SVHN, CIFAR10, and CIFAR100, that GEDI outperforms existingself-supervised learning strategies in terms of clustering performance by asignificant margin. The symbolic component further allows it to leverageknowledge in the form of logical constraints to improve performance in thesmall data regime.", "output": "Learning Symbolic Representations Through Joint GEnerative and DIscriminative Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attacks aim to disturb the functionality of a target system byadding specific noise to the input samples, bringing potential threats tosecurity and robustness when applied to facial recognition systems. Althoughexisting defense techniques achieve high accuracy in detecting some specificadversarial faces (adv-faces), new attack methods especially GAN-based attackswith completely different noise patterns circumvent them and reach a higherattack success rate. Even worse, existing techniques require attack data beforeimplementing the defense, making it impractical to defend newly emergingattacks that are unseen to defenders. In this paper, we investigate theintrinsic generality of adv-faces and propose to generate pseudo adv-faces byperturbing real faces with three heuristically designed noise patterns. We arethe first to train an adv-face detector using only real faces and theirself-perturbations, agnostic to victim facial recognition systems, and agnosticto unseen attacks. By regarding adv-faces as out-of-distribution data, we thennaturally introduce a novel cascaded system for adv-face detection, whichconsists of training data self-perturbations, decision boundary regularization,and a max-pooling-based binary classifier focusing on abnormal local coloraberrations. Experiments conducted on LFW and CelebA-HQ datasets with eightgradient-based and two GAN-based attacks validate that our method generalizesto a variety of unseen adversarial attacks.", "output": "Detecting Adversarial Faces Using Only Real Face Self-Perturbations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Public opinion is a crucial factor in shaping political decision-making.Nowadays, social media has become an essential platform for individuals toengage in political discussions and express their political views, presentingresearchers with an invaluable resource for analyzing public opinion. In thispaper, we focus on the 2020 US presidential election and create a large-scaledataset from Twitter. To detect political opinions in tweets, we build auser-tweet bipartite graph based on users' posting and retweeting behaviors andconvert the task into a Graph Neural Network (GNN)-based node classificationproblem. Then, we introduce a novel skip aggregation mechanism that makes tweetnodes aggregate information from second-order neighbors, which are also tweetnodes due to the graph's bipartite nature, effectively leveraging userbehavioral information. The experimental results show that our proposed modelsignificantly outperforms several competitive baselines. Further analysesdemonstrate the significance of user behavioral information and theeffectiveness of skip aggregation.", "output": "Detecting Political Opinions in Tweets through Bipartite Graph Analysis: A Skip Aggregation Graph Convolution Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Strong foundations in basic AI techniques are key to understanding moreadvanced concepts. We believe that introducing AI techniques, such as searchmethods, early in higher education helps create a deeper understanding of theconcepts seen later in more advanced AI and algorithms courses. We present aproject-based and competition-based bachelor course that gives second-yearstudents an introduction to search methods applied to board games. In groups oftwo, students have to use network programming and AI methods to build an AIagent to compete in a board game tournament-othello was this year's game.Students are evaluated based on the quality of their projects and on theirperformance during the final tournament. We believe that the introduction ofgamification, in the form of competition-based learning, allows for a betterlearning experience for the students.", "output": "Stimulating student engagement with an AI board game tournament."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The utilization of consumer electronics, such as televisions, set-top boxes,home theaters, and air conditioners, has become increasingly prevalent inmodern society as technology continues to evolve. As new devices enter ourhomes each year, the accumulation of multiple infrared remote controls tooperate them not only results in a waste of energy and resources, but alsocreates a cumbersome and cluttered environment for the user. This paperpresents a novel system, named SimplyMime, which aims to eliminate the need formultiple remote controls for consumer electronics and provide the user withintuitive control without the need for additional devices. SimplyMime leveragesa dynamic hand gesture recognition architecture, incorporating ArtificialIntelligence and Human-Computer Interaction, to create a sophisticated systemthat enables users to interact with a vast majority of consumer electronicswith ease. Additionally, SimplyMime has a security aspect where it can verifyand authenticate the user utilising the palmprint, which ensures that onlyauthorized users can control the devices. The performance of the proposedmethod for detecting and recognizing gestures in a stream of motion wasthoroughly tested and validated using multiple benchmark datasets, resulting incommendable accuracy levels. One of the distinct advantages of the proposedmethod is its minimal computational power requirements, making it highlyadaptable and reliable in a wide range of circumstances. The paper proposesincorporating this technology into all consumer electronic devices thatcurrently require a secondary remote for operation, thus promoting a moreefficient and sustainable living environment.", "output": "SimplyMime: A Control at Our Fingertips."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning and symbolic learning are two frequently employed methods inSequential Recommendation (SR). Recent neural-symbolic SR models demonstratetheir potential to enable SR to be equipped with concurrent perception andcognition capacities. However, neural-symbolic SR remains a challenging problemdue to open issues like representing users and items in logical reasoning. Inthis paper, we combine the Deep Neural Network (DNN) SR models with logicalreasoning and propose a general framework named Sequential Recommendation withProbabilistic Logical Reasoning (short for SR-PLR). This framework allowsSR-PLR to benefit from both similarity matching and logical reasoning bydisentangling feature embedding and logic embedding in the DNN andprobabilistic logic network. To better capture the uncertainty and evolution ofuser tastes, SR-PLR embeds users and items with a probabilistic method andconducts probabilistic logical reasoning on users' interaction patterns. Thenthe feature and logic representations learned from the DNN and logic networkare concatenated to make the prediction. Finally, experiments on varioussequential recommendation models demonstrate the effectiveness of the SR-PLR.", "output": "Sequential Recommendation with Probabilistic Logical Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "LiDAR point cloud segmentation is one of the most fundamental tasks forautonomous driving scene understanding. However, it is difficult for existingmodels to achieve both high inference speed and accuracy simultaneously. Forexample, voxel-based methods perform well in accuracy, while Bird's-Eye-View(BEV)-based methods can achieve real-time inference. To overcome this issue, wedevelop an effective 3D-to-BEV knowledge distillation method that transfersrich knowledge from 3D voxel-based models to BEV-based models. Our frameworkmainly consists of two modules: the voxel-to-pillar distillation module and thelabel-weight distillation module. Voxel-to-pillar distillation distills sparse3D features to BEV features for middle layers to make the BEV-based model awareof more structural and geometric information. Label-weight distillation helpsthe model pay more attention to regions with more height information. Finally,we conduct experiments on the SemanticKITTI dataset and Paris-Lille-3D. Theresults on SemanticKITTI show more than 5% improvement on the test set,especially for classes such as motorcycle and person, with more than 15%improvement. The code can be accessed at", "output": "Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The challenging problem of non-line-of-sight (NLOS) localization is criticalfor many wireless networking applications. The lack of available datasets hasmade NLOS localization difficult to tackle with ML-driven methods, but recentdevelopments in synthetic dataset generation have provided new opportunitiesfor research. This paper explores three different input representations: (i)single wireless radio path features, (ii) wireless radio link features(multi-path), and (iii) image-based representations. Inspired by the two latternew representations, we design two convolutional neural networks (CNNs) and wedemonstrate that, although not significantly improving the NLOS localizationperformance, they are able to support richer prediction outputs, thus allowingdeeper analysis of the predictions. In particular, the richer outputs enablereliable identification of non-trustworthy predictions and support theprediction of the top-K candidate locations for a given instance. We alsomeasure how the availability of various features (such as angles of signaldeparture and arrival) affects the model's performance, providing insightsabout the types of data that should be collected for enhanced NLOSlocalization. Our insights motivate future work on building more efficientneural architectures and input representations for improved NLOS localizationperformance, along with additional useful application features.", "output": "ML-based Approaches for Wireless NLOS Localization: Input Representations and Uncertainty Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Online movie review platforms are providing crowdsourced feedback for thefilm industry and the general public, while spoiler reviews greatly compromiseuser experience. Although preliminary research efforts were made toautomatically identify spoilers, they merely focus on the review contentitself, while robust spoiler detection requires putting the review into thecontext of facts and knowledge regarding movies, user behavior on film reviewplatforms, and more. In light of these challenges, we first curate alarge-scale network-based spoiler detection dataset LCS and a comprehensive andup-to-date movie knowledge base UKM. We then propose MVSD, a novel Multi-ViewSpoiler Detection framework that takes into account the external knowledgeabout movies and user activities on movie review platforms. Specifically, MVSDconstructs three interconnecting heterogeneous information networks to modeldiverse data sources and their multi-view attributes, while we design andemploy a novel heterogeneous graph neural network architecture for spoilerdetection as node-level classification. Extensive experiments demonstrate thatMVSD advances the state-of-the-art on two spoiler detection datasets, while theintroduction of external knowledge and user interactions help ground robustspoiler detection. Our data and code are available at", "output": "Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative models have attracted significant interest due to their ability tohandle uncertainty by learning the inherent data distributions. However, twoprominent generative models, namely Generative Adversarial Networks (GANs) andVariational AutoEncoders (VAEs), exhibit challenges that impede achievingoptimal performance in sequential recommendation tasks. Specifically, GANssuffer from unstable optimization, while VAEs are prone to posterior collapseand over-smoothed generations. The sparse and noisy nature of sequentialrecommendation further exacerbates these issues. In response to theselimitations, we present a conditional denoising diffusion model, which includesa sequence encoder, a cross-attentive denoising decoder, and a step-wisediffuser. This approach streamlines the optimization and generation process bydividing it into easier and tractable steps in a conditional autoregressivemanner. Furthermore, we introduce a novel optimization schema that incorporatesboth cross-divergence loss and contrastive loss. This novel training schemaenables the model to generate high-quality sequence/item representations andmeanwhile precluding collapse. We conducted comprehensive experiments on fourbenchmark datasets, and the superior performance achieved by our model atteststo its efficacy.", "output": "Conditional Denoising Diffusion for Sequential Recommendation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Learning with Model Distillation (FedMD) is a nascent collaborativelearning paradigm, where only output logits of public datasets are transmittedas distilled knowledge, instead of passing on private model parameters that aresusceptible to gradient inversion attacks, a known privacy risk in federatedlearning. In this paper, we found that even though sharing output logits ofpublic datasets is safer than directly sharing gradients, there still exists asubstantial risk of data exposure caused by carefully designed maliciousattacks. Our study shows that a malicious server can inject a PLI(Paired-Logits Inversion) attack against FedMD and its variants by training aninversion neural network that exploits the confidence gap between the serverand client models. Experiments on multiple facial recognition datasets validatethat under FedMD-like schemes, by using paired server-client logits of publicdatasets only, the malicious server is able to reconstruct private images onall tested benchmarks with a high success rate.", "output": "Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Denoising diffusion probabilistic models (DDPMs) are a class of powerfulgenerative models. The past few years have witnessed the great success of DDPMsin generating high-fidelity samples. A significant limitation of the DDPMs isthe slow sampling procedure. DDPMs generally need hundreds or thousands ofsequential function evaluations (steps) of neural networks to generate asample. This paper aims to develop a fast sampling method for DDPMs requiringmuch fewer steps while retaining high sample quality. The inference process ofDDPMs approximates solving the corresponding diffusion ordinary differentialequations (diffusion ODEs) in the continuous limit. This work analyzes how thebackward error affects the diffusion ODEs and the sample quality in DDPMs. Wepropose fast sampling through the textbf{Restricting Backward Error schedule(RBE schedule)} based on dynamically moderating the long-time backward error.Our method accelerates DDPMs without any further training. Our experiments showthat sampling with an RBE schedule generates high-quality samples within only 8to 20 function evaluations on various benchmark datasets. We achieved 12.01 FIDin 8 function evaluations on the ImageNet $128times128$, and a $20times$speedup compared with previous baseline samplers.", "output": "Fast Diffusion Probabilistic Model Sampling through the lens of Backward Error Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given a visual scene, humans have strong intuitions about how a scene canevolve over time under given actions. The intuition, often termed visualintuitive physics, is a critical ability that allows us to make effective plansto manipulate the scene to achieve desired outcomes without relying onextensive trial and error. In this paper, we present a framework capable oflearning 3D-grounded visual intuitive physics models from videos of complexscenes with fluids. Our method is composed of a conditional Neural RadianceField (NeRF)-style visual frontend and a 3D point-based dynamics predictionbackend, using which we can impose strong relational and structural inductivebias to capture the structure of the underlying environment. Unlike existingintuitive point-based dynamics works that rely on the supervision of densepoint trajectory from simulators, we relax the requirements and only assumeaccess to multi-view RGB images and (imperfect) instance masks acquired usingcolor prior. This enables the proposed model to handle scenarios where accuratepoint estimation and tracking are hard or impossible. We generate datasetsincluding three challenging scenarios involving fluid, granular materials, andrigid objects in the simulation. The datasets do not include any dense particleinformation so most previous 3D-based intuitive physics pipelines can barelydeal with that. We show our model can make long-horizon future predictions bylearning from raw images and significantly outperforms models that do notemploy an explicit 3D representation space. We also show that once trained, ourmodel can achieve strong generalization in complex scenarios under extrapolatesettings.", "output": "3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pre-trained models (PTMs) have shown great promise in the speech and audiodomain. Embeddings leveraged from these models serve as inputs for learningalgorithms with applications in various downstream tasks. One such crucial taskis Speech Emotion Recognition (SER) which has a wide range of applications,including dynamic analysis of customer calls, mental health assessment, andpersonalized language learning. PTM embeddings have helped advance SER,however, a comprehensive comparison of these PTM embeddings that considermultiple facets such as embedding model architecture, data used forpre-training, and the pre-training procedure being followed is missing. Athorough comparison of PTM embeddings will aid in the faster and more efficientdevelopment of models and enable their deployment in real-world scenarios. Inthis work, we exploit this research gap and perform a comparative analysis ofembeddings extracted from eight speech and audio PTMs (wav2vec 2.0, data2vec,wavLM, UniSpeech-SAT, wav2clip, YAMNet, x-vector, ECAPA). We perform anextensive empirical analysis with four speech emotion datasets (CREMA-D, TESS,SAVEE, Emo-DB) by training three algorithms (XGBoost, Random Forest, FCN) onthe derived embeddings. The results of our study indicate that the bestperformance is achieved by algorithms trained on embeddings derived from PTMstrained for speaker recognition followed by wav2clip and UniSpeech-SAT. Thiscan relay that the top performance by embeddings from speaker recognition PTMsis most likely due to the model taking up information about numerous speechfeatures such as tone, accent, pitch, and so on during its speaker recognitiontraining. Insights from this work will assist future studies in their selectionof embeddings for applications related to SER.", "output": "A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have demonstrated remarkable zero-shotgeneralization abilities: state-of-the-art chatbots can provide plausibleanswers to many common questions that arise in daily life. However, so far,LLMs cannot reliably solve long-horizon planning problems. By contrast,classical planners, once a problem is given in a formatted way, can useefficient search algorithms to quickly identify correct, or even optimal,plans. In an effort to get the best of both worlds, this paper introducesLLM+P, the first framework that incorporates the strengths of classicalplanners into LLMs. LLM+P takes in a natural language description of a planningproblem, then returns a correct (or optimal) plan for solving that problem innatural language. LLM+P does so by first converting the language descriptioninto a file written in the planning domain definition language (PDDL), thenleveraging classical planners to quickly find a solution, and then translatingthe found solution back into natural language. Along with LLM+P, we define adiverse set of different benchmark problems taken from common planningscenarios. Via a comprehensive set of experiments on these benchmark problems,we find that LLM+P is able to provide optimal solutions for most problems,while LLMs fail to provide even feasible plans for most problems.footnote{Thecode and results are publicly available at", "output": "LLM+P: Empowering Large Language Models with Optimal Planning Proficiency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Individuals involved in gang-related activity use mainstream social mediaincluding Facebook and Twitter to express taunts and threats as well as griefand memorializing. However, identifying the impact of gang-related activity inorder to serve community member needs through social media sources has a uniqueset of challenges. This includes the difficulty of ethically identifyingtraining data of individuals impacted by gang activity and the need to accountfor a non-standard language style commonly used in the tweets from theseindividuals. Our study provides evidence of methods where natural languageprocessing tools can be helpful in efficiently identifying individuals who maybe in need of community care resources such as counselors, conflict mediators,or academic/professional training programs. We demonstrate that our binarylogistic classifier outperforms baseline standards in identifying individualsimpacted by gang-related violence using a sample of gang-related tweetsassociated with Chicago. We ultimately found that the language of a tweet ishighly relevant and that uses of ``big data'' methods or machine learningmodels need to better understand how language impacts the model's performanceand how it discriminates among populations.", "output": "Understanding Lexical Biases when Identifying Gang-related Social Media Communications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) excel in many tasks in 2023, but they still facechallenges in complex reasoning. Theory-of-mind (ToM) tasks, which requireunderstanding agents' beliefs, goals, and mental states, are essential forcommon-sense reasoning involving humans, making it crucial to enhance LLMperformance in this area. This study measures the ToM performance of GPT-4 andthree GPT-3.5 variants (Davinci-2, Davinci-3, GPT-3.5-Turbo), and investigatesthe effectiveness of in-context learning in improving their ToM comprehension.We evaluated prompts featuring two-shot chain of thought reasoning andstep-by-step thinking instructions. We found that LLMs trained withReinforcement Learning from Human Feedback (RLHF) (all models excludingDavinci-2) improved their ToM accuracy via in-context learning. GPT-4 performedbest in zero-shot settings, reaching nearly 80% ToM accuracy, but still fellshort of the 87% human accuracy on the test set. However, when supplied withprompts for in-context learning, all RLHF-trained LLMs exceeded 80% ToMaccuracy, with GPT-4 reaching 100%. These results demonstrate that appropriateprompting enhances LLM ToM reasoning, and they underscore the context-dependentnature of LLM cognitive capacities.", "output": "Boosting Theory-of-Mind Performance in Large Language Models via Prompting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traffic congestion caused by non-recurring incidents such as vehicle crashesand debris is a key issue for Traffic Management Centers (TMCs). Clearingincidents in a timely manner is essential for improving safety and reducingdelays and emissions for the traveling public. However, TMCs and otherresponders face a challenge in predicting the duration of incidents (until theroadway is clear), making decisions of what resources to deploy difficult. Toaddress this problem, this research developed an analytical framework andend-to-end machine-learning solution for predicting incident duration based oninformation available as soon as an incident report is received. Qualitypredictions of incident duration can help TMCs and other responders take aproactive approach in deploying responder services such as tow trucks,maintenance crews or activating alternative routes. The predictions use acombination of classification and regression machine learning modules. Theperformance of the developed solution has been evaluated based on the MeanAbsolute Error (MAE), or deviation from the actual incident duration as well asArea Under the Curve (AUC) and Mean Absolute Percentage Error (MAPE). Theresults showed that the framework significantly improved incident durationprediction compared to methods from previous research.", "output": "Machine learning framework for end-to-end implementation of Incident duration prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the rapid development of Internet of Things technologies, the nextgeneration traffic monitoring infrastructures are connected via the web, to aidtraffic data collection and intelligent traffic management. One of the mostimportant tasks in traffic is anomaly detection, since abnormal drivers canreduce traffic efficiency and cause safety issues. This work focuses ondetecting abnormal driving behaviors from trajectories produced by highwayvideo surveillance systems. Most of the current abnormal driving behaviordetection methods focus on a limited category of abnormal behaviors that dealwith a single vehicle without considering vehicular interactions. In this work,we consider the problem of detecting a variety of socially abnormal drivingbehaviors, i.e., behaviors that do not conform to the behavior of other nearbydrivers. This task is complicated by the variety of vehicular interactions andthe spatial-temporal varying nature of highway traffic. To solve this problem,we propose an autoencoder with a Recurrent Graph Attention Network that cancapture the highway driving behaviors contextualized on the surrounding cars,and detect anomalies that deviate from learned patterns. Our model is scalableto large freeways with thousands of cars. Experiments on data generated fromtraffic simulation software show that our model is the only one that can spotthe exact vehicle conducting socially abnormal behaviors, among thestate-of-the-art anomaly detection models. We further show the performance onreal world HighD traffic dataset, where our model detects vehicles that violatethe local driving norms.", "output": "Detecting Socially Abnormal Highway Driving Behaviors via Recurrent Graph Attention Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Architecture Search (NAS) has become a popular method for discoveringeffective model architectures, especially for target hardware. As such, NASmethods that find optimal architectures under constraints are essential. In ourpaper, we propose LayerNAS to address the challenge of multi-objective NAS bytransforming it into a combinatorial optimization problem, which effectivelyconstrains the search complexity to be polynomial.For a model architecture with $L$ layers, we perform layerwise-search foreach layer, selecting from a set of search options $mathbb{S}$. LayerNASgroups model candidates based on one objective, such as model size or latency,and searches for the optimal model based on another objective, therebysplitting the cost and reward elements of the search. This approach limits thesearch complexity to $ O(H cdot |mathbb{S}| cdot L) $, where $H$ is aconstant set in LayerNAS.Our experiments show that LayerNAS is able to consistently discover superiormodels across a variety of search spaces in comparison to strong baselines,including search spaces derived from NATS-Bench, MobileNetV2 and MobileNetV3.", "output": "LayerNAS: Neural Architecture Search in Polynomial Complexity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text summarization is essential for information aggregation and demands largeamounts of training data. However, concerns about data privacy and securitylimit data collection and model training. To eliminate this concern, we proposea federated learning text summarization scheme, which allows users to share theglobal model in a cooperative learning manner without sharing raw data.Personalized federated learning (PFL) balances personalization andgeneralization in the process of optimizing the global model, to guide thetraining of local models. However, multiple local data have differentdistributions of semantics and context, which may cause the local model tolearn deviated semantic and context information. In this paper, we proposeFedSUMM, a dynamic gradient adapter to provide more appropriate localparameters for local model. Simultaneously, FedSUMM uses differential privacyto prevent parameter leakage during distributed training. Experimental evidenceverifies FedSUMM can achieve faster model convergence on PFL algorithm fortask-specific text summarization, and the method achieves superior performancefor different optimization metrics for text summarization.", "output": "Personalized Federated Learning via Gradient Modulation for Heterogeneous Text Summarization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep reinforcement learning (DRL) for fluidic pinball, three individuallyrotating cylinders in the uniform flow arranged in an equilaterally triangularconfiguration, can learn the efficient flow control strategies due to thevalidity of self-learning and data-driven state estimation for complex fluiddynamic problems. In this work, we present a DRL-based real-time feedbackstrategy to control the hydrodynamic force on fluidic pinball, i.e., forceextremum and tracking, from cylinders' rotation. By adequately designing rewardfunctions and encoding historical observations, and after automatic learning ofthousands of iterations, the DRL-based control was shown to make reasonable andvalid control decisions in nonparametric control parameter space, which iscomparable to and even better than the optimal policy found through lengthybrute-force searching. Subsequently, one of these results was analyzed by amachine learning model that enabled us to shed light on the basis ofdecision-making and physical mechanisms of the force tracking process. Thefinding from this work can control hydrodynamic force on the operation offluidic pinball system and potentially pave the way for exploring efficientactive flow control strategies in other complex fluid dynamic problems.", "output": "How to Control Hydrodynamic Force on Fluidic Pinball via Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial intelligence (AI) methods have great potential to revolutionizenumerous medical care by enhancing the experience of medical experts andpatients. AI based computer-assisted diagnosis tools can have a tremendousbenefit if they can outperform or perform similarly to the level of a clinicalexpert. As a result, advanced healthcare services can be affordable indeveloping nations, and the problem of a lack of expert medical practitionerscan be addressed. AI based tools can save time, resources, and overall cost forpatient treatment. Furthermore, in contrast to humans, AI can uncover complexrelations in the data from a large set of inputs and even lead to newevidence-based knowledge in medicine. However, integrating AI in healthcareraises several ethical and philosophical concerns, such as bias, transparency,autonomy, responsibility and accountability, which must be addressed beforeintegrating such tools into clinical settings. In this article, we emphasizerecent advances in AI-assisted medical image analysis, existing standards, andthe significance of comprehending ethical issues and best practices for theapplications of AI in clinical settings. We cover the technical and ethicalchallenges of AI and the implications of deploying AI in hospitals and publicorganizations. We also discuss promising key measures and techniques to addressthe ethical challenges, data scarcity, racial bias, lack of transparency, andalgorithmic bias. Finally, we provide our recommendation and future directionsfor addressing the ethical challenges associated with AI in healthcareapplications, with the goal of deploying AI into the clinical settings to makethe workflow more efficient, accurate, accessible, transparent, and reliablefor the patient worldwide.", "output": "Ensuring Trustworthy Medical Artificial Intelligencethrough Ethical and Philosophical Principles."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Chain-of-thought (CoT) prompting combined with large language models (LLMs)have achieved encouraging results on complex reasoning tasks. Text-to-SQL is acritical semantic parsing task that converts natural language questions intoSQL statements, involving a complex reasoning process. However, there is littlework about using CoT prompting to activate LLM's reasoning capabilities onText-to-SQL tasks. In this work, we propose a new paradigm for promptingText-to-SQL tasks, called Divide-and-Prompt, which first divides the task intosubtasks, and then approach each subtask through CoT. We present 3prompting-based methods to enhance the Text-to-SQL ability of LLMs. Experimentsshow that these prompts guide LLMs to generate Text-to-SQL with higherexecution accuracy.", "output": "Divide and Prompt: Chain of Thought Prompting for Text-to-SQL."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The flexible duplex (FD) technique, including dynamic time-division duplex(D-TDD) and dynamic frequency-division duplex (D-FDD), is regarded as apromising solution to achieving a more flexible uplink/downlink transmission in5G-Advanced or 6G mobile communication systems. However, it may introduceserious cross-link interference (CLI). For better mitigating the impact of CLI,we first present a more realistic base station (BS)-to-BS channel modelincorporating the radio frequency (RF) chain characteristics, which exhibit ahardware-dependent nonlinear property, and hence the accuracy of conventionalchannel modelling is inadequate for CLI cancellation. Then, we propose achannel parameter estimation based polynomial CLI canceller and two machinelearning (ML) based CLI cancellers that use the lightweight feedforward neuralnetwork (FNN). Our simulation results and analysis show that the ML based CLIcancellers achieve notable performance improvement and dramatic reduction ofcomputational complexity, in comparison with the polynomial CLI canceller.", "output": "Lightweight Machine Learning for Digital Cross-Link Interference Cancellation with RF Chain Characteristics in Flexible Duplex MIMO Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Background: Large language models such as ChatGPT are capable of generatinggrammatically perfect and human-like text content, and a large number ofChatGPT-generated texts have appeared on the Internet. However, medical textssuch as clinical notes and diagnoses require rigorous validation, and erroneousmedical content generated by ChatGPT could potentially lead to disinformationthat poses significant harm to healthcare and the general public.Objective: This research is among the first studies on responsible andethical AIGC (Artificial Intelligence Generated Content) in medicine. We focuson analyzing the differences between medical texts written by human experts andgenerated by ChatGPT, and designing machine learning workflows to effectivelydetect and differentiate medical texts generated by ChatGPT.Methods: We first construct a suite of datasets containing medical textswritten by human experts and generated by ChatGPT. In the next step, we analyzethe linguistic features of these two types of content and uncover differencesin vocabulary, part-of-speech, dependency, sentiment, perplexity, etc. Finally,we design and implement machine learning methods to detect medical textgenerated by ChatGPT.Results: Medical texts written by humans are more concrete, more diverse, andtypically contain more useful information, while medical texts generated byChatGPT pay more attention to fluency and logic, and usually express generalterminologies rather than effective information specific to the context of theproblem. A BERT-based model can effectively detect medical texts generated byChatGPT, and the F1 exceeds 95%.", "output": "Differentiate ChatGPT-generated and Human-written Medical Texts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Meta-structures are widely used to define which subset of neighbors toaggregate information in heterogeneous information networks (HINs). In thiswork, we investigate existing meta-structures, including meta-path andmeta-graph, and observe that they are initially designed manually with fixedpatterns and hence are insufficient to encode various rich semantic informationon diverse HINs. Through reflection on their limitation, we define a newconcept called meta-multigraph as a more expressive and flexible generalizationof meta-graph, and propose a stable differentiable search method toautomatically optimize the meta-multigraph for specific HINs and tasks. As theflexibility of meta-multigraphs may propagate redundant messages, we furtherintroduce a complex-to-concise (C2C) meta-multigraph that propagates messagesfrom complex to concise along the depth of meta-multigraph. Moreover, weobserve that the differentiable search typically suffers from unstable searchand a significant gap between the meta-structures in search and evaluation. Tothis end, we propose a progressive search algorithm by implicitly narrowing thesearch space to improve search stability and reduce inconsistency. Extensiveexperiments are conducted on six medium-scale benchmark datasets and onelarge-scale benchmark dataset over two representative tasks, i.e., nodeclassification and recommendation. Empirical results demonstrate that oursearch methods can automatically find expressive meta-multigraphs and C2Cmeta-multigraphs, enabling our model to outperform state-of-the-artheterogeneous graph neural networks.", "output": "Meta-multigraph Search: Rethinking Meta-structure on Heterogeneous Information Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent work known as Segment Anything (SA) has made significant stridesin pushing the boundaries of semantic segmentation into the era of foundationmodels. The impact of SA has sparked extremely active discussions and usheredin an encouraging new wave of developing foundation models for the diversetasks in the Euclidean domain, such as object detection and image inpainting.Despite the promising advances led by SA, the concept has yet to be extended tothe non-Euclidean graph domain. In this paper, we explore a novel SegmentNon-Euclidean Anything (SNA) paradigm that strives to develop foundation modelsthat can handle the diverse range of graph data within the non-Euclideandomain, seeking to expand the scope of SA and lay the groundwork for futureresearch in this direction. To achieve this goal, we begin by discussing therecent achievements in foundation models associated with SA. We then shed lighton the unique challenges that arise when applying the SA concept to graphanalysis, which involves understanding the differences between the Euclideanand non-Euclidean domains from both the data and task perspectives. Motivatedby these observations, we present several preliminary solutions to tackle thechallenges of SNA and detail their corresponding limitations, along withseveral potential directions to pave the way for future SNA research.Experiments on five Open Graph Benchmark (OGB) datasets across various tasks,including graph property classification and regression, as well as multi-labelprediction, demonstrate that the performance of the naive SNA solutions hasconsiderable room for improvement, pointing towards a promising avenue forfuture exploration of Graph General Intelligence.", "output": "Segment Anything in Non-Euclidean Domains: Challenges and Opportunities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual representation based on covariance matrix has demonstrates itsefficacy for image classification by characterising the pairwise correlation ofdifferent channels in convolutional feature maps. However, pairwise correlationwill become misleading once there is another channel correlating with bothchannels of interest, resulting in the ``confounding'' effect. For this case,``partial correlation'' which removes the confounding effect shall be estimatedinstead. Nevertheless, reliably estimating partial correlation requires tosolve a symmetric positive definite matrix optimisation, known as sparseinverse covariance estimation (SICE). How to incorporate this process into CNNremains an open issue. In this work, we formulate SICE as a novel structuredlayer of CNN. To ensure end-to-end trainability, we develop an iterative methodto solve the above matrix optimisation during forward and backward propagationsteps. Our work obtains a partial correlation based deep visual representationand mitigates the small sample problem often encountered by covariance matrixestimation in CNN. Computationally, our model can be effectively trained withGPU and works well with a large number of channels of advanced CNNs.Experiments show the efficacy and superior classification performance of ourdeep visual representation compared to covariance matrix based counterparts.", "output": "Learning Partial Correlation based Deep Visual Representation for Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) is popular due to its ability to adapt to novelclasses. Compared with inductive few-shot learning, transductive modelstypically perform better as they leverage all samples of the query set. The twoexisting classes of methods, prototype-based and graph-based, have thedisadvantages of inaccurate prototype estimation and sub-optimal graphconstruction with kernel functions, respectively. In this paper, we propose anovel prototype-based label propagation to solve these issues. Specifically,our graph construction is based on the relation between prototypes and samplesrather than between samples. As prototypes are being updated, the graphchanges. We also estimate the label of each prototype instead of considering aprototype be the class centre. On mini-ImageNet, tiered-ImageNet, CIFAR-FS andCUB datasets, we show the proposed method outperforms other state-of-the-artmethods in transductive FSL and semi-supervised FSL when some unlabeled dataaccompanies the novel few-shot task.", "output": "Transductive Few-shot Learning with Prototype-based Label Propagation by Iterative Graph Refinement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Negative sampling (NS) is widely used in knowledge graph embedding (KGE),which aims to generate negative triples to make a positive-negative contrastduring training. However, existing NS methods are unsuitable when multi-modalinformation is considered in KGE models. They are also inefficient due to theircomplex design. In this paper, we propose Modality-Aware Negative Sampling(MANS) for multi-modal knowledge graph embedding (MMKGE) to address thementioned problems. MANS could align structural and visual embeddings forentities in KGs and learn meaningful embeddings to perform better inmulti-modal KGE while keeping lightweight and efficient. Empirical results ontwo benchmarks demonstrate that MANS outperforms existing NS methods.Meanwhile, we make further explorations about MANS to confirm itseffectiveness.", "output": "Modality-Aware Negative Sampling for Multi-modal Knowledge Graph Embedding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpreting remote sensing imagery enables numerous downstream applicationsranging from land-use planning to deforestation monitoring. Robustlyclassifying this data is challenging due to the Earth's geographic diversity.While many distinct satellite and aerial image classification datasets exist,there is yet to be a benchmark curated that suitably covers this diversity. Inthis work, we introduce SATellite ImageNet (SATIN), a metadataset curated from27 existing remotely sensed datasets, and comprehensively evaluate thezero-shot transfer classification capabilities of a broad range ofvision-language (VL) models on SATIN. We find SATIN to be a challengingbenchmark-the strongest method we evaluate achieves a classification accuracyof 52.0%. We provide a $href{ to guide and track the progress of VL models in this importantdomain.", "output": "SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In aggregated variables the impact of interventions is typically ill-definedbecause different micro-realizations of the same macro-intervention can resultin different changes of downstream macro-variables. We show that thisill-definedness of causality on aggregated variables can turn unconfoundedcausal relations into confounded ones and vice versa, depending on therespective micro-realization. We argue that it is practically infeasible toonly use aggregated causal systems when we are free from this ill-definedness.Instead, we need to accept that macro causal relations are typically definedonly with reference to the micro states. On the positive side, we show thatcause-effect relations can be aggregated when the macro interventions are suchthat the distribution of micro states is the same as in the observationaldistribution and also discuss generalizations of this observation.", "output": "Meaningful Causal Aggregation and Paradoxical Confounding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Skeleton-based action recognition has achieved remarkable results in humanaction recognition with the development of graph convolutional networks (GCNs).However, the recent works tend to construct complex learning mechanisms withredundant training and exist a bottleneck for long time-series. To solve theseproblems, we propose the Temporal-Spatio Graph ConvNeXt (TSGCNeXt) to exploreefficient learning mechanism of long temporal skeleton sequences. Firstly, anew graph learning mechanism with simple structure, Dynamic-Static SeparateMulti-graph Convolution (DS-SMG) is proposed to aggregate features of multipleindependent topological graphs and avoid the node information being ignoredduring dynamic convolution. Next, we construct a graph convolution trainingacceleration mechanism to optimize the back-propagation computing of dynamicgraph learning with 55.08% speed-up. Finally, the TSGCNeXt restructure theoverall structure of GCN with three Spatio-temporal learningmodules,efficiently modeling long temporal features. In comparison withexisting previous methods on large-scale datasets NTU RGB+D 60 and 120,TSGCNeXt outperforms on single-stream networks. In addition, with the ema modelintroduced into the multi-stream fusion, TSGCNeXt achieves SOTA levels. On thecross-subject and cross-set of the NTU 120, accuracies reach 90.22% and 91.74%.", "output": "TSGCNeXt: Dynamic-Static Multi-Graph Convolution for Efficient Skeleton-Based Action Recognition with Long-term Learning Potential."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "MOBA games, e.g., Dota2 and Honor of Kings, have been actively used as thetestbed for the recent AI research on games, and various AI systems have beendeveloped at the human level so far. However, these AI systems mainly focus onhow to compete with humans, less on exploring how to collaborate with humans.To this end, this paper makes the first attempt to investigate human-agentcollaboration in MOBA games. In this paper, we propose to enable humans andagents to collaborate through explicit communication by designing an efficientand interpretable Meta-Command Communication-based framework, dubbed MCC, foraccomplishing effective human-agent collaboration in MOBA games. The MCCframework consists of two pivotal modules: 1) an interpretable communicationprotocol, i.e., the Meta-Command, to bridge the communication gap betweenhumans and agents; 2) a meta-command value estimator, i.e., the Meta-CommandSelector, to select a valuable meta-command for each agent to achieve effectivehuman-agent collaboration. Experimental results in Honor of Kings demonstratethat MCC agents can collaborate reasonably well with human teammates and evengeneralize to collaborate with different levels and numbers of human teammates.Videos are available at ", "output": "Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games: A Communication Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies temporal planning in probabilistic environments, modeledas labeled Markov decision processes (MDPs), with user preferences overmultiple temporal goals. Existing works reflect such preferences as aprioritized list of goals. This paper introduces a new specification language,termed prioritized qualitative choice linear temporal logic on finite traces,which augments linear temporal logic on finite traces with prioritizedconjunction and ordered disjunction from prioritized qualitative choice logic.This language allows for succinctly specifying temporal objectives withcorresponding preferences accomplishing each temporal task. The finite tracesthat describe the system's behaviors are ranked based on their dissatisfactionscores with respect to the formula. We propose a systematic translation fromthe new language to a weighted deterministic finite automaton. Utilizing thiscomputational model, we formulate and solve a problem of computing an optimalpolicy that minimizes the expected score of dissatisfaction given userpreferences. We demonstrate the efficacy and applicability of the logic and thealgorithm on several case studies with detailed analyses for each.", "output": "Probabilistic Planning with Prioritized Preferences over Temporal Logic Objectives."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) can achieve highly effective performance onvarious reasoning tasks by incorporating step-by-step chain-of-thought (CoT)prompting as demonstrations. However, the reasoning chains of demonstrationsgenerated by LLMs are prone to errors, which can subsequently lead to incorrectreasoning during inference. Furthermore, inappropriate exemplars (overlysimplistic or complex), can affect overall performance among varying levels ofdifficulty. We introduce Iter-CoT (Iterative bootstrapping in Chain-of-ThoughtsPrompting), an iterative bootstrapping approach for selecting exemplars andgenerating reasoning chains. By utilizing iterative bootstrapping, our approachenables LLMs to autonomously rectify errors, resulting in more precise andcomprehensive reasoning chains. Simultaneously, our approach selectschallenging yet answerable questions accompanied by reasoning chains asexemplars with a moderate level of difficulty, which enhances the LLMs'generalizability across varying levels of difficulty. Experimental resultsindicate that Iter-CoT exhibits superiority, achieving competitive performanceacross three distinct reasoning tasks on eleven datasets.", "output": "Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep equilibrium models (DEQs) have proven to be very powerful for learningdata representations. The idea is to replace traditional (explicit) feedforwardneural networks with an implicit fixed-point equation, which allows to decouplethe forward and backward passes. In particular, training DEQ layers becomesvery memory-efficient via the implicit function theorem. However,backpropagation through DEQ layers still requires solving an expensiveJacobian-based equation. In this paper, we introduce a simple but effectivestrategy to avoid this computational burden. Our method relies on the Jacobianapproximation of Broyden's method after the forward pass to compute thegradients during the backward pass. Experiments show that simply re-using thisapproximation can significantly speed up the training while not causing anyperformance degradation.", "output": "Efficient Training of Deep Equilibrium Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nowadays, one of the main challenges for Question Answering Systems is toanswer complex questions using various sources of information. Multi-hopquestions are a type of complex questions that require multi-step reasoning toanswer. In this article, the IslamicPCQA dataset is introduced. This is thefirst Persian dataset for answering complex questions based on non-structuredinformation sources and consists of 12,282 question-answer pairs extracted from9 Islamic encyclopedias. This dataset has been created inspired by the HotpotQAEnglish dataset approach, which was customized to suit the complexities of thePersian language. Answering questions in this dataset requires more than oneparagraph and reasoning. The questions are not limited to any prior knowledgebase or ontology, and to provide robust reasoning ability, the dataset alsoincludes supporting facts and key sentences. The prepared dataset covers a widerange of Islamic topics and aims to facilitate answering complex Persianquestions within this subject matter", "output": "IslamicPCQA: A Dataset for Persian Multi-hop Complex Question Answering in Islamic Text Resources."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The discriminability of feature representation is the key to open-set facerecognition. Previous methods rely on the learnable weights of theclassification layer that represent the identities. However, the evaluationprocess learns no identity representation and drops the classifier fromtraining. This inconsistency could confuse the feature encoder in understandingthe evaluation goal and hinder the effect of identity-based methods. Toalleviate the above problem, we propose a novel approach namely ContrastiveRegularization for Face recognition (CoReFace) to apply image-levelregularization in feature representation learning. Specifically, we employsample-guided contrastive learning to regularize the training with theimage-image relationship directly, which is consistent with the evaluationprocess. To integrate contrastive learning into face recognition, we augmentembeddings instead of images to avoid the image quality degradation. Then, wepropose a novel contrastive loss for the representation distribution byincorporating an adaptive margin and a supervised contrastive mask to generatesteady loss values and avoid the collision with the classification supervisionsignal. Finally, we discover and solve the semantically repetitive signalproblem in contrastive learning by exploring new pair coupling protocols.Extensive experiments demonstrate the efficacy and efficiency of our CoReFacewhich is highly competitive with the state-of-the-art approaches.", "output": "CoReFace: Sample-Guided Contrastive Regularization for Deep Face Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies semi-supervised graph classification, a crucial task witha wide range of applications in social network analysis and bioinformatics.Recent works typically adopt graph neural networks to learn graph-levelrepresentations for classification, failing to explicitly leverage featuresderived from graph topology (e.g., paths). Moreover, when labeled data isscarce, these methods are far from satisfactory due to their insufficienttopology exploration of unlabeled data. We address the challenge by proposing anovel semi-supervised framework called Twin Graph Neural Network (TGNN). Toexplore graph structural information from complementary views, our TGNN has amessage passing module and a graph kernel module. To fully utilize unlabeleddata, for each module, we calculate the similarity of each unlabeled graph toother labeled graphs in the memory bank and our consistency loss encouragesconsistency between two similarity distributions in different embedding spaces.The two twin modules collaborate with each other by exchanging instancesimilarity knowledge to fully explore the structure information of both labeledand unlabeled data. We evaluate our TGNN on various public datasets and showthat it achieves strong performance.", "output": "TGNN: A Joint Semi-supervised Framework for Graph-level Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semi-cooperative behaviors are intrinsic properties of human drivers andshould be considered for autonomous driving. In addition, new autonomousplanners can consider the social value orientation (SVO) of human drivers togenerate socially-compliant trajectories. Yet the overall impact on trafficflow for this new class of planners remain to be understood. In this work, wepresent study of implicit semi-cooperative driving where agents deploy agame-theoretic version of iterative best response assuming knowledge of theSVOs of other agents. We simulate nominal traffic flow and investigate whetherthe proportion of prosocial agents on the road impact individual or system-widedriving performance. Experiments show that the proportion of prosocial agentshas a minor impact on overall traffic flow and that benefits ofsemi-cooperation disproportionally affect egoistic and high-speed drivers.", "output": "Studying the Impact of Semi-Cooperative Drivers on Overall Highway Flow."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper discusses the representation of ontologies in the first-orderlogical environment {ttfamily FOLE}. An ontology defines the primitives withwhich to model the knowledge resources for a community of discourse. Theseprimitives consist of classes, relationships and properties. An ontology usesformal axioms to constrain the interpretation of these primitives. In short, anontology specifies a logical theory. This paper continues the discussion of therepresentation and interpretation of ontologies in the first-order logicalenvironment {ttfamily FOLE}. The formalism and semantics of (many-sorted)first-order logic can be developed in both a emph{classification form} and anemph{interpretation form}. Two papers, the current paper, defining the conceptof a structure, and ``The {ttfamily ERA} of {ttfamily FOLE}:Superstructure'', defining the concept of a sound logic, represent theemph{classification form}, corresponding to ideas discussed in the``Information Flow Framework''. Two papers, ``The {ttfamily FOLE} Table'',defining the concept of a relational table, and ``The {ttfamily FOLE}Database'', defining the concept of a relational database, represent theemph{interpretation form}, expanding on material found in the paper ``DatabaseSemantics''. Although the classification form follows theentity-relationship-attribute data model of Chen, the interpretation formincorporates the relational data model of Codd. A fifth paper ``{ttfamilyFOLE} Equivalence'' proves that the classification form is equivalent to theinterpretation form. In general, the {ttfamily FOLE} representation uses aconceptual structures approach, that is completely compatible with the theoryof institutions, formal concept analysis and information flow.", "output": "The ERA of FOLE: Foundation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bone age assessment is challenging in clinical practice due to thecomplicated bone age assessment process. Current automatic bone age assessmentmethods were designed with rare consideration of the diagnostic logistics andthus may yield certain uninterpretable hidden states and outputs. Consequently,doctors can find it hard to cooperate with such models harmoniously because itis difficult to check the correctness of the model predictions. In this work,we propose a new graph-based deep learning framework for bone age assessmentwith hand radiographs, called Doctor Imitator (DI). The architecture of DI isdesigned to learn the diagnostic logistics of doctors using the scoring methods(e.g., the Tanner-Whitehouse method) for bone age assessment. Specifically, theconvolutions of DI capture the local features of the anatomical regions ofinterest (ROIs) on hand radiographs and predict the ROI scores by our proposedAnatomy-based Group Convolution, summing up for bone age prediction. Besides,we develop a novel Dual Graph-based Attention module to computepatient-specific attention for ROI features and context attention for ROIscores. As far as we know, DI is the first automatic bone age assessmentframework following the scoring methods without fully supervised handradiographs. Experiments on hand radiographs with only bone age supervisionverify that DI can achieve excellent performance with sparse parameters andprovide more interpretability.", "output": "Doctor Imitator: Hand-Radiography-based Bone Age Assessment by Imitating Scoring Methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The computational principles underlying attention allocation in complexgoal-directed tasks remain elusive. Goal-directed reading, i.e., reading apassage to answer a question in mind, is a common real-world task that stronglyengages attention. Here, we investigate what computational models can explainattention distribution in this complex task. We show that the reading time oneach word is predicted by the attention weights in transformer-based deepneural networks (DNNs) optimized to perform the same reading task. Eye-trackingfurther reveals that readers separately attend to basic text features andquestion-relevant information during first-pass reading and rereading,respectively. Similarly, text features and question relevance separatelymodulate attention weights in shallow and deep DNN layers. Furthermore, whenreaders scan a passage without a question in mind, their reading time ispredicted by DNNs optimized for a word prediction task. Therefore, attentionduring real-world reading can be interpreted as the consequence of taskoptimization.", "output": "Human Attention during Goal-directed Reading Comprehension Relies on Task Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data missingness and quality are common problems in machine learning,especially for high-stakes applications such as healthcare. Developers oftentrain machine learning models on carefully curated datasets using only highquality data; however, this reduces the utility of such models in productionenvironments. We propose a novel neural network modification to mitigate theimpacts of low quality and missing data which involves replacing the fixedweights of a fully-connected layer with a function of an additional input. Thisis inspired from neuromodulation in biological neural networks where the cortexcan up- and down-regulate inputs based on their reliability and the presence ofother data. In testing, with reliability scores as a modulating signal, modelswith modulating layers were found to be more robust against degradation of dataquality, including additional missingness. These models are superior toimputation as they save on training time by completely skipping the imputationprocess and further allow the introduction of other data quality measures thatimputation cannot handle. Our results suggest that explicitly accounting forreduced information quality with a modulating fully connected layer can enablethe deployment of artificial intelligence systems in real-time applications.", "output": "A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, dense passage retrieval has become a mainstream approach to findingrelevant information in various natural language processing tasks. A number ofstudies have been devoted to improving the widely adopted dual-encoderarchitecture. However, most of the previous studies only consider query-centricsimilarity relation when learning the dual-encoder retriever. In order tocapture more comprehensive similarity relations, we propose a novel approachthat leverages both query-centric and PAssage-centric sImilarity Relations(called PAIR) for dense passage retrieval. To implement our approach, we makethree major technical contributions by introducing formal formulations of thetwo kinds of similarity relations, generating high-quality pseudo labeled datavia knowledge distillation, and designing an effective two-stage trainingprocedure that incorporates passage-centric similarity relation constraint.Extensive experiments show that our approach significantly outperforms previousstate-of-the-art models on both MSMARCO and Natural Questions datasets.", "output": "PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision Transformer (ViT) is known to be highly nonlinear like other classicalneural networks and could be easily fooled by both natural and adversarialpatch perturbations. This limitation could pose a threat to the deployment ofViT in the real industrial environment, especially in safety-criticalscenarios. In this work, we propose PatchCensor, aiming to certify the patchrobustness of ViT by applying exhaustive testing. We try to provide a provableguarantee by considering the worst patch attack scenarios. Unlike empiricaldefenses against adversarial patches that may be adaptively breached, certifiedrobust approaches can provide a certified accuracy against arbitrary attacksunder certain conditions. However, existing robustness certifications aremostly based on robust training, which often requires substantial trainingefforts and the sacrifice of model performance on normal samples. To bridge thegap, PatchCensor seeks to improve the robustness of the whole system bydetecting abnormal inputs instead of training a robust model and asking it togive reliable results for every input, which may inevitably compromiseaccuracy. Specifically, each input is tested by voting over multiple inferenceswith different mutated attention masks, where at least one inference isguaranteed to exclude the abnormal patch. This can be seen as complete-coveragetesting, which could provide a statistical guarantee on inference at the testtime. Our comprehensive evaluation demonstrates that PatchCensor is able toachieve high certified accuracy (e.g. 67.1% on ImageNet for 2%-pixeladversarial patches), significantly outperforming state-of-the-art techniqueswhile achieving similar clean accuracy (81.8% on ImageNet). Meanwhile, ourtechnique also supports flexible configurations to handle different adversarialpatch sizes (up to 25%) by simply changing the masking strategy.", "output": "PatchCensor: Patch Robustness Certification for Transformers via Exhaustive Testing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Object detection in autonomous driving applications implies that thedetection and tracking of semantic objects are commonly native to urban drivingenvironments, as pedestrians and vehicles. One of the major challenges instate-of-the-art deep-learning based object detection are false positives whichoccur with overconfident scores. This is highly undesirable in autonomousdriving and other critical robotic-perception domains because of safetyconcerns. This paper proposes an approach to alleviate the problem ofoverconfident predictions by introducing a novel probabilistic layer to deepobject detection networks in testing. The suggested approach avoids thetraditional Sigmoid or Softmax prediction layer which often producesoverconfident predictions. It is demonstrated that the proposed techniquereduces overconfidence in the false positives without degrading the performanceon the true positives. The approach is validated on the 2D-KITTI objectiondetection through the YOLOV4 and SECOND (Lidar-based detector). The proposedapproach enables interpretable probabilistic predictions without therequirement of re-training the network and therefore is very practical.", "output": "Probabilistic Approach for Road-Users Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper considers the problem of learning temporal task specifications,e.g. automata and temporal logic, from expert demonstrations. Taskspecifications are a class of sparse memory augmented rewards with explicitsupport for temporal and Boolean composition. Three features make learningtemporal task specifications difficult: (1) the (countably) infinite number oftasks under consideration; (2) an a-priori ignorance of what memory is neededto encode the task; and (3) the discrete solution space - typically addressedby (brute force) enumeration. To overcome these hurdles, we proposeDemonstration Informed Specification Search (DISS): a family of algorithmsrequiring only black box access to a maximum entropy planner and a task samplerfrom labeled examples. DISS then works by alternating between conjecturinglabeled examples to make the provided demonstrations less surprising andsampling tasks consistent with the conjectured labeled examples. We provide aconcrete implementation of DISS in the context of tasks described byDeterministic Finite Automata, and show that DISS is able to efficientlyidentify tasks from only one or two expert demonstrations.", "output": "Demonstration Informed Specification Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep reinforcement learning has gathered much attention recently. Impressiveresults were achieved in activities as diverse as autonomous driving, gameplaying, molecular recombination, and robotics. In all these fields, computerprograms have taught themselves to solve difficult problems. They have learnedto fly model helicopters and perform aerobatic manoeuvers such as loops androlls. In some applications they have even become better than the best humans,such as in Atari, Go, poker and StarCraft. The way in which deep reinforcementlearning explores complex environments reminds us of how children learn, byplayfully trying out things, getting feedback, and trying again. The computerseems to truly possess aspects of human learning; this goes to the heart of thedream of artificial intelligence. The successes in research have not goneunnoticed by educators, and universities have started to offer courses on thesubject. The aim of this book is to provide a comprehensive overview of thefield of deep reinforcement learning. The book is written for graduate studentsof artificial intelligence, and for researchers and practitioners who wish tobetter understand deep reinforcement learning methods and their challenges. Weassume an undergraduate-level of understanding of computer science andartificial intelligence; the programming language of this book is Python. Wedescribe the foundations, the algorithms and the applications of deepreinforcement learning. We cover the established model-free and model-basedmethods that form the basis of the field. Developments go quickly, and we alsocover advanced topics: deep multi-agent reinforcement learning, deephierarchical reinforcement learning, and deep meta learning.", "output": "Deep Reinforcement Learning, a textbook."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Utilizing multi-modal neuroimaging data has been proved to be effective toinvestigate human cognitive activities and certain pathologies. However, it isnot practical to obtain the full set of paired neuroimaging data centrallysince the collection faces several constraints, e.g., high examination cost,long acquisition time, and image corruption. In addition, these data aredispersed into different medical institutions and thus cannot be aggregated forcentralized training considering the privacy issues. There is a clear need tolaunch a federated learning and facilitate the integration of the disperseddata from different institutions. In this paper, we propose a new benchmark forfederated domain translation on unsupervised brain image synthesis (termed asFedMed-GAN) to bridge the gap between federated learning and medical GAN.FedMed-GAN mitigates the mode collapse without sacrificing the performance ofgenerators, and is widely applied to different proportions of unpaired andpaired data with variation adaptation property. We treat the gradient penaltiesby federally averaging algorithm and then leveraging differential privacygradient descent to regularize the training dynamics. A comprehensiveevaluation is provided for comparing FedMed-GAN and other centralized methods,which shows the new state-of-the-art performance by our FedMed-GAN. Our codehas been released on the website: ", "output": "FedMed-GAN: Federated Domain Translation on Unsupervised Cross-Modality Brain Image Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Optimization is a ubiquitous modeling tool and is often deployed in settingswhich repeatedly solve similar instances of the same problem. Amortizedoptimization methods use learning to predict the solutions to problems in thesesettings, exploiting the shared structure between similar problem instances.These methods have been crucial in variational inference and reinforcementlearning and are capable of solving optimization problems many orders ofmagnitudes times faster than traditional optimization methods that do not useamortization. This tutorial presents an introduction to the amortizedoptimization foundations behind these advancements and overviews theirapplications in variational inference, sparse coding, gradient-basedmeta-learning, control, reinforcement learning, convex optimization, optimaltransport, and deep equilibrium networks. The source code for this tutorial isavailable at", "output": "Tutorial on amortized optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This research studies graph-based approaches for Answer Sentence Selection(AS2), an essential component for retrieval-based Question Answering (QA)systems. During offline learning, our model constructs a small-scale relevanttraining graph per question in an unsupervised manner, and integrates withGraph Neural Networks. Graph nodes are question sentence to answer sentencepairs. We train and integrate state-of-the-art (SOTA) models for computingscores between question-question, question-answer, and answer-answer pairs, anduse thresholding on relevance scores for creating graph edges. Online inferenceis then performed to solve the AS2 task on unseen queries. Experiments on twowell-known academic benchmarks and a real-world dataset show that our approachconsistently outperforms SOTA QA baseline models.", "output": "Question-Answer Sentence Graph for Joint Modeling Answer Selection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Proof Blocks is a software tool that allows students to practice writingmathematical proofs by dragging and dropping lines instead of writing proofsfrom scratch. Proof Blocks offers the capability of assigning partial creditand providing solution quality feedback to students. This is done by computingthe edit distance from a student's submission to some predefined set ofsolutions. In this work, we propose an algorithm for the edit distance problemthat significantly outperforms the baseline procedure of exhaustivelyenumerating over the entire search space. Our algorithm relies on a reductionto the minimum vertex cover problem. We benchmark our algorithm on thousands ofstudent submissions from multiple courses, showing that the baseline algorithmis intractable, and that our proposed algorithm is critical to enable classroomdeployment. Our new algorithm has also been used for problems in many otherdomains where the solution space can be modeled as a DAG, including but notlimited to Parsons Problems for writing code, helping students understandpacket ordering in networking protocols, and helping students sketch solutionsteps for physics problems. Integrated into multiple learning managementsystems, the algorithm serves thousands of students each year.", "output": "Efficient Feedback and Partial Credit Grading for Proof Blocks Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep reinforcement learning (DRL) gives the promise that an agent learns goodpolicy from high-dimensional information, whereas representation learningremoves irrelevant and redundant information and retains pertinent information.In this work, we demonstrate that the learned representation of the $Q$-networkand its target $Q$-network should, in theory, satisfy a favorabledistinguishable representation property. Specifically, there exists an upperbound on the representation similarity of the value functions of two adjacenttime steps in a typical DRL setting. However, through illustrative experiments,we show that the learned DRL agent may violate this property and lead to asub-optimal policy. Therefore, we propose a simple yet effective regularizercalled Policy Evaluation with Easy Regularization on Representation (PEER),which aims to maintain the distinguishable representation property via explicitregularization on internal representations. And we provide the convergence rateguarantee of PEER. Implementing PEER requires only one line of code. Ourexperiments demonstrate that incorporating PEER into DRL can significantlyimprove performance and sample efficiency. Comprehensive experiments show thatPEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9out of 12 tasks on DMControl, and 19 out of 26 games on Atari. To the best ofour knowledge, PEER is the first work to study the inherent representationproperty of Q-network and its target. Our code is available at", "output": "Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new type of multi-agent interactive classifier that providesprovable interpretability guarantees even for complex agents such as neuralnetworks. These guarantees consist of bounds on the mutual information of thefeatures selected by this classifier. Our results are inspired by theMerlin-Arthur protocol from Interactive Proof Systems and express these boundsin terms of measurable metrics such as soundness and completeness. Compared toexisting interactive setups we do not rely on optimal agents or on theassumption that features are distributed independently. Instead, we use therelative strength of the agents as well as the new concept of AsymmetricFeature Correlation which captures the precise kind of correlations that makeinterpretability guarantees difficult. %relates the information carried by setsof features to one of the individual features. We test our results throughnumerical experiments on two small-scale datasets where high mutual informationcan be verified explicitly.", "output": "Formal Interpretability with Merlin-Arthur Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural message passing is a basic feature extraction unit forgraph-structured data considering neighboring node features in networkpropagation from one layer to the next. We model such process by an interactingparticle system with attractive and repulsive forces and the Allen-Cahn forcearising in the modeling of phase transition. The dynamics of the system is areaction-diffusion process which can separate particles without blowing up.This induces an Allen-Cahn message passing (ACMP) for graph neural networkswhere the numerical iteration for the particle system solution constitutes themessage passing propagation. ACMP which has a simple implementation with aneural ODE solver can propel the network depth up to one hundred of layers withtheoretically proven strictly positive lower bound of the Dirichlet energy. Itthus provides a deep model of GNNs circumventing the common GNN problem ofoversmoothing. GNNs with ACMP achieve state of the art performance forreal-world node classification tasks on both homophilic and heterophilicdatasets.", "output": "ACMP: Allen-Cahn Message Passing for Graph Neural Networks with Particle Phase Transition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Widely observed neural scaling laws, in which error falls off as a power ofthe training set size, model size, or both, have driven substantial performanceimprovements in deep learning. However, these improvements through scalingalone require considerable costs in compute and energy. Here we focus on thescaling of error with dataset size and show how in theory we can break beyondpower law scaling and potentially even reduce it to exponential scaling insteadif we have access to a high-quality data pruning metric that ranks the order inwhich training examples should be discarded to achieve any pruned dataset size.We then test this improved scaling prediction with pruned dataset sizeempirically, and indeed observe better than power law scaling in practice onResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance offinding high-quality pruning metrics, we perform the first large-scalebenchmarking study of ten different data pruning metrics on ImageNet. We findmost existing high performing metrics scale poorly to ImageNet, while the bestare computationally intensive and require labels for every image. We thereforedeveloped a new simple, cheap and scalable self-supervised pruning metric thatdemonstrates comparable performance to the best supervised metrics. Overall,our work suggests that the discovery of good data-pruning metrics may provide aviable path forward to substantially improved neural scaling laws, therebyreducing the resource costs of modern deep learning.", "output": "Beyond neural scaling laws: beating power law scaling via data pruning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How might we use cognitive modeling to consider the ways in whichantiblackness, and racism more broadly, impact the design and development of AIsystems? We provide a discussion and an example towards an answer to thisquestion. We use the ACT-R/{Phi} cognitive architecture and an existingknowledge graph system, ConceptNet, to consider this question not only from acognitive and sociocultural perspective, but also from a physiologicalperspective. In addition to using a cognitive modeling as a means to explorehow antiblackness may manifest in the design and development of AI systems(particularly from a software engineering perspective), we also introduceconnections between antiblackness, the Human, and computational cognitivemodeling. We argue that the typical eschewing of sociocultural processes andknowledge structures in cognitive architectures and cognitive modelingimplicitly furthers a colorblind approach to cognitive modeling and hidessociocultural context that is always present in human behavior and affectscognitive processes.", "output": "Using a Cognitive Architecture to consider antiblackness in design and development of AI systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the continuous-time counterpart of Q-learning for reinforcementlearning (RL) under the entropy-regularized, exploratory diffusion processformulation introduced by Wang et al. (2020). As the conventional (big)Q-function collapses in continuous time, we consider its first-orderapproximation and coin the term ``(little) q-function\". This function isrelated to the instantaneous advantage rate function as well as theHamiltonian. We develop a ``q-learning\" theory around the q-function that isindependent of time discretization. Given a stochastic policy, we jointlycharacterize the associated q-function and value function by martingaleconditions of certain stochastic processes, in both on-policy and off-policysettings. We then apply the theory to devise different actor-critic algorithmsfor solving underlying RL problems, depending on whether or not the densityfunction of the Gibbs measure generated from the q-function can be computedexplicitly. One of our algorithms interprets the well-known Q-learningalgorithm SARSA, and another recovers a policy gradient (PG) basedcontinuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conductsimulation experiments to compare the performance of our algorithms with thoseof PG-based algorithms in Jia and Zhou (2022b) and time-discretizedconventional Q-learning algorithms.", "output": "q-Learning in Continuous Time."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training machine learning (ML) algorithms is a computationally intensiveprocess, which is frequently memory-bound due to repeatedly accessing largetraining datasets. As a result, processor-centric systems (e.g., CPU, GPU)suffer from costly data movement between memory units and processing units,which consumes large amounts of energy and execution cycles. Memory-centriccomputing systems, i.e., with processing-in-memory (PIM) capabilities, canalleviate this data movement bottleneck.Our goal is to understand the potential of modern general-purpose PIMarchitectures to accelerate ML training. To do so, we (1) implement severalrepresentative classic ML algorithms (namely, linear regression, logisticregression, decision tree, K-Means clustering) on a real-world general-purposePIM architecture, (2) rigorously evaluate and characterize them in terms ofaccuracy, performance and scaling, and (3) compare to their counterpartimplementations on CPU and GPU. Our evaluation on a real memory-centriccomputing system with more than 2500 PIM cores shows that general-purpose PIMarchitectures can greatly accelerate memory-bound ML workloads, when thenecessary operations and datatypes are natively supported by PIM hardware. Forexample, our PIM implementation of decision tree is $27times$ faster than astate-of-the-art CPU version on an 8-core Intel Xeon, and $1.34times$ fasterthan a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clusteringon PIM is $2.8times$ and $3.2times$ than state-of-the-art CPU and GPUversions, respectively.To our knowledge, our work is the first one to evaluate ML training on areal-world PIM architecture. We conclude with key observations, takeaways, andrecommendations that can inspire users of ML workloads, programmers of PIMarchitectures, and hardware designers & architects of future memory-centriccomputing systems.", "output": "An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing visual question answering methods tend to capture the cross-modalspurious correlations and fail to discover the true causal mechanism thatfacilitates reasoning truthfully based on the dominant visual evidence and thequestion intention. Additionally, the existing methods usually ignore thecross-modal event-level understanding that requires to jointly model eventtemporality, causality, and dynamics. In this work, we focus on event-levelvisual question answering from a new perspective, i.e., cross-modal causalrelational reasoning, by introducing causal intervention methods to discoverthe true causal structures for visual and linguistic modalities. Specifically,we propose a novel event-level visual question answering framework namedCross-Modal Causal RelatIonal Reasoning (CMCIR), to achieve robustcausality-aware visual-linguistic question answering. To discover cross-modalcausal structures, the Causality-aware Visual-Linguistic Reasoning (CVLR)module is proposed to collaboratively disentangle the visual and linguisticspurious correlations via front-door and back-door causal interventions. Tomodel the fine-grained interactions between linguistic semantics andspatial-temporal representations, we build a Spatial-Temporal Transformer (STT)that creates multi-modal co-occurrence interactions between visual andlinguistic content. To adaptively fuse the causality-ware visual and linguisticfeatures, we introduce a Visual-Linguistic Feature Fusion (VLFF) module thatleverages the hierarchical linguistic semantic relations as the guidance tolearn the global semantic-aware visual-linguistic representations adaptively.Extensive experiments on four event-level datasets demonstrate the superiorityof our CMCIR in discovering visual-linguistic causal structures and achievingrobust event-level visual question answering.", "output": "Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We extend conformal prediction to control the expected value of any monotoneloss function. The algorithm generalizes split conformal prediction togetherwith its coverage guarantee. Like conformal prediction, the conformal riskcontrol procedure is tight up to an $mathcal{O}(1/n)$ factor. Worked examplesfrom computer vision and natural language processing demonstrate the usage ofour algorithm to bound the false negative rate, graph distance, and token-levelF1-score.", "output": "Conformal Risk Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neuro-Symbolic (NeSy) integration combines symbolic reasoning with NeuralNetworks (NNs) for tasks requiring perception and reasoning. Most NeSy systemsrely on continuous relaxation of logical knowledge, and no discrete decisionsare made within the model pipeline. Furthermore, these methods assume that thesymbolic rules are given. In this paper, we propose Deep Symbolic Learning(DSL), a NeSy system that learns NeSy-functions, i.e., the composition of a(set of) perception functions which map continuous data to discrete symbols,and a symbolic function over the set of symbols. DSL learns simultaneously theperception and symbolic functions while being trained only on their composition(NeSy-function). The key novelty of DSL is that it can create internal(interpretable) symbolic representations and map them to perception inputswithin a differentiable NN learning pipeline. The created symbols areautomatically selected to generate symbolic functions that best explain thedata. We provide experimental analysis to substantiate the efficacy of DSL insimultaneously learning perception and symbolic functions.", "output": "Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing techniques for image-to-image translation commonly have sufferedfrom two critical problems: heavy reliance on per-sample domain annotationand/or inability of handling multiple attributes per image. Recenttruly-unsupervised methods adopt clustering approaches to easily provideper-sample one-hot domain labels. However, they cannot account for thereal-world setting: one sample may have multiple attributes. In addition, thesemantics of the clusters are not easily coupled to the human understanding. Toovercome these, we present a LANguage-driven Image-to-image Translation model,dubbed LANIT. We leverage easy-to-obtain candidate attributes given in textsfor a dataset: the similarity between images and attributes indicatesper-sample domain labels. This formulation naturally enables multi-hot label sothat users can specify the target domain with a set of attributes in language.To account for the case that the initial prompts are inaccurate, we alsopresent prompt learning. We further present domain regularization loss thatenforces translated images be mapped to the corresponding domain. Experimentson several standard benchmarks demonstrate that LANIT achieves comparable orsuperior performance to existing models.", "output": "LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Context. We study the benefits of using a large public neuroimaging databasecomposed of fMRI statistic maps, in a self-taught learning framework, forimproving brain decoding on new tasks. First, we leverage the NeuroVaultdatabase to train, on a selection of relevant statistic maps, a convolutionalautoencoder to reconstruct these maps. Then, we use this trained encoder toinitialize a supervised convolutional neural network to classify tasks orcognitive processes of unseen statistic maps from large collections of theNeuroVault database. Results. We show that such a self-taught learning processalways improves the performance of the classifiers but the magnitude of thebenefits strongly depends on the number of samples available both forpre-training and finetuning the models and on the complexity of the targeteddownstream task. Conclusion. The pre-trained model improves the classificationperformance and displays more generalizable features, less sensitive toindividual differences.", "output": "On the benefits of self-taught learning for brain decoding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The idea of embedding optimization problems into deep neural networks asoptimization layers to encode constraints and inductive priors has taken holdin recent years. Most existing methods focus on implicitly differentiatingKarush-Kuhn-Tucker (KKT) conditions in a way that requires expensivecomputations on the Jacobian matrix, which can be slow and memory-intensive. Inthis paper, we developed a new framework, named Alternating Differentiation(Alt-Diff), that differentiates optimization problems (here, specifically inthe form of convex optimization problems with polyhedral constraints) in a fastand recursive way. Alt-Diff decouples the differentiation procedure into aprimal update and a dual update in an alternating way. Accordingly, Alt-Diffsubstantially decreases the dimensions of the Jacobian matrix especially foroptimization with large-scale constraints and thus increases the computationalspeed of implicit differentiation. We show that the gradients obtained byAlt-Diff are consistent with those obtained by differentiating KKT conditions.In addition, we propose to truncate Alt-Diff to further accelerate thecomputational speed. Under some standard assumptions, we show that thetruncation error of gradients is upper bounded by the same order of variables'estimation error. Therefore, Alt-Diff can be truncated to further increasecomputational speed without sacrificing much accuracy. A series ofcomprehensive experiments validate the superiority of Alt-Diff.", "output": "Alternating Differentiation for Optimization Layers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a smoothly broken power law functional form (referred to by us asa Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates thescaling behaviors of deep neural networks (i.e. how the evaluation metric ofinterest varies as the amount of compute used for training, number of modelparameters, training dataset size, model input size, number of training steps,or upstream performance varies) for various architectures and for each ofvarious tasks within a large and diverse set of upstream and downstream tasks,in zero-shot, prompted, and fine-tuned settings. This set includes large-scalevision, language, audio, video, diffusion, generative modeling, multimodallearning, contrastive learning, AI alignment, robotics, out-of-distribution(OOD) generalization, continual learning, transfer learning, uncertaintyestimation / calibration, out-of-distribution detection, adversarialrobustness, distillation, sparsity, retrieval, quantization, pruning, fairness,molecules, computer programming/coding, math word problems, \"emergent\" \"phasetransitions / changes\", arithmetic, unsupervised/self-supervised learning, &reinforcement learning (single agent & multi-agent). When compared to otherfunctional forms for neural scaling behavior, this functional form yieldsextrapolations of scaling behavior that are considerably more accurate on thisset. Moreover, this functional form accurately models & extrapolates scalingbehavior that other functional forms are incapable of expressing such as thenon-monotonic transitions present in the scaling behavior of phenomena such asdouble descent & the delayed, sharp inflection points present in the scalingbehavior of tasks such as arithmetic. Lastly, we use this functional form toglean insights about the limit of the predictability of scaling behavior. Codeis available at ", "output": "Broken Neural Scaling Laws."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Embedding knowledge graphs (KGs) for multi-hop logical reasoning is achallenging problem due to massive and complicated structures in many KGs.Recently, many promising works projected entities and queries into a geometricspace to efficiently find answers. However, it remains challenging to model thenegation and union operator. The negation operator has no strict boundaries,which generates overlapped embeddings and leads to obtaining ambiguous answers.An additional limitation is that the union operator is non-closure, whichundermines the model to handle a series of union operators. To address theseproblems, we propose a novel probabilistic embedding model, namely GammaEmbeddings (GammaE), for encoding entities and queries to answer differenttypes of FOL queries on KGs. We utilize the linear property and strong boundarysupport of the Gamma distribution to capture more features of entities andqueries, which dramatically reduces model uncertainty. Furthermore, GammaEimplements the Gamma mixture method to design the closed union operator. Theperformance of GammaE is validated on three large logical query datasets.Experimental results show that GammaE significantly outperformsstate-of-the-art models on public benchmarks.", "output": "GammaE: Gamma Embeddings for Logical Queries on Knowledge Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-agent artificial intelligence research promises a path to developintelligent technologies that are more human-like and more human-compatiblethan those produced by \"solipsistic\" approaches, which do not considerinteractions between agents. Melting Pot is a research tool developed tofacilitate work on multi-agent artificial intelligence, and provides anevaluation protocol that measures generalization to novel social partners in aset of canonical test scenarios. Each scenario pairs a physical environment (a\"substrate\") with a reference set of co-players (a \"background population\"), tocreate a social situation with substantial interdependence between theindividuals involved. For instance, some scenarios were inspired byinstitutional-economics-based accounts of natural resource management andpublic-good-provision dilemmas. Others were inspired by considerations fromevolutionary biology, game theory, and artificial life. Melting Pot aims tocover a maximally diverse set of interdependencies and incentives. It includesthe commonly-studied extreme cases of perfectly-competitive (zero-sum)motivations and perfectly-cooperative (shared-reward) motivations, but does notstop with them. As in real-life, a clear majority of scenarios in Melting Pothave mixed incentives. They are neither purely competitive nor purelycooperative and thus demand successful agents be able to navigate the resultingambiguity. Here we describe Melting Pot 2.0, which revises and expands onMelting Pot. We also introduce support for scenarios with asymmetric roles, andexplain how to integrate them into the evaluation protocol. This report alsocontains: (1) details of all substrates and scenarios; (2) a completedescription of all baseline algorithms and results. Our intention is for it toserve as a reference for researchers using Melting Pot 2.0.", "output": "Melting Pot 2.0."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Non-parametric episodic memory can be used to quickly latch ontohigh-rewarded experience in reinforcement learning tasks. In contrast toparametric deep reinforcement learning approaches in which reward signals needto be back-propagated slowly, these methods only need to discover the solutiononce, and may then repeatedly solve the task. However, episodic controlsolutions are stored in discrete tables, and this approach has so far only beenapplied to discrete action space problems. Therefore, this paper introducesContinuous Episodic Control (CEC), a novel non-parametric episodic memoryalgorithm for sequential decision making in problems with a continuous actionspace. Results on several sparse-reward continuous control environments showthat our proposed method learns faster than state-of-the-art model-free RL andmemory-augmented RL algorithms, while maintaining good long-run performance aswell. In short, CEC can be a fast approach for learning in continuous controltasks.", "output": "Continuous Episodic Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over-parameterization of deep neural networks (DNNs) has shown highprediction accuracy for many applications. Although effective, the large numberof parameters hinders its popularity on resource-limited devices and has anoutsize environmental impact. Sparse training (using a fixed number of nonzeroweights in each iteration) could significantly mitigate the training costs byreducing the model size. However, existing sparse training methods mainly useeither random-based or greedy-based drop-and-grow strategies, resulting inlocal minimal and low accuracy. In this work, we consider the dynamic sparsetraining as a sparse connectivity search problem and design an exploitation andexploration acquisition function to escape from local optima and saddle points.We further design an acquisition function and provide the theoreticalguarantees for the proposed method and clarify its convergence property.Experimental results show that sparse models (up to 98% sparsity) obtained byour proposed method outperform the SOTA sparse training methods on a widevariety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10,ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models.On ResNet-50 / ImageNet, the proposed method has up to 8.2% accuracyimprovement compared to SOTA sparse training methods.", "output": "Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks have emerged as the workhorse for a large section ofrobotics and control applications, especially as models for dynamical systems.Such data-driven models are in turn used for designing and verifying autonomoussystems. They are particularly useful in modeling medical systems where datacan be leveraged to individualize treatment. In safety-critical applications,it is important that the data-driven model is conformant to establishedknowledge from the natural sciences. Such knowledge is often available or canoften be distilled into a (possibly black-box) model. For instance, an F1racing car should conform to Newton's laws (which are encoded within a unicyclemodel). In this light, we consider the following problem - given a model $M$and a state transition dataset, we wish to best approximate the system modelwhile being a bounded distance away from $M$. We propose a method to guaranteethis conformance. Our first step is to distill the dataset into a fewrepresentative samples called memories, using the idea of a growing neural gas.Next, using these memories we partition the state space into disjoint subsetsand compute bounds that should be respected by the neural network in eachsubset. This serves as a symbolic wrapper for guaranteed conformance. We arguetheoretically that this only leads to a bounded increase in approximationerror; which can be controlled by increasing the number of memories. Weexperimentally show that on three case studies (Car Model, Drones, andArtificial Pancreas), our constrained neurosymbolic models conform to specifiedmodels (each encoding various constraints) with order-of-magnitude improvementscompared to the augmented Lagrangian and vanilla training methods. Our code canbe found at: ", "output": "Guaranteed Conformance of Neurosymbolic Models to Natural Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This article presents a dataset of 10,917 news articles with hierarchicalnews categories collected between 1 January 2019 and 31 December 2019. Wemanually labeled the articles based on a hierarchical taxonomy with 17first-level and 109 second-level categories. This dataset can be used to trainmachine learning models for automatically classifying news articles by topic.This dataset can be helpful for researchers working on news structuring,classification, and predicting future events based on released news.", "output": "MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is an emerging paradigm to train model withdistributed data from numerous Internet of Things (IoT) devices. It inherentlyassumes a uniform capacity among participants. However, due to differentconditions such as differing energy budgets or executing parallel unrelatedtasks, participants have diverse computational resources in practice.Participants with insufficient computation budgets must plan for the use ofrestricted computational resources appropriately, otherwise they would beunable to complete the entire training procedure, resulting in modelperformance decline. To address the this issue, we propose a strategy forestimating local models without computationally intensive iterations. Based onit, we propose Computationally Customized Federated Averaging (CC-FedAvg),which allows participants to determine whether to perform traditional localtraining or model estimation in each round based on their current computationalbudgets. Both theoretical analysis and exhaustive experiments indicate thatCC-FedAvg has the same convergence rate and comparable performance as FedAvgwithout resource constraints. Furthermore, CC-FedAvg can be viewed as acomputation-efficient version of FedAvg that retains model performance whileconsiderably lowering computation overhead.", "output": "CC-FedAvg: Computationally Customized Federated Averaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, by constructing extremely hard examples of CSP (with largedomains) and SAT (with long clauses), we prove that such examples cannot besolved without exhaustive search, which implies a weaker conclusion P $neq$NP. This constructive approach for proving impossibility results is verydifferent (and missing) from those currently used in computational complexitytheory, but is similar to that used by Kurt G\"{o}del in proving his famouslogical impossibility results. Just as shown by G\"{o}del's results thatproving formal unprovability is feasible in mathematics, the results of thispaper show that proving computational hardness is not hard in mathematics.Specifically, proving lower bounds for many problems, such as 3-SAT, can bechallenging because these problems have various effective strategies availablefor avoiding exhaustive search. However, in cases of extremely hard examples,exhaustive search may be the only viable option, and proving its necessitybecomes more straightforward. Consequently, it makes the separation between SAT(with long clauses) and 3-SAT much easier than that between 3-SAT and 2-SAT.Finally, the main results of this paper demonstrate that the fundamentaldifference between the syntax and the semantics revealed by G\"{o}del's resultsalso exists in CSP and SAT.", "output": "SAT Requires Exhaustive Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Huge challenges exist for old landslide detection because their morphologyfeatures have been partially or strongly transformed over a long time and havelittle difference from their surrounding. Besides, small-sample problem alsorestrict in-depth learning.In this paper, an iterative classification and semantic segmentation network(ICSSN) is developed, which can greatly enhance both object-level andpixel-level classification performance by iteratively upgrading the featureextractor shared by two network. An object-level contrastive learning (OCL)strategy is employed in the object classification sub-network featuring asiamese network to realize the global features extraction, and asub-object-level contrastive learning (SOCL) paradigm is designed in thesemantic segmentation sub-network to efficiently extract salient features fromboundaries of landslides. Moreover, an iterative training strategy iselaborated to fuse features in semantic space such that both object-level andpixel-level classification performance are improved.The proposed ICSSN is evaluated on the real landslide data set, and theexperimental results show that ICSSN can greatly improve the classification andsegmentation accuracy of old landslide detection. For the semantic segmentationtask, compared to the baseline, the F1 score increases from 0.5054 to 0.5448,the mIoU improves from 0.6405 to 0.6610, the landslide IoU improved from 0.3381to 0.3743, and the object-level detection accuracy of old landslides isenhanced from 0.55 to 0.9. For the object classification task, the F1 scoreincreases from 0.8846 to 0.9230, and the accuracy score is up from 0.8375 to0.8875.", "output": "An Iterative Classification and Semantic Segmentation Network for Old Landslide Detection Using High-Resolution Remote Sensing Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reconstructing perceived natural images or decoding their categories fromfMRI signals are challenging tasks with great scientific significance. Due tothe lack of paired samples, most existing methods fail to generate semanticallyrecognizable reconstruction and are difficult to generalize to novel classes.In this work, we propose, for the first time, a task-agnostic brain decodingmodel by unifying the visual stimulus classification and reconstruction tasksin a semantic space. We denote it as BrainCLIP, which leverages CLIP'scross-modal generalization ability to bridge the modality gap between brainactivities, images, and texts. Specifically, BrainCLIP is a VAE-basedarchitecture that transforms fMRI patterns into the CLIP embedding space bycombining visual and textual supervision. Note that previous works rarely usemulti-modal supervision for visual stimulus decoding. Our experimentsdemonstrate that textual supervision can significantly boost the performance ofdecoding models compared to the condition where only image supervision exists.BrainCLIP can be applied to multiple scenarios like fMRI-to-image generation,fMRI-image-matching, and fMRI-text-matching. Compared with BraVL, a recentlyproposed multi-modal method for fMRI-based brain decoding, BrainCLIP achievessignificantly better performance on the novel class classification task.BrainCLIP also establishes a new state-of-the-art for fMRI-based natural imagereconstruction in terms of high-level image features.", "output": "BrainCLIP: Bridging Brain and Visual-Linguistic Representation via CLIP for Generic Natural Visual Stimulus Decoding from fMRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper describes the submission of UZH_CLyp for the SemEval 2023 Task 9\"Multilingual Tweet Intimacy Analysis\". We achieved second-best results in all10 languages according to the official Pearson's correlation regressionevaluation measure. Our cross-lingual transfer learning approach explores thebenefits of using a Head-First Fine-Tuning method (HeFiT) that first updatesonly the regression head parameters and then also updates the pre-trainedtransformer encoder parameters at a reduced learning rate. Additionally, westudy the impact of using a small set of automatically generated examples (inour case, from ChatGPT) for low-resource settings where no human-labeled datais available. Our study shows that HeFiT stabilizes training and consistentlyimproves results for pre-trained models that lack domain adaptation to tweets.Our study also shows a noticeable performance increase in cross-linguallearning when synthetic data is used, confirming the usefulness of current textgeneration systems to improve zero-shot baseline results. Finally, we examinehow possible inconsistencies in the annotated data contribute to cross-lingualinterference issues.", "output": "UZH_CLyp at SemEval-2023 Task 9: Head-First Fine-Tuning and ChatGPT Data Generation for Cross-Lingual Learning in Tweet Intimacy Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Teacher-Student Framework (TSF) is a reinforcement learning setting wherea teacher agent guards the training of a student agent by intervening andproviding online demonstrations. Assuming optimal, the teacher policy has theperfect timing and capability to intervene in the learning process of thestudent agent, providing safety guarantee and exploration guidance.Nevertheless, in many real-world settings it is expensive or even impossible toobtain a well-performing teacher policy. In this work, we relax the assumptionof a well-performing teacher and develop a new method that can incorporatearbitrary teacher policies with modest or inferior performance. We instantiatean Off-Policy Reinforcement Learning algorithm, termed Teacher-Student SharedControl (TS2C), which incorporates teacher intervention based ontrajectory-based value estimation. Theoretical analysis validates that theproposed TS2C algorithm attains efficient exploration and substantial safetyguarantee without being affected by the teacher's own performance. Experimentson various continuous control tasks show that our method can exploit teacherpolicies at different performance levels while maintaining a low training cost.Moreover, the student policy surpasses the imperfect teacher policy in terms ofhigher accumulated reward in held-out testing environments. Code is availableat ", "output": "Guarded Policy Optimization with Imperfect Online Demonstrations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal hate detection, which aims to identify harmful content online suchas memes, is crucial for building a wholesome internet environment. Previouswork has made enlightening exploration in detecting explicit hate remarks.However, most of their approaches neglect the analysis of implicit harm, whichis particularly challenging as explicit text markers and demographic visualcues are often twisted or missing. The leveraged cross-modal attentionmechanisms also suffer from the distributional modality gap and lack logicalinterpretability. To address these semantic gaps issues, we propose TOT: atopology-aware optimal transport framework to decipher the implicit harm inmemes scenario, which formulates the cross-modal aligning problem as solutionsfor optimal transportation plans. Specifically, we leverage an optimaltransport kernel method to capture complementary information from multiplemodalities. The kernel embedding provides a non-linear transformation abilityto reproduce a kernel Hilbert space (RKHS), which reflects significance foreliminating the distributional modality gap. Moreover, we perceive the topologyinformation based on aligned representations to conduct bipartite graph pathreasoning. The newly achieved state-of-the-art performance on two publiclyavailable benchmark datasets, together with further visual analysis,demonstrate the superiority of TOT in capturing implicit cross-modal alignment.", "output": "TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In Task Oriented Dialogue (TOD) system, detecting and inducing new intentsare two main challenges to apply the system in the real world. In this paper,we suggest the semantic multi-view model to resolve these two challenges: (1)SBERT for General Embedding (GE), (2) Multi Domain Batch (MDB) for dialoguedomain knowledge, and (3) Proxy Gradient Transfer (PGT) for cluster-specializedsemantic. MDB feeds diverse dialogue datasets to the model at once to tacklethe multi-domain problem by learning the multiple domain knowledge. Weintroduce a novel method PGT, which employs the Siamese network to fine-tunethe model with a clustering method directly.Our model can learn how to clusterdialogue utterances by using PGT. Experimental results demonstrate that ourmulti-view model with MDB and PGT significantly improves the Open IntentInduction performance compared to baseline systems.", "output": "Multi-View Zero-Shot Open Intent Induction from Dialogues: Multi Domain Batch and Proxy Gradient Transfer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Scholarly publications are key to the transfer of knowledge from scholars toothers. However, research papers are information-dense, and as the volume ofthe scientific literature grows, the need for new technology to support thereading process grows. In contrast to the process of finding papers, which hasbeen transformed by Internet technology, the experience of reading researchpapers has changed little in decades. The PDF format for sharing researchpapers is widely used due to its portability, but it has significant downsidesincluding: static content, poor accessibility for low-vision readers, anddifficulty reading on mobile devices. This paper explores the question \"Canrecent advances in AI and HCI power intelligent, interactive, and accessiblereading interfaces -- even for legacy PDFs?\" We describe the Semantic ReaderProject, a collaborative effort across multiple institutions to exploreautomatic creation of dynamic reading interfaces for research papers. Throughthis project, we've developed ten research prototype interfaces and conductedusability studies with more than 300 participants and real-world users showingimproved reading experiences for scholars. We've also released a productionreading interface for research papers that will incorporate the best featuresas they mature. We structure this paper around challenges scholars and thepublic face when reading research papers -- Discovery, Efficiency,Comprehension, Synthesis, and Accessibility -- and present an overview of ourprogress and remaining open challenges.", "output": "The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language is essentially a complex, intricate system of human expressionsgoverned by grammatical rules. It poses a significant challenge to developcapable AI algorithms for comprehending and grasping a language. As a majorapproach, language modeling has been widely studied for language understandingand generation in the past two decades, evolving from statistical languagemodels to neural language models. Recently, pre-trained language models (PLMs)have been proposed by pre-training Transformer models over large-scale corpora,showing strong capabilities in solving various NLP tasks. Since researchershave found that model scaling can lead to performance improvement, they furtherstudy the scaling effect by increasing the model size to an even larger size.Interestingly, when the parameter scale exceeds a certain level, these enlargedlanguage models not only achieve a significant performance improvement but alsoshow some special abilities that are not present in small-scale languagemodels. To discriminate the difference in parameter scale, the researchcommunity has coined the term large language models (LLM) for the PLMs ofsignificant size. Recently, the research on LLMs has been largely advanced byboth academia and industry, and a remarkable progress is the launch of ChatGPT,which has attracted widespread attention from society. The technical evolutionof LLMs has been making an important impact on the entire AI community, whichwould revolutionize the way how we develop and use AI algorithms. In thissurvey, we review the recent advances of LLMs by introducing the background,key findings, and mainstream techniques. In particular, we focus on four majoraspects of LLMs, namely pre-training, adaptation tuning, utilization, andcapacity evaluation. Besides, we also summarize the available resources fordeveloping LLMs and discuss the remaining issues for future directions.", "output": "A Survey of Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper describes our submission to Task 10 at SemEval 2023-ExplainableDetection of Online Sexism (EDOS), divided into three subtasks. The recent risein social media platforms has seen an increase in disproportionate levels ofsexism experienced by women on social media platforms. This has made detectingand explaining online sexist content more important than ever to make socialmedia safer and more accessible for women. Our approach consists ofexperimenting and finetuning BERT-based models and using a Majority Votingensemble model that outperforms individual baseline model scores. Our systemachieves a macro F1 score of 0.8392 for Task A, 0.6092 for Task B, and 0.4319for Task C.", "output": "SSS at SemEval-2023 Task 10: Explainable Detection of Online Sexism using Majority Voted Fine-Tuned Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given their flexibility and encouraging performance, deep-learning models arebecoming standard for motion prediction in autonomous driving. However, withgreat flexibility comes a lack of interpretability and possible violations ofphysical constraints. Accompanying these data-driven methods withdifferentially-constrained motion models to provide physically feasibletrajectories is a promising future direction. The foundation for this work is apreviously introduced graph-neural-network-based model, MTP-GO. The neuralnetwork learns to compute the inputs to an underlying motion model to providephysically feasible trajectories. This research investigates the performance ofvarious motion models in combination with numerical solvers for the predictiontask. The study shows that simpler models, such as low-order integrator models,are preferred over more complex, e.g., kinematic models, to achieve accuratepredictions. Further, the numerical solver can have a substantial impact onperformance, advising against commonly used first-order methods like Eulerforward. Instead, a second-order method like Heun's can greatly improvepredictions.", "output": "Evaluation of Differentially Constrained Motion Models for Graph-Based Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Structured reconstruction is a non-trivial dense prediction problem, whichextracts structural information (eg, building corners and edges) from a rasterimage, then reconstructs it to a 2D planar graph accordingly. Compared withcommon segmentation or detection problems, it significantly relays on thecapability that leveraging holistic geometric information for structuralreasoning. Current transformer-based approaches tackle this challenging problemin a two-stage manner, which detect corners in the first model and classify theproposed edges (corner-pairs) in the second model. However, they separatetwo-stage into different models and only share the backbone encoder. Unlike theexisting modeling strategies, we present an enhanced corner representationmethod: 1) It fuses knowledge between the corner detection and edge predictionby sharing feature in different granularity; 2) Corner candidates are proposedin four heatmap channels w.r.t its direction. Both qualitative and quantitativeevaluations demonstrate that our proposed method can better reconstructfine-grained structures, such as adjacent corners and tiny edges. Consequently,it outperforms the state-of-the-art model by +1.9%@F-1 on Corner and+3.0%@F-1 on Edge.", "output": "CornerFormer: Boosting Corner Representation for Fine-Grained Structured Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Temporal facts, the facts for characterizing events that hold in specifictime periods, are attracting rising attention in the knowledge graph (KG)research communities. In terms of quality management, the introduction of timerestrictions brings new challenges to maintaining the temporal consistency ofKGs and detecting potential temporal conflicts. Previous studies rely onmanually enumerated temporal constraints to detect conflicts, which arelabor-intensive and may have granularity issues. We start from the commonpattern of temporal facts and constraints and propose a pattern-based temporalconstraint mining method, PaTeCon. PaTeCon uses automatically determined graphpatterns and their relevant statistical information over the given KG insteadof human experts to generate time constraints. Specifically, PaTeCondynamically attaches class restriction to candidate constraints according totheir measuring scores.We evaluate PaTeCon on two large-scale datasets based onWikidata and Freebase respectively. The experimental results show thatpattern-based automatic constraint mining is powerful in generating valuabletemporal constraints.", "output": "PaTeCon: A Pattern-Based Temporal Constraint Mining Method for Conflict Detection on Knowledge Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In populous countries, pending legal cases have been growing exponentially.There is a need for developing NLP-based techniques for processing andautomatically understanding legal documents. To promote research in the area ofLegal NLP we organized the shared task LegalEval - Understanding Legal Texts atSemEval 2023. LegalEval task has three sub-tasks: Task-A (Rhetorical RolesLabeling) is about automatically structuring legal documents into semanticallycoherent units, Task-B (Legal Named Entity Recognition) deals with identifyingrelevant entities in a legal document and Task-C (Court Judgement Predictionwith Explanation) explores the possibility of automatically predicting theoutcome of a legal case along with providing an explanation for the prediction.In total 26 teams (approx. 100 participants spread across the world) submittedsystems paper. In each of the sub-tasks, the proposed systems outperformed thebaselines; however, there is a lot of scope for improvement. This paperdescribes the tasks, and analyzes techniques proposed by various teams.", "output": "SemEval 2023 Task 6: LegalEval -- Understanding Legal Texts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While large language models (LLMs) have been successfully applied to varioustasks, they still face challenges with hallucinations and generating erroneouscontent. Augmenting LLMs with domain-specific tools such as database utilitieshas the potential to facilitate more precise and straightforward access tospecialized knowledge. In this paper, we present GeneGPT, a novel method forteaching LLMs to use the Web Application Programming Interfaces (APIs) of theNational Center for Biotechnology Information (NCBI) and answer genomicsquestions. Specifically, we prompt Codex (code-davinci-002) to solve theGeneTuring tests with few-shot URL requests of NCBI API calls as demonstrationsfor in-context learning. During inference, we stop the decoding once a callrequest is detected and make the API call with the generated URL. We thenappend the raw execution results returned by NCBI APIs to the generated textsand continue the generation until the answer is found or another API call isdetected. Our preliminary results show that GeneGPT achieves state-of-the-artresults on three out of four one-shot tasks and four out of five zero-shottasks in the GeneTuring dataset. Overall, GeneGPT achieves a macro-averagescore of 0.76, which is much higher than retrieval-augmented LLMs such as theNew Bing (0.44), biomedical LLMs such as BioMedLM (0.08) and BioGPT (0.04), aswell as other LLMs such as GPT-3 (0.16) and ChatGPT (0.12).", "output": "GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Languages are not created randomly but rather to communicate information.There is a strong association between languages and their underlying meanings,resulting in a sparse joint distribution that is heavily peaked according totheir correlations. Moreover, these peak values happen to match with themarginal distribution of languages due to the sparsity. With the advent of LLMstrained on big data and large models, we can now precisely assess the marginaldistribution of languages, providing a convenient means of exploring the sparsestructures in the joint distribution for effective inferences. In this paper,we categorize languages as either unambiguous or {epsilon}-ambiguous andpresent quantitative results to demonstrate that the emergent abilities ofLLMs, such as language understanding, in-context learning, chain-of-thoughtprompting, and effective instruction fine-tuning, can all be attributed toBayesian inference on the sparse joint distribution of languages.", "output": "A Latent Space Theory for Emergent Abilities in Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While deep reinforcement learning has shown important empirical success, ittends to learn relatively slow due to slow propagation of rewards informationand slow update of parametric neural networks. Non-parametric episodic memory,on the other hand, provides a faster learning alternative that does not requirerepresentation learning and uses maximum episodic return as state-action valuesfor action selection. Episodic memory and reinforcement learning both havetheir own strengths and weaknesses. Notably, humans can leverage multiplememory systems concurrently during learning and benefit from all of them. Inthis work, we propose a method called Two-Memory reinforcement learning agent(2M) that combines episodic memory and reinforcement learning to distill bothof their strengths. The 2M agent exploits the speed of the episodic memory partand the optimality and the generalization capacity of the reinforcementlearning part to complement each other. Our experiments demonstrate that the 2Magent is more data efficient and outperforms both pure episodic memory and purereinforcement learning, as well as a state-of-the-art memory-augmented RLagent. Moreover, the proposed approach provides a general framework that can beused to combine any episodic memory agent with other off-policy reinforcementlearning algorithms.", "output": "Two-Memory Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The release of ChatGPT has uncovered a range of possibilities whereby largelanguage models (LLMs) can substitute human intelligence. In this paper, weseek to understand whether ChatGPT has the potential to reproducehuman-generated label annotations in social computing tasks. Such anachievement could significantly reduce the cost and complexity of socialcomputing research. As such, we use ChatGPT to relabel five seminal datasetscovering stance detection (2x), sentiment analysis, hate speech, and botdetection. Our results highlight that ChatGPT does have the potential to handlethese data annotation tasks, although a number of challenges remain. ChatGPTobtains an average accuracy 0.609. Performance is highest for the sentimentanalysis dataset, with ChatGPT correctly annotating 64.9% of tweets. Yet, weshow that performance varies substantially across individual labels. We believethis work can open up new lines of analysis and act as a basis for futureresearch into the exploitation of ChatGPT for human annotation tasks.", "output": "Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts suchas floaters or flawed geometry when rendered outside the camera trajectory.Existing evaluation protocols often do not capture these effects, since theyusually only assess image quality at every 8th frame of the training capture.To push forward progress in novel-view synthesis, we propose a new dataset andevaluation procedure, where two camera trajectories are recorded of the scene:one used for training, and the other for evaluation. In this more challengingin-the-wild setting, we find that existing hand-crafted regularizers do notremove floaters nor improve scene geometry. Thus, we propose a 3Ddiffusion-based method that leverages local 3D priors and a novel density-basedscore distillation sampling loss to discourage artifacts during NeRFoptimization. We show that this data-driven prior removes floaters and improvesscene geometry for casual captures.", "output": "Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The transformer is a neural network component that can be used to learnuseful representations of sequences or sets of datapoints. The transformer hasdriven recent advances in natural language processing, computer vision, andspatio-temporal modelling. There are many introductions to transformers, butmost do not contain precise mathematical descriptions of the architecture andthe intuitions behind the design choices are often also missing. Moreover, asresearch takes a winding path, the explanations for the components of thetransformer can be idiosyncratic. In this note we aim for a mathematicallyprecise, intuitive, and clean description of the transformer architecture.", "output": "An Introduction to Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While Feedforward Neural Networks (FNNs) have achieved remarkable success invarious tasks, they are vulnerable to adversarial examples. Several techniqueshave been developed to verify the adversarial robustness of FNNs, but most ofthem focus on robustness verification against the local perturbationneighborhood of a single data point. There is still a large research gap inglobal robustness analysis. The global-robustness verifiable frameworkDeepGlobal has been proposed to identify textit{all} possible AdversarialDangerous Regions (ADRs) of FNNs, not limited to data samples in a test set. Inthis paper, we propose a complete specification and implementation ofDeepGlobal utilizing the SMT solver Z3 for more explicit definition, andpropose several improvements to DeepGlobal for more efficient verification. Toevaluate the effectiveness of our implementation and improvements, we conductextensive experiments on a set of benchmark datasets. Visualization of ourexperiment results shows the validity and effectiveness of the approach.", "output": "Using Z3 for Formal Modeling and Verification of FNN Global Robustness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Learning (FL) is a machine learning approach that enables thecreation of shared models for powerful applications while allowing data toremain on devices. This approach provides benefits such as improved dataprivacy, security, and reduced latency. However, in some systems, directcommunication between clients and servers may not be possible, such as remoteareas without proper communication infrastructure. To overcome this challenge,a new framework called FedEx (Federated Learning via Model Express Delivery) isproposed. This framework employs mobile transporters, such as UAVs, toestablish indirect communication channels between the server and clients. Thesetransporters act as intermediaries and allow for model information exchange.The use of indirect communication presents new challenges for convergenceanalysis and optimization, as the delay introduced by the transporters'movement creates issues for both global model dissemination and local modelcollection. To address this, two algorithms, FedEx-Sync and FedEx-Async, areproposed for synchronized and asynchronized learning at the transporter level.Additionally, a bi-level optimization algorithm is proposed to solve the jointclient assignment and route planning problem. Experimental validation using twopublic datasets in a simulated network demonstrates consistent results with thetheory, proving the efficacy of FedEx.", "output": "Joint Client Assignment and UAV Route Planning for Indirect-Communication Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Evaluating the relevance of an exogenous data series is the first step inimproving the prediction capabilities of a forecast algorithm. Inspired byexisting metrics for time series similarity, we introduce a new approach namedFARM - Forward Aligned Relevance Metric. Our forward method relies on anangular measure that compares changes in subsequent data points to aligntime-warped series in an efficient way. The proposed algorithm combines localand global measures to provide a balanced relevance metric. This results inconsidering also partial, intermediate matches as relevant indicators forexogenous data series significance. As a first validation step, we present theapplication of our FARM approach to synthetic but representative signals. Whiledemonstrating the improved capabilities with respect to existing approaches, wealso discuss existing constraints and limitations of our idea.", "output": "Exogenous Data in Forecasting: FARM -- A New Measure for Relevance Evaluation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we aim to develop a large language model (LLM) with thereasoning ability on complex graph data. Currently, LLMs have achieved veryimpressive performance on various natural language learning tasks, extensionsof which have also been applied to study the vision tasks with multi-modaldata. However, when it comes to the graph learning tasks, existing LLMs presentvery serious flaws due to their several inherited weaknesses in performing{multi-step logic reasoning}, {precise mathematical calculation} and{perception about the spatial and temporal factors}.To address such challenges, in this paper, we will investigate theprinciples, methodologies and algorithms to empower existing LLMs with graphreasoning ability, which will have tremendous impacts on the current researchof both LLMs and graph learning. Inspired by the latest ChatGPT and Toolformermodels, we propose the Graph-ToolFormer (Graph Reasoning oriented Toolformer)framework to teach LLMs themselves with prompts augmented by ChatGPT to useexternal graph reasoning API tools. Specifically, we will investigate to teachGraph-ToolFormer to handle various graph data reasoning tasks in this paper,including both (1) very basic graph data loading and graph property reasoningtasks, ranging from simple graph order and size to the graph diameter andperiphery, and (2) more advanced reasoning tasks on real-world graph data, suchas bibliographic networks, protein molecules, sequential recommender systems,social networks and knowledge graphs.To demonstrate the effectiveness of Graph-ToolFormer, we conduct somepreliminary experimental studies on various graph reasoning datasets and tasks,and will launch a LLM demo online with various graph reasoning abilities.", "output": "Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce altiro3D, a free extended library developed to represent realitystarting from a given original RGB image or flat video. It allows to generate alight-field (or Native) image or video and get a realistic 3D experience. Tosynthesize N-number of virtual images and add them sequentially into a Quiltcollage, we apply MiDaS models for the monocular depth estimation, simpleOpenCV and Telea inpainting techniques to map all pixels, and implement a'Fast' algorithm to handle 3D projection camera and scene transformations alongN-viewpoints. We use the degree of depth to move proportionally the pixels,assuming the original image to be at the center of all the viewpoints. altiro3Dcan also be used with DIBR algorithm to compute intermediate snapshots from aequivalent 'Real (slower)' camera with N-geometric viewpoints, which requiresto calibrate a priori several intrinsic and extrinsic camera parameters. Weadopt a pixel- and device-based Lookup Table to optimize computing time. Themultiple viewpoints and video generated from a single image or frame can bedisplayed in a free-view LCD display.", "output": "altiro3D: Scene representation from single image and novel view synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work presents a novel label-efficient selfsupervised representationlearning-based approach for classifying diabetic retinopathy (DR) images incross-domain settings. Most of the existing DR image classification methods arebased on supervised learning which requires a lot of time-consuming andexpensive medical domain experts-annotated data for training. The proposedapproach uses the prior learning from the source DR image dataset to classifyimages drawn from the target datasets. The image representations learned fromthe unlabeled source domain dataset through contrastive learning are used toclassify DR images from the target domain dataset. Moreover, the proposedapproach requires a few labeled images to perform successfully on DR imageclassification tasks in cross-domain settings. The proposed work experimentswith four publicly available datasets: EyePACS, APTOS 2019, MESSIDOR-I, andFundus Images for self-supervised representation learning-based DR imageclassification in cross-domain settings. The proposed method achievesstate-of-the-art results on binary and multiclassification of DR images, evenin cross-domain settings. The proposed method outperforms the existing DR imagebinary and multi-class classification methods proposed in the literature. Theproposed method is also validated qualitatively using class activation maps,revealing that the method can learn explainable image representations. Thesource code and trained models are published on GitHub.", "output": "Learning Self-Supervised Representations for Label Efficient Cross-Domain Knowledge Transfer on Diabetic Retinopathy Fundus Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we explore the impact of adding tactile sensation to videoprediction models for physical robot interactions. Predicting the impact ofrobotic actions on the environment is a fundamental challenge in robotics.Current methods leverage visual and robot action data to generate videopredictions over a given time period, which can then be used to adjust robotactions. However, humans rely on both visual and tactile feedback to developand maintain a mental model of their physical surroundings. In this paper, weinvestigate the impact of integrating tactile feedback into video predictionmodels for physical robot interactions. We propose three multi-modalintegration approaches and compare the performance of these tactile-enhancedvideo prediction models. Additionally, we introduce two new datasets of robotpushing that use a magnetic-based tactile sensor for unsupervised learning. Thefirst dataset contains visually identical objects with different physicalproperties, while the second dataset mimics existing robot-pushing datasets ofhousehold object clusters. Our results demonstrate that incorporating tactilefeedback into video prediction models improves scene prediction accuracy andenhances the agent's perception of physical interactions and understanding ofcause-effect relationships during physical robot interactions.", "output": "Combining Vision and Tactile Sensation for Video Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-task learning has shown considerable promise for improving theperformance of deep learning-driven vision systems for the purpose of roboticgrasping. However, high architectural and computational complexity can resultin poor suitability for deployment on embedded devices that are typicallyleveraged in robotic arms for real-world manufacturing and warehouseenvironments. As such, the design of highly efficient multi-task deep neuralnetwork architectures tailored for computer vision tasks for robotic graspingon the edge is highly desired for widespread adoption in manufacturingenvironments. Motivated by this, we propose Fast GraspNeXt, a fastself-attention neural network architecture tailored for embedded multi-tasklearning in computer vision tasks for robotic grasping. To build FastGraspNeXt, we leverage a generative network architecture search strategy with aset of architectural constraints customized to achieve a strong balance betweenmulti-task learning performance and embedded inference efficiency. Experimentalresults on the MetaGraspNet benchmark dataset show that the Fast GraspNeXtnetwork design achieves the highest performance (average precision (AP),accuracy, and mean squared error (MSE)) across multiple computer vision taskswhen compared to other efficient multi-task network architecture designs, whilehaving only 17.8M parameters (about >5x smaller), 259 GFLOPs (as much as >5xlower) and as much as >3.15x faster on a NVIDIA Jetson TX2 embedded processor.", "output": "Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present SSS3D, a fast multi-objective NAS framework designed to findcomputationally efficient 3D semantic scene segmentation networks. It usesRandLA-Net, an off-the-shelf point-based network, as a super-network to enableweight sharing and reduce search time by 99.67% for single-stage searches.SSS3D has a complex search space composed of sampling and architecturalparameters that can form 2.88 * 10^17 possible networks. To further reducesearch time, SSS3D splits the complete search space and introduces a two-stagesearch that finds optimal subnetworks in 54% of the time required bysingle-stage searches.", "output": "SSS3D: Fast Neural Architecture Search For Efficient Three-Dimensional Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concealed scene understanding (CSU) is a hot computer vision topic aiming toperceive objects with camouflaged properties. The current boom in its advancedtechniques and novel applications makes it timely to provide an up-to-datesurvey to enable researchers to understand the global picture of the CSU field,including both current achievements and major challenges. This paper makes fourcontributions: (1) For the first time, we present a comprehensive survey of thedeep learning techniques oriented at CSU, including a background with itstaxonomy, task-unique challenges, and a review of its developments in the deeplearning era via surveying existing datasets and deep techniques. (2) For aquantitative comparison of the state-of-the-art, we contribute the largest andlatest benchmark for Concealed Object Segmentation (COS). (3) To evaluate thetransferability of deep CSU in practical scenarios, we re-organize the largestconcealed defect segmentation dataset termed CDS2K with the hard cases fromdiversified industrial scenarios, on which we construct a comprehensivebenchmark. (4) We discuss open problems and potential research directions forthis community. Our code and datasets are available at which will be updated continuously to watchand summarize the advancements in this rapidly evolving field.", "output": "Advances in Deep Concealed Scene Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Purpose: The aim of this work is to introduce a single model-based deepnetwork that can provide high-quality reconstructions from undersampledparallel MRI data acquired with multiple sequences, acquisition settings andfield strengths.Methods: A single unrolled architecture, which offers good reconstructionsfor multiple acquisition settings, is introduced. The proposed scheme adaptsthe model to each setting by scaling the CNN features and the regularizationparameter with appropriate weights. The scaling weights and regularizationparameter are derived using a multi-layer perceptron model from conditionalvectors, which represents the specific acquisition setting. The perceptronparameters and the CNN weights are jointly trained using data from multipleacquisition settings, including differences in field strengths, acceleration,and contrasts. The conditional network is validated using datasets acquiredwith different acquisition settings.Results: The comparison of the adaptive framework, which trains a singlemodel using the data from all the settings, shows that it can offerconsistently improved performance for each acquisition condition. Thecomparison of the proposed scheme with networks that are trained independentlyfor each acquisition setting shows that it requires less training data peracquisition setting to offer good performance.Conclusion: The Ada-MoDL framework enables the use of a single model-basedunrolled network for multiple acquisition settings. In addition to eliminatingthe need to train and store multiple networks for different acquisitionsettings, this approach reduces the training data needed for each acquisitionsetting.", "output": "Adapting model-based deep learning to multiple acquisition conditions: Ada-MoDL."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Implicit representations such as Neural Radiance Fields (NeRF) have beenshown to be very effective at novel view synthesis. However, these modelstypically require manual and careful human data collection for training. Inthis paper, we present AutoNeRF, a method to collect data required to trainNeRFs using autonomous embodied agents. Our method allows an agent to explorean unseen environment efficiently and use the experience to build an implicitmap representation autonomously. We compare the impact of different explorationstrategies including handcrafted frontier-based exploration and modularapproaches composed of trained high-level planners and classical low-level pathfollowers. We train these models with different reward functions tailored tothis problem and evaluate the quality of the learned representations on fourdifferent downstream tasks: classical viewpoint rendering, map reconstruction,planning, and pose refinement. Empirical results show that NeRFs can be trainedon actively collected data using just a single episode of experience in anunseen environment, and can be used for several downstream robotic tasks, andthat modular trained exploration models significantly outperform the classicalbaselines.", "output": "AutoNeRF: Training Implicit Scene Representations with Autonomous Agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Maritime obstacle detection is critical for safe navigation of autonomoussurface vehicles (ASVs). While the accuracy of image-based detection methodshas advanced substantially, their computational and memory requirementsprohibit deployment on embedded devices. In this paper we analyze the currentlybest-performing maritime obstacle detection network WaSR. Based on the analysiswe then propose replacements for the most computationally intensive stages andpropose its embedded-compute-ready variant eWaSR. In particular, the new designfollows the most recent advancements of transformer-based lightweight networks.eWaSR achieves comparable detection results to state-of-the-art WaSR with only0.52% F1 score performance drop and outperforms other state-of-the-artembedded-ready architectures by over 9.74% in F1 score. On a standard GPU,eWaSR runs 10x faster than the original WaSR (115 FPS vs 11 FPS). Tests on areal embedded device OAK-D show that, while WaSR cannot run due to memoryrestrictions, eWaSR runs comfortably at 5.5 FPS. This makes eWaSR the firstpractical embedded-compute-ready maritime obstacle detection network. Thesource code and trained eWaSR models are publicly available here:", "output": "eWaSR -- an embedded-compute-ready maritime obstacle detection network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robustness to natural distribution shifts has seen remarkable progress thanksto recent pre-training strategies combined with better fine-tuning methods.However, such fine-tuning assumes access to large amounts of labelled data, andthe extent to which the observations hold when the amount of training data isnot as high remains unknown. We address this gap by performing the firstin-depth study of robustness to various natural distribution shifts indifferent low-shot regimes: spanning datasets, architectures, pre-trainedinitializations, and state-of-the-art robustness interventions. Mostimportantly, we find that there is no single model of choice that is often morerobust than others, and existing interventions can fail to improve robustnesson some datasets even if they do so in the full-shot regime. We hope that ourwork will motivate the community to focus on this problem of practicalimportance.", "output": "Benchmarking Low-Shot Robustness to Natural Distribution Shifts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rapid development and application of foundation models haverevolutionized the field of artificial intelligence. Large diffusion modelshave gained significant attention for their ability to generate photorealisticimages and support various tasks. On-device deployment of these models providesbenefits such as lower server costs, offline functionality, and improved userprivacy. However, common large diffusion models have over 1 billion parametersand pose challenges due to restricted computational and memory resources ondevices. We present a series of implementation optimizations for largediffusion models that achieve the fastest reported inference latency to-date(under 12 seconds for Stable Diffusion 1.4 without int8 quantization on SamsungS23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobiledevices. These enhancements broaden the applicability of generative AI andimprove the overall user experience across a wide range of devices.", "output": "Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-label image recognition aims to predict a set of labels that present inan image. The key to deal with such problem is to mine the associations betweenimage contents and labels, and further obtain the correct assignments betweenimages and their labels. In this paper, we treat each image as a bag ofinstances, and formulate the task of multi-label image recognition as aninstance-label matching selection problem. To model such problem, we propose aninnovative Semantic-aware Graph Matching framework for Multi-Label imagerecognition (ML-SGM), in which Graph Matching mechanism is introduced owing toits good performance of excavating the instance and label relationship. Theframework explicitly establishes category correlations and instance-labelcorrespondences by modeling the relation among content-aware (instance) andsemantic-aware (label) category representations, to facilitate multi-labelimage understanding and reduce the dependency of large amounts of trainingsamples for each category. Specifically, we first construct an instance spatialgraph and a label semantic graph respectively and then incorporate them into aconstructed assignment graph by connecting each instance to all labels.Subsequently, the graph network block is adopted to aggregate and update allnodes and edges state on the assignment graph to form structuredrepresentations for each instance and label. Our network finally derives aprediction score for each instance-label correspondence and optimizes suchcorrespondence with a weighted cross-entropy loss. Empirical results conductedon generic multi-label image recognition demonstrate the superiority of ourproposed method. Moreover, the proposed method also shows advantages inmulti-label recognition with partial labels and multi-label few-shot learning,as well as outperforms current state-of-the-art methods with a clear margin.", "output": "Semantic-Aware Graph Matching Mechanism for Multi-Label Image Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a novel approach for visible-thermal infraredstereoscopy, focusing on the estimation of disparities of human silhouettes.Visible-thermal infrared stereo poses several challenges, including occlusionsand differently textured matching regions in both spectra. Finding matchesbetween two spectra with varying colors, textures, and shapes adds furthercomplexity to the task. To address the aforementioned challenges, this paperproposes a novel approach where a high-resolution convolutional neural networkis used to better capture relationships between the two spectra. To do so, amodified HRNet backbone is used for feature extraction. This HRNet backbone iscapable of capturing fine details and textures as it extracts features atmultiple scales, thereby enabling the utilization of both local and globalinformation. For matching visible and thermal infrared regions, our methodextracts features on each patch using two modified HRNet streams. Features fromthe two streams are then combined for predicting the disparities byconcatenation and correlation. Results on public datasets demonstrate theeffectiveness of the proposed approach by improving the results byapproximately 18 percentage points on the $leq$ 1 pixel error, highlightingits potential for improving accuracy in this task. The code of VisiTherS isavailable on GitHub at the following link", "output": "VisiTherS: Visible-thermal infrared stereo disparity estimation of human silhouette."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increased accuracy of modern computer vision technology, many accesscontrol systems are equipped with face recognition functions for fasteridentification. In order to maintain high recognition accuracy, it is necessaryto keep the face database up-to-date. However, it is impractical to collect thelatest facial picture of the system's user through human effort. Thus, wepropose a bottom-up training method for our proposed network to address thischallenge. Essentially, our proposed network is a translation pipeline thatcascades two CycleGAN blocks (a widely used unpaired image-to-image translationgenerative adversarial network) called BiTrackGAN. By bottom-up training, itinduces an ideal intermediate state between these two CycleGAN blocks, namelythe constraint mechanism. Experimental results show that BiTrackGAN achievesmore reasonable and diverse cross-age facial synthesis than otherCycleGAN-related methods. As far as we know, it is a novel and effectiveconstraint mechanism for more reason and accurate aging synthesis through theCycleGAN approach.", "output": "BiTrackGAN: Cascaded CycleGANs to Constraint Face Aging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing image-to-image(I2I) translation methods achieve state-of-the-artperformance by incorporating the patch-wise contrastive learning intoGenerative Adversarial Networks. However, patch-wise contrastive learning onlyfocuses on the local content similarity but neglects the global structureconstraint, which affects the quality of the generated images. In this paper,we propose a new unpaired I2I translation framework based on dual contrastiveregularization and spectral normalization, namely SN-DCR. To maintainconsistency of the global structure and texture, we design the dual contrastiveregularization using different feature spaces respectively. In order to improvethe global structure information of the generated images, we formulate asemantically contrastive loss to make the global semantic structure of thegenerated images similar to the real images from the target domain in thesemantic feature space. We use Gram Matrices to extract the style of texturefrom images. Similarly, we design style contrastive loss to improve the globaltexture information of the generated images. Moreover, to enhance the stabilityof model, we employ the spectral normalized convolutional network in the designof our generator. We conduct the comprehensive experiments to evaluate theeffectiveness of SN-DCR, and the results prove that our method achieves SOTA inmultiple tasks.", "output": "Spectral normalized dual contrastive regularization for image-to-image translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperspectral unmixing is a critical yet challenging task in hyperspectralimage interpretation. Recently, great efforts have been made to solve thehyperspectral unmixing task via deep autoencoders. However, existing networksmainly focus on extracting spectral features from mixed pixels, and theemployment of spatial feature prior knowledge is still insufficient. To thisend, we put forward a spatial attention weighted unmixing network, dubbed asSAWU-Net, which learns a spatial attention network and a weighted unmixingnetwork in an end-to-end manner for better spatial feature exploitation. Inparticular, we design a spatial attention module, which consists of a pixelattention block and a window attention block to efficiently model pixel-basedspectral information and patch-based spatial information, respectively. Whilein the weighted unmixing framework, the central pixel abundance is dynamicallyweighted by the coarse-grained abundances of surrounding pixels. In addition,SAWU-Net generates dynamically adaptive spatial weights through the spatialattention mechanism, so as to dynamically integrate surrounding pixels moreeffectively. Experimental results on real and synthetic datasets demonstratethe better accuracy and superiority of SAWU-Net, which reflects theeffectiveness of the proposed spatial attention mechanism.", "output": "SAWU-Net: Spatial Attention Weighted Unmixing Network for Hyperspectral Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present view-synthesis autoencoders (VSA) in this paper, which is aself-supervised learning framework designed for vision transformers. Differentfrom traditional 2D pretraining methods, VSA can be pre-trained with multi-viewdata. In each iteration, the input to VSA is one view (or multiple views) of a3D object and the output is a synthesized image in another target pose. Thedecoder of VSA has several cross-attention blocks, which use the source view asvalue, source pose as key, and target pose as query. They achievecross-attention to synthesize the target view. This simple approach realizeslarge-angle view synthesis and learns spatial invariant representation, wherethe latter is decent initialization for transformers on downstream tasks, suchas 3D classification on ModelNet40, ShapeNet Core55, and ScanObjectNN. VSAoutperforms existing methods significantly for linear probing and iscompetitive for fine-tuning. The code will be made publicly available.", "output": "Self-supervised Learning by View Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Segment Anything Model (SAM) is a recently developed large model forgeneral-purpose segmentation for computer vision tasks. SAM was trained using11 million images with over 1 billion masks and can produce segmentationresults for a wide range of objects in natural scene images. SAM can be viewedas a general perception model for segmentation (partitioning images intosemantically meaningful regions). Thus, how to utilize such a large foundationmodel for medical image segmentation is an emerging research target. This papershows that although SAM does not immediately give high-quality segmentation formedical images, its generated masks, features, and stability scores are usefulfor building and training better medical image segmentation models. Inparticular, we demonstrate how to use SAM to augment image inputs for acommonly-used medical image segmentation model (e.g., U-Net). Experiments ontwo datasets show the effectiveness of our proposed method.", "output": "Input Augmentation with SAM: Boosting Medical Image Segmentation with Segmentation Foundation Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current arbitrary style transfer models are limited to either image or videodomains. In order to achieve satisfying image and video style transfers, twodifferent models are inevitably required with separate training processes onimage and video domains, respectively. In this paper, we show that this can beprecluded by introducing UniST, a Unified Style Transfer framework for bothimages and videos. At the core of UniST is a domain interaction transformer(DIT), which first explores context information within the specific domain andthen interacts contextualized domain information for joint learning. Inparticular, DIT enables exploration of temporal information from videos for theimage style transfer task and meanwhile allows rich appearance texture fromimages for video style transfer, thus leading to mutual benefits. Consideringheavy computation of traditional multi-head self-attention, we present a simpleyet effective axial multi-head self-attention (AMSA) for DIT, which improvescomputational efficiency while maintains style transfer performance. To verifythe effectiveness of UniST, we conduct extensive experiments on both image andvideo style transfer tasks and show that UniST performs favorably againststate-of-the-art approaches on both tasks. Our code and results will bereleased.", "output": "Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D representation disentanglement aims to identify, decompose, and manipulatethe underlying explanatory factors of 3D data, which helps AI fundamentallyunderstand our 3D world. This task is currently under-explored and poses greatchallenges: (i) the 3D representations are complex and in general contains muchmore information than 2D image; (ii) many 3D representations are not wellsuited for gradient-based optimization, let alone disentanglement. To addressthese challenges, we use NeRF as a differentiable 3D representation, andintroduce a self-supervised Navigation to identify interpretable semanticdirections in the latent space. To our best knowledge, this novel method,dubbed NaviNeRF, is the first work to achieve fine-grained 3D disentanglementwithout any priors or supervisions. Specifically, NaviNeRF is built upon thegenerative NeRF pipeline, and equipped with an Outer Navigation Branch and anInner Refinement Branch. They are complementary -- the outer navigation is toidentify global-view semantic directions, and the inner refinement dedicates tofine-grained attributes. A synergistic loss is further devised to coordinatetwo branches. Extensive experiments demonstrate that NaviNeRF has a superiorfine-grained 3D disentanglement ability than the previous 3D-aware models. Itsperformance is also comparable to editing-oriented models relying on semanticor geometry priors.", "output": "NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Airborne particles are the medium for SARS-CoV-2 to invade the human body.Light also reflects through suspended particles in the air, allowing people tosee a colorful world. Impressionism is the most prominent art school thatexplores the spectrum of color created through color reflection of light. Wefind similarities of color structure and color stacking in the Impressionistpaintings and the illustrations of the novel coronavirus by artists around theworld. With computerized data analysis through the main tones, the way of colorlayout, and the way of color stacking in the paintings of the Impressionists,we train computers to draw the novel coronavirus in an Impressionist styleusing a Generative Adversarial Network to create our artwork \"Medium.Permeation\". This artwork is composed of 196 randomly generated viral picturesarranged in a 14 by 14 matrix to form a large-scale painting. In addition, wehave developed an extended work: Gradual Change, which is presented as videoart. We use Graph Neural Network to present 196 paintings of the newcoronavirus to the audience one by one in a gradual manner. In front of LED TVscreen, audience will find 196 virus paintings whose colors will changecontinuously. This large video painting symbolizes that worldwide 196 countrieshave been invaded by the epidemic, and every nation continuously pops up mutantviruses. The speed of vaccine development cannot keep up with the speed ofvirus mutation. This is also the first generative art in the world based on thecommon features and a metaphorical symbiosis between Impressionist art and thenovel coronavirus. This work warns us of the unprecedented challenges posed bythe SARS-CoV-2, implying that the world should not ignore the invisible enemywho uses air as a medium.", "output": "Medium. Permeation: SARS-COV-2 Painting Creation by Generative Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work studies the multi-human parsing problem. Existing methods, eitherfollowing top-down or bottom-up two-stage paradigms, usually involve expensivecomputational costs. We instead present a high-performance Single-stageMulti-human Parsing (SMP) deep architecture that decouples the multi-humanparsing problem into two fine-grained sub-problems, i.e., locating the humanbody and parts. SMP leverages the point features in the barycenter positions toobtain their segmentation and then generates a series of offsets from thebarycenter of the human body to the barycenters of parts, thus performing humanbody and parts matching without the grouping process. Within the SMParchitecture, we propose a Refined Feature Retain module to extract the globalfeature of instances through generated mask attention and a Mask of InterestReclassify module as a trainable plug-in module to refine the classificationresults with the predicted segmentation. Extensive experiments on the MHPv2.0dataset demonstrate the best effectiveness and efficiency of the proposedmethod, surpassing the state-of-the-art method by 2.1% in AP50p, 1.0% inAPvolp, and 1.2% in PCP50. In particular, the proposed method requires fewertraining epochs and a less complex model architecture. We will release oursource codes, pretrained models, and online demos to facilitate furtherstudies.", "output": "Single-stage Multi-human Parsing via Point Sets and Center-based Offsets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attacks aim to disturb the functionality of a target system byadding specific noise to the input samples, bringing potential threats tosecurity and robustness when applied to facial recognition systems. Althoughexisting defense techniques achieve high accuracy in detecting some specificadversarial faces (adv-faces), new attack methods especially GAN-based attackswith completely different noise patterns circumvent them and reach a higherattack success rate. Even worse, existing techniques require attack data beforeimplementing the defense, making it impractical to defend newly emergingattacks that are unseen to defenders. In this paper, we investigate theintrinsic generality of adv-faces and propose to generate pseudo adv-faces byperturbing real faces with three heuristically designed noise patterns. We arethe first to train an adv-face detector using only real faces and theirself-perturbations, agnostic to victim facial recognition systems, and agnosticto unseen attacks. By regarding adv-faces as out-of-distribution data, we thennaturally introduce a novel cascaded system for adv-face detection, whichconsists of training data self-perturbations, decision boundary regularization,and a max-pooling-based binary classifier focusing on abnormal local coloraberrations. Experiments conducted on LFW and CelebA-HQ datasets with eightgradient-based and two GAN-based attacks validate that our method generalizesto a variety of unseen adversarial attacks.", "output": "Detecting Adversarial Faces Using Only Real Face Self-Perturbations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While unsupervised change detection using contrastive learning has beensignificantly improved the performance of literature techniques, at present, itonly focuses on the bi-temporal change detection scenario. Previousstate-of-the-art models for image time-series change detection often usefeatures obtained by learning for clustering or training a model from scratchusing pseudo labels tailored to each scene. However, these approaches fail toexploit the spatial-temporal information of image time-series or generalize tounseen scenarios. In this work, we propose a two-stage approach to unsupervisedchange detection in satellite image time-series using contrastive learning withfeature tracking. By deriving pseudo labels from pre-trained models and usingfeature tracking to propagate them among the image time-series, we improve theconsistency of our pseudo labels and address the challenges of seasonal changesin long-term remote sensing image time-series. We adopt the self-trainingalgorithm with ConvLSTM on the obtained pseudo labels, where we first usesupervised contrastive loss and contrastive random walks to further improve thefeature correspondence in space-time. Then a fully connected layer isfine-tuned on the pre-trained multi-temporal features for generating the finalchange maps. Through comprehensive experiments on two datasets, we demonstrateconsistent improvements in accuracy on fitting and inference scenarios.", "output": "Unsupervised CD in satellite image time series by contrastive learning and feature tracking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic map construction under bird's-eye view (BEV) plays an essential rolein autonomous driving. In contrast to camera image, LiDAR provides the accurate3D observations to project the captured 3D features onto BEV space inherently.However, the vanilla LiDAR-based BEV feature often contains many indefinitenoises, where the spatial features have little texture and semantic cues. Inthis paper, we propose an effective LiDAR-based method to build semantic map.Specifically, we introduce a BEV pyramid feature decoder that learns the robustmulti-scale BEV features for semantic map construction, which greatly booststhe accuracy of the LiDAR-based method. To mitigate the defects caused bylacking semantic cues in LiDAR data, we present an online Camera-to-LiDARdistillation scheme to facilitate the semantic learning from image to pointcloud. Our distillation scheme consists of feature-level and logit-leveldistillation to absorb the semantic information from camera in BEV. Theexperimental results on challenging nuScenes dataset demonstrate the efficacyof our proposed LiDAR2Map on semantic map construction, which significantlyoutperforms the previous LiDAR-based methods over 27.9% mIoU and even performsbetter than the state-of-the-art camera-based approaches. Source code isavailable at: ", "output": "LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The mechanism of connecting multimodal signals through self-attentionoperation is a key factor in the success of multimodal Transformer networks inremote sensing data fusion tasks. However, traditional approaches assume accessto all modalities during both training and inference, which can lead to severedegradation when dealing with modal-incomplete inputs in downstreamapplications. To address this limitation, our proposed approach introduces anovel model for incomplete multimodal learning in the context of remote sensingdata fusion. This approach can be used in both supervised and self-supervisedpretraining paradigms and leverages the additional learned fusion tokens incombination with Bi-LSTM attention and masked self-attention mechanisms tocollect multimodal signals. The proposed approach employs reconstruction andcontrastive loss to facilitate fusion in pre-training while allowing for randommodality combinations as inputs in network training. Our approach deliversstate-of-the-art performance on two multimodal datasets for tasks such asbuilding instance / semantic segmentation and land-cover mapping tasks whendealing with incomplete inputs during inference.", "output": "Incomplete Multimodal Learning for Remote Sensing Data Fusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "LiDAR point cloud segmentation is one of the most fundamental tasks forautonomous driving scene understanding. However, it is difficult for existingmodels to achieve both high inference speed and accuracy simultaneously. Forexample, voxel-based methods perform well in accuracy, while Bird's-Eye-View(BEV)-based methods can achieve real-time inference. To overcome this issue, wedevelop an effective 3D-to-BEV knowledge distillation method that transfersrich knowledge from 3D voxel-based models to BEV-based models. Our frameworkmainly consists of two modules: the voxel-to-pillar distillation module and thelabel-weight distillation module. Voxel-to-pillar distillation distills sparse3D features to BEV features for middle layers to make the BEV-based model awareof more structural and geometric information. Label-weight distillation helpsthe model pay more attention to regions with more height information. Finally,we conduct experiments on the SemanticKITTI dataset and Paris-Lille-3D. Theresults on SemanticKITTI show more than 5% improvement on the test set,especially for classes such as motorcycle and person, with more than 15%improvement. The code can be accessed at", "output": "Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fast and accurate MRI reconstruction is a key concern in modern clinicalpractice. Recently, numerous Deep-Learning methods have been proposed for MRIreconstruction, however, they usually fail to reconstruct sharp details fromthe subsampled k-space data. To solve this problem, we propose a lightweightand accurate Edge Attention MRI Reconstruction Network (EAMRI) to reconstructimages with edge guidance. Specifically, we design an efficient Edge PredictionNetwork to directly predict accurate edges from the blurred image. Meanwhile,we propose a novel Edge Attention Module (EAM) to guide the imagereconstruction utilizing the extracted edge priors, as inspired by the popularself-attention mechanism. EAM first projects the input image and edges intoQ_image, K_edge, and V_image, respectively. Then EAM pairs the Q_image withK_edge along the channel dimension, such that 1) it can search globally for thehigh-frequency image features that are activated by the edge priors; 2) theoverall computation burdens are largely reduced compared with the traditionalspatial-wise attention. With the help of EAM, the predicted edge priors caneffectively guide the model to reconstruct high-quality MR images with accurateedges. Extensive experiments show that our proposed EAMRI outperforms othermethods with fewer parameters and can recover more accurate edges.", "output": "Fast MRI Reconstruction via Edge Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, synthetic aperture radar (SAR) image change detection has become aninteresting yet challenging direction due to the presence of speckle noise.Although both traditional and modern learning-driven methods attempted toovercome this challenge, deep convolutional neural networks (DCNNs)-basedmethods are still hindered by the lack of interpretability and the requirementof large computation power. To overcome this drawback, wavelet scatteringnetwork (WSN) and Fourier scattering network (FSN) are proposed. Combiningrespective merits of WSN and FSN, we propose Stockwell scattering network (SSN)based on Stockwell transform which is widely applied against noisy signals andshows advantageous characteristics in speckle reduction. The proposed SSNprovides noise-resilient feature representation and obtains state-of-artperformance in SAR image change detection as well as high computationalefficiency. Experimental results on three real SAR image datasets demonstratethe effectiveness of the proposed method.", "output": "SSN: Stockwell Scattering Network for SAR Image Change Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Image Prior (DIP) shows that some network architectures naturally biastowards smooth images and resist noises, a phenomenon known as spectral bias.Image denoising is an immediate application of this property. Although DIP hasremoved the requirement of large training sets, it still presents two practicalchallenges for denoising: architectural design and noise-fitting, which areoften intertwined. Existing methods mostly handcraft or search for thearchitecture from a large design space, due to the lack of understanding on howthe architectural choice corresponds to the image. In this study, we analyzefrom a frequency perspective to demonstrate that the unlearnt upsampling is themain driving force behind the denoising phenomenon in DIP. This finding thenleads to strategies for estimating a suitable architecture for every imagewithout a laborious search. Extensive experiments show that the estimatedarchitectures denoise and preserve the textural details better than currentmethods with up to 95% fewer parameters. The under-parameterized nature alsomakes them especially robust to a higher level of noise.", "output": "The Devil is in the Upsampling: Architectural Decisions Made Simpler for Denoising with Deep Image Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As an important task in remote sensing image analysis, remote sensing changedetection (RSCD) aims to identify changes of interest in a region fromspatially co-registered multi-temporal remote sensing images, so as to monitorthe local development. Existing RSCD methods usually formulate RSCD as a binaryclassification task, representing changes of interest by merely featureconcatenation or feature subtraction and recovering the spatial details viadensely connected change representations, whose performances need furtherimprovement. In this paper, we propose STNet, a RSCD network based on spatialand temporal feature fusions. Specifically, we design a temporal feature fusion(TFF) module to combine bi-temporal features using a cross-temporal gatingmechanism for emphasizing changes of interest; a spatial feature fusion moduleis deployed to capture fine-grained information using a cross-scale attentionmechanism for recovering the spatial details of change representations.Experimental results on three benchmark datasets for RSCD demonstrate that theproposed method achieves the state-of-the-art performance. Code is available at", "output": "STNet: Spatial and Temporal feature fusion network for change detection in remote sensing images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial attention mechanism has been widely used in semantic segmentation ofremote sensing images given its capability to model long-range dependencies.Many methods adopting spatial attention mechanism aggregate contextualinformation using direct relationships between pixels within an image, whileignoring the scene awareness of pixels (i.e., being aware of the global contextof the scene where the pixels are located and perceiving their relativepositions). Given the observation that scene awareness benefits contextmodeling with spatial correlations of ground objects, we design a scene-awareattention module based on a refined spatial attention mechanism embedding sceneawareness. Besides, we present a local-global class attention mechanism toaddress the problem that general attention mechanism introduces excessivebackground noises while hardly considering the large intra-class variance inremote sensing images. In this paper, we integrate both scene-aware and classattentions to propose a scene-aware class attention network (SACANet) forsemantic segmentation of remote sensing images. Experimental results on threedatasets show that SACANet outperforms other state-of-the-art methods andvalidate its effectiveness. Code is available at", "output": "SACANet: scene-aware class attention network for semantic segmentation of remote sensing images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video captioning (VC) is a fast-moving, cross-disciplinary area of researchthat bridges work in the fields of computer vision, natural language processing(NLP), linguistics, and human-computer interaction. In essence, VC involvesunderstanding a video and describing it with language. Captioning is used in ahost of applications from creating more accessible interfaces (e.g., low-visionnavigation) to video question answering (V-QA), video retrieval and contentgeneration. This survey covers deep learning-based VC, including but, notlimited to, attention-based architectures, graph networks, reinforcementlearning, adversarial networks, dense video captioning (DVC), and more. Wediscuss the datasets and evaluation metrics used in the field, and limitations,applications, challenges, and future directions for VC.", "output": "A Review of Deep Learning for Video Captioning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Variability in staining protocols, such as different slide preparationtechniques, chemicals, and scanner configurations, can result in a diverse setof whole slide images (WSIs). This distribution shift can negatively impact theperformance of deep learning models on unseen samples, presenting a significantchallenge for developing new computational pathology applications. In thisstudy, we propose a method for improving the generalizability of convolutionalneural networks (CNNs) to stain changes in a single-source setting for semanticsegmentation. Recent studies indicate that style features mainly exist ascovariances in earlier network layers. We design a channel attention mechanismbased on these findings that detects stain-specific features and modify thepreviously proposed stain-invariant training scheme. We reweigh the outputs ofearlier layers and pass them to the stain-adversarial training branch. Weevaluate our method on multi-center, multi-stain datasets and demonstrate itseffectiveness through interpretability analysis. Our approach achievessubstantial improvements over baselines and competitive performance compared toother methods, as measured by various evaluation metrics. We also show thatcombining our method with stain augmentation leads to mutually beneficialresults and outperforms other techniques. Overall, our study makes significantcontributions to the field of computational pathology.", "output": "Improving Stain Invariance of CNNs for Segmentation by Fusing Channel Attention and Domain-Adversarial Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Denoising diffusion probabilistic models (DDPMs) are a class of powerfulgenerative models. The past few years have witnessed the great success of DDPMsin generating high-fidelity samples. A significant limitation of the DDPMs isthe slow sampling procedure. DDPMs generally need hundreds or thousands ofsequential function evaluations (steps) of neural networks to generate asample. This paper aims to develop a fast sampling method for DDPMs requiringmuch fewer steps while retaining high sample quality. The inference process ofDDPMs approximates solving the corresponding diffusion ordinary differentialequations (diffusion ODEs) in the continuous limit. This work analyzes how thebackward error affects the diffusion ODEs and the sample quality in DDPMs. Wepropose fast sampling through the textbf{Restricting Backward Error schedule(RBE schedule)} based on dynamically moderating the long-time backward error.Our method accelerates DDPMs without any further training. Our experiments showthat sampling with an RBE schedule generates high-quality samples within only 8to 20 function evaluations on various benchmark datasets. We achieved 12.01 FIDin 8 function evaluations on the ImageNet $128times128$, and a $20times$speedup compared with previous baseline samplers.", "output": "Fast Diffusion Probabilistic Model Sampling through the lens of Backward Error Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Radiance Field (NeRF) has received much attention in recent years dueto the impressively high quality in 3D scene reconstruction and novel viewsynthesis. However, image degradation caused by the scattering of atmosphericlight and object light by particles in the atmosphere can significantlydecrease the reconstruction quality when shooting scenes in hazy conditions. Toaddress this issue, we propose Dehazing-NeRF, a method that can recover clearNeRF from hazy image inputs. Our method simulates the physical imaging processof hazy images using an atmospheric scattering model, and jointly learns theatmospheric scattering model and a clean NeRF model for both image dehazing andnovel view synthesis. Different from previous approaches, Dehazing-NeRF is anunsupervised method with only hazy images as the input, and also does not relyon hand-designed dehazing priors. By jointly combining the depth estimated fromthe NeRF 3D scene with the atmospheric scattering model, our proposed modelbreaks through the ill-posed problem of single-image dehazing while maintaininggeometric consistency. Besides, to alleviate the degradation of image qualitycaused by information loss, soft margin consistency regularization, as well asatmospheric consistency and contrast discriminative loss, are addressed duringthe model training process. Extensive experiments demonstrate that our methodoutperforms the simple combination of single-image dehazing and NeRF on bothimage dehazing and novel view image synthesis.", "output": "Dehazing-NeRF: Neural Radiance Fields from Hazy Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical image segmentation is crucial for the development of computer-aideddiagnostic and therapeutic systems, but still faces numerous difficulties. Inrecent years, the commonly used encoder-decoder architecture based on CNNs hasbeen applied effectively in medical image segmentation, but has limitations interms of learning global context and spatial relationships. Some researchershave attempted to incorporate transformers into both the decoder and encodercomponents, with promising results, but this approach still requires furtherimprovement due to its high computational complexity. This paper introducesDilated-UNet, which combines a Dilated Transformer block with the U-Netarchitecture for accurate and fast medical image segmentation. Image patchesare transformed into tokens and fed into the U-shaped encoder-decoderarchitecture, with skip-connections for local-global semantic feature learning.The encoder uses a hierarchical Dilated Transformer with a combination ofNeighborhood Attention and Dilated Neighborhood Attention Transformer toextract local and sparse global attention. The results of our experiments showthat Dilated-UNet outperforms other models on several challenging medical imagesegmentation datasets, such as ISIC and Synapse.", "output": "Dilated-UNet: A Fast and Accurate Medical Image Segmentation Approach using a Dilated Transformer and U-Net Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In many Vietnamese schools, grades are still being inputted into the databasemanually, which is not only inefficient but also prone to human error. Thus,the automation of this process is highly necessary, which can only be achievedif we can extract information from academic transcripts. In this paper, we testour improved CRNN model in extracting information from 126 transcripts, with1008 vertical lines, 3859 horizontal lines, and 2139 handwritten test scores.Then, this model is compared to the Baseline model. The results show that ourmodel significantly outperforms the Baseline model with an accuracy of 99.6% inrecognizing vertical lines, 100% in recognizing horizontal lines, and 96.11% inrecognizing handwritten test scores.", "output": "An approach to extract information from academic transcripts of HUST."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language-based object detection is a promising direction towards building anatural interface to describe objects in images that goes far beyond plaincategory names. While recent methods show great progress in that direction,proper evaluation is lacking. With OmniLabel, we propose a novel taskdefinition, dataset, and evaluation metric. The task subsumes standard- andopen-vocabulary detection as well as referring expressions. With more than 28Kunique object descriptions on over 25K images, OmniLabel provides a challengingbenchmark with diverse and complex object descriptions in a naturallyopen-vocabulary setting. Moreover, a key differentiation to existing benchmarksis that our object descriptions can refer to one, multiple or even no object,hence, providing negative examples in free-form text. The proposed evaluationhandles the large label space and judges performance via a modified averageprecision metric, which we validate by evaluating strong language-basedbaselines. OmniLabel indeed provides a challenging test bed for future researchon language-based detection.", "output": "OmniLabel: A Challenging Benchmark for Language-Based Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given a visual scene, humans have strong intuitions about how a scene canevolve over time under given actions. The intuition, often termed visualintuitive physics, is a critical ability that allows us to make effective plansto manipulate the scene to achieve desired outcomes without relying onextensive trial and error. In this paper, we present a framework capable oflearning 3D-grounded visual intuitive physics models from videos of complexscenes with fluids. Our method is composed of a conditional Neural RadianceField (NeRF)-style visual frontend and a 3D point-based dynamics predictionbackend, using which we can impose strong relational and structural inductivebias to capture the structure of the underlying environment. Unlike existingintuitive point-based dynamics works that rely on the supervision of densepoint trajectory from simulators, we relax the requirements and only assumeaccess to multi-view RGB images and (imperfect) instance masks acquired usingcolor prior. This enables the proposed model to handle scenarios where accuratepoint estimation and tracking are hard or impossible. We generate datasetsincluding three challenging scenarios involving fluid, granular materials, andrigid objects in the simulation. The datasets do not include any dense particleinformation so most previous 3D-based intuitive physics pipelines can barelydeal with that. We show our model can make long-horizon future predictions bylearning from raw images and significantly outperforms models that do notemploy an explicit 3D representation space. We also show that once trained, ourmodel can achieve strong generalization in complex scenarios under extrapolatesettings.", "output": "3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In computer vision, unsupervised domain adaptation (UDA) is an approach totransferring knowledge from a label-rich source domain to a fully-unlabeledtarget domain. Conventional UDA approaches have two problems. The first problemis that a class classifier can be biased to the source domain because it istrained using only source samples. The second is that previous approaches alignimage-level features regardless of foreground and background, although theclassifier requires foreground features. To solve these problems, we introduceWeight-based Mask Network (WEMNet) composed of Domain Ignore Module (DIM) andSemantic Enhancement Module (SEM). DIM obtains domain-agnostic featurerepresentations via the weight of the domain discriminator and predictscategories. In addition, SEM obtains class-related feature representationsusing the classifier weight and focuses on the foreground features for domainadaptation. Extensive experimental results reveal that the proposed WEMNetoutperforms the competitive accuracy on representative UDA datasets.", "output": "Weight-based Mask for Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate and timely monitoring of forest canopy heights is critical forassessing forest dynamics, biodiversity, carbon sequestration as well as forestdegradation and deforestation. Recent advances in deep learning techniques,coupled with the vast amount of spaceborne remote sensing data offer anunprecedented opportunity to map canopy height at high spatial and temporalresolutions. Current techniques for wall-to-wall canopy height mappingcorrelate remotely sensed 2D information from optical and radar sensors to thevertical structure of trees using LiDAR measurements. While studies using deeplearning algorithms have shown promising performances for the accurate mappingof canopy heights, they have limitations due to the type of architectures andloss functions employed. Moreover, mapping canopy heights over tropical forestsremains poorly studied, and the accurate height estimation of tall canopies isa challenge due to signal saturation from optical and radar sensors, persistentcloud covers and sometimes the limited penetration capabilities of LiDARs.Here, we map heights at 10 m resolution across the diverse landscape of Ghanawith a new vision transformer (ViT) model optimized concurrently with aclassification (discrete) and a regression (continuous) loss function. Thismodel achieves better accuracy than previously used convolutional basedapproaches (ConvNets) optimized with only a continuous loss function. The ViTmodel results show that our proposed discrete/continuous loss significantlyincreases the sensitivity for very tall trees (i.e., > 35m), for which otherapproaches show saturation effects. The height maps generated by the ViT alsohave better ground sampling distance and better sensitivity to sparsevegetation in comparison to a convolutional model. Our ViT model has a RMSE of3.12m in comparison to a reference dataset while the ConvNet model has a RMSEof 4.3m.", "output": "Vision Transformers, a new approach for high-resolution and large-scale mapping of canopy heights."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although computational aesthetics evaluation has made certain achievements inmany fields, its research of music performance remains to be explored. Atpresent, subjective evaluation is still a ultimate method of music aestheticsresearch, but it will consume a lot of human and material resources. Inaddition, the music performance generated by AI is still mechanical, monotonousand lacking in beauty. In order to guide the generation task of AI musicperformance, and to improve the performance effect of human performers, thispaper uses Birkhoff's aesthetic measure to propose a method of objectivemeasurement of beauty. The main contributions of this paper are as follows:Firstly, we put forward an objective aesthetic evaluation method to measure themusic performance aesthetic; Secondly, we propose 10 basic music features and 4aesthetic music features. Experiments show that our method performs well onperformance assessment.", "output": "An Order-Complexity Model for Aesthetic Quality Assessment of Homophony Music Performance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Optical flow is an indispensable building block for various importantcomputer vision tasks, including motion estimation, object tracking, anddisparity measurement. In this work, we propose TransFlow, a pure transformerarchitecture for optical flow estimation. Compared to dominant CNN-basedmethods, TransFlow demonstrates three advantages. First, it provides moreaccurate correlation and trustworthy matching in flow estimation by utilizingspatial self-attention and cross-attention mechanisms between adjacent framesto effectively capture global dependencies; Second, it recovers morecompromised information (e.g., occlusion and motion blur) in flow estimationthrough long-range temporal association in dynamic scenes; Third, it enables aconcise self-learning paradigm and effectively eliminate the complex andlaborious multi-stage pre-training procedures. We achieve the state-of-the-artresults on the Sintel, KITTI-15, as well as several downstream tasks, includingvideo object detection, interpolation and stabilization. For its efficacy, wehope TransFlow could serve as a flexible baseline for optical flow estimation.", "output": "TransFlow: Transformer as Flow Learner."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical image analysis is a hot research topic because of its usefulness indifferent clinical applications, such as early disease diagnosis and treatment.Convolutional neural networks (CNNs) have become the de-facto standard inmedical image analysis tasks because of their ability to learn complex featuresfrom the available datasets, which makes them surpass humans in manyimage-understanding tasks. In addition to CNNs, transformer architectures alsohave gained popularity for medical image analysis tasks. However, despiteprogress in the field, there are still potential areas for improvement. Thisstudy uses different CNNs and transformer-based methods with a wide range ofdata augmentation techniques. We evaluated their performance on three medicalimage datasets from different modalities. We evaluated and compared theperformance of the vision transformer model with other state-of-the-art (SOTA)pre-trained CNN networks. For Chest X-ray, our vision transformer modelachieved the highest F1 score of 0.9532, recall of 0.9533, Matthews correlationcoefficient (MCC) of 0.9259, and ROC-AUC score of 0.97. Similarly, for theKvasir dataset, we achieved an F1 score of 0.9436, recall of 0.9437, MCC of0.9360, and ROC-AUC score of 0.97. For the Kvasir-Capsule (a large-scale VCEdataset), our ViT model achieved a weighted F1-score of 0.7156, recall of0.7182, MCC of 0.3705, and ROC-AUC score of 0.57. We found that ourtransformer-based models were better or more effective than various CNN modelsfor classifying different anatomical structures, findings, and abnormalities.Our model showed improvement over the CNN-based approaches and suggests that itcould be used as a new benchmarking algorithm for algorithm development.", "output": "Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semi-supervised semantic segmentation aims to learn from a small amount oflabeled data and plenty of unlabeled ones for the segmentation task. The mostcommon approach is to generate pseudo-labels for unlabeled images to augmentthe training data. However, the noisy pseudo-labels will lead to cumulativeclassification errors and aggravate the local inconsistency in prediction. Thispaper proposes a Region Relevance Network (RRN) to alleviate the problemmentioned above. Specifically, we first introduce a local pseudo-labelfiltering module that leverages discriminator networks to assess the accuracyof the pseudo-label at the region level. A local selection loss is proposed tomitigate the negative impact of wrong pseudo-labels in consistencyregularization training. In addition, we propose a dynamic region-losscorrection module, which takes the merit of network diversity to further ratethe reliability of pseudo-labels and correct the convergence direction of thesegmentation network with a dynamic region loss. Extensive experiments areconducted on PASCAL VOC 2012 and Cityscapes datasets with varying amounts oflabeled data, demonstrating that our proposed approach achievesstate-of-the-art performance compared to current counterparts.", "output": "Semi-Supervised Semantic Segmentation With Region Relevance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "KBody is a method for fitting a low-dimensional body model to an image. Itfollows a predict-and-optimize approach, relying on data-driven model estimatesfor the constraints that will be used to solve for the body's parameters.Compared to other approaches, it introduces virtual joints to identify higherquality correspondences and disentangles the optimization between the pose andshape parameters to achieve a more balanced result in terms of pose and shapecapturing capacity, as well as pixel alignment.", "output": "KBody: Balanced monocular whole-body estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Lane detection is challenging due to the complicated on road scenarios andline deformation from different camera perspectives. Lots of solutions wereproposed, but can not deal with corner lanes well. To address this problem,this paper proposes a new top-down deep learning lane detection approach,CANET. A lane instance is first responded by the heat-map on the U-shapedcurved guide line at global semantic level, thus the corresponding features ofeach lane are aggregated at the response point. Then CANET obtains the heat-mapresponse of the entire lane through conditional convolution, and finallydecodes the point set to describe lanes via adaptive decoder. The experimentalresults show that CANET reaches SOTA in different metrics. Our code will bereleased soon.", "output": "CANet: Curved Guide Line Network with Adaptive Decoder for Lane Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A number of problems in computer vision and related fields would be mitigatedif camera spectral sensitivities were known. As consumer cameras are notdesigned for high-precision visual tasks, manufacturers do not disclosespectral sensitivities. Their estimation requires a costly optical setup, whichtriggered researchers to come up with numerous indirect methods that aim tolower cost and complexity by using color targets. However, the use of colortargets gives rise to new complications that make the estimation moredifficult, and consequently, there currently exists no simple, low-cost, robustgo-to method for spectral sensitivity estimation. Furthermore, even if notlimited by hardware or cost, researchers frequently work with imagery frommultiple cameras that they do not have in their possession. To provide apractical solution to this problem, we propose a framework for spectralsensitivity estimation that not only does not require any hardware, but alsodoes not require physical access to the camera itself. Similar to other work,we formulate an optimization problem that minimizes a two-term objectivefunction: a camera-specific term from a system of equations, and a universalterm that bounds the solution space. Different than other work, we use publiclyavailable high-quality calibration data to construct both terms. We use thecolorimetric mapping matrices provided by the Adobe DNG Converter to formulatethe camera-specific system of equations, and constrain the solutions using anautoencoder trained on a database of ground-truth curves. On average, weachieve reconstruction errors as low as those that can arise due tomanufacturing imperfections between two copies of the same camera. We providepredicted sensitivities for more than 1,000 cameras that the Adobe DNGConverter currently supports, and discuss which tasks can become trivial whencamera responses are available.", "output": "Spectral Sensitivity Estimation Without a Camera."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since stroke is the main cause of various cerebrovascular diseases, deeplearning-based stroke lesion segmentation on magnetic resonance (MR) images hasattracted considerable attention. However, the existing methods often neglectthe domain shift among MR images collected from different sites, which haslimited performance improvement. To address this problem, we intend to changestyle information without affecting high-level semantics via adaptivelychanging the low-frequency amplitude components of the Fourier transform so asto enhance model robustness to varying domains. Thus, we propose a novelFAN-Net, a U-Net--based segmentation network incorporated with a Fourier-basedadaptive normalization (FAN) and a domain classifier with a gradient reversallayer. The FAN module is tailored for learning adaptive affine parameters forthe amplitude components of different domains, which can dynamically normalizethe style information of source images. Then, the domain classifier providesdomain-agnostic knowledge to endow FAN with strong domain generalizability. Theexperimental results on the ATLAS dataset, which consists of MR images from 9sites, show the superior performance of the proposed FAN-Net compared withbaseline methods.", "output": "FAN-Net: Fourier-Based Adaptive Normalization For Cross-Domain Stroke Lesion Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we report an autoencoder-based 2D representation to classify atime-series as stochastic or non-stochastic, to understand the underlyingphysical process. Content-aware conversion of 1D time-series to 2Drepresentation, that simultaneously utilizes time- and frequency-domaincharacteristics, is proposed. An autoencoder is trained with a loss function tolearn latent space (using both time- and frequency domains) representation,that is designed to be, time-invariant. Every element of the time-series isrepresented as a tuple with two components, one each, from latent spacerepresentation in time- and frequency-domains, forming a binary image. In thisbinary image, those tuples that represent the points in the time-series,together form the ``Latent Space Signature\" (LSS) of the input time-series. Theobtained binary LSS images are fed to a classification network. TheEfficientNetv2-S classifier is trained using 421 synthetic time-series, withfair representation from both categories. The proposed methodology is evaluatedon publicly available astronomical data which are 12 distinct temporal classesof time-series pertaining to the black hole GRS 1915 + 105, obtained from RXTEsatellite. Results obtained using the proposed methodology are compared withexisting techniques. Concurrence in labels obtained across the classes,illustrates the efficacy of the proposed 2D representation using the latentspace co-ordinates. The proposed methodology also outputs the confidence in theclassification label.", "output": "Identifying Stochasticity in Time-Series with Autoencoder-Based Content-aware 2D Representation: Application to Black Hole Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attacks can mislead deep neural networks (DNNs) by addingimperceptible perturbations to benign examples. The attack transferabilityenables adversarial examples to attack black-box DNNs with unknownarchitectures or parameters, which poses threats to many real-worldapplications. We find that existing transferable attacks do not distinguishbetween style and content features during optimization, limiting their attacktransferability. To improve attack transferability, we propose a novel attackmethod called style-less perturbation (StyLess). Specifically, instead of usinga vanilla network as the surrogate model, we advocate using stylized networks,which encode different style features by perturbing an adaptive instancenormalization. Our method can prevent adversarial examples from usingnon-robust style features and help generate transferable perturbations.Comprehensive experiments show that our method can significantly improve thetransferability of adversarial examples. Furthermore, our approach is genericand can outperform state-of-the-art transferable attacks when combined withother attack techniques.", "output": "StyLess: Boosting the Transferability of Adversarial Examples."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Object detection is one of the key tasks in many applications of computervision. Deep Neural Networks (DNNs) are undoubtedly a well-suited approach forobject detection. However, such DNNs need highly adapted hardware together withhardware-specific optimization to guarantee high efficiency during inference.This is especially the case when aiming for efficient object detection in videostreaming applications on limited hardware such as edge devices. Comparingvendor-specific hardware and related optimization software pipelines in a fairexperimental setup is a challenge. In this paper, we propose a framework thatuses a host computer with a host software application together with alight-weight interface based on the Message Queuing Telemetry Transport (MQTT)protocol. Various different target devices with target apps can be connectedvia MQTT with this host computer. With well-defined and standardized MQTTmessages, object detection results can be reported to the host computer, wherethe results are evaluated without harming or influencing the processing on thedevice. With this quite generic framework, we can measure the object detectionperformance, the runtime, and the energy efficiency at the same time. Theeffectiveness of this framework is demonstrated in multiple experiments thatoffer deep insights into the optimization of DNNs.", "output": "A Framework for Benchmarking Real-Time Embedded Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Two-stage point-to-box network acts as a critical role in the recent popular3D Siamese tracking paradigm, which first generates proposals and then predictscorresponding proposal-wise scores. However, such a network suffers fromtedious hyper-parameter tuning and task misalignment, limiting the trackingperformance. Towards these concerns, we propose a simple yet effectiveone-stage point-to-box network for point cloud-based 3D single object tracking.It synchronizes 3D proposal generation and center-ness score prediction by aparallel predictor without tedious hyper-parameters. To guide a task-alignedscore ranking of proposals, a center-aware focal loss is proposed to supervisethe training of the center-ness branch, which enhances the network'sdiscriminative ability to distinguish proposals of different quality. Besides,we design a binary target classifier to identify target-relevant points. Byintegrating the derived classification scores with the center-ness scores, theresulting network can effectively suppress interference proposals and furthermitigate task misalignment. Finally, we present a novel one-stage Siamesetracker OSP2B equipped with the designed network. Extensive experiments onchallenging benchmarks including KITTI and Waymo SOT Dataset show that ourOSP2B achieves leading performance with a considerable real-time speed.", "output": "OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Application of electronic railway systems as well as the implication ofAutomatic Train Control (ATC) System has increased the safety of railtransportation. However, one of the most important causes of accidents on therailway is rail damage and breakage. In this paper, we have proposed a methodthat the rail region is first recognized from the observation area, then byinvestigating the image texture processing data, the types of rail defectsincluding cracks, wear, peeling, disintegration, and breakage are detected. Inorder to reduce the computational cost, the image is changed from the RGB colorspectrum to the gray spectrum. Image texture processing data is obtained by thetwo-dimensional Gray Levels Co-occurrence Matrix (GLCM) at different angles;this data demonstrates second-order features of the images. Large data offeatures has a negative effect on the overall accuracy of the classifiers. Totackle this issue and acquire faster response, Principal Component Analysis(PCA) algorithm is used, before entering the band into the classifier. Then thefeatures extracted from the images are compared by three different classifiersincluding Support Vector Machine (SVM), Random Forest (RF), and K-NearestNeighbor (KNN) classification. The results obtained from this method indicatethat the Random Forest classifier has better performance (accuracy 97%,precision 96%, and recall 96%) than other classifiers.", "output": "Broken Rail Detection With Texture Image Processing Using Two-Dimensional Gray Level Co-occurrence Matrix."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent work known as Segment Anything (SA) has made significant stridesin pushing the boundaries of semantic segmentation into the era of foundationmodels. The impact of SA has sparked extremely active discussions and usheredin an encouraging new wave of developing foundation models for the diversetasks in the Euclidean domain, such as object detection and image inpainting.Despite the promising advances led by SA, the concept has yet to be extended tothe non-Euclidean graph domain. In this paper, we explore a novel SegmentNon-Euclidean Anything (SNA) paradigm that strives to develop foundation modelsthat can handle the diverse range of graph data within the non-Euclideandomain, seeking to expand the scope of SA and lay the groundwork for futureresearch in this direction. To achieve this goal, we begin by discussing therecent achievements in foundation models associated with SA. We then shed lighton the unique challenges that arise when applying the SA concept to graphanalysis, which involves understanding the differences between the Euclideanand non-Euclidean domains from both the data and task perspectives. Motivatedby these observations, we present several preliminary solutions to tackle thechallenges of SNA and detail their corresponding limitations, along withseveral potential directions to pave the way for future SNA research.Experiments on five Open Graph Benchmark (OGB) datasets across various tasks,including graph property classification and regression, as well as multi-labelprediction, demonstrate that the performance of the naive SNA solutions hasconsiderable room for improvement, pointing towards a promising avenue forfuture exploration of Graph General Intelligence.", "output": "Segment Anything in Non-Euclidean Domains: Challenges and Opportunities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual representation based on covariance matrix has demonstrates itsefficacy for image classification by characterising the pairwise correlation ofdifferent channels in convolutional feature maps. However, pairwise correlationwill become misleading once there is another channel correlating with bothchannels of interest, resulting in the ``confounding'' effect. For this case,``partial correlation'' which removes the confounding effect shall be estimatedinstead. Nevertheless, reliably estimating partial correlation requires tosolve a symmetric positive definite matrix optimisation, known as sparseinverse covariance estimation (SICE). How to incorporate this process into CNNremains an open issue. In this work, we formulate SICE as a novel structuredlayer of CNN. To ensure end-to-end trainability, we develop an iterative methodto solve the above matrix optimisation during forward and backward propagationsteps. Our work obtains a partial correlation based deep visual representationand mitigates the small sample problem often encountered by covariance matrixestimation in CNN. Computationally, our model can be effectively trained withGPU and works well with a large number of channels of advanced CNNs.Experiments show the efficacy and superior classification performance of ourdeep visual representation compared to covariance matrix based counterparts.", "output": "Learning Partial Correlation based Deep Visual Representation for Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) is popular due to its ability to adapt to novelclasses. Compared with inductive few-shot learning, transductive modelstypically perform better as they leverage all samples of the query set. The twoexisting classes of methods, prototype-based and graph-based, have thedisadvantages of inaccurate prototype estimation and sub-optimal graphconstruction with kernel functions, respectively. In this paper, we propose anovel prototype-based label propagation to solve these issues. Specifically,our graph construction is based on the relation between prototypes and samplesrather than between samples. As prototypes are being updated, the graphchanges. We also estimate the label of each prototype instead of considering aprototype be the class centre. On mini-ImageNet, tiered-ImageNet, CIFAR-FS andCUB datasets, we show the proposed method outperforms other state-of-the-artmethods in transductive FSL and semi-supervised FSL when some unlabeled dataaccompanies the novel few-shot task.", "output": "Transductive Few-shot Learning with Prototype-based Label Propagation by Iterative Graph Refinement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generating coherent and natural movement is the key challenge in videogeneration. This research proposes to condense video generation into a problemof motion generation, to improve the expressiveness of motion and make videogeneration more manageable. This can be achieved by breaking down the videogeneration process into latent motion generation and video reconstruction. Wepresent a latent motion diffusion (LaMD) framework, which consists of amotion-decomposed video autoencoder and a diffusion-based motion generator, toimplement this idea. Through careful design, the motion-decomposed videoautoencoder can compress patterns in movement into a concise latent motionrepresentation. Meanwhile, the diffusion-based motion generator is able toefficiently generate realistic motion on a continuous latent space undermulti-modal conditions, at a cost that is similar to that of image diffusionmodels. Results show that LaMD generates high-quality videos with a wide rangeof motions, from stochastic dynamics to highly controllable movements. Itachieves new state-of-the-art performance on benchmark datasets, includingBAIR, Landscape and CATER-GENs, for Image-to-Video (I2V) andText-Image-to-Video (TI2V) generation. The source code of LaMD will be madeavailable soon.", "output": "LaMD: Latent Motion Diffusion for Video Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Click-based interactive segmentation enables productive pixel-levelannotation and image editing with simple user clicks, whereas target ambiguityremains a problem hindering precise segmentation. That is, in scenes with richcontext, one click may refer to multiple potential targets residing incorresponding masks, while most interactive segmentors can only generate onesingle mask and fail to capture the rich context. To resolve target ambiguity,we here propose PiClick to produce semantically diversified masks. PiClickleverages a transformer network design wherein mutually interactive maskqueries are integrated to infuse target priors. Moreover, a Target ReasoningModule is designed in PiClick to automatically imply the best-matched mask fromall proposals, significantly relieving target ambiguity as well as extra humanintervention. Extensive experiments conducted on all 9 interactive segmentationdatasets not only demonstrate the state-of-the-art segmentation performance ofPiClick, but also reduces human interventions with multiple proposal generationand target reasoning. To promote direct usage and future endeavors, we releasethe source code of PiClick together with a plug-and-play annotation tool at", "output": "PiClick: Picking the desired mask in click-based interactive segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpreting remote sensing imagery enables numerous downstream applicationsranging from land-use planning to deforestation monitoring. Robustlyclassifying this data is challenging due to the Earth's geographic diversity.While many distinct satellite and aerial image classification datasets exist,there is yet to be a benchmark curated that suitably covers this diversity. Inthis work, we introduce SATellite ImageNet (SATIN), a metadataset curated from27 existing remotely sensed datasets, and comprehensively evaluate thezero-shot transfer classification capabilities of a broad range ofvision-language (VL) models on SATIN. We find SATIN to be a challengingbenchmark-the strongest method we evaluate achieves a classification accuracyof 52.0%. We provide a $href{ to guide and track the progress of VL models in this importantdomain.", "output": "SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Skeleton-based action recognition has achieved remarkable results in humanaction recognition with the development of graph convolutional networks (GCNs).However, the recent works tend to construct complex learning mechanisms withredundant training and exist a bottleneck for long time-series. To solve theseproblems, we propose the Temporal-Spatio Graph ConvNeXt (TSGCNeXt) to exploreefficient learning mechanism of long temporal skeleton sequences. Firstly, anew graph learning mechanism with simple structure, Dynamic-Static SeparateMulti-graph Convolution (DS-SMG) is proposed to aggregate features of multipleindependent topological graphs and avoid the node information being ignoredduring dynamic convolution. Next, we construct a graph convolution trainingacceleration mechanism to optimize the back-propagation computing of dynamicgraph learning with 55.08% speed-up. Finally, the TSGCNeXt restructure theoverall structure of GCN with three Spatio-temporal learningmodules,efficiently modeling long temporal features. In comparison withexisting previous methods on large-scale datasets NTU RGB+D 60 and 120,TSGCNeXt outperforms on single-stream networks. In addition, with the ema modelintroduced into the multi-stream fusion, TSGCNeXt achieves SOTA levels. On thecross-subject and cross-set of the NTU 120, accuracies reach 90.22% and 91.74%.", "output": "TSGCNeXt: Dynamic-Static Multi-Graph Convolution for Efficient Skeleton-Based Action Recognition with Long-term Learning Potential."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One fundamental limitation to the research of bird strike prevention is thelack of a large-scale dataset taken directly from real-world airports. Existingrelevant datasets are either small in size or not dedicated for this purpose.To advance the research and practical solutions for bird strike prevention, inthis paper, we present a large-scale challenging dataset AirBirds that consistsof 118,312 time-series images, where a total of 409,967 bounding boxes offlying birds are manually, carefully annotated. The average size of allannotated instances is smaller than 10 pixels in 1920x1080 images. Images inthe dataset are captured over 4 seasons of a whole year by a network of camerasdeployed at a real-world airport, covering diverse bird species, lightingconditions and 13 meteorological scenarios. To the best of our knowledge, it isthe first large-scale image dataset that directly collects flying birds inreal-world airports for bird strike prevention. This dataset is publiclyavailable at ", "output": "AirBirds: A Large-scale Challenging Dataset for Bird Strike Prevention in Real-world Airports."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The discriminability of feature representation is the key to open-set facerecognition. Previous methods rely on the learnable weights of theclassification layer that represent the identities. However, the evaluationprocess learns no identity representation and drops the classifier fromtraining. This inconsistency could confuse the feature encoder in understandingthe evaluation goal and hinder the effect of identity-based methods. Toalleviate the above problem, we propose a novel approach namely ContrastiveRegularization for Face recognition (CoReFace) to apply image-levelregularization in feature representation learning. Specifically, we employsample-guided contrastive learning to regularize the training with theimage-image relationship directly, which is consistent with the evaluationprocess. To integrate contrastive learning into face recognition, we augmentembeddings instead of images to avoid the image quality degradation. Then, wepropose a novel contrastive loss for the representation distribution byincorporating an adaptive margin and a supervised contrastive mask to generatesteady loss values and avoid the collision with the classification supervisionsignal. Finally, we discover and solve the semantically repetitive signalproblem in contrastive learning by exploring new pair coupling protocols.Extensive experiments demonstrate the efficacy and efficiency of our CoReFacewhich is highly competitive with the state-of-the-art approaches.", "output": "CoReFace: Sample-Guided Contrastive Regularization for Deep Face Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, as various realistic face forgery techniques known asDeepFake improves by leaps and bounds,more and more DeepFake detectiontechniques have been proposed. These methods typically rely on detectingstatistical differences between natural (i.e., real) and DeepFakegeneratedimages in both spatial and frequency domains. In this work, we propose toexplicitly minimize the statistical differences to evade state-of-the-artDeepFake detectors. To this end, we propose a statistical consistency attack(StatAttack) against DeepFake detectors, which contains two main parts. First,we select several statistical-sensitive natural degradations (i.e., exposure,blur, and noise) and add them to the fake images in an adversarial way. Second,we find that the statistical differences between natural and DeepFake imagesare positively associated with the distribution shifting between the two kindsof images, and we propose to use a distribution-aware loss to guide theoptimization of different degradations. As a result, the feature distributionsof generated adversarial examples is close to the natural images.Furthermore,we extend the StatAttack to a more powerful version, MStatAttack, where weextend the single-layer degradation to multi-layer degradations sequentiallyand use the loss to tune the combination weights jointly. Comprehensiveexperimental results on four spatial-based detectors and two frequency-baseddetectors with four datasets demonstrate the effectiveness of our proposedattack method in both white-box and black-box settings.", "output": "Evading DeepFake Detectors via Adversarial Statistical Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, deep learning-based compressed sensing (CS) has achieved greatsuccess in reducing the sampling and computational cost of sensing systems andimproving the reconstruction quality. These approaches, however, largelyoverlook the issue of the computational cost; they rely on complex structuresand task-specific operator designs, resulting in extensive storage and highenergy consumption in CS imaging systems. In this paper, we propose alightweight but effective deep neural network based on recurrent learning toachieve a sustainable CS system; it requires a smaller number of parameters butobtains high-quality reconstructions. Specifically, our proposed networkconsists of an initial reconstruction sub-network and a residual reconstructionsub-network. While the initial reconstruction sub-network has a hierarchicalstructure to progressively recover the image, reducing the number ofparameters, the residual reconstruction sub-network facilitates recurrentresidual feature extraction via recurrent learning to perform both featurefusion and deep reconstructions across different scales. In addition, we alsodemonstrate that, after the initial reconstruction, feature maps with reducedsizes are sufficient to recover the residual information, and thus we achieveda significant reduction in the amount of memory required. Extensive experimentsillustrate that our proposed model can achieve a better reconstruction qualitythan existing state-of-the-art CS algorithms, and it also has a smaller numberof network parameters than these algorithms. Our source codes are available at:", "output": "A Lightweight Recurrent Learning Network for Sustainable Compressed Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, indiscernible scene understanding has attracted a lot of attentionin the vision community. We further advance the frontier of this field bysystematically studying a new challenge named indiscernible object counting(IOC), the goal of which is to count objects that are blended with respect totheir surroundings. Due to a lack of appropriate IOC datasets, we present alarge-scale dataset IOCfish5K which contains a total of 5,637 high-resolutionimages and 659,024 annotated center points. Our dataset consists of a largenumber of indiscernible objects (mainly fish) in underwater scenes, making theannotation process all the more challenging. IOCfish5K is superior to existingdatasets with indiscernible scenes because of its larger scale, higher imageresolutions, more annotations, and denser scenes. All these aspects make it themost challenging dataset for IOC so far, supporting progress in this area. Forbenchmarking purposes, we select 14 mainstream methods for object counting andcarefully evaluate them on IOCfish5K. Furthermore, we propose IOCFormer, a newstrong baseline that combines density and regression branches in a unifiedframework and can effectively tackle object counting under concealed scenes.Experiments show that IOCFormer achieves state-of-the-art scores on IOCfish5K.", "output": "Indiscernible Object Counting in Underwater Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We address the need for a large-scale database of children's faces by usinggenerative adversarial networks (GANs) and face age progression (FAP) models tosynthesize a realistic dataset referred to as HDA-SynChildFaces. To this end,we proposed a processing pipeline that initially utilizes StyleGAN3 to sampleadult subjects, which are subsequently progressed to children of varying agesusing InterFaceGAN. Intra-subject variations, such as facial expression andpose, are created by further manipulating the subjects in their latent space.Additionally, the presented pipeline allows to evenly distribute the races ofsubjects, allowing to generate a balanced and fair dataset with respect to racedistribution. The created HDA-SynChildFaces consists of 1,652 subjects and atotal of 188,832 images, each subject being present at various ages and withmany different intra-subject variations. Subsequently, we evaluates theperformance of various facial recognition systems on the generated database andcompare the results of adults and children at different ages. The study revealsthat children consistently perform worse than adults, on all tested systems,and the degradation in performance is proportional to age. Additionally, ourstudy uncovers some biases in the recognition systems, with Asian and Blacksubjects and females performing worse than White and Latino Hispanic subjectsand males.", "output": "Child Face Recognition at Scale: Synthetic Data Generation and Performance Benchmark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A large amount of annotated training images is critical for training accurateand robust deep network models but the collection of a large amount ofannotated training images is often time-consuming and costly. Image synthesisalleviates this constraint by generating annotated training imagesautomatically by machines which has attracted increasing interest in the recentdeep learning research. We develop an innovative image synthesis technique thatcomposes annotated training images by realistically embedding foregroundobjects of interest (OOI) into background images. The proposed techniqueconsists of two key components that in principle boost the usefulness of thesynthesized images in deep network training. The first is context-awaresemantic coherence which ensures that the OOI are placed around semanticallycoherent regions within the background image. The second is harmoniousappearance adaptation which ensures that the embedded OOI are agreeable to thesurrounding background from both geometry alignment and appearance realism. Theproposed technique has been evaluated over two related but very differentcomputer vision challenges, namely, scene text detection and scene textrecognition. Experiments over a number of public datasets demonstrate theeffectiveness of our proposed image synthesis technique - the use of oursynthesized images in deep network training is capable of achieving similar oreven better scene text detection and scene text recognition performance ascompared with using real images.", "output": "Scene Text Synthesis for Efficient and Effective Deep Network Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we study robust deep learning against abnormal training datafrom the perspective of example weighting built in empirical loss functions,i.e., gradient magnitude with respect to logits, an angle that is notthoroughly studied so far. Consequently, we have two key findings: (1) MeanAbsolute Error (MAE) Does Not Treat Examples Equally. We present newobservations and insightful analysis about MAE, which is theoretically provedto be noise-robust. First, we reveal its underfitting problem in practice.Second, we analyse that MAE's noise-robustness is from emphasising on uncertainexamples instead of treating training samples equally, as claimed in priorwork. (2) The Variance of Gradient Magnitude Matters. We propose an effectiveand simple solution to enhance MAE's fitting ability while preserving itsnoise-robustness. Without changing MAE's overall weighting scheme, i.e., whatexamples get higher weights, we simply change its weighting variancenon-linearly so that the impact ratio between two examples are adjusted. Oursolution is termed Improved MAE (IMAE). We prove IMAE's effectiveness usingextensive experiments: image classification under clean labels, synthetic labelnoise, and real-world unknown noise. We conclude IMAE is superior to CCE, themost popular loss for training DNNs.", "output": "IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in generative adversarial networks (GANs) have achieved greatsuccess in automated image composition that generates new images by embeddinginterested foreground objects into background images automatically. On theother hand, most existing works deal with foreground objects in two-dimensional(2D) images though foreground objects in three-dimensional (3D) models are moreflexible with 360-degree view freedom. This paper presents an innovative ViewAlignment GAN (VA-GAN) that composes new images by embedding 3D models into 2Dbackground images realistically and automatically. VA-GAN consists of a texturegenerator and a differential discriminator that are inter-connected andend-to-end trainable. The differential discriminator guides to learn geometrictransformation from background images so that the composed 3D models can bealigned with the background images with realistic poses and views. The texturegenerator adopts a novel view encoding mechanism for generating accurate objecttextures for the 3D models under the estimated views. Extensive experimentsover two synthesis tasks (car synthesis with KITTI and pedestrian synthesiswith Cityscapes) show that VA-GAN achieves high-fidelity compositionqualitatively and quantitatively as compared with state-of-the-art generationmethods.", "output": "Towards Realistic 3D Embedding via View Alignment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bone age assessment is challenging in clinical practice due to thecomplicated bone age assessment process. Current automatic bone age assessmentmethods were designed with rare consideration of the diagnostic logistics andthus may yield certain uninterpretable hidden states and outputs. Consequently,doctors can find it hard to cooperate with such models harmoniously because itis difficult to check the correctness of the model predictions. In this work,we propose a new graph-based deep learning framework for bone age assessmentwith hand radiographs, called Doctor Imitator (DI). The architecture of DI isdesigned to learn the diagnostic logistics of doctors using the scoring methods(e.g., the Tanner-Whitehouse method) for bone age assessment. Specifically, theconvolutions of DI capture the local features of the anatomical regions ofinterest (ROIs) on hand radiographs and predict the ROI scores by our proposedAnatomy-based Group Convolution, summing up for bone age prediction. Besides,we develop a novel Dual Graph-based Attention module to computepatient-specific attention for ROI features and context attention for ROIscores. As far as we know, DI is the first automatic bone age assessmentframework following the scoring methods without fully supervised handradiographs. Experiments on hand radiographs with only bone age supervisionverify that DI can achieve excellent performance with sparse parameters andprovide more interpretability.", "output": "Doctor Imitator: Hand-Radiography-based Bone Age Assessment by Imitating Scoring Methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "An object's interior material properties, while invisible to the human eye,determine motion observed on its surface. We propose an approach that estimatesheterogeneous material properties of an object from a monocular video of itssurface vibrations. Specifically, we show how to estimate Young's modulus anddensity throughout a 3D object with known geometry. Knowledge of how thesevalues change across the object is useful for simulating its motion andcharacterizing any defects. Traditional non-destructive testing approaches,which often require expensive instruments, generally estimate only homogenizedmaterial properties or simply identify the presence of defects. In contrast,our approach leverages monocular video to (1) identify image-space modes froman object's sub-pixel motion, and (2) directly infer spatially-varying Young'smodulus and density values from the observed modes. We demonstrate our approachon both simulated and real videos.", "output": "Visual Vibration Tomography: Estimating Interior Material Properties from Monocular Video."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision Transformer (ViT) is known to be highly nonlinear like other classicalneural networks and could be easily fooled by both natural and adversarialpatch perturbations. This limitation could pose a threat to the deployment ofViT in the real industrial environment, especially in safety-criticalscenarios. In this work, we propose PatchCensor, aiming to certify the patchrobustness of ViT by applying exhaustive testing. We try to provide a provableguarantee by considering the worst patch attack scenarios. Unlike empiricaldefenses against adversarial patches that may be adaptively breached, certifiedrobust approaches can provide a certified accuracy against arbitrary attacksunder certain conditions. However, existing robustness certifications aremostly based on robust training, which often requires substantial trainingefforts and the sacrifice of model performance on normal samples. To bridge thegap, PatchCensor seeks to improve the robustness of the whole system bydetecting abnormal inputs instead of training a robust model and asking it togive reliable results for every input, which may inevitably compromiseaccuracy. Specifically, each input is tested by voting over multiple inferenceswith different mutated attention masks, where at least one inference isguaranteed to exclude the abnormal patch. This can be seen as complete-coveragetesting, which could provide a statistical guarantee on inference at the testtime. Our comprehensive evaluation demonstrates that PatchCensor is able toachieve high certified accuracy (e.g. 67.1% on ImageNet for 2%-pixeladversarial patches), significantly outperforming state-of-the-art techniqueswhile achieving similar clean accuracy (81.8% on ImageNet). Meanwhile, ourtechnique also supports flexible configurations to handle different adversarialpatch sizes (up to 25%) by simply changing the masking strategy.", "output": "PatchCensor: Patch Robustness Certification for Transformers via Exhaustive Testing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Object detection in autonomous driving applications implies that thedetection and tracking of semantic objects are commonly native to urban drivingenvironments, as pedestrians and vehicles. One of the major challenges instate-of-the-art deep-learning based object detection are false positives whichoccur with overconfident scores. This is highly undesirable in autonomousdriving and other critical robotic-perception domains because of safetyconcerns. This paper proposes an approach to alleviate the problem ofoverconfident predictions by introducing a novel probabilistic layer to deepobject detection networks in testing. The suggested approach avoids thetraditional Sigmoid or Softmax prediction layer which often producesoverconfident predictions. It is demonstrated that the proposed techniquereduces overconfidence in the false positives without degrading the performanceon the true positives. The approach is validated on the 2D-KITTI objectiondetection through the YOLOV4 and SECOND (Lidar-based detector). The proposedapproach enables interpretable probabilistic predictions without therequirement of re-training the network and therefore is very practical.", "output": "Probabilistic Approach for Road-Users Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Geometric feature learning for 3D surfaces is critical for many applicationsin computer graphics and 3D vision. However, deep learning currently lags inhierarchical modeling of 3D surfaces due to the lack of required operationsand/or their efficient implementations. In this paper, we propose a series ofmodular operations for effective geometric feature learning from 3D trianglemeshes. These operations include novel mesh convolutions, efficient meshdecimation and associated mesh (un)poolings. Our mesh convolutions exploitspherical harmonics as orthonormal bases to create continuous convolutionalfilters. The mesh decimation module is GPU-accelerated and able to processbatched meshes on-the-fly, while the (un)pooling operations compute featuresfor up/down-sampled meshes. We provide open-source implementation of theseoperations, collectively termed Picasso. Picasso supports heterogeneous meshbatching and processing. Leveraging its modular operations, we furthercontribute a novel hierarchical neural network for perceptual parsing of 3Dsurfaces, named PicassoNet++. It achieves highly competitive performance forshape analysis and scene segmentation on prominent 3D benchmarks. The code,data and trained models are available at", "output": "Mesh Convolution with Continuous Filters for 3D Surface Parsing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent research efforts on 3D point cloud semantic segmentation (PCSS) haveachieved outstanding performance by adopting neural networks. However, therobustness of these complex models have not been systematically analyzed. Giventhat PCSS has been applied in many safety-critical applications like autonomousdriving, it is important to fill this knowledge gap, especially, how thesemodels are affected under adversarial samples. As such, we present acomparative study of PCSS robustness. First, we formally define the attacker'sobjective under performance degradation and object hiding. Then, we develop newattack by whether to bound the norm. We evaluate different attack options ontwo datasets and three PCSS models. We found all the models are vulnerable andattacking point color is more effective. With this study, we call the attentionof the research community to develop new approaches to harden PCSS models.", "output": "On Adversarial Robustness of Point Cloud Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Our objective is to locate and provide a unique identifier for each mouse ina cluttered home-cage environment through time, as a precursor to automatedbehaviour recognition for biological research. This is a very challengingproblem due to (i) the lack of distinguishing visual features for each mouse,and (ii) the close confines of the scene with constant occlusion, makingstandard visual tracking approaches unusable. However, a coarse estimate ofeach mouse's location is available from a unique RFID implant, so there is thepotential to optimally combine information from (weak) tracking with coarseinformation on identity. To achieve our objective, we make the following keycontributions: (a) the formulation of the object identification problem as anassignment problem (solved using Integer Linear Programming), and (b) a novelprobabilistic model of the affinity between tracklets and RFID data. The latteris a crucial part of the model, as it provides a principled probabilistictreatment of object detections given coarse localisation. Our approach achieves77% accuracy on this animal identification problem, and is able to rejectspurious detections when the animals are hidden.", "output": "Persistent Animal Identification Leveraging Non-Visual Markers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As information exists in various modalities in real world, effectiveinteraction and fusion among multimodal information plays a key role for thecreation and perception of multimodal data in computer vision and deep learningresearch. With superb power in modeling the interaction among multimodalinformation, multimodal image synthesis and editing has become a hot researchtopic in recent years. Instead of providing explicit guidance for networktraining, multimodal guidance offers intuitive and flexible means for imagesynthesis and editing. On the other hand, this field is also facing severalchallenges in alignment of multimodal features, synthesis of high-resolutionimages, faithful evaluation metrics, etc. In this survey, we comprehensivelycontextualize the advance of the recent multimodal image synthesis and editingand formulate taxonomies according to data modalities and model types. We startwith an introduction to different guidance modalities in image synthesis andediting, and then describe multimodal image synthesis and editing approachesextensively according to their model types. After that, we describe benchmarkdatasets and evaluation metrics as well as corresponding experimental results.Finally, we provide insights about the current research challenges and possibledirections for future research. A project associated with this survey isavailable at ", "output": "Multimodal Image Synthesis and Editing: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method that learns to camouflage 3D objects within scenes. Givenan object's shape and a distribution of viewpoints from which it will be seen,we estimate a texture that will make it difficult to detect. Successfullysolving this task requires a model that can accurately reproduce textures fromthe scene, while simultaneously dealing with the highly conflicting constraintsimposed by each viewpoint. We address these challenges with a model based ontexture fields and adversarial learning. Our model learns to camouflage avariety of object shapes from randomly sampled locations and viewpoints withinthe input scene, and is the first to address the problem of hiding complexobject shapes. Using a human visual search study, we find that our estimatedtextures conceal objects significantly better than previous methods. Projectsite: ", "output": "GANmouflage: 3D Object Nondetection with Texture Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Utilizing multi-modal neuroimaging data has been proved to be effective toinvestigate human cognitive activities and certain pathologies. However, it isnot practical to obtain the full set of paired neuroimaging data centrallysince the collection faces several constraints, e.g., high examination cost,long acquisition time, and image corruption. In addition, these data aredispersed into different medical institutions and thus cannot be aggregated forcentralized training considering the privacy issues. There is a clear need tolaunch a federated learning and facilitate the integration of the disperseddata from different institutions. In this paper, we propose a new benchmark forfederated domain translation on unsupervised brain image synthesis (termed asFedMed-GAN) to bridge the gap between federated learning and medical GAN.FedMed-GAN mitigates the mode collapse without sacrificing the performance ofgenerators, and is widely applied to different proportions of unpaired andpaired data with variation adaptation property. We treat the gradient penaltiesby federally averaging algorithm and then leveraging differential privacygradient descent to regularize the training dynamics. A comprehensiveevaluation is provided for comparing FedMed-GAN and other centralized methods,which shows the new state-of-the-art performance by our FedMed-GAN. Our codehas been released on the website: ", "output": "FedMed-GAN: Federated Domain Translation on Unsupervised Cross-Modality Brain Image Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Zigzag flattening (ZF) is commonly used in computer vision as a defaultoption to unfold matrices, eg in patch slicing for Vision Transformer (ViT).However, when decomposing multi-scale-object web images, ZF cannot preserve thesmoothness of local information well. To address this, we draw inspiration fromSpace-Filling Curves (SFC) and investigate Hilbert flattening (HF) as analternative for visual models. We provide a comprehensive theoreticaldiscussion and practical analysis, demonstrating the superiority of HF overother SFC in locality and multi-scale robustness. We leverage HF to alleviatethe problem of the lack of locality bias in the shallow layers of ViT, whichformulates our Localformer. Extensive experiments demonstrate that Localformerconsistently improves performance for several common visual tasks.Additionally, upon inspection, we find that Localformer enhances representationlearning and length extrapolation abilities of ViT.", "output": "Localformer: a Locality-Preserving Vision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a shape fitting/registration method based on a Gaussian Processesformulation, suitable for shapes with extensive regions of missing data.Gaussian Processes are a proven powerful tool, as they provide a unifiedsetting for shape modelling and fitting. While the existing methods in thisarea prove to work well for the general case of the human head, when looking atmore detailed and deformed data, with a high prevalence of missing data, suchas the ears, the results are not satisfactory. In order to overcome this, weformulate the shape fitting problem as a multi-annotator Gaussian ProcessRegression and establish a parallel with the standard probabilisticregistration. The achieved method SFGP shows better performance when dealingwith extensive areas of missing data when compared to a state-of-the-artregistration method and current approaches for registration with pre-existingshape models. Experiments are conducted both for a 2D small dataset withdiverse transformations and a 3D dataset of ears.", "output": "Probabilistic Registration for Gaussian Process 3D shape modelling in the presence of extensive missing data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce compositional soft prompting (CSP), a parameter-efficientlearning technique to improve the zero-shot compositionality of large-scalepretrained vision-language models (VLMs) like CLIP. We develop CSP forcompositional zero-shot learning, the task of predicting unseenattribute-object compositions (e.g., old cat and young tiger). VLMs have aflexible text encoder that can represent arbitrary classes as natural languageprompts but they often underperform task-specific architectures on thecompositional zero-shot benchmark datasets. CSP treats the attributes andobjects that define classes as learnable tokens of vocabulary. During training,the vocabulary is tuned to recognize classes that compose tokens in multipleways (e.g., old cat and white cat). At test time, we recompose the learnedattribute-object vocabulary in new combinations to recognize novel classes. Weshow that CSP outperforms the CLIP on benchmark datasets by an average of 10.9percentage points on AUC. CSP also outperforms CoOp, a soft prompting methodthat fine-tunes the prefix context tokens, by an average of 5.8 percentagepoints on AUC. We perform additional experiments to show that CSP improvesgeneralization to higher-order attribute-attribute-object compositions (e.g.,old white cat) and combinations of pretrained attributes and fine-tunedobjects. The code is available at ", "output": "Learning to Compose Soft Prompts for Compositional Zero-Shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans intuitively understand that inanimate objects do not move bythemselves, but that state changes are typically caused by human manipulation(e.g., the opening of a book). This is not yet the case for machines. In partthis is because there exist no datasets with ground-truth 3D annotations forthe study of physically consistent and synchronised motion of hands andarticulated objects. To this end, we introduce ARCTIC -- a dataset of two handsthat dexterously manipulate objects, containing 2.1M video frames paired withaccurate 3D hand and object meshes and detailed, dynamic contact information.It contains bi-manual articulation of objects such as scissors or laptops,where hand poses and object states evolve jointly in time. We propose two novelarticulated hand-object interaction tasks: (1) Consistent motionreconstruction: Given a monocular video, the goal is to reconstruct two handsand articulated objects in 3D, so that their motions are spatio-temporallyconsistent. (2) Interaction field estimation: Dense relative hand-objectdistances must be estimated from images. We introduce two baselines ArcticNetand InterField, respectively and evaluate them qualitatively and quantitativelyon ARCTIC. Our code and data are available at ", "output": "ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Contrastive learning has revolutionized the field of computer vision,learning rich representations from unlabeled data, which generalize well todiverse vision tasks. Consequently, it has become increasingly important toexplain these approaches and understand their inner workings mechanisms. Giventhat contrastive models are trained with interdependent and interacting inputsand aim to learn invariance through data augmentation, the existing methods forexplaining single-image systems (e.g., image classification models) areinadequate as they fail to account for these factors. Additionally, there is alack of evaluation metrics designed to assess pairs of explanations, and noanalytical studies have been conducted to investigate the effectiveness ofdifferent techniques used to explaining contrastive learning. In this work, wedesign visual explanation methods that contribute towards understandingsimilarity learning tasks from pairs of images. We further adapt existingmetrics, used to evaluate visual explanations of image classification systems,to suit pairs of explanations and evaluate our proposed methods with thesemetrics. Finally, we present a thorough analysis of visual explainabilitymethods for contrastive learning, establish their correlation with downstreamtasks and demonstrate the potential of our approaches to investigate theirmerits and drawbacks.", "output": "Visualizing and Understanding Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Widely observed neural scaling laws, in which error falls off as a power ofthe training set size, model size, or both, have driven substantial performanceimprovements in deep learning. However, these improvements through scalingalone require considerable costs in compute and energy. Here we focus on thescaling of error with dataset size and show how in theory we can break beyondpower law scaling and potentially even reduce it to exponential scaling insteadif we have access to a high-quality data pruning metric that ranks the order inwhich training examples should be discarded to achieve any pruned dataset size.We then test this improved scaling prediction with pruned dataset sizeempirically, and indeed observe better than power law scaling in practice onResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance offinding high-quality pruning metrics, we perform the first large-scalebenchmarking study of ten different data pruning metrics on ImageNet. We findmost existing high performing metrics scale poorly to ImageNet, while the bestare computationally intensive and require labels for every image. We thereforedeveloped a new simple, cheap and scalable self-supervised pruning metric thatdemonstrates comparable performance to the best supervised metrics. Overall,our work suggests that the discovery of good data-pruning metrics may provide aviable path forward to substantially improved neural scaling laws, therebyreducing the resource costs of modern deep learning.", "output": "Beyond neural scaling laws: beating power law scaling via data pruning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We focus on the problem of producing well-calibrated out-of-distribution(OOD) detectors, in order to enable safe deployment of medical imageclassifiers. Motivated by the difficulty of curating suitable calibrationdatasets, synthetic augmentations have become highly prevalent forinlier/outlier specification. While there have been rapid advances in dataaugmentation techniques, this paper makes a striking finding that the space inwhich the inliers and outliers are synthesized, in addition to the type ofaugmentation, plays a critical role in calibrating OOD detectors. Using thepopular energy-based OOD detection framework, we find that the optimal protocolis to synthesize latent-space inliers along with diverse pixel-space outliers.Based on empirical studies with multiple medical imaging benchmarks, wedemonstrate that our approach consistently leads to superior OOD detection($15% - 35%$ in AUROC) over the state-of-the-art in a variety of open-setrecognition settings.", "output": "Know Your Space: Inlier and Outlier Construction for Calibrating Medical OOD Detectors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing visual question answering methods tend to capture the cross-modalspurious correlations and fail to discover the true causal mechanism thatfacilitates reasoning truthfully based on the dominant visual evidence and thequestion intention. Additionally, the existing methods usually ignore thecross-modal event-level understanding that requires to jointly model eventtemporality, causality, and dynamics. In this work, we focus on event-levelvisual question answering from a new perspective, i.e., cross-modal causalrelational reasoning, by introducing causal intervention methods to discoverthe true causal structures for visual and linguistic modalities. Specifically,we propose a novel event-level visual question answering framework namedCross-Modal Causal RelatIonal Reasoning (CMCIR), to achieve robustcausality-aware visual-linguistic question answering. To discover cross-modalcausal structures, the Causality-aware Visual-Linguistic Reasoning (CVLR)module is proposed to collaboratively disentangle the visual and linguisticspurious correlations via front-door and back-door causal interventions. Tomodel the fine-grained interactions between linguistic semantics andspatial-temporal representations, we build a Spatial-Temporal Transformer (STT)that creates multi-modal co-occurrence interactions between visual andlinguistic content. To adaptively fuse the causality-ware visual and linguisticfeatures, we introduce a Visual-Linguistic Feature Fusion (VLFF) module thatleverages the hierarchical linguistic semantic relations as the guidance tolearn the global semantic-aware visual-linguistic representations adaptively.Extensive experiments on four event-level datasets demonstrate the superiorityof our CMCIR in discovering visual-linguistic causal structures and achievingrobust event-level visual question answering.", "output": "Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose to model the video dynamics by learning thetrajectory of independently inverted latent codes from GANs. The entiresequence is seen as discrete-time observations of a continuous trajectory ofthe initial latent code, by considering each latent code as a moving particleand the latent space as a high-dimensional dynamic system. The latent codesrepresenting different frames are therefore reformulated as state transitionsof the initial frame, which can be modeled by neural ordinary differentialequations. The learned continuous trajectory allows us to perform infiniteframe interpolation and consistent video manipulation. The latter task isreintroduced for video editing with the advantage of requiring the coreoperations to be applied to the first frame only while maintaining temporalconsistency across all frames. Extensive experiments demonstrate that ourmethod achieves state-of-the-art performance but with much less computation.Code is available at ", "output": "Modelling Latent Dynamics of StyleGAN using Neural ODEs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing techniques for image-to-image translation commonly have sufferedfrom two critical problems: heavy reliance on per-sample domain annotationand/or inability of handling multiple attributes per image. Recenttruly-unsupervised methods adopt clustering approaches to easily provideper-sample one-hot domain labels. However, they cannot account for thereal-world setting: one sample may have multiple attributes. In addition, thesemantics of the clusters are not easily coupled to the human understanding. Toovercome these, we present a LANguage-driven Image-to-image Translation model,dubbed LANIT. We leverage easy-to-obtain candidate attributes given in textsfor a dataset: the similarity between images and attributes indicatesper-sample domain labels. This formulation naturally enables multi-hot label sothat users can specify the target domain with a set of attributes in language.To account for the case that the initial prompts are inaccurate, we alsopresent prompt learning. We further present domain regularization loss thatenforces translated images be mapped to the corresponding domain. Experimentson several standard benchmarks demonstrate that LANIT achieves comparable orsuperior performance to existing models.", "output": "LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The visual quality of point clouds has been greatly emphasized since theever-increasing 3D vision applications are expected to provide cost-effectiveand high-quality experiences for users. Looking back on the development ofpoint cloud quality assessment (PCQA) methods, the visual quality is usuallyevaluated by utilizing single-modal information, i.e., either extracted fromthe 2D projections or 3D point cloud. The 2D projections contain rich textureand semantic information but are highly dependent on viewpoints, while the 3Dpoint clouds are more sensitive to geometry distortions and invariant toviewpoints. Therefore, to leverage the advantages of both point cloud andprojected image modalities, we propose a novel no-reference point cloud qualityassessment (NR-PCQA) metric in a multi-modal fashion. In specific, we split thepoint clouds into sub-models to represent local geometry distortions such aspoint shift and down-sampling. Then we render the point clouds into 2D imageprojections for texture feature extraction. To achieve the goals, thesub-models and projected images are encoded with point-based and image-basedneural networks. Finally, symmetric cross-modal attention is employed to fusemulti-modal quality-aware information. Experimental results show that ourapproach outperforms all compared state-of-the-art methods and is far ahead ofprevious NR-PCQA methods, which highlights the effectiveness of the proposedmethod. The code is available at ", "output": "MM-PCQA: Multi-Modal Learning for No-reference Point Cloud Quality Assessment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D visual grounding aims to find the object within point clouds mentioned byfree-form natural language descriptions with rich semantic cues. However,existing methods either extract the sentence-level features coupling all wordsor focus more on object names, which would lose the word-level information orneglect other attributes. To alleviate these issues, we present EDA thatExplicitly Decouples the textual attributes in a sentence and conducts DenseAlignment between such fine-grained language and point cloud objects.Specifically, we first propose a text decoupling module to produce textualfeatures for every semantic component. Then, we design two losses to supervisethe dense matching between two modalities: position alignment loss and semanticalignment loss. On top of that, we further introduce a new visual groundingtask, locating objects without object names, which can thoroughly evaluate themodel's dense alignment capacity. Through experiments, we achievestate-of-the-art performance on two widely-adopted 3D visual groundingdatasets, ScanRefer and SR3D/NR3D, and obtain absolute leadership on ournewly-proposed task. The source code is available at", "output": "EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans tend to decompose a sentence into different parts like textsc{sth dosth at someplace} and then fill each part with certain content. Inspired bythis, we follow the textit{principle of modular design} to propose a novelimage captioner: learning to Collocate Visual-Linguistic Neural Modules(CVLNM). Unlike the re{widely used} neural module networks in VQA, where thelanguage (ie, question) is fully observable, re{the task of collocatingvisual-linguistic modules is more challenging.} This is because the language isonly partially observable, for which we need to dynamically collocate themodules during the process of image captioning. To sum up, we make thefollowing technical contributions to design and train our CVLNM: 1)textit{distinguishable module design} -- re{four modules in the encoder}including one linguistic module for function words and three visual modules fordifferent content words (ie, noun, adjective, and verb) and another linguisticone in the decoder for commonsense reasoning, 2) a self-attention basedtextit{module controller} for robustifying the visual reasoning, 3) apart-of-speech based textit{syntax loss} imposed on the module controller forfurther regularizing the training of our CVLNM. Extensive experiments on theMS-COCO dataset show that our CVLNM is more effective, eg, achieving a newstate-of-the-art 129.5 CIDEr-D, and more robust, eg, being less likely tooverfit to dataset bias and suffering less when fewer training samples areavailable. Codes are available at url{", "output": "Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ideally disentangled latent space in GAN involves the globalrepresentation of latent space with semantic attribute coordinates. In otherwords, considering that this disentangled latent space is a vector space, thereexists the global semantic basis where each basis component describes oneattribute of generated images. In this paper, we propose an unsupervised methodfor finding this global semantic basis in the intermediate latent space inGANs. This semantic basis represents sample-independent meaningfulperturbations that change the same semantic attribute of an image on the entirelatent space. The proposed global basis, called Fr'echet basis, is derived byintroducing Fr'echet mean to the local semantic perturbations in a latentspace. Fr'echet basis is discovered in two stages. First, the global semanticsubspace is discovered by the Fr'echet mean in the Grassmannian manifold ofthe local semantic subspaces. Second, Fr'echet basis is found by optimizing abasis of the semantic subspace via the Fr'echet mean in the Special OrthogonalGroup. Experimental results demonstrate that Fr'echet basis provides bettersemantic factorization and robustness compared to the previous methods.Moreover, we suggest the basis refinement scheme for the previous methods. Thequantitative experiments show that the refined basis achieves better semanticfactorization while constrained on the same semantic subspace given by theprevious method.", "output": "Finding the global semantic representation in GAN through Frechet Mean."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Coronavirus Disease 2019 (COVID-19) pandemic has increased the publichealth burden and brought profound disaster to humans. For the particularity ofthe COVID-19 medical images with blurred boundaries, low contrast and differentsizes of infection sites, some researchers have improved the segmentationaccuracy by adding model complexity. However, this approach has severelimitations. Increasing the computational complexity and the number ofparameters is unfavorable for model transfer from laboratory to clinic.Meanwhile, the current COVID-19 infections segmentation DCNN-based methods onlyapply to a single modality. To solve the above issues, this paper proposes asymmetric Encoder-Decoder segmentation framework named MS-DCANet. We introduceTokenized MLP block, a novel attention scheme that uses a shift-windowmechanism similar to the Transformer to acquire self-attention and achievelocal-to-global semantic dependency. MS-DCANet also uses several Dual Channelblocks and a Res-ASPP block to expand the receptive field and extractmulti-scale features. On multi-modality COVID-19 tasks, MS-DCANet achievedstate-of-the-art performance compared with other U-shape models. It can welltrade off the accuracy and complexity. To prove the strong generalizationability of our proposed model, we apply it to other tasks (ISIC 2018 and BAA)and achieve satisfactory results.", "output": "MS-DCANet: A Novel Segmentation Network For Multi-Modality COVID-19 Medical Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Abnormal event detection in videos is a challenging problem, partly due tothe multiplicity of abnormal patterns and the lack of their correspondingannotations. In this paper, we propose new constrained pretext tasks to learnobject level normality patterns. Our approach consists in learning a mappingbetween down-scaled visual queries and their corresponding normal appearanceand motion characteristics at the original resolution. The proposed tasks aremore challenging than reconstruction and future frame prediction tasks whichare widely used in the literature, since our model learns to jointly predictspatial and temporal features rather than reconstructing them. We believe thatmore constrained pretext tasks induce a better learning of normality patterns.Experiments on several benchmark datasets demonstrate the effectiveness of ourapproach to localize and track anomalies as it outperforms or reaches thecurrent state-of-the-art on spatio-temporal evaluation metrics.", "output": "Spatio-temporal predictive tasks for abnormal event detection in videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work presents a novel deep-learning-based pipeline for the inverseproblem of image deblurring, leveraging augmentation and pre-training withsynthetic data. Our results build on our winning submission to the recentHelsinki Deblur Challenge 2021, whose goal was to explore the limits ofstate-of-the-art deblurring algorithms in a real-world data setting. The taskof the challenge was to deblur out-of-focus images of random text, thereby in adownstream task, maximizing an optical-character-recognition-based scorefunction. A key step of our solution is the data-driven estimation of thephysical forward model describing the blur process. This enables a stream ofsynthetic data, generating pairs of ground-truth and blurry images on-the-fly,which is used for an extensive augmentation of the small amount of challengedata provided. The actual deblurring pipeline consists of an approximateinversion of the radial lens distortion (determined by the estimated forwardmodel) and a U-Net architecture, which is trained end-to-end. Our algorithm wasthe only one passing the hardest challenge level, achieving over $70%$character recognition accuracy. Our findings are well in line with the paradigmof data-centric machine learning, and we demonstrate its effectiveness in thecontext of inverse problems. Apart from a detailed presentation of ourmethodology, we also analyze the importance of several design choices in aseries of ablation studies. The code of our challenge submission is availableunder ", "output": "Let's Enhance: A Deep Learning Approach to Extreme Deblurring of Text Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We address the problem of synthesizing novel views from a monocular videodepicting a complex dynamic scene. State-of-the-art methods based on temporallyvarying Neural Radiance Fields (aka dynamic NeRFs) have shown impressiveresults on this task. However, for long videos with complex object motions anduncontrolled camera trajectories, these methods can produce blurry orinaccurate renderings, hampering their use in real-world applications. Insteadof encoding the entire dynamic scene within the weights of MLPs, we present anew approach that addresses these limitations by adopting a volumetricimage-based rendering framework that synthesizes new viewpoints by aggregatingfeatures from nearby views in a scene-motion-aware manner. Our system retainsthe advantages of prior methods in its ability to model complex scenes andview-dependent effects, but also enables synthesizing photo-realistic novelviews from long videos featuring complex scene dynamics with unconstrainedcamera trajectories. We demonstrate significant improvements overstate-of-the-art methods on dynamic scene datasets, and also apply our approachto in-the-wild videos with challenging camera and object motion, where priormethods fail to produce high-quality renderings. Our project webpage is atdynibar.github.io.", "output": "DynIBaR: Neural Dynamic Image-Based Rendering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this report, we present a fast and accurate object detection method dubbedDAMO-YOLO, which achieves higher performance than the state-of-the-art YOLOseries. DAMO-YOLO is extended from YOLO with some new technologies, includingNeural Architecture Search (NAS), efficient Reparameterized Generalized-FPN(RepGFPN), a lightweight head with AlignedOTA label assignment, anddistillation enhancement. In particular, we use MAE-NAS, a method guided by theprinciple of maximum entropy, to search our detection backbone under theconstraints of low latency and high performance, producing ResNet/CSP-likestructures with spatial pyramid pooling and focus modules. In the design ofnecks and heads, we follow the rule of ``large neck, small head''.We importGeneralized-FPN with accelerated queen-fusion to build the detector neck andupgrade its CSPNet with efficient layer aggregation networks (ELAN) andreparameterization. Then we investigate how detector head size affectsdetection performance and find that a heavy neck with only one task projectionlayer would yield better results.In addition, AlignedOTA is proposed to solvethe misalignment problem in label assignment. And a distillation schema isintroduced to improve performance to a higher level. Based on these new techs,we build a suite of models at various scales to meet the needs of differentscenarios. For general industry requirements, we propose DAMO-YOLO-T/S/M/L.They can achieve 43.6/47.7/50.2/51.9 mAPs on COCO with the latency of2.78/3.83/5.62/7.95 ms on T4 GPUs respectively. Additionally, for edge deviceswith limited computing power, we have also proposed DAMO-YOLO-Ns/Nm/Nllightweight models. They can achieve 32.3/38.2/40.5 mAPs on COCO with thelatency of 4.08/5.05/6.69 ms on X86-CPU. Our proposed general and lightweightmodels have outperformed other YOLO series models in their respectiveapplication scenarios.", "output": "DAMO-YOLO : A Report on Real-Time Object Detection Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over-parameterization of deep neural networks (DNNs) has shown highprediction accuracy for many applications. Although effective, the large numberof parameters hinders its popularity on resource-limited devices and has anoutsize environmental impact. Sparse training (using a fixed number of nonzeroweights in each iteration) could significantly mitigate the training costs byreducing the model size. However, existing sparse training methods mainly useeither random-based or greedy-based drop-and-grow strategies, resulting inlocal minimal and low accuracy. In this work, we consider the dynamic sparsetraining as a sparse connectivity search problem and design an exploitation andexploration acquisition function to escape from local optima and saddle points.We further design an acquisition function and provide the theoreticalguarantees for the proposed method and clarify its convergence property.Experimental results show that sparse models (up to 98% sparsity) obtained byour proposed method outperform the SOTA sparse training methods on a widevariety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10,ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models.On ResNet-50 / ImageNet, the proposed method has up to 8.2% accuracyimprovement compared to SOTA sparse training methods.", "output": "Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose ClipFace, a novel self-supervised approach for text-guided editingof textured 3D morphable model of faces. Specifically, we employ user-friendlylanguage prompts to enable control of the expressions as well as appearance of3D faces. We leverage the geometric expressiveness of 3D morphable models,which inherently possess limited controllability and texture expressivity, anddevelop a self-supervised generative model to jointly synthesize expressive,textured, and articulated faces in 3D. We enable high-quality texturegeneration for 3D faces by adversarial self-supervised training, guided bydifferentiable rendering against collections of real RGB images. Controllableediting and manipulation are given by language prompts to adapt texture andexpression of the 3D morphable model. To this end, we propose a neural networkthat predicts both texture and expression latent codes of the morphable model.Our model is trained in a self-supervised fashion by exploiting differentiablerendering and losses based on a pre-trained CLIP model. Once trained, our modeljointly predicts face textures in UV-space, along with expression parameters tocapture both geometry and texture changes in facial expressions in a singleforward pass. We further show the applicability of our method to generatetemporally changing textures for a given animation sequence.", "output": "ClipFace: Text-guided Editing of Textured 3D Morphable Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing state-of-the-art method for audio-visual conditioned videoprediction uses the latent codes of the audio-visual frames from a multimodalstochastic network and a frame encoder to predict the next visual frame.However, a direct inference of per-pixel intensity for the next visual framefrom the latent codes is extremely challenging because of the high-dimensionalimage space. To this end, we propose to decouple the audio-visual conditionedvideo prediction into motion and appearance modeling. The first part is themultimodal motion estimation module that learns motion information as opticalflow from the given audio-visual clip. The second part is the context-awarerefinement module that uses the predicted optical flow to warp the currentvisual frame into the next visual frame and refines it base on the givenaudio-visual context. Experimental results show that our method achievescompetitive results on existing benchmarks.", "output": "Motion and Context-Aware Audio-Visual Conditioned Video Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automotive radar sensors provide valuable information for advanced drivingassistance systems (ADAS). Radars can reliably estimate the distance to anobject and the relative velocity, regardless of weather and light conditions.However, radar sensors suffer from low resolution and huge intra-classvariations in the shape of objects. Exploiting the time information (e.g.,multiple frames) has been shown to help to capture better the dynamics ofobjects and, therefore, the variation in the shape of objects. Most temporalradar object detectors use 3D convolutions to learn spatial and temporalinformation. However, these methods are often non-causal and unsuitable forreal-time applications. This work presents RECORD, a new recurrent CNNarchitecture for online radar object detection. We propose an end-to-endtrainable architecture mixing convolutions and ConvLSTMs to learnspatio-temporal dependencies between successive frames. Our model is causal andrequires only the past information encoded in the memory of the ConvLSTMs todetect objects. Our experiments show such a method's relevance for detectingobjects in different radar representations (range-Doppler, range-angle) andoutperform state-of-the-art models on the ROD2021 and CARRADA datasets whilebeing less computationally expensive.", "output": "A recurrent CNN for online object detection on raw radar frames."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video action segmentation under timestamp supervision has recently receivedmuch attention due to lower annotation costs. Most existing methods generatepseudo-labels for all frames in each video to train the segmentation model.However, these methods suffer from incorrect pseudo-labels, especially for thesemantically unclear frames in the transition region between two consecutiveactions, which we call ambiguous intervals. To address this issue, we propose anovel framework from the perspective of clustering, which includes thefollowing two parts. First, pseudo-label ensembling generates incomplete buthigh-quality pseudo-label sequences, where the frames in ambiguous intervalshave no pseudo-labels. Second, iterative clustering iteratively propagates thepseudo-labels to the ambiguous intervals by clustering, and thus updates thepseudo-label sequences to train the model. We further introduce a clusteringloss, which encourages the features of frames within the same action segmentmore compact. Extensive experiments show the effectiveness of our method.", "output": "Timestamp-Supervised Action Segmentation from the Perspective of Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability to jointly learn from multiple modalities, such as text, audio,and visual data, is a defining feature of intelligent systems. While there havebeen promising advances in designing neural networks to harness multimodaldata, the enormous success of data augmentation currently remains limited tosingle-modality tasks like image classification. Indeed, it is particularlydifficult to augment each modality while preserving the overall semanticstructure of the data; for example, a caption may no longer be a gooddescription of an image after standard augmentations have been applied, such astranslation. Moreover, it is challenging to specify reasonable transformationsthat are not tailored to a particular modality. In this paper, we introduceLeMDA, Learning Multimodal Data Augmentation, an easy-to-use method thatautomatically learns to jointly augment multimodal data in feature space, withno constraints on the identities of the modalities or the relationship betweenmodalities. We show that LeMDA can (1) profoundly improve the performance ofmultimodal deep learning architectures, (2) apply to combinations of modalitiesthat have not been previously considered, and (3) achieve state-of-the-artresults on a wide range of applications comprised of image, text, and tabulardata.", "output": "Learning Multimodal Data Augmentation in Feature Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "People often use their hands to make contact with the world and applypressure. Machine perception of this important human activity could be widelyapplied. Prior research has shown that deep models can estimate hand pressurebased on a single RGB image. Yet, evaluations have been limited to controlledsettings, since performance relies on training data with high-resolutionpressure measurements that are difficult to obtain. We present a novel approachthat enables diverse data to be captured with only an RGB camera and acooperative participant. Our key insight is that people can be prompted toperform actions that correspond with categorical labels describing contactpressure (contact labels), and that the resulting weakly labeled data can beused to train models that perform well under varied conditions. We demonstratethe effectiveness of our approach by training on a novel dataset with 51participants making fingertip contact with instrumented and uninstrumentedobjects. Our network, ContactLabelNet, dramatically outperforms prior work,performs well under diverse conditions, and matched or exceeded the performanceof human annotators.", "output": "Visual Estimation of Fingertip Pressure on Diverse Surfaces using Easily Captured Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modeling strong gravitational lenses in order to quantify the distortions inthe images of background sources and to reconstruct the mass density in theforeground lenses has been a difficult computational challenge. As the qualityof gravitational lens images increases, the task of fully exploiting theinformation they contain becomes computationally and algorithmically moredifficult. In this work, we use a neural network based on the RecurrentInference Machine (RIM) to simultaneously reconstruct an undistorted image ofthe background source and the lens mass density distribution as pixelated maps.The method iteratively reconstructs the model parameters (the image of thesource and a pixelated density map) by learning the process of optimizing thelikelihood given the data using the physical model (a ray-tracing simulation),regularized by a prior implicitly learned by the neural network through itstraining data. When compared to more traditional parametric models, theproposed method is significantly more expressive and can reconstruct complexmass distributions, which we demonstrate by using realistic lensing galaxiestaken from the IllustrisTNG cosmological hydrodynamic simulation.", "output": "Pixelated Reconstruction of Foreground Density and Background Surface Brightness in Gravitational Lensing Systems using Recurrent Inference Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Consistency regularization and pseudo labeling-based semi-supervised methodsperform co-training using the pseudo labels from multi-view inputs. However,such co-training models tend to converge early to a consensus, degenerating tothe self-training ones, and produce low-confidence pseudo labels from theperturbed inputs during training. To address these issues, we propose anUncertainty-guided Collaborative Mean-Teacher (UCMT) for semi-supervisedsemantic segmentation with the high-confidence pseudo labels. Concretely, UCMTconsists of two main components: 1) collaborative mean-teacher (CMT) forencouraging model disagreement and performing co-training between thesub-networks, and 2) uncertainty-guided region mix (UMIX) for manipulating theinput images according to the uncertainty maps of CMT and facilitating CMT toproduce high-confidence pseudo labels. Combining the strengths of UMIX withCMT, UCMT can retain model disagreement and enhance the quality of pseudolabels for the co-training segmentation. Extensive experiments on four publicmedical image datasets including 2D and 3D modalities demonstrate thesuperiority of UCMT over the state-of-the-art. Code is available at:", "output": "Co-training with High-Confidence Pseudo Labels for Semi-supervised Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "HybrIK relies on a combination of analytical inverse kinematics and deeplearning to produce more accurate 3D pose estimation from 2D monocular images.HybrIK has three major components: (1) pretrained convolution backbone, (2)deconvolution to lift 3D pose from 2D convolution features, (3) analyticalinverse kinematics pass correcting deep learning prediction using learneddistribution of plausible twist and swing angles. In this paper we propose anenhancement of the 2D to 3D lifting module, replacing deconvolution withTransformer, resulting in accuracy and computational efficiency improvementrelative to the original HybrIK method. We demonstrate our results on commonlyused H36M, PW3D, COCO and HP3D datasets. Our code is publicly available", "output": "3D Human Pose and Shape Estimation via HybrIK-Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Relation-focused cross-modal information retrieval focuses on retrievinginformation based on relations expressed in user queries, and it isparticularly important in information retrieval applications andnext-generation search engines. While pre-trained networks like ContrastiveLanguage-Image Pre-training (CLIP) have achieved state-of-the-art performancein cross-modal learning tasks, the Vision Transformer (ViT) used in thesenetworks is limited in its ability to focus on image region relations.Specifically, ViT is trained to match images with relevant descriptions at theglobal level, without considering the alignment between image regions anddescriptions. This paper introduces VITR, a novel network that enhances ViT byextracting and reasoning about image region relations based on a Local encoder.VITR comprises two main components: (1) extending the capabilities of ViT-basedcross-modal networks to extract and reason with region relations in images; and(2) aggregating the reasoned results with the global knowledge to predict thesimilarity scores between images and descriptions. Experiments were carried outby applying the proposed network to relation-focused cross-modal informationretrieval tasks on the Flickr30K, RefCOCOg, and CLEVR datasets. The resultsrevealed that the proposed VITR network outperformed various otherstate-of-the-art networks including CLIP, VSE$infty$, and VSRN++ on bothimage-to-text and text-to-image cross-modal information retrieval tasks.", "output": "VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Huge challenges exist for old landslide detection because their morphologyfeatures have been partially or strongly transformed over a long time and havelittle difference from their surrounding. Besides, small-sample problem alsorestrict in-depth learning.In this paper, an iterative classification and semantic segmentation network(ICSSN) is developed, which can greatly enhance both object-level andpixel-level classification performance by iteratively upgrading the featureextractor shared by two network. An object-level contrastive learning (OCL)strategy is employed in the object classification sub-network featuring asiamese network to realize the global features extraction, and asub-object-level contrastive learning (SOCL) paradigm is designed in thesemantic segmentation sub-network to efficiently extract salient features fromboundaries of landslides. Moreover, an iterative training strategy iselaborated to fuse features in semantic space such that both object-level andpixel-level classification performance are improved.The proposed ICSSN is evaluated on the real landslide data set, and theexperimental results show that ICSSN can greatly improve the classification andsegmentation accuracy of old landslide detection. For the semantic segmentationtask, compared to the baseline, the F1 score increases from 0.5054 to 0.5448,the mIoU improves from 0.6405 to 0.6610, the landslide IoU improved from 0.3381to 0.3743, and the object-level detection accuracy of old landslides isenhanced from 0.55 to 0.9. For the object classification task, the F1 scoreincreases from 0.8846 to 0.9230, and the accuracy score is up from 0.8375 to0.8875.", "output": "An Iterative Classification and Semantic Segmentation Network for Old Landslide Detection Using High-Resolution Remote Sensing Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reconstructing perceived natural images or decoding their categories fromfMRI signals are challenging tasks with great scientific significance. Due tothe lack of paired samples, most existing methods fail to generate semanticallyrecognizable reconstruction and are difficult to generalize to novel classes.In this work, we propose, for the first time, a task-agnostic brain decodingmodel by unifying the visual stimulus classification and reconstruction tasksin a semantic space. We denote it as BrainCLIP, which leverages CLIP'scross-modal generalization ability to bridge the modality gap between brainactivities, images, and texts. Specifically, BrainCLIP is a VAE-basedarchitecture that transforms fMRI patterns into the CLIP embedding space bycombining visual and textual supervision. Note that previous works rarely usemulti-modal supervision for visual stimulus decoding. Our experimentsdemonstrate that textual supervision can significantly boost the performance ofdecoding models compared to the condition where only image supervision exists.BrainCLIP can be applied to multiple scenarios like fMRI-to-image generation,fMRI-image-matching, and fMRI-text-matching. Compared with BraVL, a recentlyproposed multi-modal method for fMRI-based brain decoding, BrainCLIP achievessignificantly better performance on the novel class classification task.BrainCLIP also establishes a new state-of-the-art for fMRI-based natural imagereconstruction in terms of high-level image features.", "output": "BrainCLIP: Bridging Brain and Visual-Linguistic Representation via CLIP for Generic Natural Visual Stimulus Decoding from fMRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The performance of video action recognition has been significantly boosted byusing motion representations within a two-stream Convolutional Neural Network(CNN) architecture. However, there are a few challenging problems in actionrecognition in real scenarios, e.g., the variations in viewpoints and poses,and the changes in backgrounds. The domain discrepancy between the trainingdata and the test data causes the performance drop. To improve the modelrobustness, we propose a novel method to determine the task-irrelevant contentin inputs which increases the domain discrepancy. The method is based on ahuman parsing model (HP model) which jointly conducts dense correspondencelabelling and semantic part segmentation. The predictions from the HP modelalso function as re-rendering the human regions in each video using the sameset of textures to make humans appearances in all classes be the same. Arevised dataset is generated for training and testing and makes the actionrecognition model exhibit invariance to the irrelevant content in the inputs.Moreover, the predictions from the HP model are used to enrich the inputs tothe AR model during both training and testing. Experimental results show thatour proposed model is superior to existing models for action recognition on theHMDB-51 dataset and the Penn Action dataset.", "output": "Texture-Based Input Feature Selection for Action Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Class-agnostic object counting aims to count object instances of an arbitraryclass at test time. It is challenging but also enables many potentialapplications. Current methods require human-annotated exemplars as inputs whichare often unavailable for novel categories, especially for autonomous systems.Thus, we propose zero-shot object counting (ZSC), a new setting where only theclass name is available during test time. Such a counting system does notrequire human annotators in the loop and can operate automatically. Startingfrom a class name, we propose a method that can accurately identify the optimalpatches which can then be used as counting exemplars. Specifically, we firstconstruct a class prototype to select the patches that are likely to containthe objects of interest, namely class-relevant patches. Furthermore, weintroduce a model that can quantitatively measure how suitable an arbitrarypatch is as a counting exemplar. By applying this model to all the candidatepatches, we can select the most suitable patches as exemplars for counting.Experimental results on a recent class-agnostic counting dataset, FSC-147,validate the effectiveness of our method. Code is available at", "output": "Zero-shot Object Counting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks provide state-of-the-art accuracy for vision tasks butthey require significant resources for training. Thus, they are trained oncloud servers far from the edge devices that acquire the data. This issueincreases communication cost, runtime and privacy concerns. In this study, anovel hierarchical training method for deep neural networks is proposed thatuses early exits in a divided architecture between edge and cloud workers toreduce the communication cost, training runtime and privacy concerns. Themethod proposes a brand-new use case for early exits to separate the backwardpass of neural networks between the edge and the cloud during the trainingphase. We address the issues of most available methods that due to thesequential nature of the training phase, cannot train the levels of hierarchysimultaneously or they do it with the cost of compromising privacy. Incontrast, our method can use both edge and cloud workers simultaneously, doesnot share the raw input data with the cloud and does not require communicationduring the backward pass. Several simulations and on-device experiments fordifferent neural network architectures demonstrate the effectiveness of thismethod. It is shown that the proposed method reduces the training runtime by29% and 61% in CIFAR-10 classification experiment for VGG-16 and ResNet-18 whenthe communication with the cloud is done at a low bit rate channel. This gainin the runtime is achieved whilst the accuracy drop is negligible. This methodis advantageous for online learning of high-accuracy deep neural networks onlow-resource devices such as mobile phones or robots as a part of an edge-cloudsystem, making them more flexible in facing new tasks and classes of data.", "output": "Hierarchical Training of Deep Neural Networks Using Early Exiting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work presents RiDDLE, short for Reversible and DiversifiedDe-identification with Latent Encryptor, to protect the identity information ofpeople from being misused. Built upon a pre-learned StyleGAN2 generator, RiDDLEmanages to encrypt and decrypt the facial identity within the latent space. Thedesign of RiDDLE has three appealing properties. First, the encryption processis cipher-guided and hence allows diverse anonymization using differentpasswords. Second, the true identity can only be decrypted with the correctpassword, otherwise the system will produce another de-identified face tomaintain the privacy. Third, both encryption and decryption share an efficientimplementation, benefiting from a carefully tailored lightweight encryptor.Comparisons with existing alternatives confirm that our approach accomplishesthe de-identification task with better quality, higher diversity, and strongerreversibility. We further demonstrate the effectiveness of RiDDLE inanonymizing videos. Code and models will be made publicly available.", "output": "RiDDLE: Reversible and Diversified De-identification with Latent Encryptor."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent rise of Self-Supervised Learning (SSL) as one of the preferredstrategies for learning with limited labeled data, and abundant unlabeled datahas led to the widespread use of these models. They are usually pretrained,finetuned, and evaluated on the same data distribution, i.e., within anin-distribution setting. However, they tend to perform poorly inout-of-distribution evaluation scenarios, a challenge that Unsupervised DomainGeneralization (UDG) seeks to address.This paper introduces a novel method to standardize the styles of images in abatch. Batch styles standardization, relying on Fourier-based augmentations,promotes domain invariance in SSL by preventing spurious correlations fromleaking into the features. The combination of batch styles standardization withthe well-known contrastive-based method SimCLR leads to a novel UDG methodnamed CLaSSy ($textbf{C}$ontrastive $textbf{L}$e$textbf{a}$rning with$textbf{S}$tandardized $textbf{S}$t$textbf{y}$les). CLaSSy offers seriousadvantages over prior methods, as it does not rely on domain labels and isscalable to handle a large number of domains. Experimental results on variousUDG datasets demonstrate the superior performance of CLaSSy compared toexisting UDG methods. Finally, the versatility of the proposed batch stylesstandardization is demonstrated by extending respectively the contrastive-basedand non-contrastive-based SSL methods, SWaV and MSN, while consideringdifferent backbone architectures (convolutional-based, transformers-based).", "output": "Improving Domain-Invariance in Self-Supervised Learning via Batch Styles Standardization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Achieving high-quality semantic segmentation predictions using onlyimage-level labels enables a new level of real-world applicability. Althoughstate-of-the-art networks deliver reliable predictions, the amount ofhandcrafted pixel-wise annotations to enable these results are not feasible inmany real-world applications. Hence, several works have already targeted thisbottleneck, using classifier-based networks like Class ActivationMaps~cite{CAM} (CAMs) as a base. Addressing CAM's weaknesses of fuzzy bordersand incomplete predictions, state-of-the-art approaches rely only on addingregulations to the classifier loss or using pixel-similarity-based refinementafter the fact. We propose a framework that introduces an additional moduleusing object perimeters for improved saliency. We define object perimeterinformation as the line separating the object and background. Our newPerimeterFit module will be applied to pre-refine the CAM predictions beforeusing the pixel-similarity-based network. In this way, our PerimeterFitincreases the quality of the CAM prediction while simultaneously improving thefalse negative rate. We investigated a wide range of state-of-the-artunsupervised semantic segmentation networks and edge detection techniques tocreate useful perimeter maps, which enable our framework to predict objectlocations with sharper perimeters. We achieved up to 1.5% improvement overframeworks without our PerimeterFit module. We conduct an exhaustive analysisto illustrate that SILOP enhances existing state-of-the-art frameworks forimage-level-based semantic segmentation. The framework is open-source andaccessible online at ", "output": "SILOP: An Automated Framework for Semantic Segmentation Using Image Labels Based on Object Perimeters."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robust point cloud classification is crucial for real-world applications, asconsumer-type 3D sensors often yield partial and noisy data, degraded byvarious artifacts. In this work we propose a general ensemble framework, basedon partial point cloud sampling. Each ensemble member is exposed to onlypartial input data. Three sampling strategies are used jointly, two local ones,based on patches and curves, and a global one of random sampling. Wedemonstrate the robustness of our method to various local and globaldegradations. We show that our framework significantly improves the robustnessof top classification netowrks by a large margin. Our experimental setting usesthe recently introduced ModelNet-C database by Ren et al.[24], where we reachSOTA both on unaugmented and on augmented data. Our unaugmented mean CorruptionError (mCE) is 0.64 (current SOTA is 0.86) and 0.50 for augmented data (currentSOTA is 0.57). We analyze and explain these remarkable results throughdiversity analysis. Our code is available at:", "output": "EPiC: Ensemble of Partial Point Clouds for Robust Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing Transformer-based RGBT tracking methods either use cross-attentionto fuse the two modalities, or use self-attention and cross-attention to modelboth modality-specific and modality-sharing information. However, thesignificant appearance gap between modalities limits the feature representationability of certain modalities during the fusion process. To address thisproblem, we propose a novel Progressive Fusion Transformer called ProFormer,which progressively integrates single-modality information into the multimodalrepresentation for robust RGBT tracking. In particular, ProFormer first uses aself-attention module to collaboratively extract the multimodal representation,and then uses two cross-attention modules to interact it with the features ofthe dual modalities respectively. In this way, the modality-specificinformation can well be activated in the multimodal representation. Finally, afeed-forward network is used to fuse two interacted multimodal representationsfor the further enhancement of the final multimodal representation. Inaddition, existing learning methods of RGBT trackers either fuse multimodalfeatures into one for final classification, or exploit the relationship betweenunimodal branches and fused branch through a competitive learning strategy.However, they either ignore the learning of single-modality branches or resultin one branch failing to be well optimized. To solve these problems, we proposea dynamically guided learning algorithm that adaptively uses well-performingbranches to guide the learning of other branches, for enhancing therepresentation ability of each branch. Extensive experiments demonstrate thatour proposed ProFormer sets a new state-of-the-art performance on RGBT210,RGBT234, LasHeR, and VTUAV datasets.", "output": "RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Patch-to-point matching has become a robust way of point cloud registration.However, previous patch-matching methods employ superpoints with poorlocalization precision as nodes, which may lead to ambiguous patch partitions.In this paper, we propose a HybridPoint-based network to find more robust andaccurate correspondences. Firstly, we propose to use salient points withprominent local features as nodes to increase patch repeatability, andintroduce some uniformly distributed points to complete the point cloud, thusconstituting hybrid points. Hybrid points not only have better localizationprecision but also give a complete picture of the whole point cloud.Furthermore, based on the characteristic of hybrid points, we propose adual-classes patch matching module, which leverages the matching results ofsalient points and filters the matching noise of non-salient points.Experiments show that our model achieves state-of-the-art performance on3DMatch, 3DLoMatch, and KITTI odometry, especially with 93.0% RegistrationRecall on the 3DMatch dataset. Our code and models are available at", "output": "HybridPoint: Point Cloud Registration Based on Hybrid Point Sampling and Matching."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While recent research has made significant progress in speech-driven talkingface generation, the quality of the generated video still lags behind that ofreal recordings. One reason for this is the use of handcrafted intermediaterepresentations like facial landmarks and 3DMM coefficients, which are designedbased on human knowledge and are insufficient to precisely describe facialmovements. Additionally, these methods require an external pretrained model forextracting these representations, whose performance sets an upper bound ontalking face generation. To address these limitations, we propose a novelmethod called DAE-Talker that leverages data-driven latent representationsobtained from a diffusion autoencoder (DAE). DAE contains an image encoder thatencodes an image into a latent vector and a DDIM image decoder thatreconstructs the image from it. We train our DAE on talking face video framesand then extract their latent representations as the training target for aConformer-based speech2latent model. This allows DAE-Talker to synthesize fullvideo frames and produce natural head movements that align with the content ofspeech, rather than relying on a predetermined head pose from a template video.We also introduce pose modelling in speech2latent for pose controllability.Additionally, we propose a novel method for generating continuous video frameswith the DDIM image decoder trained on individual frames, eliminating the needfor modelling the joint distribution of consecutive frames directly. Ourexperiments show that DAE-Talker outperforms existing popular methods inlip-sync, video fidelity, and pose naturalness. We also conduct ablationstudies to analyze the effectiveness of the proposed techniques and demonstratethe pose controllability of DAE-Talker.", "output": "DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Whether by processing videos with fixed resolution from start to end orincorporating pooling and down-scaling strategies, existing video transformersprocess the whole video content throughout the network without speciallyhandling the large portions of redundant information. In this paper, we presenta Supertoken Video Transformer (SVT) that incorporates a Semantic PoolingModule (SPM) to aggregate latent representations along the depth of visualtransformer based on their semantics, and thus, reduces redundancy inherent invideo inputs.~Qualitative results show that our method can effectively reduceredundancy by merging latent representations with similar semantics and thusincrease the proportion of salient information for downstreamtasks.~Quantitatively, our method improves the performance of both ViT and MViTwhile requiring significantly less computations on the Kinectics andSomething-Something-V2 benchmarks.~More specifically, with our SPM, we improvethe accuracy of MAE-pretrained ViT-B and ViT-L by 1.5% with 33% less GFLOPs andby 0.2% with 55% less FLOPs, respectively, on the Kinectics-400 benchmark, andimprove the accuracy of MViTv2-B by 0.2% and 0.3% with 22% less GFLOPs onKinectics-400 and Something-Something-V2, respectively.", "output": "SVT: Supertoken Video Transformer for Efficient Video Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, source-free unsupervised domain adaptation (SFUDA) has emerged as amore practical and feasible approach compared to unsupervised domain adaptation(UDA) which assumes that labeled source data are always accessible. However,significant limitations associated with SFUDA approaches are often overlooked,which limits their practicality in real-world applications. These limitationsinclude a lack of principled ways to determine optimal hyperparameters andperformance degradation when the unlabeled target data fail to meet certainrequirements such as a closed-set and identical label distribution to thesource data. All these limitations stem from the fact that SFUDA entirelyrelies on unlabeled target data. We empirically demonstrate the limitations ofexisting SFUDA methods in real-world scenarios including out-of-distributionand label distribution shifts in target data, and verify that none of thesemethods can be safely applied to real-world settings. Based on our experimentalresults, we claim that fine-tuning a source pretrained model with a few labeleddata (e.g., 1- or 3-shot) is a practical and reliable solution to circumventthe limitations of SFUDA. Contrary to common belief, we find that carefullyfine-tuned models do not suffer from overfitting even when trained with only afew labeled data, and also show little change in performance due to samplingbias. Our experimental results on various domain adaptation benchmarksdemonstrate that the few-shot fine-tuning approach performs comparatively underthe standard SFUDA settings, and outperforms comparison methods under realisticscenarios. Our code is available at .", "output": "Few-shot Fine-tuning is All You Need for Source-free Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Thermal imaging has numerous advantages over regular visible-range imagingsince it performs well in low-light circumstances. Super-Resolution approachescan broaden their usefulness by replicating accurate high-resolution thermalpictures using measurements from low-cost, low-resolution thermal sensors.Because of the spectral range mismatch between the images, GuidedSuper-Resolution of thermal images utilizing visible range images is difficult.However, In case of failure to capture Visible Range Images can prevent theoperations of applications in critical areas. We present a novel data fusionframework and regularization technique for Guided Super Resolution of Thermalimages. The proposed architecture is computationally in-expensive andlightweight with the ability to maintain performance despite missing one of themodalities, i.e., high-resolution RGB image or the lower-resolution thermalimage, and is designed to be robust in the presence of missing data. Theproposed method presents a promising solution to the frequently occurringproblem of missing modalities in a real-world scenario. Code is available at .", "output": "CoReFusion: Contrastive Regularized Fusion for Guided Thermal Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is a promising technique for unsupervised anomaly detectionin industrial applications, where the availability of positive samples is oftenlimited due to factors such as commercial competition and sample collectiondifficulties. In this paper, how to effectively select and apply dataaugmentation methods for unsupervised anomaly detection is studied. The impactof various data augmentation methods on different anomaly detection algorithmsis systematically investigated through experiments. The experimental resultsshow that the performance of different industrial image anomaly detection(termed as IAD) algorithms is not significantly affected by the specific dataaugmentation method employed and that combining multiple data augmentationmethods does not necessarily yield further improvements in the accuracy ofanomaly detection, although it can achieve excellent results on specificmethods. These findings provide useful guidance on selecting appropriate dataaugmentation methods for different requirements in IAD.", "output": "What makes a good data augmentation for few-shot unsupervised image anomaly detection?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Portrait retouching aims to improve the aesthetic quality of input portraitphotos and especially requires human-region priority. The deep learning-basedmethods largely elevate the retouching efficiency and provide promisingretouched results. However, existing portrait retouching methods focus onautomatic retouching, which treats all human-regions equally and ignores users'preferences for specific individuals, thus suffering from limited flexibilityin interactive scenarios. In this work, we emphasize the importance of users'intents and explore the interactive portrait retouching task. Specifically, wepropose a region-aware retouching framework with two branches: an automaticbranch and an interactive branch. The automatic branch involves anencoding-decoding process, which searches region candidates and performsautomatic region-aware retouching without user guidance. The interactive branchencodes sparse user guidance into a priority condition vector and modulateslatent features with a region selection module to further emphasize theuser-specified regions. Experimental results show that our interactive brancheffectively captures users' intents and generalizes well to unseen scenes withsparse user guidance, while our automatic branch also outperforms thestate-of-the-art retouching methods due to improved region-awareness.", "output": "Region-Aware Portrait Retouching with Sparse Interactive Guidance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of deep generative models, recent years have seen greatsuccess of Chinese landscape painting generation. However, few works focus oncontrollable Chinese landscape painting generation due to the lack of data andlimited modeling capabilities. In this work, we propose a controllable Chineselandscape painting generation method named CCLAP, which can generate paintingwith specific content and style based on Latent Diffusion Model. Specifically,it consists of two cascaded modules, i.e., content generator and styleaggregator. The content generator module guarantees the content of generatedpaintings specific to the input text. While the style aggregator module is togenerate paintings of a style corresponding to a reference image. Moreover, anew dataset of Chinese landscape paintings named CLAP is collected forcomprehensive evaluation. Both the qualitative and quantitative resultsdemonstrate that our method achieves state-of-the-art performance, especiallyin artfully-composed and artistic conception. Codes are available at", "output": "CCLAP: Controllable Chinese Landscape Painting Generation via Latent Diffusion Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Gene expression can be used to subtype breast cancer with improved predictionof risk of recurrence and treatment responsiveness over that obtained usingroutine immunohistochemistry (IHC). However, in the clinic, molecular profilingis primarily used for ER+ cancer and is costly and tissue destructive, requiresspecialized platforms and takes several weeks to obtain a result. Deep learningalgorithms can effectively extract morphological patterns in digitalhistopathology images to predict molecular phenotypes quickly andcost-effectively. We propose a new, computationally efficient approach calledhist2RNA inspired by bulk RNA-sequencing techniques to predict the expressionof 138 genes (incorporated from six commercially available molecular profilingtests), including luminal PAM50 subtype, from hematoxylin and eosin (H&E)stained whole slide images (WSIs). The training phase involves the aggregationof extracted features for each patient from a pretrained model to predict geneexpression at the patient level using annotated H&E images from The CancerGenome Atlas (TCGA, n=335). We demonstrate successful gene prediction on aheld-out test set (n=160, corr=0.82 across patients, corr=0.29 across genes)and perform exploratory analysis on an external tissue microarray (TMA) dataset(n=498) with known IHC and survival information. Our model is able to predictgene expression and luminal PAM50 subtype (Luminal A versus Luminal B) on theTMA dataset with prognostic significance for overall survival in univariateanalysis (c-index=0.56, hazard ratio=2.16 (95% CI 1.12-3.06), p<5x10-3), andindependent significance in multivariate analysis incorporating standardclinicopathological variables (c-index=0.65, hazard ratio=1.85 (95% CI1.30-2.68), p<5x10-3).", "output": "hist2RNA: An efficient deep learning architecture to predict gene expression from breast cancer histopathology images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Kinship recognition aims to determine whether the subjects in two facialimages are kin or non-kin, which is an emerging and challenging problem.However, most previous methods focus on heuristic designs without consideringthe spatial correlation between face images. In this paper, we aim to learndiscriminative kinship representations embedded with the relation informationbetween face components (e.g., eyes, nose, etc.). To achieve this goal, wepropose the Face Componential Relation Network, which learns the relationshipbetween face components among images with a cross-attention mechanism, whichautomatically learns the important facial regions for kinship recognition.Moreover, we propose Face Componential Relation Network (FaCoRNet), whichadapts the loss function by the guidance from cross-attention to learn morediscriminative feature representations. The proposed FaCoRNet outperformsprevious state-of-the-art methods by large margins for the largest publickinship recognition FIW benchmark. The code will be publicly released uponacceptance.", "output": "Kinship Representation Learning with Face Componential Relation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Document-based Visual Question Answering examines the document understandingof document images in conditions of natural language questions. We proposed anew document-based VQA dataset, PDF-VQA, to comprehensively examine thedocument understanding from various aspects, including document elementrecognition, document layout structural understanding as well as contextualunderstanding and key information extraction. Our PDF-VQA dataset extends thecurrent scale of document understanding that limits on the single document pageto the new scale that asks questions over the full document of multiple pages.We also propose a new graph-based VQA model that explicitly integrates thespatial and hierarchically structural relationships between different documentelements to boost the document structural understanding. The performances arecompared with several baselines over different question types andtasksfootnote{The full dataset will be released after paper acceptance.", "output": "PDFVQA: A New Dataset for Real-World VQA on PDF Documents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pretrained backbones with fine-tuning have been widely adopted in 2D visionand natural language processing tasks and demonstrated significant advantagesto task-specific networks. In this paper, we present a pretrained 3D backbone,named Swin3D, which first outperforms all state-of-the-art methods indownstream 3D indoor scene understanding tasks. Our backbone network is basedon a 3D Swin transformer and carefully designed to efficiently conductself-attention on sparse voxels with linear memory complexity and capture theirregularity of point signals via generalized contextual relative positionalembedding. Based on this backbone design, we pretrained a large Swin3D model ona synthetic Structured3D dataset that is 10 times larger than the ScanNetdataset and fine-tuned the pretrained model in various downstream real-worldindoor scene understanding tasks. The results demonstrate that our modelpretrained on the synthetic dataset not only exhibits good generality in bothdownstream segmentation and detection on real 3D point datasets, but alsosurpasses the state-of-the-art methods on downstream tasks after fine-tuningwith +2.3 mIoU and +2.2 mIoU on S3DIS Area5 and 6-fold semantic segmentation,+2.1 mIoU on ScanNet segmentation (val), +1.9 mAP@0.5 on ScanNet detection,+8.1 mAP@0.5 on S3DIS detection. Our method demonstrates the great potential ofpretrained 3D backbones with fine-tuning for 3D understanding tasks. The codeand models are available at .", "output": "Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Structured reconstruction is a non-trivial dense prediction problem, whichextracts structural information (eg, building corners and edges) from a rasterimage, then reconstructs it to a 2D planar graph accordingly. Compared withcommon segmentation or detection problems, it significantly relays on thecapability that leveraging holistic geometric information for structuralreasoning. Current transformer-based approaches tackle this challenging problemin a two-stage manner, which detect corners in the first model and classify theproposed edges (corner-pairs) in the second model. However, they separatetwo-stage into different models and only share the backbone encoder. Unlike theexisting modeling strategies, we present an enhanced corner representationmethod: 1) It fuses knowledge between the corner detection and edge predictionby sharing feature in different granularity; 2) Corner candidates are proposedin four heatmap channels w.r.t its direction. Both qualitative and quantitativeevaluations demonstrate that our proposed method can better reconstructfine-grained structures, such as adjacent corners and tiny edges. Consequently,it outperforms the state-of-the-art model by +1.9%@F-1 on Corner and+3.0%@F-1 on Edge.", "output": "CornerFormer: Boosting Corner Representation for Fine-Grained Structured Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Extracting single-cell information from microscopy data requires accurateinstance-wise segmentations. Obtaining pixel-wise segmentations from microscopyimagery remains a challenging task, especially with the added complexity ofmicrostructured environments. This paper presents a novel dataset forsegmenting yeast cells in microstructures. We offer pixel-wise instancesegmentation labels for both cells and trap microstructures. In total, werelease 493 densely annotated microscopy images. To facilitate a unifiedcomparison between novel segmentation algorithms, we propose a standardizedevaluation strategy for our dataset. The aim of the dataset and evaluationstrategy is to facilitate the development of new cell segmentation approaches.The dataset is publicly available at .", "output": "An Instance Segmentation Dataset of Yeast Cells in Microstructures."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The drastic variation of motion in spatial and temporal dimensions makes thevideo prediction task extremely challenging. Existing RNN models obtain higherperformance by deepening or widening the model. They obtain the multi-scalefeatures of the video only by stacking layers, which is inefficient and bringsunbearable training costs (such as memory, FLOPs, and training time). Differentfrom them, this paper proposes a spatiotemporal multi-scale model calledMS-LSTM wholly from a multi-scale perspective. On the basis of stacked layers,MS-LSTM incorporates two additional efficient multi-scale designs to fullycapture spatiotemporal context information. Concretely, we employ LSTMs withmirrored pyramid structures to construct spatial multi-scale representationsand LSTMs with different convolution kernels to construct temporal multi-scalerepresentations. Detailed comparison experiments with eight baseline models onfour video datasets show that MS-LSTM has better performance but lower trainingcosts.", "output": "MS-LSTM: Exploring Spatiotemporal Multiscale Representations in Video Prediction Domain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Forecasting players in sports has grown in popularity due to the potentialfor a tactical advantage and the applicability of such research to multi-agentinteraction systems. Team sports contain a significant social component thatinfluences interactions between teammates and opponents. However, it stillneeds to be fully exploited. In this work, we hypothesize that each participanthas a specific function in each action and that role-based interaction iscritical for predicting players' future moves. We create RolFor, a novelend-to-end model for Role-based Forecasting. RolFor uses a new module wedeveloped called Ordering Neural Networks (OrderNN) to permute the order of theplayers such that each player is assigned to a latent role. The latent role isthen modeled with a RoleGCN. Thanks to its graph representation, it provides afully learnable adjacency matrix that captures the relationships between rolesand is subsequently used to forecast the players' future trajectories.Extensive experiments on a challenging NBA basketball dataset back up theimportance of roles and justify our goal of modeling them using optimizablemodels. When an oracle provides roles, the proposed RolFor compares favorablyto the current state-of-the-art (it ranks first in terms of ADE and second interms of FDE errors). However, training the end-to-end RolFor incurs the issuesof differentiability of permutation methods, which we experimentally review.Finally, this work restates differentiable ranking as a difficult open problemand its great potential in conjunction with graph-based interaction models.Project is available at: ", "output": "About latent roles in forecasting players in team sports."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D shape completion from point clouds is a challenging task, especially fromscans of real-world objects. Considering the paucity of 3D shape ground truthsfor real scans, existing works mainly focus on benchmarking this task onsynthetic data, e.g. 3D computer-aided design models. However, the domain gapbetween synthetic and real data limits the generalizability of these methods.Thus, we propose a new task, SCoDA, for the domain adaptation of real scanshape completion from synthetic data. A new dataset, ScanSalon, is contributedwith a bunch of elaborate 3D models created by skillful artists according toscans. To address this new task, we propose a novel cross-domain feature fusionmethod for knowledge transfer and a novel volume-consistent self-trainingframework for robust learning from real data. Extensive experiments prove ourmethod is effective to bring an improvement of 6%~7% mIoU.", "output": "SCoDA: Domain Adaptive Shape Completion for Real Scans."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While lightweight ViT framework has made tremendous progress in imagesuper-resolution, its uni-dimensional self-attention modeling, as well ashomogeneous aggregation scheme, limit its effective receptive field (ERF) toinclude more comprehensive interactions from both spatial and channeldimensions. To tackle these drawbacks, this work proposes two enhancedcomponents under a new Omni-SR architecture. First, an Omni Self-Attention(OSA) block is proposed based on dense interaction principle, which cansimultaneously model pixel-interaction from both spatial and channeldimensions, mining the potential correlations across omni-axis (i.e., spatialand channel). Coupling with mainstream window partitioning strategies, OSA canachieve superior performance with compelling computational budgets. Second, amulti-scale interaction scheme is proposed to mitigate sub-optimal ERF (i.e.,premature saturation) in shallow models, which facilitates local propagationand meso-/global-scale interactions, rendering an omni-scale aggregationbuilding block. Extensive experiments demonstrate that Omni-SR achievesrecord-high performance on lightweight super-resolution benchmarks (e.g., 26.95dB@Urban100 $times 4$ with only 792K parameters). Our code is available aturl{", "output": "Omni Aggregation Networks for Lightweight Image Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts suchas floaters or flawed geometry when rendered outside the camera trajectory.Existing evaluation protocols often do not capture these effects, since theyusually only assess image quality at every 8th frame of the training capture.To push forward progress in novel-view synthesis, we propose a new dataset andevaluation procedure, where two camera trajectories are recorded of the scene:one used for training, and the other for evaluation. In this more challengingin-the-wild setting, we find that existing hand-crafted regularizers do notremove floaters nor improve scene geometry. Thus, we propose a 3Ddiffusion-based method that leverages local 3D priors and a novel density-basedscore distillation sampling loss to discourage artifacts during NeRFoptimization. We show that this data-driven prior removes floaters and improvesscene geometry for casual captures.", "output": "Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For years, Single Image Super Resolution (SISR) has been an interesting andill-posed problem in computer vision. The traditional super-resolution (SR)imaging approaches involve interpolation, reconstruction, and learning-basedmethods. Interpolation methods are fast and uncomplicated to compute, but theyare not so accurate and reliable. Reconstruction-based methods are bettercompared with interpolation methods, but they are time-consuming and thequality degrades as the scaling increases. Even though learning-based methodslike Markov random chains are far better than all the previous ones, they areunable to match the performance of deep learning models for SISR. This studyexamines the Residual Dense Networks architecture proposed by Yhang et al. [17]and analyzes the importance of its components. By leveraging hierarchicalfeatures from original low-resolution (LR) images, this architecture achievessuperior performance, with a network structure comprising four main blocks,including the residual dense block (RDB) as the core. Through investigations ofeach block and analyses using various loss metrics, the study evaluates theeffectiveness of the architecture and compares it to other state-of-the-artmodels that differ in both architecture and components.", "output": "Ultra Sharp : Study of Single Image Super Resolution using Residual Dense Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human cognition has a ``large-scale first'' cognitive mechanism, thereforepossesses adaptive multi-granularity description capabilities. This results incomputational characteristics such as efficiency, robustness, andinterpretability. Although most existing artificial intelligence learningmethods have certain multi-granularity features, they do not fully align withthe ``large-scale first'' cognitive mechanism. Multi-granularity granular-ballcomputing is an important model method developed in recent years. This methodcan use granular-balls of different sizes to adaptively represent and cover thesample space, and perform learning based on granular-balls. Since the number ofcoarse-grained \"granular-ball\" is smaller than the number of sample points,granular-ball computing is more efficient; the coarse-grained characteristicsof granular-balls are less likely to be affected by fine-grained sample points,making them more robust; the multi-granularity structure of granular-balls canproduce topological structures and coarse-grained descriptions, providingnatural interpretability. Granular-ball computing has now been effectivelyextended to various fields of artificial intelligence, developing theoreticalmethods such as granular-ball classifiers, granular-ball clustering methods,granular-ball neural networks, granular-ball rough sets, and granular-ballevolutionary computation, significantly improving the efficiency, noiserobustness, and interpretability of existing methods. It has good innovation,practicality, and development potential. This article provides a systematicintroduction to these methods and analyzes the main problems currently faced bygranular-ball computing, discussing both the primary applicable scenarios forgranular-ball computing and offering references and suggestions for futureresearchers to improve this theory.", "output": "Granular ball computing: an efficient, robust, and interpretable adaptive multi-granularity representation and computation method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Meta-learning performs adaptation through a limited amount of support set,which may cause a sample bias problem. To solve this problem, transductivemeta-learning is getting more and more attention, going beyond the conventionalinductive learning perspective. This paper proposes so-called task-adaptivepseudo labeling for transductive meta-learning. Specifically, pseudo labels forunlabeled query sets are generated from labeled support sets through labelpropagation. Pseudo labels enable to adopt the supervised setting as it is andalso use the unlabeled query set in the adaptation process. As a result, theproposed method is able to deal with more examples in the adaptation processthan inductive ones, which can result in better classification performance ofthe model. Note that the proposed method is the first approach of applying taskadaptation to pseudo labeling. Experiments show that the proposed methodoutperforms the state-of-the-art (SOTA) technique in 5-way 1-shot few-shotclassification.", "output": "Task-Adaptive Pseudo Labeling for Transductive Meta-Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Discovering nonlinear differential equations that describe system dynamicsfrom empirical data is a fundamental challenge in contemporary science. Here,we propose a methodology to automatically identify dynamical laws byintegrating denoising techniques, sparse regression, and bootstrap confidenceintervals. We evaluate our method on well-known ordinary differential equationswith an ensemble of random initial conditions, time series of increasinglength, and varying signal-to-noise ratios. Our algorithm consistentlyidentifies three-dimensional systems, given moderately-sized time series andhigh signal quality levels relative to background noise. By accuratelyidentifying dynamical systems, our methodology has the potential to impactdiverse fields, such as the physical and biological sciences, as well asengineering, where understanding complex systems is crucial.", "output": "Automatically identifying dynamical systems from data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-task learning has shown considerable promise for improving theperformance of deep learning-driven vision systems for the purpose of roboticgrasping. However, high architectural and computational complexity can resultin poor suitability for deployment on embedded devices that are typicallyleveraged in robotic arms for real-world manufacturing and warehouseenvironments. As such, the design of highly efficient multi-task deep neuralnetwork architectures tailored for computer vision tasks for robotic graspingon the edge is highly desired for widespread adoption in manufacturingenvironments. Motivated by this, we propose Fast GraspNeXt, a fastself-attention neural network architecture tailored for embedded multi-tasklearning in computer vision tasks for robotic grasping. To build FastGraspNeXt, we leverage a generative network architecture search strategy with aset of architectural constraints customized to achieve a strong balance betweenmulti-task learning performance and embedded inference efficiency. Experimentalresults on the MetaGraspNet benchmark dataset show that the Fast GraspNeXtnetwork design achieves the highest performance (average precision (AP),accuracy, and mean squared error (MSE)) across multiple computer vision taskswhen compared to other efficient multi-task network architecture designs, whilehaving only 17.8M parameters (about >5x smaller), 259 GFLOPs (as much as >5xlower) and as much as >3.15x faster on a NVIDIA Jetson TX2 embedded processor.", "output": "Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We observe that the traditional use of DP with the Adam optimizer introducesa bias in the second moment estimation, due to the addition of independentnoise in the gradient computation. This bias leads to a different scaling forlow variance parameter updates, that is inconsistent with the behavior ofnon-private Adam, and Adam's sign descent interpretation. Empirically,correcting the bias introduced by DP noise significantly improves theoptimization performance of DP-Adam.", "output": "DP-Adam: Correcting DP Bias in Adam's Second Moment Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Development of robust concrete mixes with a lower environmental impact ischallenging due to natural variability in constituent materials and a multitudeof possible combinations of mix proportions. Making reliable propertypredictions with machine learning can facilitate performance-basedspecification of concrete, reducing material inefficiencies and improving thesustainability of concrete construction. In this work, we develop a machinelearning algorithm that can utilize intermediate target variables and theirassociated noise to predict the final target variable. We apply the methodologyto specify a concrete mix that has high resistance to carbonation, and anotherconcrete mix that has low environmental impact. Both mixes also fulfill targetson the strength, density, and cost. The specified mixes are experimentallyvalidated against their predictions. Our generic methodology enables theexploitation of noise in machine learning, which has a broad range ofapplications in structural engineering and beyond.", "output": "Probabilistic selection and design of concrete using machine learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "$L_0$ regularization of neural networks is a fundamental problem. In additionto regularizing models for better generalizability, $L_0$ regularization alsoapplies to selecting input features and training sparse neural networks. Thereis a large body of research on related topics, some with quite complicatedmethods. In this paper, we show that a straightforward formulation, BinMask,which multiplies weights with deterministic binary masks and uses the identitystraight-through estimator for backpropagation, is an effective $L_0$regularizer. We evaluate BinMask on three tasks: feature selection, networksparsification, and model regularization. Despite its simplicity, BinMaskachieves competitive performance on all the benchmarks without task-specifictuning compared to methods designed for each task. Our results suggest thatdecoupling weights from mask optimization, which has been widely adopted byprevious work, is a key component for effective $L_0$ regularization.", "output": "Effective Neural Network $L_0$ Regularization With BinMask."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Purpose: The aim of this work is to introduce a single model-based deepnetwork that can provide high-quality reconstructions from undersampledparallel MRI data acquired with multiple sequences, acquisition settings andfield strengths.Methods: A single unrolled architecture, which offers good reconstructionsfor multiple acquisition settings, is introduced. The proposed scheme adaptsthe model to each setting by scaling the CNN features and the regularizationparameter with appropriate weights. The scaling weights and regularizationparameter are derived using a multi-layer perceptron model from conditionalvectors, which represents the specific acquisition setting. The perceptronparameters and the CNN weights are jointly trained using data from multipleacquisition settings, including differences in field strengths, acceleration,and contrasts. The conditional network is validated using datasets acquiredwith different acquisition settings.Results: The comparison of the adaptive framework, which trains a singlemodel using the data from all the settings, shows that it can offerconsistently improved performance for each acquisition condition. Thecomparison of the proposed scheme with networks that are trained independentlyfor each acquisition setting shows that it requires less training data peracquisition setting to offer good performance.Conclusion: The Ada-MoDL framework enables the use of a single model-basedunrolled network for multiple acquisition settings. In addition to eliminatingthe need to train and store multiple networks for different acquisitionsettings, this approach reduces the training data needed for each acquisitionsetting.", "output": "Adapting model-based deep learning to multiple acquisition conditions: Ada-MoDL."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Implicit representations such as Neural Radiance Fields (NeRF) have beenshown to be very effective at novel view synthesis. However, these modelstypically require manual and careful human data collection for training. Inthis paper, we present AutoNeRF, a method to collect data required to trainNeRFs using autonomous embodied agents. Our method allows an agent to explorean unseen environment efficiently and use the experience to build an implicitmap representation autonomously. We compare the impact of different explorationstrategies including handcrafted frontier-based exploration and modularapproaches composed of trained high-level planners and classical low-level pathfollowers. We train these models with different reward functions tailored tothis problem and evaluate the quality of the learned representations on fourdifferent downstream tasks: classical viewpoint rendering, map reconstruction,planning, and pose refinement. Empirical results show that NeRFs can be trainedon actively collected data using just a single episode of experience in anunseen environment, and can be used for several downstream robotic tasks, andthat modular trained exploration models significantly outperform the classicalbaselines.", "output": "AutoNeRF: Training Implicit Scene Representations with Autonomous Agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Finding the distribution of the velocities and pressures of a fluid (bysolving the Navier-Stokes equations) is a principal task in the chemical,energy, and pharmaceutical industries, as well as in mechanical engineering andthe design of pipeline systems. With existing solvers, such as OpenFOAM andAnsys, simulations of fluid dynamics in intricate geometries arecomputationally expensive and require re-simulation whenever the geometricparameters or the initial and boundary conditions are altered. Physics-informedneural networks (PINNs) are a promising tool for simulating fluid flows incomplex geometries, as they can adapt to changes in the geometry and meshdefinitions, allowing for generalization across different shapes. We present ahybrid quantum physics-informed neural network that simulates laminar fluidflows in 3D Y-shaped mixers. Our approach combines the expressive power of aquantum model with the flexibility of a PINN, resulting in a 21% higheraccuracy compared to a purely classical neural network. Our findings highlightthe potential of machine learning approaches, and in particular quantum PINNs,for complex shape optimization tasks in computational fluid dynamics. Byimproving the accuracy of fluid simulations in complex geometries, our researchusing quantum PINNs contributes to the development of more efficient andreliable fluid dynamics solvers.", "output": "Quantum physics-informed neural networks for simulating computational fluid dynamics in complex shapes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Maritime obstacle detection is critical for safe navigation of autonomoussurface vehicles (ASVs). While the accuracy of image-based detection methodshas advanced substantially, their computational and memory requirementsprohibit deployment on embedded devices. In this paper we analyze the currentlybest-performing maritime obstacle detection network WaSR. Based on the analysiswe then propose replacements for the most computationally intensive stages andpropose its embedded-compute-ready variant eWaSR. In particular, the new designfollows the most recent advancements of transformer-based lightweight networks.eWaSR achieves comparable detection results to state-of-the-art WaSR with only0.52% F1 score performance drop and outperforms other state-of-the-artembedded-ready architectures by over 9.74% in F1 score. On a standard GPU,eWaSR runs 10x faster than the original WaSR (115 FPS vs 11 FPS). Tests on areal embedded device OAK-D show that, while WaSR cannot run due to memoryrestrictions, eWaSR runs comfortably at 5.5 FPS. This makes eWaSR the firstpractical embedded-compute-ready maritime obstacle detection network. Thesource code and trained eWaSR models are publicly available here:", "output": "eWaSR -- an embedded-compute-ready maritime obstacle detection network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian models are a powerful tool for studying complex data, allowing theanalyst to encode rich hierarchical dependencies and leverage priorinformation. Most importantly, they facilitate a complete characterization ofuncertainty through the posterior distribution. Practical posterior computationis commonly performed via MCMC, which can be computationally infeasible forhigh dimensional models with many observations. In this article we discuss thepotential to improve posterior computation using ideas from machine learning.Concrete future directions are explored in vignettes on normalizing flows,Bayesian coresets, distributed Bayesian inference, and variational inference.", "output": "Machine Learning and the Future of Bayesian Computation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Written answers to open-ended questions can have a higher long-term effect onlearning than multiple-choice questions. However, it is critical that teachersimmediately review the answers, and ask to redo those that are incoherent. Thiscan be a difficult task and can be time-consuming for teachers. A possiblesolution is to automate the detection of incoherent answers. One option is toautomate the review with Large Language Models (LLM). In this paper, we analyzethe responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM,and YOU. We used them with zero, one, two, three and four shots. We comparedtheir performance with the results of various classifiers trained with MachineLearning (ML). We found that LLMs perform worse than MLs in detectingincoherent answers. The difficulty seems to reside in recursive questions thatcontain both questions and answers, and in responses from students with typicalfourth-grader misspellings. Upon closer examination, we have found that theChatGPT model faces the same challenges.", "output": "Who's the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robustness to natural distribution shifts has seen remarkable progress thanksto recent pre-training strategies combined with better fine-tuning methods.However, such fine-tuning assumes access to large amounts of labelled data, andthe extent to which the observations hold when the amount of training data isnot as high remains unknown. We address this gap by performing the firstin-depth study of robustness to various natural distribution shifts indifferent low-shot regimes: spanning datasets, architectures, pre-trainedinitializations, and state-of-the-art robustness interventions. Mostimportantly, we find that there is no single model of choice that is often morerobust than others, and existing interventions can fail to improve robustnesson some datasets even if they do so in the full-shot regime. We hope that ourwork will motivate the community to focus on this problem of practicalimportance.", "output": "Benchmarking Low-Shot Robustness to Natural Distribution Shifts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Parkinson's disease (PD) is a neurodegenerative disease with frequentlychanging motor symptoms where continuous symptom monitoring enables moretargeted treatment. Classical time series classification (TSC) and deeplearning techniques have limited performance for PD symptom monitoring usingwearable accelerometer data because PD movement patterns are complex, butdatasets are small. We investigate InceptionTime and RandOm ConvolutionalKErnel Transform (ROCKET) because they are state-of-the-art for TSC andpromising for PD symptom monitoring: InceptionTime's high learning capacity issuited to modeling complex movement patterns while ROCKET is suited to smalldatasets. We used a random search to find the highest-scoring InceptionTimearchitecture and compared it to ROCKET with a ridge classifier and amulti-layer perceptron (MLP) on wrist motions of PD patients. We find that allapproaches are suitable for estimating tremor severity and bradykinesiapresence but struggle with detecting dyskinesia. ROCKET performs better fordyskinesia, whereas InceptionTime is slightly better for tremor andbradykinesia but has much higher variability in performance. Both outperformthe MLP. In conclusion, both InceptionTime and ROCKET are suitable forcontinuous symptom monitoring, with the choice depending on the symptom ofinterest and desired robustness.", "output": "Time Series Classification for Detecting Parkinson's Disease from Wrist Motions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rapid development and application of foundation models haverevolutionized the field of artificial intelligence. Large diffusion modelshave gained significant attention for their ability to generate photorealisticimages and support various tasks. On-device deployment of these models providesbenefits such as lower server costs, offline functionality, and improved userprivacy. However, common large diffusion models have over 1 billion parametersand pose challenges due to restricted computational and memory resources ondevices. We present a series of implementation optimizations for largediffusion models that achieve the fastest reported inference latency to-date(under 12 seconds for Stable Diffusion 1.4 without int8 quantization on SamsungS23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobiledevices. These enhancements broaden the applicability of generative AI andimprove the overall user experience across a wide range of devices.", "output": "Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is widely acknowledged that large models have the potential to deliversuperior performance across a broad range of domains. Despite the remarkableprogress made in the field of machine learning systems research, which hasenabled the development and exploration of large models, such abilities remainconfined to a small group of advanced users and industry leaders, resulting inan implicit technical barrier for the wider community to access and leveragethese technologies. In this paper, we introduce PyTorch Fully Sharded DataParallel (FSDP) as an industry-grade solution for large model training. FSDPhas been closely co-designed with several key PyTorch core components includingTensor implementation, dispatcher system, and CUDA memory caching allocator, toprovide non-intrusive user experiences and high training efficiency.Additionally, FSDP natively incorporates a range of techniques and settings tooptimize resource utilization across a variety of hardware configurations. Theexperimental results demonstrate that FSDP is capable of achieving comparableperformance to Distributed Data Parallel while providing support forsignificantly larger models with near-linear scalability in terms of TFLOPS.", "output": "PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The commercial use of Machine Learning (ML) is spreading; at the same time,ML models are becoming more complex and more expensive to train, which makesIntellectual Property Protection (IPP) of trained models a pressing issue.Unlike other domains that can build on a solid understanding of the threats,attacks and defenses available to protect their IP, the ML-related research inthis regard is still very fragmented. This is also due to a missing unifiedview as well as a common taxonomy of these aspects.In this paper, we systematize our findings on IPP in ML, while focusing onthreats and attacks identified and defenses proposed at the time of writing. Wedevelop a comprehensive threat model for IP in ML, categorizing attacks anddefenses within a unified and consolidated taxonomy, thus bridging researchfrom both the ML and security communities.", "output": "Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The energy inefficiency of the apps can be a major issue for the app userswhich is discussed on App Stores extensively. Previous research has shown theimportance of investigating the energy related app reviews to identify themajor causes or categories of energy related user feedback. However, there isno study that efficiently extracts the energy related app reviewsautomatically. In this paper, we empirically study different techniques forautomatic extraction of the energy related user feedback. We compare theaccuracy, F1-score and run time of numerous machine-learning models withrelevant feature combinations and relatively modern Neural Network-basedmodels. In total, 60 machine learning models are compared to 30 models that webuild using six neural network architectures and three word embedding models.We develop a visualization tool for this study through which a developer cantraverse through this large-scale result set. The results show that neuralnetworks outperform the other machine learning techniques and can achieve thehighest F1-score of 0.935. To replicate the research results, we have opensourced the interactive visualization tool. After identifying the best resultsand extracting the energy related reviews, we further compare varioustechniques to help the developers automatically investigate the emerging issuesthat might be responsible for energy inefficiency of the apps. We experimentthe previously used string matching with results obtained from applying two ofthe state-of-the-art topic modeling algorithms, OBTM and AOLDA. Finally, we runa qualitative study performed in collaboration with developers and studentsfrom different institutions to determine their preferences for identifyingnecessary topics from previously categorized reviews, which shows OBTM producesthe most helpful results.", "output": "On the Identification of the Energy related Issues from the App Reviews."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose lookahead diffusion probabilistic models (LA-DPMs) to exploit thecorrelation in the outputs of the deep neural networks (DNNs) over subsequenttimesteps in diffusion probabilistic models (DPMs) to refine the meanestimation of the conditional Gaussian distributions in the backward process. Atypical DPM first obtains an estimate of the original data sample$boldsymbol{x}$ by feeding the most recent state $boldsymbol{z}_i$ and index$i$ into the DNN model and then computes the mean vector of the conditionalGaussian distribution for $boldsymbol{z}_{i-1}$. We propose to calculate amore accurate estimate for $boldsymbol{x}$ by performing extrapolation on thetwo estimates of $boldsymbol{x}$ that are obtained by feeding$(boldsymbol{z}_{i+1},i+1)$ and $(boldsymbol{z}_{i},i)$ into the DNN model.The extrapolation can be easily integrated into the backward process ofexisting DPMs by introducing an additional connection over two consecutivetimesteps, and fine-tuning is not required. Extensive experiments showed thatplugging in the additional connection into DDPM, DDIM, DEIS, S-PNDM, andhigh-order DPM-Solvers leads to a significant performance gain in terms of FIDscore.", "output": "Lookahead Diffusion Probabilistic Models for Refining Mean Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a deep learning based model predictive control (MPC)algorithm for systems with unmatched and bounded state-action dependentuncertainties of unknown structure. We utilize a deep neural network (DNN) asan oracle in the underlying optimization problem of learning based MPC (LBMPC)to estimate unmatched uncertainties. Generally, non-parametric oracles such asDNN are considered difficult to employ with LBMPC due to the technicaldifficulties associated with estimation of their coefficients in real time. Weemploy a dual-timescale adaptation mechanism, where the weights of the lastlayer of the neural network are updated in real time while the inner layers aretrained on a slower timescale using the training data collected online andselectively stored in a buffer. Our results are validated through a numericalexperiment on the compression system model of jet engine. These resultsindicate that the proposed approach is implementable in real time and carriesthe theoretical guarantees of LBMPC.", "output": "Unmatched uncertainty mitigation through neural network supported model predictive control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A common explanation for the failure of out-of-distribution (OOD)generalization is that the model trained with empirical risk minimization (ERM)learns spurious features instead of the desired invariant features. However,several recent studies challenged this explanation and found that deep networksmay have already learned sufficiently good features for OOD generalization. Thedebate extends to the in-distribution and OOD performance correlations alongwith training or fine-tuning neural nets across a variety of OOD generalizationtasks. To understand these seemingly contradicting phenomena, we conduct atheoretical investigation and find that ERM essentially learns both spuriousfeatures and invariant features. On the other hand, the quality of learnedfeatures during ERM pre-training significantly affects the final OODperformance, as OOD objectives rarely learn new features. Failing to captureall the underlying useful features during pre-training will further limit thefinal OOD performance. To remedy the issue, we propose Feature AugmentedTraining (FAT ), to enforce the model to learn all useful features by retainingthe already learned features and augmenting new ones by multiple rounds. Ineach round, the retention and augmentation operations are performed ondifferent subsets of the training data that capture distinct features.Extensive experiments show that FAT effectively learns richer features andconsistently improves the OOD performance when applied to various objectives.", "output": "Towards Understanding Feature Learning in Out-of-Distribution Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One popular diffusion-based sampling strategy attempts to solve the reverseordinary differential equations (ODEs) effectively. The coefficients of theobtained ODE solvers are pre-determined by the ODE formulation, the reversediscrete timesteps, and the employed ODE methods. In this paper, we consideraccelerating several popular ODE-based sampling processes by optimizing certaincoefficients via improved integration approximation (IIA). At each reversetimestep, we propose to minimize a mean squared error (MSE) function withrespect to certain selected coefficients. The MSE is constructed by applyingthe original ODE solver for a set of fine-grained timesteps which in principleprovides a more accurate integration approximation in predicting the nextdiffusion hidden state. Given a pre-trained diffusion model, the procedure forIIA for a particular number of neural functional evaluations (NFEs) only needsto be conducted once over a batch of samples. The obtained optimal solutionsfor those selected coefficients via minimum MSE (MMSE) can be restored andreused later on to accelerate the sampling process. Extensive experiments onEDM and DDIM show the IIA technique leads to significant performance gain whenthe numbers of NFEs are small.", "output": "On Accelerating Diffusion-Based Sampling Process via Improved Integration Approximation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Segment Anything Model (SAM) is a recently developed large model forgeneral-purpose segmentation for computer vision tasks. SAM was trained using11 million images with over 1 billion masks and can produce segmentationresults for a wide range of objects in natural scene images. SAM can be viewedas a general perception model for segmentation (partitioning images intosemantically meaningful regions). Thus, how to utilize such a large foundationmodel for medical image segmentation is an emerging research target. This papershows that although SAM does not immediately give high-quality segmentation formedical images, its generated masks, features, and stability scores are usefulfor building and training better medical image segmentation models. Inparticular, we demonstrate how to use SAM to augment image inputs for acommonly-used medical image segmentation model (e.g., U-Net). Experiments ontwo datasets show the effectiveness of our proposed method.", "output": "Input Augmentation with SAM: Boosting Medical Image Segmentation with Segmentation Foundation Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Synthetic data has been hailed as the silver bullet for privacy preservingdata analysis. If a record is not real, then how could it violate a person'sprivacy? In addition, deep-learning based generative models are employedsuccessfully to approximate complex high-dimensional distributions from dataand draw realistic samples from this learned distribution. It is oftenoverlooked though that generative models are prone to memorising many detailsof individual training records and often generate synthetic data that tooclosely resembles the underlying sensitive training data, hence violatingstrong privacy regulations as, e.g., encountered in health care. Differentialprivacy is the well-known state-of-the-art framework for guaranteeingprotection of sensitive individuals' data, allowing aggregate statistics andeven machine learning models to be released publicly without compromisingprivacy. The training mechanisms however often add too much noise during thetraining process, and thus severely compromise the utility of these privatemodels. Even worse, the tight privacy budgets do not allow for many trainingepochs so that model quality cannot be properly controlled in practice. In thispaper we explore an alternative approach for privately generating data thatmakes direct use of the inherent stochasticity in generative models, e.g.,variational autoencoders. The main idea is to appropriately constrain thecontinuity modulus of the deep models instead of adding another noise mechanismon top. For this approach, we derive mathematically rigorous privacy guaranteesand illustrate its effectiveness with practical experiments.", "output": "Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce GEDI, a Bayesian framework that combines existingself-supervised learning objectives with likelihood-based generative models.This framework leverages the benefits of both GEnerative and DIscriminativeapproaches, resulting in improved symbolic representations over standalonesolutions. Additionally, GEDI can be easily integrated and trained jointly withexisting neuro-symbolic frameworks without the need for additional supervisionor costly pre-training steps. We demonstrate through experiments on real-worlddata, including SVHN, CIFAR10, and CIFAR100, that GEDI outperforms existingself-supervised learning strategies in terms of clustering performance by asignificant margin. The symbolic component further allows it to leverageknowledge in the form of logical constraints to improve performance in thesmall data regime.", "output": "Learning Symbolic Representations Through Joint GEnerative and DIscriminative Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Provisioning dynamic machine learning (ML) inference as a service forartificial intelligence (AI) applications of edge devices faces manychallenges, including the trade-off among accuracy loss, carbon emission, andunknown future costs. Besides, many governments are launching carbon emissionrights (CER) for operators to reduce carbon emissions further to reverseclimate change. Facing these challenges, to achieve carbon-aware ML taskoffloading under limited carbon emission rights thus to achieve green edge AI,we establish a joint ML task offloading and CER purchasing problem, intendingto minimize the accuracy loss under the long-term time-averaged cost budget ofpurchasing the required CER. However, considering the uncertainty of theresource prices, the CER purchasing prices, the carbon intensity of sites, andML tasks' arrivals, it is hard to decide the optimal policy online over along-running period time. To overcome this difficulty, we leverage thetwo-timescale Lyapunov optimization technique, of which the $T$-slotdrift-plus-penalty methodology inspires us to propose an online algorithm thatpurchases CER in multiple timescales (on-preserved in carbon future market andon-demanded in the carbon spot market) and makes decisions about where tooffload ML tasks. Considering the NP-hardness of the $T$-slot problems, wefurther propose the resource-restricted randomized dependent rounding algorithmto help to gain the near-optimal solution with no help of any futureinformation. Our theoretical analysis and extensive simulation results drivenby the real carbon intensity trace show the superior performance of theproposed algorithms.", "output": "Towards Carbon-Neutral Edge Computing: Greening Edge AI by Harnessing Spot and Future Carbon Markets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This article presents a comprehensive overview of the digital twin technologyand its capability levels, with a specific focus on its applications in thewind energy industry. It consolidates the definitions of digital twin and itscapability levels on a scale from 0-5; 0-standalone, 1-descriptive,2-diagnostic, 3-predictive, 4-prescriptive, 5-autonomous. It then, from anindustrial perspective, identifies the current state of the art and researchneeds in the wind energy sector. The article proposes approaches to theidentified challenges from the perspective of research institutes and offers aset of recommendations for diverse stakeholders to facilitate the acceptance ofthe technology. The contribution of this article lies in its synthesis of thecurrent state of knowledge and its identification of future research needs andchallenges from an industry perspective, ultimately providing a roadmap forfuture research and development in the field of digital twin and itsapplications in the wind energy industry.", "output": "Digital Twins in Wind Energy: Emerging Technologies and Industry-Informed Future Directions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Mixture of Experts (MoE) model becomes an important choice of largelanguage models nowadays because of its scalability with sublinearcomputational complexity for training and inference. However, existing MoEmodels suffer from two critical drawbacks, 1) tremendous inner-node andinter-node communication overhead introduced by all-to-all dispatching andgathering, and 2) limited scalability for the backbone because of the bounddata parallel and expert parallel to scale in the expert dimension. In thispaper, we systematically analyze these drawbacks in terms of trainingefficiency in the parallel framework view and propose a novel MoE architecturecalled Pipeline MoE (PPMoE) to tackle them. PPMoE builds expert parallelincorporating with tensor parallel and replaces communication-intensiveall-to-all dispatching and gathering with a simple tensor index slicing andinner-node all-reduce. Besides, it is convenient for PPMoE to integratepipeline parallel to further scale the backbone due to its flexible parallelarchitecture. Extensive experiments show that PPMoE not only achieves a morethan $1.75times$ speed up compared to existing MoE architectures but alsoreaches $90%$ throughput of its corresponding backbone model that is$20times$ smaller.", "output": "Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vertical federated learning (VFL) is a cloud-edge collaboration paradigm thatenables edge nodes, comprising resource-constrained Internet of Things (IoT)devices, to cooperatively train artificial intelligence (AI) models whileretaining their data locally. This paradigm facilitates improved privacy andsecurity for edges and IoT devices, making VFL an essential component ofArtificial Intelligence of Things (AIoT) systems. Nevertheless, the partitionedstructure of VFL can be exploited by adversaries to inject a backdoor, enablingthem to manipulate the VFL predictions. In this paper, we aim to investigatethe vulnerability of VFL in the context of binary classification tasks. To thisend, we define a threat model for backdoor attacks in VFL and introduce auniversal adversarial backdoor (UAB) attack to poison the predictions of VFL.The UAB attack, consisting of universal trigger generation and clean-labelbackdoor injection, is incorporated during the VFL training at specificiterations. This is achieved by alternately optimizing the universal triggerand model parameters of VFL sub-problems. Our work distinguishes itself fromexisting studies on designing backdoor attacks for VFL, as those require theknowledge of auxiliary information not accessible within the split VFLarchitecture. In contrast, our approach does not necessitate any additionaldata to execute the attack. On the LendingClub and Zhongyuan datasets, ourapproach surpasses existing state-of-the-art methods, achieving up to 100%backdoor task performance while maintaining the main task performance. Ourresults in this paper make a major advance to revealing the hidden backdoorrisks of VFL, hence paving the way for the future development of secure AIoT.", "output": "Universal Adversarial Backdoor Attacks to Fool Vertical Federated Learning in Cloud-Edge Collaboration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative models have attracted significant interest due to their ability tohandle uncertainty by learning the inherent data distributions. However, twoprominent generative models, namely Generative Adversarial Networks (GANs) andVariational AutoEncoders (VAEs), exhibit challenges that impede achievingoptimal performance in sequential recommendation tasks. Specifically, GANssuffer from unstable optimization, while VAEs are prone to posterior collapseand over-smoothed generations. The sparse and noisy nature of sequentialrecommendation further exacerbates these issues. In response to theselimitations, we present a conditional denoising diffusion model, which includesa sequence encoder, a cross-attentive denoising decoder, and a step-wisediffuser. This approach streamlines the optimization and generation process bydividing it into easier and tractable steps in a conditional autoregressivemanner. Furthermore, we introduce a novel optimization schema that incorporatesboth cross-divergence loss and contrastive loss. This novel training schemaenables the model to generate high-quality sequence/item representations andmeanwhile precluding collapse. We conducted comprehensive experiments on fourbenchmark datasets, and the superior performance achieved by our model atteststo its efficacy.", "output": "Conditional Denoising Diffusion for Sequential Recommendation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The multilingual Sentence-BERT (SBERT) models map different languages tocommon representation space and are useful for cross-language similarity andmining tasks. We propose a simple yet effective approach to convert vanillamultilingual BERT models into multilingual sentence BERT models using syntheticcorpus. We simply aggregate translated NLI or STS datasets of the low-resourcetarget languages together and perform SBERT-like fine-tuning of the vanillamultilingual BERT model. We show that multilingual BERT models are inherentcross-lingual learners and this simple baseline fine-tuning approach withoutexplicit cross-lingual training yields exceptional cross-lingual properties. Weshow the efficacy of our approach on 10 major Indic languages and also show theapplicability of our approach to non-Indic languages German and French. Usingthis approach, we further present L3Cube-IndicSBERT, the first multilingualsentence representation model specifically for Indian languages Hindi, Marathi,Kannada, Telugu, Malayalam, Tamil, Gujarati, Odia, Bengali, and Punjabi. TheIndicSBERT exhibits strong cross-lingual capabilities and performssignificantly better than alternatives like LaBSE, LASER, andparaphrase-multilingual-mpnet-base-v2 on Indic cross-lingual and monolingualsentence similarity tasks. We also release monolingual SBERT models for each ofthe languages and show that IndicSBERT performs competitively with itsmonolingual counterparts. These models have been evaluated using embeddingsimilarity scores and classification accuracy.", "output": "L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tensor-oriented multi-view subspace clustering has achieved significantstrides in assessing high-order correlations and improving clustering analysisof multi-view data. Nevertheless, most of existing investigations are typicallyhampered by the two flaws. First, self-representation based tensor subspacelearning usually induces high time and space complexity, and is limited inperceiving nonlinear local structure in the embedding space. Second, the tensorsingular value decomposition (t-SVD) model redistributes each singular valueequally without considering the diverse importance among them. To well copewith the issues, we propose a hyper-Laplacian regularized concept factorization(HLRCF) in low-rank tensor space for multi-view clustering. Specifically, weadopt the concept factorization to explore the latent cluster-wiserepresentation of each view. Further, the hypergraph Laplacian regularizationendows the model with the capability of extracting the nonlinear localstructures in the latent space. Considering that different tensor singularvalues associate structural information with unequal importance, we develop aself-weighted tensor Schatten p-norm to constrain the tensor comprised of allcluster-wise representations. Notably, the tensor with smaller size greatlydecreases the time and space complexity in the low-rank optimization. Finally,experimental results on eight benchmark datasets exhibit that HLRCF outperformsother multi-view methods, showingcasing its superior performance.", "output": "Hyper-Laplacian Regularized Concept Factorization in Low-rank Tensor Space for Multi-view Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Learning with Model Distillation (FedMD) is a nascent collaborativelearning paradigm, where only output logits of public datasets are transmittedas distilled knowledge, instead of passing on private model parameters that aresusceptible to gradient inversion attacks, a known privacy risk in federatedlearning. In this paper, we found that even though sharing output logits ofpublic datasets is safer than directly sharing gradients, there still exists asubstantial risk of data exposure caused by carefully designed maliciousattacks. Our study shows that a malicious server can inject a PLI(Paired-Logits Inversion) attack against FedMD and its variants by training aninversion neural network that exploits the confidence gap between the serverand client models. Experiments on multiple facial recognition datasets validatethat under FedMD-like schemes, by using paired server-client logits of publicdatasets only, the malicious server is able to reconstruct private images onall tested benchmarks with a high success rate.", "output": "Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised anomaly detection (AD) is critical for a wide range of practicalapplications, from network security to health and medical tools. Due to thediversity of problems, no single algorithm has been found to be superior forall AD tasks. Choosing an algorithm, otherwise known as the Algorithm SelectionProblem (ASP), has been extensively examined in supervised classificationproblems, through the use of meta-learning and AutoML, however, it has receivedlittle attention in unsupervised AD tasks. This research proposes a newmeta-learning approach that identifies an appropriate unsupervised AD algorithmgiven a set of meta-features generated from the unlabelled input dataset. Theperformance of the proposed meta-learner is superior to the current state ofthe art solution. In addition, a mixed model statistical analysis has beenconducted to examine the impact of the meta-learner components: the meta-model,meta-features, and the base set of AD algorithms, on the overall performance ofthe meta-learner. The analysis was conducted using more than 10,000 datasets,which is significantly larger than previous studies. Results indicate that arelatively small number of meta-features can be used to identify an appropriateAD algorithm, but the choice of a meta-model in the meta-learner has aconsiderable impact.", "output": "Constructing a meta-learner for unsupervised anomaly detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The problem of reinforcement learning is considered where the environment orthe model undergoes a change. An algorithm is proposed that an agent can applyin such a problem to achieve the optimal long-time discounted reward. Thealgorithm is model-free and learns the optimal policy by interacting with theenvironment. It is shown that the proposed algorithm has strong optimalityproperties. The effectiveness of the algorithm is also demonstrated usingsimulation results. The proposed algorithm exploits a fundamentalreward-detection trade-off present in these problems and uses a quickest changedetection algorithm to detect the model change. Recommendations are providedfor faster detection of model changes and for smart initialization strategies.", "output": "Reinforcement Learning with an Abrupt Model Change."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This is a tutorial paper on Recurrent Neural Network (RNN), Long Short-TermMemory Network (LSTM), and their variants. We start with a dynamical system andbackpropagation through time for RNN. Then, we discuss the problems of gradientvanishing and explosion in long-term dependencies. We explain close-to-identityweight matrix, long delays, leaky units, and echo state networks for solvingthis problem. Then, we introduce LSTM gates and cells, history and variants ofLSTM, and Gated Recurrent Units (GRU). Finally, we introduce bidirectional RNN,bidirectional LSTM, and the Embeddings from Language Model (ELMo) network, forprocessing a sequence in both directions.", "output": "Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances have extended the scope of Bayesian optimization (BO) toexpensive-to-evaluate black-box functions with dozens of dimensions, aspiringto unlock impactful applications, for example, in the life sciences, neuralarchitecture search, and robotics. However, a closer examination reveals thatthe state-of-the-art methods for high-dimensional Bayesian optimization (HDBO)suffer from degrading performance as the number of dimensions increases or evenrisk failure if certain unverifiable assumptions are not met. This paperproposes BAxUS that leverages a novel family of nested random subspaces toadapt the space it optimizes over to the problem. This ensures high performancewhile removing the risk of failure, which we assert via theoretical guarantees.A comprehensive evaluation demonstrates that BAxUS achieves better results thanthe state-of-the-art methods for a broad set of applications.", "output": "Increasing the Scope as You Learn: Adaptive Bayesian Optimization in Nested Subspaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pre-trained models (PTMs) have shown great promise in the speech and audiodomain. Embeddings leveraged from these models serve as inputs for learningalgorithms with applications in various downstream tasks. One such crucial taskis Speech Emotion Recognition (SER) which has a wide range of applications,including dynamic analysis of customer calls, mental health assessment, andpersonalized language learning. PTM embeddings have helped advance SER,however, a comprehensive comparison of these PTM embeddings that considermultiple facets such as embedding model architecture, data used forpre-training, and the pre-training procedure being followed is missing. Athorough comparison of PTM embeddings will aid in the faster and more efficientdevelopment of models and enable their deployment in real-world scenarios. Inthis work, we exploit this research gap and perform a comparative analysis ofembeddings extracted from eight speech and audio PTMs (wav2vec 2.0, data2vec,wavLM, UniSpeech-SAT, wav2clip, YAMNet, x-vector, ECAPA). We perform anextensive empirical analysis with four speech emotion datasets (CREMA-D, TESS,SAVEE, Emo-DB) by training three algorithms (XGBoost, Random Forest, FCN) onthe derived embeddings. The results of our study indicate that the bestperformance is achieved by algorithms trained on embeddings derived from PTMstrained for speaker recognition followed by wav2clip and UniSpeech-SAT. Thiscan relay that the top performance by embeddings from speaker recognition PTMsis most likely due to the model taking up information about numerous speechfeatures such as tone, accent, pitch, and so on during its speaker recognitiontraining. Insights from this work will assist future studies in their selectionof embeddings for applications related to SER.", "output": "A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As ecommerce continues growing, huge investments in ML and NLP forInformation Retrieval are following. While the vector space model dominatedretrieval modelling in product search - even as vectorization itself greatlychanged with the advent of deep learning -, our position paper argues in acontrarian fashion that program synthesis provides significant advantages formany queries and a significant number of players in the market. We detail theindustry significance of the proposed approach, sketch implementation details,and address common objections drawing from our experience building a similarsystem at Tooso.", "output": "(Vector) Space is Not the Final Frontier: Product Search as Program Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Individuals involved in gang-related activity use mainstream social mediaincluding Facebook and Twitter to express taunts and threats as well as griefand memorializing. However, identifying the impact of gang-related activity inorder to serve community member needs through social media sources has a uniqueset of challenges. This includes the difficulty of ethically identifyingtraining data of individuals impacted by gang activity and the need to accountfor a non-standard language style commonly used in the tweets from theseindividuals. Our study provides evidence of methods where natural languageprocessing tools can be helpful in efficiently identifying individuals who maybe in need of community care resources such as counselors, conflict mediators,or academic/professional training programs. We demonstrate that our binarylogistic classifier outperforms baseline standards in identifying individualsimpacted by gang-related violence using a sample of gang-related tweetsassociated with Chicago. We ultimately found that the language of a tweet ishighly relevant and that uses of ``big data'' methods or machine learningmodels need to better understand how language impacts the model's performanceand how it discriminates among populations.", "output": "Understanding Lexical Biases when Identifying Gang-related Social Media Communications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This short note describes the concept of guided training of deep neuralnetworks (DNNs) to learn physically reasonable solutions. DNNs are being widelyused to predict phenomena in physics and mechanics. One of the issues of DNNsis that their output does not always satisfy physical equations. One approachto consider physical equations is adding a residual of equations into the lossfunction; this is called physics-informed neural network (PINN). One feature ofPINNs is that the physical equations and corresponding residual must beimplemented as part of a neural network model. In addition, the residual doesnot always converge to a small value. The proposed model is a physics-guidedgenerative adversarial network (PG-GAN) that uses a GAN architecture in whichphysical equations are used to judge whether the neural network's output isconsistent with physics. The proposed method was applied to a simple problem toassess its potential usability.", "output": "Physics-guided generative adversarial network to learn physical models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Customer churn describes terminating a relationship with a business orreducing customer engagement over a specific period. Customer acquisition costcan be five to six times that of customer retention, hence investing incustomers with churn risk is wise. Causal analysis of the churn model canpredict whether a customer will churn in the foreseeable future and identifyeffects and possible causes for churn. In general, this study presents aconceptual framework to discover the confounding features that correlate withindependent variables and are causally related to those dependent variablesthat impact churn. We combine different algorithms including the SMOTE,ensemble ANN, and Bayesian networks to address churn prediction problems on amassive and high-dimensional finance data that is usually generated infinancial institutions due to employing interval-based features used inCustomer Relationship Management systems. The effects of the curse and blessingof dimensionality assessed by utilising the Recursive Feature Eliminationmethod to overcome the high dimension feature space problem. Moreover, a causaldiscovery performed to find possible interpretation methods to describe causeprobabilities that lead to customer churn. Evaluation metrics on validationdata confirm the random forest and our ensemble ANN model, with %86 accuracy,outperformed other approaches. Causal analysis results confirm that someindependent causal variables representing the level of super guaranteecontribution, account growth, and account balance amount were identified asconfounding variables that cause customer churn with a high degree of belief.This article provides a real-world customer churn analysis from current statusinference to future directions in local superannuation funds.", "output": "Improved Churn Causal Analysis Through Restrained High-Dimensional Feature Space Effects in Financial Institutions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traffic congestion caused by non-recurring incidents such as vehicle crashesand debris is a key issue for Traffic Management Centers (TMCs). Clearingincidents in a timely manner is essential for improving safety and reducingdelays and emissions for the traveling public. However, TMCs and otherresponders face a challenge in predicting the duration of incidents (until theroadway is clear), making decisions of what resources to deploy difficult. Toaddress this problem, this research developed an analytical framework andend-to-end machine-learning solution for predicting incident duration based oninformation available as soon as an incident report is received. Qualitypredictions of incident duration can help TMCs and other responders take aproactive approach in deploying responder services such as tow trucks,maintenance crews or activating alternative routes. The predictions use acombination of classification and regression machine learning modules. Theperformance of the developed solution has been evaluated based on the MeanAbsolute Error (MAE), or deviation from the actual incident duration as well asArea Under the Curve (AUC) and Mean Absolute Percentage Error (MAPE). Theresults showed that the framework significantly improved incident durationprediction compared to methods from previous research.", "output": "Machine learning framework for end-to-end implementation of Incident duration prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Security has always been a critical issue in machine learning (ML)applications. Due to the high cost of model training -- such as collectingrelevant samples, labeling data, and consuming computing power --model-stealing attack is one of the most fundamental but vitally importantissues. When it comes to quantum computing, such a quantum machine learning(QML) model-stealing attack also exists and it is even more severe because thetraditional encryption method can hardly be directly applied to quantumcomputation. On the other hand, due to the limited quantum computing resources,the monetary cost of training QML model can be even higher than classical onesin the near term. Therefore, a well-tuned QML model developed by a company canbe delegated to a quantum cloud provider as a service to be used by ordinaryusers. In this case, the QML model will be leaked if the cloud provider isunder attack. To address such a problem, we propose a novel framework, namelyQuMoS, to preserve model security. Instead of applying encryption algorithms,we propose to distribute the QML model to multiple physically isolated quantumcloud providers. As such, even if the adversary in one provider can obtain apartial model, the information of the full model is maintained in the QMLservice company. Although promising, we observed an arbitrary model designunder distributed settings cannot provide model security. We further developeda reinforcement learning-based security engine, which can automaticallyoptimize the model design under the distributed setting, such that a goodtrade-off between model performance and security can be made. Experimentalresults on four datasets show that the model design proposed by QuMoS canachieve a close accuracy to the model designed with neural architecture searchunder centralized settings while providing the highest security than thebaselines.", "output": "QuMoS: A Framework for Preserving Security of Quantum Machine Learning Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Architecture Search (NAS) has become a popular method for discoveringeffective model architectures, especially for target hardware. As such, NASmethods that find optimal architectures under constraints are essential. In ourpaper, we propose LayerNAS to address the challenge of multi-objective NAS bytransforming it into a combinatorial optimization problem, which effectivelyconstrains the search complexity to be polynomial.For a model architecture with $L$ layers, we perform layerwise-search foreach layer, selecting from a set of search options $mathbb{S}$. LayerNASgroups model candidates based on one objective, such as model size or latency,and searches for the optimal model based on another objective, therebysplitting the cost and reward elements of the search. This approach limits thesearch complexity to $ O(H cdot |mathbb{S}| cdot L) $, where $H$ is aconstant set in LayerNAS.Our experiments show that LayerNAS is able to consistently discover superiormodels across a variety of search spaces in comparison to strong baselines,including search spaces derived from NATS-Bench, MobileNetV2 and MobileNetV3.", "output": "LayerNAS: Neural Architecture Search in Polynomial Complexity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the simplicity, stochastic gradient descent (SGD)-like algorithms aresuccessful in training deep neural networks (DNNs). Among various attempts toimprove SGD, weight averaging (WA), which averages the weights of multiplemodels, has recently received much attention in the literature. Broadly, WAfalls into two categories: 1) online WA, which averages the weights of multiplemodels trained in parallel, is designed for reducing the gradient communicationoverhead of parallel mini-batch SGD, and 2) offline WA, which averages theweights of one model at different checkpoints, is typically used to improve thegeneralization ability of DNNs. Though online and offline WA are similar inform, they are seldom associated with each other. Besides, these methodstypically perform either offline parameter averaging or online parameteraveraging, but not both. In this work, we firstly attempt to incorporate onlineand offline WA into a general training framework termed Hierarchical WeightAveraging (HWA). By leveraging both the online and offline averaging manners,HWA is able to achieve both faster convergence speed and superiorgeneralization performance without any fancy learning rate adjustment. Besides,we also analyze the issues faced by existing WA methods, and how our HWAaddress them, empirically. Finally, extensive experiments verify that HWAoutperforms the state-of-the-art methods significantly.", "output": "Hierarchical Weight Averaging for Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In dynamic interaction graphs, user-item interactions usually followheterogeneous patterns, represented by different structural information, suchas user-item co-occurrence, sequential information of user interactions and thetransition probabilities of item pairs. However, the existing methods cannotsimultaneously leverage all three structural information, resulting insuboptimal performance. To this end, we propose TriSIM4Rec, a triple structuralinformation modeling method for accurate, explainable and interactiverecommendation on dynamic interaction graphs. Specifically, TriSIM4Rec consistsof 1) a dynamic ideal low-pass graph filter to dynamically mine co-occurrenceinformation in user-item interactions, which is implemented by incrementalsingular value decomposition (SVD); 2) a parameter-free attention module tocapture sequential information of user interactions effectively andefficiently; and 3) an item transition matrix to store the transitionprobabilities of item pairs. Then, we fuse the predictions from the triplestructural information sources to obtain the final recommendation results. Byanalyzing the relationship between the SVD-based and the recently emerginggraph signal processing (GSP)-based collaborative filtering methods, we findthat the essence of SVD is an ideal low-pass graph filter, so that the interestvector space in TriSIM4Rec can be extended to achieve explainable andinteractive recommendation, making it possible for users to actively breakthrough the information cocoons. Experiments on six public datasetsdemonstrated the effectiveness of TriSIM4Rec in accuracy, explainability andinteractivity.", "output": "Triple Structural Information Modelling for Accurate, Explainable and Interactive Recommendation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent graph neural networks (GNNs) with the attention mechanism havehistorically been limited to small-scale homogeneous graphs (HoGs). However,GNNs handling heterogeneous graphs (HeGs), which contain several entity andrelation types, all have shortcomings in handling attention. Most GNNs thatlearn graph attention for HeGs learn either node-level or relation-levelattention, but not both, limiting their ability to predict both importantentities and relations in the HeG. Even the best existing method that learnsboth levels of attention has the limitation of assuming graph relations areindependent and that its learned attention disregards this dependencyassociation. To effectively model both multi-relational and multi-entitylarge-scale HeGs, we present Bi-Level Attention Graph Neural Networks (BA-GNN),scalable neural networks (NNs) that use a novel bi-level graph attentionmechanism. BA-GNN models both node-node and relation-relation interactions in apersonalized way, by hierarchically attending to both types of information fromlocal neighborhood contexts instead of the global graph context. Rigorousexperiments on seven real-world HeGs show BA-GNN consistently outperforms allbaselines, and demonstrate quality and transferability of its learnedrelation-level attention to improve performance of other GNNs.", "output": "Bi-Level Attention Graph Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we report an autoencoder-based 2D representation to classify atime-series as stochastic or non-stochastic, to understand the underlyingphysical process. Content-aware conversion of 1D time-series to 2Drepresentation, that simultaneously utilizes time- and frequency-domaincharacteristics, is proposed. An autoencoder is trained with a loss function tolearn latent space (using both time- and frequency domains) representation,that is designed to be, time-invariant. Every element of the time-series isrepresented as a tuple with two components, one each, from latent spacerepresentation in time- and frequency-domains, forming a binary image. In thisbinary image, those tuples that represent the points in the time-series,together form the ``Latent Space Signature\" (LSS) of the input time-series. Theobtained binary LSS images are fed to a classification network. TheEfficientNetv2-S classifier is trained using 421 synthetic time-series, withfair representation from both categories. The proposed methodology is evaluatedon publicly available astronomical data which are 12 distinct temporal classesof time-series pertaining to the black hole GRS 1915 + 105, obtained from RXTEsatellite. Results obtained using the proposed methodology are compared withexisting techniques. Concurrence in labels obtained across the classes,illustrates the efficacy of the proposed 2D representation using the latentspace co-ordinates. The proposed methodology also outputs the confidence in theclassification label.", "output": "Identifying Stochasticity in Time-Series with Autoencoder-Based Content-aware 2D Representation: Application to Black Hole Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the deployment of GPS-enabled devices and data acquisition technology,the massively generated GPS trajectory data provide a core support foradvancing spatial-temporal data mining research. Nonetheless, GPS trajectoriescomprise personal geo-location information, rendering inevitable privacyconcerns on plain data. One promising solution to this problem is trajectorygeneration, replacing the original data with the generated privacy-free ones.However, owing to the complex and stochastic behavior of human activities,generating high-quality trajectories is still in its infancy. To achieve theobjective, we propose a diffusion-based trajectory generation (Diff-Traj)framework, effectively integrating the generation capability of the diffusionmodel and learning from the spatial-temporal features of trajectories.Specifically, we gradually convert real trajectories to noise through a forwardtrajectory noising process. Then, Diff-Traj reconstructs forged trajectoriesfrom the noise by a reverse trajectory denoising process. In addition, wedesign a trajectory UNet (Traj-UNet) structure to extract trajectory featuresfor noise level prediction during the reverse process. Experiments on tworeal-world datasets show that Diff-Traj can be intuitively applied to generatehigh-quality trajectories while retaining the original distribution.", "output": "Diffusion Model for GPS Trajectory Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement learning agents naturally learn from extensive exploration.Exploration is costly and can be unsafe in $textit{safety-critical}$ domains.This paper proposes a novel framework for incorporating domain knowledge tohelp guide safe exploration and boost sample efficiency. Previous approachesimpose constraints, such as regularisation parameters in neural networks, thatrely on large sample sets and often are not suitable for safety-criticaldomains where agents should almost always avoid unsafe actions. In ourapproach, called $textit{System III}$, which is inspired by psychologists'notions of the brain's $textit{System I}$ and $textit{System II}$, werepresent domain expert knowledge of safety in form of first-order logic. Weevaluate the satisfaction of these constraints via p-norms in state vectorspace. In our formulation, constraints are analogous to hazards, objects, andregions of state that have to be avoided during exploration. We evaluated theeffectiveness of the proposed method on OpenAI's Gym and Safety-Gymenvironments. In all tasks, including classic Control and Safety Games, we showthat our approach results in safer exploration and sample efficiency.", "output": "System III: Learning with Domain Knowledge for Safety Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent work known as Segment Anything (SA) has made significant stridesin pushing the boundaries of semantic segmentation into the era of foundationmodels. The impact of SA has sparked extremely active discussions and usheredin an encouraging new wave of developing foundation models for the diversetasks in the Euclidean domain, such as object detection and image inpainting.Despite the promising advances led by SA, the concept has yet to be extended tothe non-Euclidean graph domain. In this paper, we explore a novel SegmentNon-Euclidean Anything (SNA) paradigm that strives to develop foundation modelsthat can handle the diverse range of graph data within the non-Euclideandomain, seeking to expand the scope of SA and lay the groundwork for futureresearch in this direction. To achieve this goal, we begin by discussing therecent achievements in foundation models associated with SA. We then shed lighton the unique challenges that arise when applying the SA concept to graphanalysis, which involves understanding the differences between the Euclideanand non-Euclidean domains from both the data and task perspectives. Motivatedby these observations, we present several preliminary solutions to tackle thechallenges of SNA and detail their corresponding limitations, along withseveral potential directions to pave the way for future SNA research.Experiments on five Open Graph Benchmark (OGB) datasets across various tasks,including graph property classification and regression, as well as multi-labelprediction, demonstrate that the performance of the naive SNA solutions hasconsiderable room for improvement, pointing towards a promising avenue forfuture exploration of Graph General Intelligence.", "output": "Segment Anything in Non-Euclidean Domains: Challenges and Opportunities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual representation based on covariance matrix has demonstrates itsefficacy for image classification by characterising the pairwise correlation ofdifferent channels in convolutional feature maps. However, pairwise correlationwill become misleading once there is another channel correlating with bothchannels of interest, resulting in the ``confounding'' effect. For this case,``partial correlation'' which removes the confounding effect shall be estimatedinstead. Nevertheless, reliably estimating partial correlation requires tosolve a symmetric positive definite matrix optimisation, known as sparseinverse covariance estimation (SICE). How to incorporate this process into CNNremains an open issue. In this work, we formulate SICE as a novel structuredlayer of CNN. To ensure end-to-end trainability, we develop an iterative methodto solve the above matrix optimisation during forward and backward propagationsteps. Our work obtains a partial correlation based deep visual representationand mitigates the small sample problem often encountered by covariance matrixestimation in CNN. Computationally, our model can be effectively trained withGPU and works well with a large number of channels of advanced CNNs.Experiments show the efficacy and superior classification performance of ourdeep visual representation compared to covariance matrix based counterparts.", "output": "Learning Partial Correlation based Deep Visual Representation for Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) is popular due to its ability to adapt to novelclasses. Compared with inductive few-shot learning, transductive modelstypically perform better as they leverage all samples of the query set. The twoexisting classes of methods, prototype-based and graph-based, have thedisadvantages of inaccurate prototype estimation and sub-optimal graphconstruction with kernel functions, respectively. In this paper, we propose anovel prototype-based label propagation to solve these issues. Specifically,our graph construction is based on the relation between prototypes and samplesrather than between samples. As prototypes are being updated, the graphchanges. We also estimate the label of each prototype instead of considering aprototype be the class centre. On mini-ImageNet, tiered-ImageNet, CIFAR-FS andCUB datasets, we show the proposed method outperforms other state-of-the-artmethods in transductive FSL and semi-supervised FSL when some unlabeled dataaccompanies the novel few-shot task.", "output": "Transductive Few-shot Learning with Prototype-based Label Propagation by Iterative Graph Refinement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpreting remote sensing imagery enables numerous downstream applicationsranging from land-use planning to deforestation monitoring. Robustlyclassifying this data is challenging due to the Earth's geographic diversity.While many distinct satellite and aerial image classification datasets exist,there is yet to be a benchmark curated that suitably covers this diversity. Inthis work, we introduce SATellite ImageNet (SATIN), a metadataset curated from27 existing remotely sensed datasets, and comprehensively evaluate thezero-shot transfer classification capabilities of a broad range ofvision-language (VL) models on SATIN. We find SATIN to be a challengingbenchmark-the strongest method we evaluate achieves a classification accuracyof 52.0%. We provide a $href{ to guide and track the progress of VL models in this importantdomain.", "output": "SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Wasserstein Barycenter Problem (WBP) has recently received much attention inthe field of artificial intelligence. In this paper, we focus on thedecentralized setting for WBP and propose an asynchronous decentralizedalgorithm (A$^2$DWB). A$^2$DWB is induced by a novel stochastic blockcoordinate descent method to optimize the dual of entropy regularized WBP. Toour knowledge, A$^2$DWB is the first asynchronous decentralized algorithm forWBP. Unlike its synchronous counterpart, it updates local variables in a mannerthat only relies on the stale neighbor information, which effectively alleviatethe waiting overhead, and thus substantially improve the time efficiency.Empirical results validate its superior performance compared to the latestsynchronous algorithm.", "output": "An Asynchronous Decentralized Algorithm for Wasserstein Barycenter Problem."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a rigorous framework for stochastic cell transmission models forgeneral traffic networks. The performance of traffic systems is evaluated basedon preference functionals and acceptable designs. The numerical implementationcombines simulation, Gaussian process regression, and a stochastic explorationprocedure. The approach is illustrated in two case studies.", "output": "Stochastic Cell Transmission Models of Traffic Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph contrastive learning defines a contrastive task to pull similarinstances close and push dissimilar instances away. It learns discriminativenode embeddings without supervised labels, which has aroused increasingattention in the past few years. Nevertheless, existing methods of graphcontrastive learning ignore the differences between diverse semantics existedin graphs, which learn coarse-grained node embeddings and lead to sub-optimalperformances on downstream tasks. To bridge this gap, we propose a novelFine-grained Semantics enhanced Graph Contrastive Learning (FSGCL) in thispaper. Concretely, FSGCL first introduces a motif-based graph construction,which employs graph motifs to extract diverse semantics existed in graphs fromthe perspective of input data. Then, the semantic-level contrastive task isexplored to further enhance the utilization of fine-grained semantics from theperspective of model training. Experiments on five real-world datasetsdemonstrate the superiority of our proposed FSGCL over state-of-the-artmethods. To make the results reproducible, we will make our codes public onGitHub after this paper is accepted.", "output": "Capturing Fine-grained Semantics in Contrastive Graph Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep equilibrium models (DEQs) have proven to be very powerful for learningdata representations. The idea is to replace traditional (explicit) feedforwardneural networks with an implicit fixed-point equation, which allows to decouplethe forward and backward passes. In particular, training DEQ layers becomesvery memory-efficient via the implicit function theorem. However,backpropagation through DEQ layers still requires solving an expensiveJacobian-based equation. In this paper, we introduce a simple but effectivestrategy to avoid this computational burden. Our method relies on the Jacobianapproximation of Broyden's method after the forward pass to compute thegradients during the backward pass. Experiments show that simply re-using thisapproximation can significantly speed up the training while not causing anyperformance degradation.", "output": "Efficient Training of Deep Equilibrium Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nowadays, algorithms with fast convergence, small memory footprints, and lowper-iteration complexity are particularly favorable for artificial intelligenceapplications. In this paper, we propose a doubly stochastic algorithm with anovel accelerating multi-momentum technique to solve large scale empirical riskminimization problem for learning tasks. While enjoying a provably superiorconvergence rate, in each iteration, such algorithm only accesses a mini batchof samples and meanwhile updates a small block of variable coordinates, whichsubstantially reduces the amount of memory reference when both the massivesample size and ultra-high dimensionality are involved. Empirical studies onhuge scale datasets are conducted to illustrate the efficiency of our method inpractice.", "output": "Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deployment of solutions based on TinyML requires meeting several challenges.These include hardware heterogeneity, microprocessor (MCU) architectures, andresource availability constraints. Another challenge is the variety ofoperating systems for MCU, limited memory management implementations andlimited software interoperability between devices. A number of these challengesare solved by dedicated programming libraries and the ability to compile codefor specific devices. Nevertheless, the challenge discussed in the paper is theissue of network connectivity for such solutions. We point out that moreemphasis should be placed on standard protocols, interoperability of solutionsand security. Finally, the paper discusses how the LwM2M protocol can solve theidentified challenges related to network connectivity and interoperability.", "output": "Device management and network connectivity as missing elements in TinyML landscape."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, deep learning-based compressed sensing (CS) has achieved greatsuccess in reducing the sampling and computational cost of sensing systems andimproving the reconstruction quality. These approaches, however, largelyoverlook the issue of the computational cost; they rely on complex structuresand task-specific operator designs, resulting in extensive storage and highenergy consumption in CS imaging systems. In this paper, we propose alightweight but effective deep neural network based on recurrent learning toachieve a sustainable CS system; it requires a smaller number of parameters butobtains high-quality reconstructions. Specifically, our proposed networkconsists of an initial reconstruction sub-network and a residual reconstructionsub-network. While the initial reconstruction sub-network has a hierarchicalstructure to progressively recover the image, reducing the number ofparameters, the residual reconstruction sub-network facilitates recurrentresidual feature extraction via recurrent learning to perform both featurefusion and deep reconstructions across different scales. In addition, we alsodemonstrate that, after the initial reconstruction, feature maps with reducedsizes are sufficient to recover the residual information, and thus we achieveda significant reduction in the amount of memory required. Extensive experimentsillustrate that our proposed model can achieve a better reconstruction qualitythan existing state-of-the-art CS algorithms, and it also has a smaller numberof network parameters than these algorithms. Our source codes are available at:", "output": "A Lightweight Recurrent Learning Network for Sustainable Compressed Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies semi-supervised graph classification, a crucial task witha wide range of applications in social network analysis and bioinformatics.Recent works typically adopt graph neural networks to learn graph-levelrepresentations for classification, failing to explicitly leverage featuresderived from graph topology (e.g., paths). Moreover, when labeled data isscarce, these methods are far from satisfactory due to their insufficienttopology exploration of unlabeled data. We address the challenge by proposing anovel semi-supervised framework called Twin Graph Neural Network (TGNN). Toexplore graph structural information from complementary views, our TGNN has amessage passing module and a graph kernel module. To fully utilize unlabeleddata, for each module, we calculate the similarity of each unlabeled graph toother labeled graphs in the memory bank and our consistency loss encouragesconsistency between two similarity distributions in different embedding spaces.The two twin modules collaborate with each other by exchanging instancesimilarity knowledge to fully explore the structure information of both labeledand unlabeled data. We evaluate our TGNN on various public datasets and showthat it achieves strong performance.", "output": "TGNN: A Joint Semi-supervised Framework for Graph-level Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks based on batch normalization and ReLU-like activationfunctions can experience instability during the early stages of training due tothe high gradient induced by temporal gradient explosion. We explain how ReLUreduces variance more than expected, and how batch normalization amplifies thegradient during recovery, which causes gradient explosion while forwardpropagation remains stable. Additionally, we discuss how the dynamics of a deepneural network change during training and how the correlation between inputscan alleviate this problem. Lastly, we propose a better adaptive learning ratealgorithm inspired by second-order optimization algorithms, which outperformsexisting learning rate scaling methods in large batch training and can alsoreplace WarmUp in small batch training.", "output": "The Disharmony Between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation Between Activations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel technique for analyzing adaptive sampling called the {emSimulator}. Our approach differs from the existing methods by considering nothow much information could be gathered by any fixed sampling strategy, but howdifficult it is to distinguish a good sampling strategy from a bad one giventhe limited amount of data collected up to any given time. This change ofperspective allows us to match the strength of both Fano and change-of-measuretechniques, without succumbing to the limitations of either method. Forconcreteness, we apply our techniques to a structured multi-arm bandit problemin the fixed-confidence pure exploration setting, where we show that theconstraints on the means imply a substantial gap between themoderate-confidence sample complexity, and the asymptotic sample complexity as$delta to 0$ found in the literature. We also prove the first instance-basedlower bounds for the top-k problem which incorporate the appropriatelog-factors. Moreover, our lower bounds zero-in on the number of times eachemph{individual} arm needs to be pulled, uncovering new phenomena which aredrowned out in the aggregate sample complexity. Our new analysis inspires asimple and near-optimal algorithm for the best-arm and top-k identification,the first {em practical} algorithm of its kind for the latter problem whichremoves extraneous log factors, and outperforms the state-of-the-art inexperiments.", "output": "The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we study robust deep learning against abnormal training datafrom the perspective of example weighting built in empirical loss functions,i.e., gradient magnitude with respect to logits, an angle that is notthoroughly studied so far. Consequently, we have two key findings: (1) MeanAbsolute Error (MAE) Does Not Treat Examples Equally. We present newobservations and insightful analysis about MAE, which is theoretically provedto be noise-robust. First, we reveal its underfitting problem in practice.Second, we analyse that MAE's noise-robustness is from emphasising on uncertainexamples instead of treating training samples equally, as claimed in priorwork. (2) The Variance of Gradient Magnitude Matters. We propose an effectiveand simple solution to enhance MAE's fitting ability while preserving itsnoise-robustness. Without changing MAE's overall weighting scheme, i.e., whatexamples get higher weights, we simply change its weighting variancenon-linearly so that the impact ratio between two examples are adjusted. Oursolution is termed Improved MAE (IMAE). We prove IMAE's effectiveness usingextensive experiments: image classification under clean labels, synthetic labelnoise, and real-world unknown noise. We conclude IMAE is superior to CCE, themost popular loss for training DNNs.", "output": "IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce and study a learning theory which is roughly automatic, that is,it does not require but a minimum of initial programming, and is based on thepotential computational phenomenon of self-reference, (i.e. the potentialability of an algorithm to have its program as an input).The conclusions agree with scientific findings in both biology andneuroscience and provide a plethora of explanations both (in conjunction withDarwinism) about evolution, as well as for the functionality and learningcapabilities of human brain, (most importantly), as we perceive them inourselves.", "output": "Recursion, evolution and conscious self."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate gradient descent training of wide neural networks and thecorresponding implicit bias in function space. For univariate regression, weshow that the solution of training a width-$n$ shallow ReLU network is within$n^{- 1/2}$ of the function which fits the training data and whose differencefrom the initial function has the smallest 2-norm of the second derivativeweighted by a curvature penalty that depends on the probability distributionthat is used to initialize the network parameters. We compute the curvaturepenalty function explicitly for various common initialization procedures. Forinstance, asymmetric initialization with a uniform distribution yields aconstant curvature penalty, and thence the solution function is the naturalcubic spline interpolation of the training data. hj{For stochastic gradientdescent we obtain the same implicit bias result.} We obtain a similar resultfor different activation functions. For multivariate regression we show ananalogous result, whereby the second derivative is replaced by the Radontransform of a fractional Laplacian. For initialization schemes that yield aconstant penalty function, the solutions are polyharmonic splines. Moreover, weshow that the training trajectories are captured by trajectories of smoothingsplines with decreasing regularization strength.", "output": "Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We use the lens of weak signal asymptotics to study a class of sequentiallyrandomized experiments, including those that arise in solving multi-armedbandit problems. In an experiment with $n$ time steps, we let the mean rewardgaps between actions scale to the order $1/sqrt{n}$ so as to preserve thedifficulty of the learning task as $n$ grows. In this regime, we show that thesample paths of a class of sequentially randomized experiments -- adapted tothis scaling regime and with arm selection probabilities that vary continuouslywith state -- converge weakly to a diffusion limit, given as the solution to astochastic differential equation. The diffusion limit enables us to deriverefined, instance-specific characterization of stochastic dynamics, and toobtain several insights on the regret and belief evolution of a number ofsequential experiments including Thompson sampling (but not UCB, which does notsatisfy our continuity assumption). We show that all sequential experimentswhose randomization probabilities have a Lipschitz-continuous dependence on theobserved data suffer from sub-optimal regret performance when the reward gapsare relatively large. Conversely, we find that a version of Thompson samplingwith an asymptotically uninformative prior variance achieves near-optimalinstance-specific regret scaling, including with large reward gaps, but thesegood regret properties come at the cost of highly unstable posterior beliefs.", "output": "Weak Signal Asymptotics for Sequentially Randomized Experiments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In many practical control applications, the performance level of aclosed-loop system degrades over time due to the change of plantcharacteristics. Thus, there is a strong need for redesigning a controllerwithout going through the system modeling process, which is often difficult forclosed-loop systems. Reinforcement learning (RL) is one of the promisingapproaches that enable model-free redesign of optimal controllers for nonlineardynamical systems based only on the measurement of the closed-loop system.However, the learning process of RL usually requires a considerable number oftrial-and-error experiments using the poorly controlled system that mayaccumulate wear on the plant. To overcome this limitation, we propose amodel-free two-step design approach that improves the transient learningperformance of RL in an optimal regulator redesign problem for unknownnonlinear systems. Specifically, we first design a linear control law thatattains some degree of control performance in a model-free manner, and then,train the nonlinear optimal control law with online RL by using the designedlinear control law in parallel. We introduce an offline RL algorithm for thedesign of the linear control law and theoretically guarantee its convergence tothe LQR controller under mild assumptions. Numerical simulations show that theproposed approach improves the transient learning performance and efficiency inhyperparameter tuning of RL.", "output": "Two-step reinforcement learning for model-free redesign of nonlinear optimal regulator."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In learning theory, a standard assumption is that the data is generated froma finite mixture model. But what happens when the number of components is notknown in advance? The problem of estimating the number of components, alsocalled model selection, is important in its own right but there are essentiallyno known efficient algorithms with provable guarantees let alone ones that cantolerate adversarial corruptions. In this work, we study the problem of robustmodel selection for univariate Gaussian mixture models (GMMs). Given$textsf{poly}(k/epsilon)$ samples from a distribution that is$epsilon$-close in TV distance to a GMM with $k$ components, we can constructa GMM with $widetilde{O}(k)$ components that approximates the distribution towithin $widetilde{O}(epsilon)$ in $textsf{poly}(k/epsilon)$ time. Thus weare able to approximately determine the minimum number of components needed tofit the distribution within a logarithmic factor. Prior to our work, the onlyknown algorithms for learning arbitrary univariate GMMs either outputsignificantly more than $k$ components (e.g. $k/epsilon^2$ components forkernel density estimates) or run in time exponential in $k$. Moreover, byadapting our techniques we obtain similar results for reconstructingFourier-sparse signals.", "output": "Robust Model Selection and Nearly-Proper Learning for GMMs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While maximizing deep neural networks' (DNNs') acceleration efficiencyrequires a joint search/design of three different yet highly coupled aspects,including the networks, bitwidths, and accelerators, the challenges associatedwith such a joint search have not yet been fully understood and addressed. Thekey challenges include (1) the dilemma of whether to explode the memoryconsumption due to the huge joint space or achieve sub-optimal designs, (2) thediscrete nature of the accelerator design space that is coupled yet differentfrom that of the networks and bitwidths, and (3) the chicken and egg problemassociated with network-accelerator co-search, i.e., co-search requiresoperation-wise hardware cost, which is lacking during search as the optimalaccelerator depending on the whole network is still unknown during search. Totackle these daunting challenges towards optimal and fast development of DNNaccelerators, we propose a framework dubbed Auto-NBA to enable jointlysearching for the Networks, Bitwidths, and Accelerators, by efficientlylocalizing the optimal design within the huge joint design space for eachtarget dataset and acceleration specification. Our Auto-NBA integrates aheterogeneous sampling strategy to achieve unbiased search with constant memoryconsumption, and a novel joint-search pipeline equipped with a genericdifferentiable accelerator search engine. Extensive experiments and ablationstudies validate that both Auto-NBA generated networks and acceleratorsconsistently outperform state-of-the-art designs (includingco-search/exploration techniques, hardware-aware NAS methods, and DNNaccelerators), in terms of search time, task accuracy, and acceleratorefficiency. Our codes are available at: ", "output": "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data missingness and quality are common problems in machine learning,especially for high-stakes applications such as healthcare. Developers oftentrain machine learning models on carefully curated datasets using only highquality data; however, this reduces the utility of such models in productionenvironments. We propose a novel neural network modification to mitigate theimpacts of low quality and missing data which involves replacing the fixedweights of a fully-connected layer with a function of an additional input. Thisis inspired from neuromodulation in biological neural networks where the cortexcan up- and down-regulate inputs based on their reliability and the presence ofother data. In testing, with reliability scores as a modulating signal, modelswith modulating layers were found to be more robust against degradation of dataquality, including additional missingness. These models are superior toimputation as they save on training time by completely skipping the imputationprocess and further allow the introduction of other data quality measures thatimputation cannot handle. Our results suggest that explicitly accounting forreduced information quality with a modulating fully connected layer can enablethe deployment of artificial intelligence systems in real-time applications.", "output": "A Modulation Layer to Increase Neural Network Robustness Against Data Quality Issues."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing libraries for supervised classification implement techniques thatare based on empirical risk minimization and utilize surrogate losses. Wepresent MRCpy library that implements minimax risk classifiers (MRCs) that arebased on robust risk minimization and can utilize 0-1-loss. Such techniquesgive rise to a manifold of classification methods that can provide tight boundson the expected loss. MRCpy provides a unified interface for different variantsof MRCs and follows the standards of popular Python libraries. The presentedlibrary also provides implementation for popular techniques that can be seen asMRCs such as L1-regularized logistic regression, zero-one adversarial, andmaximum entropy machines. In addition, MRCpy implements recent feature mappingssuch as Fourier, ReLU, and threshold features. The library is designed with anobject-oriented approach that facilitates collaborators and users.", "output": "MRCpy: A Library for Minimax Risk Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The successes of modern deep machine learning methods are founded on theirability to transform inputs across multiple layers to build good high-levelrepresentations. It is therefore critical to understand this process ofrepresentation learning. However, standard theoretical approaches (formallyNNGPs) involving infinite width limits eliminate representation learning. Wetherefore develop a new infinite width limit, the Bayesian representationlearning limit, that exhibits representation learning mirroring that infinite-width models, yet at the same time, retains some of the simplicity ofstandard infinite-width limits. In particular, we show that Deep Gaussianprocesses (DGPs) in the Bayesian representation learning limit have exactlymultivariate Gaussian posteriors, and the posterior covariances can be obtainedby optimizing an interpretable objective combining a log-likelihood to improveperformance with a series of KL-divergences which keep the posteriors close tothe prior. We confirm these results experimentally in wide but finite DGPs.Next, we introduce the possibility of using this limit and objective as aflexible, deep generalisation of kernel methods, that we call deep kernelmachines (DKMs). Like most naive kernel methods, DKMs scale cubically in thenumber of datapoints. We therefore use methods from the Gaussian processinducing point literature to develop a sparse DKM that scales linearly in thenumber of datapoints. Finally, we extend these approaches to NNs (which havenon-Gaussian posteriors) in the Appendices.", "output": "A theory of representation learning in deep neural networks gives a deep generalisation of kernel methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a streaming framework for analyzing stochasticapproximation/optimization problems. This streaming framework is analogous tosolving optimization problems using time-varying mini-batches that arrivesequentially. We provide non-asymptotic convergence rates of variousgradient-based algorithms; this includes the famous Stochastic Gradient (SG)descent (a.k.a. Robbins-Monro algorithm), mini-batch SG and time-varyingmini-batch SG algorithms, as well as their iterated averages (a.k.a.Polyak-Ruppert averaging). We show i) how to accelerate convergence by choosingthe learning rate according to the time-varying mini-batches, ii) thatPolyak-Ruppert averaging achieves optimal convergence in terms of attaining theCramer-Rao lower bound, and iii) how time-varying mini-batches together withPolyak-Ruppert averaging can provide variance reduction and accelerateconvergence simultaneously, which is advantageous for many learning problems,such as online, sequential, and large-scale learning. We further demonstratethese favorable effects for various time-varying mini-batches.", "output": "Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Object detection in autonomous driving applications implies that thedetection and tracking of semantic objects are commonly native to urban drivingenvironments, as pedestrians and vehicles. One of the major challenges instate-of-the-art deep-learning based object detection are false positives whichoccur with overconfident scores. This is highly undesirable in autonomousdriving and other critical robotic-perception domains because of safetyconcerns. This paper proposes an approach to alleviate the problem ofoverconfident predictions by introducing a novel probabilistic layer to deepobject detection networks in testing. The suggested approach avoids thetraditional Sigmoid or Softmax prediction layer which often producesoverconfident predictions. It is demonstrated that the proposed techniquereduces overconfidence in the false positives without degrading the performanceon the true positives. The approach is validated on the 2D-KITTI objectiondetection through the YOLOV4 and SECOND (Lidar-based detector). The proposedapproach enables interpretable probabilistic predictions without therequirement of re-training the network and therefore is very practical.", "output": "Probabilistic Approach for Road-Users Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Among various distance functions for graphs, graph and subgraph editdistances (GED and SED respectively) are two of the most popular and expressivemeasures. Unfortunately, exact computations for both are NP-hard. To overcomethis computational bottleneck, neural approaches to learn and predict editdistance in polynomial time have received much interest. While considerableprogress has been made, there exist limitations that need to be addressed.First, the efficacy of an approximate distance function lies not only in itsapproximation accuracy, but also in the preservation of its properties. Toelaborate, although GED is a metric, its neural approximations do not providesuch a guarantee. This prohibits their usage in higher order tasks that rely onmetric distance functions, such as clustering or indexing. Second, severalexisting frameworks for GED do not extend to SED due to SED being asymmetric.In this work, we design a novel siamese graph neural network called GREED,which through a carefully crafted inductive bias, learns GED and SED in aproperty-preserving manner. Through extensive experiments across 10 real graphdatasets containing up to 7 million edges, we establish that GREED is not onlymore accurate than the state of the art, but also up to 3 orders of magnitudefaster. Even more significantly, due to preserving the triangle inequality, thegenerated embeddings are indexable and consequently, even in a CPU-onlyenvironment, GREED is up to 50 times faster than GPU-powered baselines forgraph / subgraph retrieval.", "output": "GREED: A Neural Framework for Learning Graph Distance Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep reinforcement learning has gathered much attention recently. Impressiveresults were achieved in activities as diverse as autonomous driving, gameplaying, molecular recombination, and robotics. In all these fields, computerprograms have taught themselves to solve difficult problems. They have learnedto fly model helicopters and perform aerobatic manoeuvers such as loops androlls. In some applications they have even become better than the best humans,such as in Atari, Go, poker and StarCraft. The way in which deep reinforcementlearning explores complex environments reminds us of how children learn, byplayfully trying out things, getting feedback, and trying again. The computerseems to truly possess aspects of human learning; this goes to the heart of thedream of artificial intelligence. The successes in research have not goneunnoticed by educators, and universities have started to offer courses on thesubject. The aim of this book is to provide a comprehensive overview of thefield of deep reinforcement learning. The book is written for graduate studentsof artificial intelligence, and for researchers and practitioners who wish tobetter understand deep reinforcement learning methods and their challenges. Weassume an undergraduate-level of understanding of computer science andartificial intelligence; the programming language of this book is Python. Wedescribe the foundations, the algorithms and the applications of deepreinforcement learning. We cover the established model-free and model-basedmethods that form the basis of the field. Developments go quickly, and we alsocover advanced topics: deep multi-agent reinforcement learning, deephierarchical reinforcement learning, and deep meta learning.", "output": "Deep Reinforcement Learning, a textbook."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Optimization is a ubiquitous modeling tool and is often deployed in settingswhich repeatedly solve similar instances of the same problem. Amortizedoptimization methods use learning to predict the solutions to problems in thesesettings, exploiting the shared structure between similar problem instances.These methods have been crucial in variational inference and reinforcementlearning and are capable of solving optimization problems many orders ofmagnitudes times faster than traditional optimization methods that do not useamortization. This tutorial presents an introduction to the amortizedoptimization foundations behind these advancements and overviews theirapplications in variational inference, sparse coding, gradient-basedmeta-learning, control, reinforcement learning, convex optimization, optimaltransport, and deep equilibrium networks. The source code for this tutorial isavailable at", "output": "Tutorial on amortized optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over the years, most research towards defenses against adversarial attacks onmachine learning models has been in the image recognition domain. The ML-basedmalware detection domain has received less attention despite its importance.Moreover, most work exploring these defenses has focused on several methods butwith no strategy when applying them. In this paper, we introduce StratDef,which is a strategic defense system based on a moving target defense approach.We overcome challenges related to the systematic construction, selection, andstrategic use of models to maximize adversarial robustness. StratDefdynamically and strategically chooses the best models to increase theuncertainty for the attacker while minimizing critical aspects in theadversarial ML domain, like attack transferability. We provide the firstcomprehensive evaluation of defenses against adversarial attacks on machinelearning for malware detection, where our threat model explores differentlevels of threat, attacker knowledge, capabilities, and attack intensities. Weshow that StratDef performs better than other defenses even when facing thepeak adversarial threat. We also show that, of the existing defenses, only afew adversarially-trained models provide substantially better protection thanjust using vanilla models but are still outperformed by StratDef.", "output": "StratDef: Strategic Defense Against Adversarial Attacks in ML-based Malware Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Standard Bayesian learning is known to have suboptimal generalizationcapabilities under misspecification and in the presence of outliers. PAC-Bayestheory demonstrates that the free energy criterion minimized by Bayesianlearning is a bound on the generalization error for Gibbs predictors (i.e., forsingle models drawn at random from the posterior) under the assumption ofsampling distributions uncontaminated by outliers. This viewpoint provides ajustification for the limitations of Bayesian learning when the model ismisspecified, requiring ensembling, and when data is affected by outliers. Inrecent work, PAC-Bayes bounds -- referred to as PAC$^m$ -- were derived tointroduce free energy metrics that account for the performance of ensemblepredictors, obtaining enhanced performance under misspecification. This workpresents a novel robust free energy criterion that combines the generalizedlogarithm score function with PAC$^m$ ensemble bounds. The proposed free energytraining criterion produces predictive distributions that are able toconcurrently counteract the detrimental effects of misspecification -- withrespect to both likelihood and prior distribution -- and outliers.", "output": "Robust PAC$^m$: Training Ensemble Models Under Misspecification and Outliers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural networks have been successfully employed in various domains such asclassification, regression and clustering, etc. Generally, the back propagation(BP) based iterative approaches are used to train the neural networks, however,it results in the issues of local minima, sensitivity to learning rate and slowconvergence. To overcome these issues, randomization based neural networks suchas random vector functional link (RVFL) network have been proposed. RVFL modelhas several characteristics such as fast training speed, direct links, simplearchitecture, and universal approximation capability, that make it a viablerandomized neural network. This article presents the first comprehensive reviewof the evolution of RVFL model, which can serve as the extensive summary forthe beginners as well as practitioners. We discuss the shallow RVFLs, ensembleRVFLs, deep RVFLs and ensemble deep RVFL models. The variations, improvementsand applications of RVFL models are discussed in detail. Moreover, we discussthe different hyperparameter optimization techniques followed in the literatureto improve the generalization performance of the RVFL model. Finally, we givepotential future research directions/opportunities that can inspire theresearchers to improve the RVFL's architecture and learning algorithm further.", "output": "Random vector functional link network: recent developments, applications, and future directions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce compositional soft prompting (CSP), a parameter-efficientlearning technique to improve the zero-shot compositionality of large-scalepretrained vision-language models (VLMs) like CLIP. We develop CSP forcompositional zero-shot learning, the task of predicting unseenattribute-object compositions (e.g., old cat and young tiger). VLMs have aflexible text encoder that can represent arbitrary classes as natural languageprompts but they often underperform task-specific architectures on thecompositional zero-shot benchmark datasets. CSP treats the attributes andobjects that define classes as learnable tokens of vocabulary. During training,the vocabulary is tuned to recognize classes that compose tokens in multipleways (e.g., old cat and white cat). At test time, we recompose the learnedattribute-object vocabulary in new combinations to recognize novel classes. Weshow that CSP outperforms the CLIP on benchmark datasets by an average of 10.9percentage points on AUC. CSP also outperforms CoOp, a soft prompting methodthat fine-tunes the prefix context tokens, by an average of 5.8 percentagepoints on AUC. We perform additional experiments to show that CSP improvesgeneralization to higher-order attribute-attribute-object compositions (e.g.,old white cat) and combinations of pretrained attributes and fine-tunedobjects. The code is available at ", "output": "Learning to Compose Soft Prompts for Compositional Zero-Shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a simple yet powerful extension of Bayesian Additive RegressionTrees which we name Hierarchical Embedded BART (HE-BART). The model allows forrandom effects to be included at the terminal node level of a set of regressiontrees, making HE-BART a non-parametric alternative to mixed effects modelswhich avoids the need for the user to specify the structure of the randomeffects in the model, whilst maintaining the prediction and uncertaintycalibration properties of standard BART. Using simulated and real-worldexamples, we demonstrate that this new extension yields superior predictionsfor many of the standard mixed effects models' example data sets, and yet stillprovides consistent estimates of the random effect variances. In a futureversion of this paper, we outline its use in larger, more advanced data setsand structures.", "output": "Hierarchical Embedded Bayesian Additive Regression Trees."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Battery cycle life prediction using early degradation data has many potentialapplications throughout the battery product life cycle. For that reason,various data-driven methods have been proposed for point prediction of batterycycle life with minimum knowledge of the battery degradation mechanisms.However, managing the rapidly increasing amounts of batteries at end-of-lifewith lower economic and technical risk requires prediction of cycle life withquantified uncertainty, which is still lacking. The interpretability (i.e., thereason for high prediction accuracy) of these advanced data-driven methods isalso worthy of investigation. Here, a Quantile Regression Forest (QRF) model,having the advantage of not assuming any specific distribution of cycle life,is introduced to make cycle life range prediction with uncertainty quantifiedas the width of the prediction interval, in addition to point predictions withhigh accuracy. The hyperparameters of the QRF model are optimized with aproposed alpha-logistic-weighted criterion so that the coverage probabilitiesassociated with the prediction intervals are calibrated. The interpretabilityof the final QRF model is explored with two global model-agnostic methods,namely permutation importance and partial dependence plot.", "output": "Interpretable Battery Cycle Life Range Prediction Using Early Degradation Data at Cell Level."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The configuration of radar networks is a complex problem that is oftenperformed manually by experts with the help of a simulator. Different numbersand types of radars as well as different locations that the radars shall covergive rise to different instances of the radar configuration problem. The exactmodeling of these instances is complex, as the quality of the configurationsdepends on a large number of parameters, on internal radar processing, and onthe terrains on which the radars need to be placed. Classic optimizationalgorithms can therefore not be applied to this problem, and we rely on\"trial-and-error\" black-box approaches.In this paper, we study the performances of 13 black-box optimizationalgorithms on 153 radar network configuration problem instances. The algorithmsperform considerably better than human experts. Their ranking, however, dependson the budget of configurations that can be evaluated and on the elevationprofile of the location. We therefore also investigate automated algorithmselection approaches. Our results demonstrate that a pipeline that extractsinstance features from the elevation of the terrain performs on par with theclassical, far more expensive approach that extracts features from theobjective function.", "output": "Automated Algorithm Selection for Radar Network Configuration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Variational quantum algorithms are the leading candidate for advantage onnear-term quantum hardware. When training a parametrized quantum circuit inthis setting to solve a specific problem, the choice of ansatz is one of themost important factors that determines the trainability and performance of thealgorithm. In quantum machine learning (QML), however, the literature onansatzes that are motivated by the training data structure is scarce. In thiswork, we introduce an ansatz for learning tasks on weighted graphs thatrespects an important graph symmetry, namely equivariance under nodepermutations. We evaluate the performance of this ansatz on a complex learningtask, namely neural combinatorial optimization, where a machine learning modelis used to learn a heuristic for a combinatorial optimization problem. Weanalytically and numerically study the performance of our model, and ourresults strengthen the notion that symmetry-preserving ansatzes are a key tosuccess in QML.", "output": "Equivariant quantum circuits for learning on weighted graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-hop logical reasoning over knowledge graph (KG) plays a fundamentalrole in many artificial intelligence tasks. Recent complex query embedding(CQE) methods for reasoning focus on static KGs, while temporal knowledgegraphs (TKGs) have not been fully explored. Reasoning over TKGs has twochallenges: 1. The query should answer entities or timestamps; 2. The operatorsshould consider both set logic on entity set and temporal logic on timestampset. To bridge this gap, we define the multi-hop logical reasoning problem onTKGs. With generated three datasets, we propose the first temporal CQE namedTemporal Feature-Logic Embedding framework (TFLEX) to answer the temporalcomplex queries. We utilize vector logic to compute the logic part of TemporalFeature-Logic embeddings, thus naturally modeling all First-Order Logic (FOL)operations on entity set. In addition, our framework extends vector logic ontimestamp set to cope with three extra temporal operators (After, Before andBetween). Experiments on numerous query patterns demonstrate the effectivenessof our method.", "output": "TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep reinforcement learning (DRL) gives the promise that an agent learns goodpolicy from high-dimensional information, whereas representation learningremoves irrelevant and redundant information and retains pertinent information.In this work, we demonstrate that the learned representation of the $Q$-networkand its target $Q$-network should, in theory, satisfy a favorabledistinguishable representation property. Specifically, there exists an upperbound on the representation similarity of the value functions of two adjacenttime steps in a typical DRL setting. However, through illustrative experiments,we show that the learned DRL agent may violate this property and lead to asub-optimal policy. Therefore, we propose a simple yet effective regularizercalled Policy Evaluation with Easy Regularization on Representation (PEER),which aims to maintain the distinguishable representation property via explicitregularization on internal representations. And we provide the convergence rateguarantee of PEER. Implementing PEER requires only one line of code. Ourexperiments demonstrate that incorporating PEER into DRL can significantlyimprove performance and sample efficiency. Comprehensive experiments show thatPEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9out of 12 tasks on DMControl, and 19 out of 26 games on Atari. To the best ofour knowledge, PEER is the first work to study the inherent representationproperty of Q-network and its target. Our code is available at", "output": "Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new type of multi-agent interactive classifier that providesprovable interpretability guarantees even for complex agents such as neuralnetworks. These guarantees consist of bounds on the mutual information of thefeatures selected by this classifier. Our results are inspired by theMerlin-Arthur protocol from Interactive Proof Systems and express these boundsin terms of measurable metrics such as soundness and completeness. Compared toexisting interactive setups we do not rely on optimal agents or on theassumption that features are distributed independently. Instead, we use therelative strength of the agents as well as the new concept of AsymmetricFeature Correlation which captures the precise kind of correlations that makeinterpretability guarantees difficult. %relates the information carried by setsof features to one of the individual features. We test our results throughnumerical experiments on two small-scale datasets where high mutual informationcan be verified explicitly.", "output": "Formal Interpretability with Merlin-Arthur Classifiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inference tasks in signal processing are often characterized by theavailability of reliable statistical modeling with some missinginstance-specific parameters. One conventional approach uses data to estimatethese missing parameters and then infers based on the estimated model.Alternatively, data can also be leveraged to directly learn the inferencemapping end-to-end. These approaches for combining partially-known statisticalmodels and data in inference are related to the notions of generative anddiscriminative models used in the machine learning literature, typicallyconsidered in the context of classifiers. The goal of this lecture note is tointroduce the concepts of generative and discriminative learning for inferencewith a partially-known statistical model. While machine learning systems oftenlack the interpretability of traditional signal processing methods, we focus ona simple setting where one can interpret and compare the approaches in atractable manner that is accessible and relevant to signal processing readers.In particular, we exemplify the approaches for the task of Bayesian signalestimation in a jointly Gaussian setting with the mean-squared error (MSE)objective, i.e., a linear estimation setting.", "output": "Discriminative and Generative Learning for Linear Estimation of Random Signals [Lecture Notes]."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural message passing is a basic feature extraction unit forgraph-structured data considering neighboring node features in networkpropagation from one layer to the next. We model such process by an interactingparticle system with attractive and repulsive forces and the Allen-Cahn forcearising in the modeling of phase transition. The dynamics of the system is areaction-diffusion process which can separate particles without blowing up.This induces an Allen-Cahn message passing (ACMP) for graph neural networkswhere the numerical iteration for the particle system solution constitutes themessage passing propagation. ACMP which has a simple implementation with aneural ODE solver can propel the network depth up to one hundred of layers withtheoretically proven strictly positive lower bound of the Dirichlet energy. Itthus provides a deep model of GNNs circumventing the common GNN problem ofoversmoothing. GNNs with ACMP achieve state of the art performance forreal-world node classification tasks on both homophilic and heterophilicdatasets.", "output": "ACMP: Allen-Cahn Message Passing for Graph Neural Networks with Particle Phase Transition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Isolation forest (iForest) has been emerging as arguably the most popularanomaly detector in recent years due to its general effectiveness acrossdifferent benchmarks and strong scalability. Nevertheless, its linearaxis-parallel isolation method often leads to (i) failure in detecting hardanomalies that are difficult to isolate inhigh-dimensional/non-linear-separable data space, and (ii) notoriousalgorithmic bias that assigns unexpectedly lower anomaly scores to artefactregions. These issues contribute to high false negative errors. Several iForestextensions are introduced, but they essentially still employ shallow, lineardata partition, restricting their power in isolating true anomalies. Therefore,this paper proposes deep isolation forest. We introduce a new representationscheme that utilises casually initialised neural networks to map original datainto random representation ensembles, where random axis-parallel cuts aresubsequently applied to perform the data partition. This representation schemefacilitates high freedom of the partition in the original data space(equivalent to non-linear partition on subspaces of varying sizes), encouraginga unique synergy between random representations and random partition-basedisolation. Extensive experiments show that our model achieves significantimprovement over state-of-the-art isolation-based methods and deep detectors ontabular, graph and time series datasets; our model also inherits desiredscalability from iForest.", "output": "Deep Isolation Forest for Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we study a class of bilevel optimization problems, also knownas simple bilevel optimization, where we minimize a smooth objective functionover the optimal solution set of another convex constrained optimizationproblem. Several iterative methods have been developed for tackling this classof problems. Alas, their convergence guarantees are either asymptotic for theupper-level objective, or the convergence rates are slow and sub-optimal. Toaddress this issue, in this paper, we introduce a novel bilevel optimizationmethod that locally approximates the solution set of the lower-level problemvia a cutting plane, and then runs a conditional gradient update to decreasethe upper-level objective. When the upper-level objective is convex, we showthat our method requires ${mathcal{O}}(max{1/epsilon_f,1/epsilon_g})$iterations to find a solution that is $epsilon_f$-optimal for the upper-levelobjective and $epsilon_g$-optimal for the lower-level objective. Moreover,when the upper-level objective is non-convex, our method requires${mathcal{O}}(max{1/epsilon_f^2,1/(epsilon_fepsilon_g)})$ iterations tofind an $(epsilon_f,epsilon_g)$-optimal solution. We also prove strongerconvergence guarantees under the H\"olderian error bound assumption on thelower-level problem. To the best of our knowledge, our method achieves thebest-known iteration complexity for the considered class of bilevel problems.", "output": "A Conditional Gradient-based Method for Simple Bilevel Optimization with Convex Lower-level Problem."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Widely observed neural scaling laws, in which error falls off as a power ofthe training set size, model size, or both, have driven substantial performanceimprovements in deep learning. However, these improvements through scalingalone require considerable costs in compute and energy. Here we focus on thescaling of error with dataset size and show how in theory we can break beyondpower law scaling and potentially even reduce it to exponential scaling insteadif we have access to a high-quality data pruning metric that ranks the order inwhich training examples should be discarded to achieve any pruned dataset size.We then test this improved scaling prediction with pruned dataset sizeempirically, and indeed observe better than power law scaling in practice onResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance offinding high-quality pruning metrics, we perform the first large-scalebenchmarking study of ten different data pruning metrics on ImageNet. We findmost existing high performing metrics scale poorly to ImageNet, while the bestare computationally intensive and require labels for every image. We thereforedeveloped a new simple, cheap and scalable self-supervised pruning metric thatdemonstrates comparable performance to the best supervised metrics. Overall,our work suggests that the discovery of good data-pruning metrics may provide aviable path forward to substantially improved neural scaling laws, therebyreducing the resource costs of modern deep learning.", "output": "Beyond neural scaling laws: beating power law scaling via data pruning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the continuous-time counterpart of Q-learning for reinforcementlearning (RL) under the entropy-regularized, exploratory diffusion processformulation introduced by Wang et al. (2020). As the conventional (big)Q-function collapses in continuous time, we consider its first-orderapproximation and coin the term ``(little) q-function\". This function isrelated to the instantaneous advantage rate function as well as theHamiltonian. We develop a ``q-learning\" theory around the q-function that isindependent of time discretization. Given a stochastic policy, we jointlycharacterize the associated q-function and value function by martingaleconditions of certain stochastic processes, in both on-policy and off-policysettings. We then apply the theory to devise different actor-critic algorithmsfor solving underlying RL problems, depending on whether or not the densityfunction of the Gibbs measure generated from the q-function can be computedexplicitly. One of our algorithms interprets the well-known Q-learningalgorithm SARSA, and another recovers a policy gradient (PG) basedcontinuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conductsimulation experiments to compare the performance of our algorithms with thoseof PG-based algorithms in Jia and Zhou (2022b) and time-discretizedconventional Q-learning algorithms.", "output": "q-Learning in Continuous Time."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we leverage low-level compiler intermediate representations(IR) to improve code translation. Traditional transpilers rely on syntacticinformation and handcrafted rules, which limits their applicability andproduces unnatural-looking code. Applying neural machine translation (NMT)approaches to code has successfully broadened the set of programs on which onecan get a natural-looking translation. However, they treat the code assequences of text tokens, and still do not differentiate well enough betweensimilar pieces of code which have different semantics in different languages.The consequence is low quality translation, reducing the practicality of NMT,and stressing the need for an approach significantly increasing its accuracy.Here we propose to augment code translation with IRs, specifically LLVM IR,with results on the C++, Java, Rust, and Go languages. Our method improves uponthe state of the art for unsupervised code translation, increasing the numberof correct translations by 11% on average, and up to 79% for the Java -> Rustpair with greedy decoding. We extend previous test sets for code translation,by adding hundreds of Go and Rust functions. Additionally, we train models withhigh performance on the problem of IR decompilation, generating programmingsource code from IR, and study using IRs as intermediary pivot for translation.", "output": "Code Translation with Compiler Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Currently deployed public-key cryptosystems will be vulnerable to attacks byfull-scale quantum computers. Consequently, \"quantum resistant\" cryptosystemsare in high demand, and lattice-based cryptosystems, based on a hard problemknown as Learning With Errors (LWE), have emerged as strong contenders forstandardization. In this work, we train transformers to perform modulararithmetic and combine half-trained models with statistical cryptanalysistechniques to propose SALSA: a machine learning attack on LWE-basedcryptographic schemes. SALSA can fully recover secrets for small-to-mid sizeLWE instances with sparse binary secrets, and may scale to attack real-worldLWE-based cryptosystems.", "output": "SALSA: Attacking Lattice Cryptography with Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We focus on the problem of producing well-calibrated out-of-distribution(OOD) detectors, in order to enable safe deployment of medical imageclassifiers. Motivated by the difficulty of curating suitable calibrationdatasets, synthetic augmentations have become highly prevalent forinlier/outlier specification. While there have been rapid advances in dataaugmentation techniques, this paper makes a striking finding that the space inwhich the inliers and outliers are synthesized, in addition to the type ofaugmentation, plays a critical role in calibrating OOD detectors. Using thepopular energy-based OOD detection framework, we find that the optimal protocolis to synthesize latent-space inliers along with diverse pixel-space outliers.Based on empirical studies with multiple medical imaging benchmarks, wedemonstrate that our approach consistently leads to superior OOD detection($15% - 35%$ in AUROC) over the state-of-the-art in a variety of open-setrecognition settings.", "output": "Know Your Space: Inlier and Outlier Construction for Calibrating Medical OOD Detectors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training machine learning (ML) algorithms is a computationally intensiveprocess, which is frequently memory-bound due to repeatedly accessing largetraining datasets. As a result, processor-centric systems (e.g., CPU, GPU)suffer from costly data movement between memory units and processing units,which consumes large amounts of energy and execution cycles. Memory-centriccomputing systems, i.e., with processing-in-memory (PIM) capabilities, canalleviate this data movement bottleneck.Our goal is to understand the potential of modern general-purpose PIMarchitectures to accelerate ML training. To do so, we (1) implement severalrepresentative classic ML algorithms (namely, linear regression, logisticregression, decision tree, K-Means clustering) on a real-world general-purposePIM architecture, (2) rigorously evaluate and characterize them in terms ofaccuracy, performance and scaling, and (3) compare to their counterpartimplementations on CPU and GPU. Our evaluation on a real memory-centriccomputing system with more than 2500 PIM cores shows that general-purpose PIMarchitectures can greatly accelerate memory-bound ML workloads, when thenecessary operations and datatypes are natively supported by PIM hardware. Forexample, our PIM implementation of decision tree is $27times$ faster than astate-of-the-art CPU version on an 8-core Intel Xeon, and $1.34times$ fasterthan a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clusteringon PIM is $2.8times$ and $3.2times$ than state-of-the-art CPU and GPUversions, respectively.To our knowledge, our work is the first one to evaluate ML training on areal-world PIM architecture. We conclude with key observations, takeaways, andrecommendations that can inspire users of ML workloads, programmers of PIMarchitectures, and hardware designers & architects of future memory-centriccomputing systems.", "output": "An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Emotion Recognition in Conversation (ERC) plays a significant part inHuman-Computer Interaction (HCI) systems since it can provide empatheticservices. Multimodal ERC can mitigate the drawbacks of uni-modal approaches.Recently, Graph Neural Networks (GNNs) have been widely used in a variety offields due to their superior performance in relation modeling. In multimodalERC, GNNs are capable of extracting both long-distance contextual informationand inter-modal interactive information. Unfortunately, since existing methodssuch as MMGCN directly fuse multiple modalities, redundant information may begenerated and diverse information may be lost. In this work, we present adirected Graph based Cross-modal Feature Complementation (GraphCFC) module thatcan efficiently model contextual and interactive information. GraphCFCalleviates the problem of heterogeneity gap in multimodal fusion by utilizingmultiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC)strategy. We extract various types of edges from the constructed graph forencoding, thus enabling GNNs to extract crucial contextual and interactiveinformation more accurately when performing message passing. Furthermore, wedesign a GNN structure called GAT-MLP, which can provide a new unified networkframework for multimodal learning. The experimental results on two benchmarkdatasets show that our GraphCFC outperforms the state-of-the-art (SOTA)approaches.", "output": "GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We extend conformal prediction to control the expected value of any monotoneloss function. The algorithm generalizes split conformal prediction togetherwith its coverage guarantee. Like conformal prediction, the conformal riskcontrol procedure is tight up to an $mathcal{O}(1/n)$ factor. Worked examplesfrom computer vision and natural language processing demonstrate the usage ofour algorithm to bound the false negative rate, graph distance, and token-levelF1-score.", "output": "Conformal Risk Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Discrete dislocation dynamics (DDD) is a widely employed computational methodto study plasticity at the mesoscale that connects the motion of dislocationlines to the macroscopic response of crystalline materials. However, thecomputational cost of DDD simulations remains a bottleneck that limits itsrange of applicability. Here, we introduce a new DDD-GNN framework in which theexpensive time-integration of dislocation motion is entirely substituted by agraph neural network (GNN) model trained on DDD trajectories. As a firstapplication, we demonstrate the feasibility and potential of our method on asimple yet relevant model of a dislocation line gliding through an array ofobstacles. We show that the DDD-GNN model is stable and reproduces very wellunseen ground-truth DDD simulation responses for a range of straining rates andobstacle densities, without the need to explicitly compute nodal forces ordislocation mobilities during time-integration. Our approach opens newpromising avenues to accelerate DDD simulations and to incorporate more complexdislocation motion behaviors.", "output": "Accelerating discrete dislocation dynamics simulations with graph neural networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generating new molecules with specified chemical and biological propertiesvia generative models has emerged as a promising direction for drug discovery.However, existing methods require extensive training/fine-tuning with a largedataset, often unavailable in real-world generation tasks. In this work, wepropose a new retrieval-based framework for controllable molecule generation.We use a small set of exemplar molecules, i.e., those that (partially) satisfythe design criteria, to steer the pre-trained generative model towardssynthesizing molecules that satisfy the given design criteria. We design aretrieval mechanism that retrieves and fuses the exemplar molecules with theinput molecule, which is trained by a new self-supervised objective thatpredicts the nearest neighbor of the input molecule. We also propose aniterative refinement process to dynamically update the generated molecules andretrieval database for better generalization. Our approach is agnostic to thechoice of generative models and requires no task-specific fine-tuning. Onvarious tasks ranging from simple design criteria to a challenging real-worldscenario for designing lead compounds that bind to the SARS-CoV-2 mainprotease, we demonstrate our approach extrapolates well beyond the retrievaldatabase, and achieves better performance and wider applicability than previousmethods. Code is available at ", "output": "Retrieval-based Controllable Molecule Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neuro-Symbolic (NeSy) integration combines symbolic reasoning with NeuralNetworks (NNs) for tasks requiring perception and reasoning. Most NeSy systemsrely on continuous relaxation of logical knowledge, and no discrete decisionsare made within the model pipeline. Furthermore, these methods assume that thesymbolic rules are given. In this paper, we propose Deep Symbolic Learning(DSL), a NeSy system that learns NeSy-functions, i.e., the composition of a(set of) perception functions which map continuous data to discrete symbols,and a symbolic function over the set of symbols. DSL learns simultaneously theperception and symbolic functions while being trained only on their composition(NeSy-function). The key novelty of DSL is that it can create internal(interpretable) symbolic representations and map them to perception inputswithin a differentiable NN learning pipeline. The created symbols areautomatically selected to generate symbolic functions that best explain thedata. We provide experimental analysis to substantiate the efficacy of DSL insimultaneously learning perception and symbolic functions.", "output": "Deep Symbolic Learning: Discovering Symbols and Rules from Perceptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present Tetris, a new task of Goal-Oriented ScriptCompletion. Unlike previous work, it considers a more realistic and generalsetting, where the input includes not only the goal but also additional usercontext, including preferences and history. To address this problem, we proposea novel approach, which uses two techniques to improve performance: (1) conceptprompting, and (2) script-oriented contrastive learning that addresses steprepetition and hallucination problems. On our WikiHow-based dataset, we findthat both methods improve performance. The dataset, repository, and models willbe publicly available to facilitate further research on this new task.", "output": "Incorporating Task-specific Concept Knowledge into Script Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel Dual Inexact Splitting Algorithm (DISA) fordistributed convex composite optimization problems, where the local lossfunction consists of a smooth term and a possibly nonsmooth term composed witha linear mapping. DISA, for the first time, eliminates the dependence of theconvergent step-size range on the Euclidean norm of the linear mapping, whileinheriting the advantages of the classic Primal-Dual Proximal SplittingAlgorithm (PD-PSA): simple structure and easy implementation. This indicatesthat DISA can be executed without prior knowledge of the norm, and tinystep-sizes can be avoided when the norm is large. Additionally, we provesublinear and linear convergence rates of DISA under general convexity andmetric subregularity, respectively. Moreover, we provide a variant of DISA withapproximate proximal mapping and prove its global convergence and sublinearconvergence rate. Numerical experiments corroborate our theoretical analysesand demonstrate a significant acceleration of DISA compared to existingPD-PSAs.", "output": "DISA: A Dual Inexact Splitting Algorithm for Distributed Convex Composite Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning training is an expensive process that extensively uses GPUs,but not all model training saturates modern powerful GPUs. Multi-Instance GPU(MIG) is a new technology introduced by NVIDIA that can partition a GPU tobetter-fit workloads that do not require all the memory and compute resourcesof a full GPU. In this paper, we examine the performance of a MIG-enabled A100GPU under deep learning workloads containing various sizes and combinations ofmodels. We contrast the benefits of MIG to older workload collocation methodson GPUs: na\"ively submitting multiple processes on the same GPU and utilizingMulti-Process Service (MPS). Our results demonstrate that collocating multiplemodel training runs may yield significant benefits. In certain cases, it canlead up to four times training throughput despite increased epoch time. On theother hand, the aggregate memory footprint and compute needs of the modelstrained in parallel must fit the available memory and compute resources of theGPU. MIG can be beneficial thanks to its interference-free partitioning,especially when the sizes of the models align with the MIG partitioningoptions. MIG's rigid partitioning, however, may create sub-optimal GPUutilization for more dynamic mixed workloads. In general, we recommend MPS asthe best performing and most flexible form of collocation for model trainingfor a single user submitting training jobs.", "output": "An Analysis of Collocation on GPUs for Deep Learning Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Natarajan dimension is a fundamental tool for characterizing multi-classPAC learnability, generalizing the Vapnik-Chervonenkis (VC) dimension frombinary to multi-class classification problems. This work establishes upperbounds on Natarajan dimensions for certain function classes, including (i)multi-class decision tree and random forests, and (ii) multi-class neuralnetworks with binary, linear and ReLU activations. These results may berelevant for describing the performance of certain multi-class learningalgorithms.", "output": "Upper bounds on the Natarajan dimensions of some function classes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While PAC-Bayes is now an established learning framework for light-tailedlosses (emph{e.g.}, subgaussian or subexponential), its extension to the caseof heavy-tailed losses remains largely uncharted and has attracted a growinginterest in recent years. We contribute PAC-Bayes generalisation bounds forheavy-tailed losses under the sole assumption of bounded variance of the lossfunction. Under that assumption, we extend previous results fromcitet{kuzborskij2019efron}. Our key technical contribution is exploiting anextention of Markov's inequality for supermartingales. Our proof techniqueunifies and extends different PAC-Bayesian frameworks by providing bounds forunbounded martingales as well as bounds for batch and online learning withheavy-tailed losses.", "output": "PAC-Bayes Generalisation Bounds for Heavy-Tailed Losses through Supermartingales."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The idea of embedding optimization problems into deep neural networks asoptimization layers to encode constraints and inductive priors has taken holdin recent years. Most existing methods focus on implicitly differentiatingKarush-Kuhn-Tucker (KKT) conditions in a way that requires expensivecomputations on the Jacobian matrix, which can be slow and memory-intensive. Inthis paper, we developed a new framework, named Alternating Differentiation(Alt-Diff), that differentiates optimization problems (here, specifically inthe form of convex optimization problems with polyhedral constraints) in a fastand recursive way. Alt-Diff decouples the differentiation procedure into aprimal update and a dual update in an alternating way. Accordingly, Alt-Diffsubstantially decreases the dimensions of the Jacobian matrix especially foroptimization with large-scale constraints and thus increases the computationalspeed of implicit differentiation. We show that the gradients obtained byAlt-Diff are consistent with those obtained by differentiating KKT conditions.In addition, we propose to truncate Alt-Diff to further accelerate thecomputational speed. Under some standard assumptions, we show that thetruncation error of gradients is upper bounded by the same order of variables'estimation error. Therefore, Alt-Diff can be truncated to further increasecomputational speed without sacrificing much accuracy. A series ofcomprehensive experiments validate the superiority of Alt-Diff.", "output": "Alternating Differentiation for Optimization Layers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tabular data is among the oldest and most ubiquitous forms of data. However,the generation of synthetic samples with the original data's characteristicsremains a significant challenge for tabular data. While many generative modelsfrom the computer vision domain, such as variational autoencoders or generativeadversarial networks, have been adapted for tabular data generation, lessresearch has been directed towards recent transformer-based large languagemodels (LLMs), which are also generative in nature. To this end, we proposeGReaT (Generation of Realistic Tabular data), which exploits an auto-regressivegenerative LLM to sample synthetic and yet highly realistic tabular data.Furthermore, GReaT can model tabular data distributions by conditioning on anysubset of features; the remaining features are sampled without additionaloverhead. We demonstrate the effectiveness of the proposed approach in a seriesof experiments that quantify the validity and quality of the produced datasamples from multiple angles. We find that GReaT maintains state-of-the-artperformance across numerous real-world and synthetic data sets withheterogeneous feature types coming in various sizes.", "output": "Language Models are Realistic Tabular Data Generators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep reinforcement learning (DRL) is one of the most powerful tools forsynthesizing complex robotic behaviors. But training DRL models is incrediblycompute and memory intensive, requiring large training datasets and replaybuffers to achieve performant results. This poses a challenge for the nextgeneration of field robots that will need to learn on the edge to adapt totheir environment. In this paper, we begin to address this issue throughobservation space quantization. We evaluate our approach using four simulatedrobot locomotion tasks and two state-of-the-art DRL algorithms, the on-policyProximal Policy Optimization (PPO) and off-policy Soft Actor-Critic (SAC) andfind that observation space quantization reduces overall memory costs by asmuch as 4.2x without impacting learning performance.", "output": "Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic Locomotion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we describe a universal method for extracting the underlyingmonotonic trend factor from time series data. We propose an approach related tothe Mann-Kendall test, a standard monotonic trend detection method and call itcontrastive trend estimation (CTE). We show that the CTE method identifies anyhidden trend underlying temporal data while avoiding the standard assumptionsused for monotonic trend identification. In particular, CTE can take any typeof temporal data (vector, images, graphs, time series, etc.) as input. Wefinally illustrate the interest of our CTE method through several experimentson different types of data and problems.", "output": "Universal hidden monotonic trend estimation with contrastive learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Uncertainty quantification is a central challenge in reliable and trustworthymachine learning. Naive measures such as last-layer scores are well-known toyield overconfident estimates in the context of overparametrized neuralnetworks. Several methods, ranging from temperature scaling to differentBayesian treatments of neural networks, have been proposed to mitigateoverconfidence, most often supported by the numerical observation that theyyield better calibrated uncertainty measures. In this work, we provide a sharpcomparison between popular uncertainty measures for binary classification in amathematically tractable model for overparametrized neural networks: the randomfeatures model. We discuss a trade-off between classification accuracy andcalibration, unveiling a double descent like behavior in the calibration curveof optimally regularized estimators as a function of overparametrization. Thisis in contrast with the empirical Bayes method, which we show to be wellcalibrated in our setting despite the higher generalization error andoverparametrization.", "output": "On double-descent in uncertainty quantification in overparametrized models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a smoothly broken power law functional form (referred to by us asa Broken Neural Scaling Law (BNSL)) that accurately models and extrapolates thescaling behaviors of deep neural networks (i.e. how the evaluation metric ofinterest varies as the amount of compute used for training, number of modelparameters, training dataset size, model input size, number of training steps,or upstream performance varies) for various architectures and for each ofvarious tasks within a large and diverse set of upstream and downstream tasks,in zero-shot, prompted, and fine-tuned settings. This set includes large-scalevision, language, audio, video, diffusion, generative modeling, multimodallearning, contrastive learning, AI alignment, robotics, out-of-distribution(OOD) generalization, continual learning, transfer learning, uncertaintyestimation / calibration, out-of-distribution detection, adversarialrobustness, distillation, sparsity, retrieval, quantization, pruning, fairness,molecules, computer programming/coding, math word problems, \"emergent\" \"phasetransitions / changes\", arithmetic, unsupervised/self-supervised learning, &reinforcement learning (single agent & multi-agent). When compared to otherfunctional forms for neural scaling behavior, this functional form yieldsextrapolations of scaling behavior that are considerably more accurate on thisset. Moreover, this functional form accurately models & extrapolates scalingbehavior that other functional forms are incapable of expressing such as thenon-monotonic transitions present in the scaling behavior of phenomena such asdouble descent & the delayed, sharp inflection points present in the scalingbehavior of tasks such as arithmetic. Lastly, we use this functional form toglean insights about the limit of the predictability of scaling behavior. Codeis available at ", "output": "Broken Neural Scaling Laws."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a vehicular edge federated learning (VEFL) solution,where an edge server leverages highly mobile connected vehicles' (CVs') onboardcentral processing units (CPUs) and local datasets to train a global model.Convergence analysis reveals that the VEFL training loss depends on thesuccessful receptions of the CVs' trained models over the intermittentvehicle-to-infrastructure (V2I) wireless links. Owing to high mobility, in thefull device participation case (FDPC), the edge server aggregates client modelparameters based on a weighted combination according to the CVs' dataset sizesand sojourn periods, while it selects a subset of CVs in the partial deviceparticipation case (PDPC). We then devise joint VEFL and radio accesstechnology (RAT) parameters optimization problems under delay, energy and costconstraints to maximize the probability of successful reception of the locallytrained models. Considering that the optimization problem is NP-hard, wedecompose it into a VEFL parameter optimization sub-problem, given theestimated worst-case sojourn period, delay and energy expense, and an onlineRAT parameter optimization sub-problem. Finally, extensive simulations areconducted to validate the effectiveness of the proposed solutions with apractical 5G new radio (5G-NR) RAT under a realistic microscopic mobilitymodel.", "output": "Resource Constrained Vehicular Edge Federated Learning with Highly Mobile Connected Vehicles."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Embedding knowledge graphs (KGs) for multi-hop logical reasoning is achallenging problem due to massive and complicated structures in many KGs.Recently, many promising works projected entities and queries into a geometricspace to efficiently find answers. However, it remains challenging to model thenegation and union operator. The negation operator has no strict boundaries,which generates overlapped embeddings and leads to obtaining ambiguous answers.An additional limitation is that the union operator is non-closure, whichundermines the model to handle a series of union operators. To address theseproblems, we propose a novel probabilistic embedding model, namely GammaEmbeddings (GammaE), for encoding entities and queries to answer differenttypes of FOL queries on KGs. We utilize the linear property and strong boundarysupport of the Gamma distribution to capture more features of entities andqueries, which dramatically reduces model uncertainty. Furthermore, GammaEimplements the Gamma mixture method to design the closed union operator. Theperformance of GammaE is validated on three large logical query datasets.Experimental results show that GammaE significantly outperformsstate-of-the-art models on public benchmarks.", "output": "GammaE: Gamma Embeddings for Logical Queries on Knowledge Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To enhance documentation and maintenance practices, developers conventionallyestablish links between related software artifacts manually. Empirical researchhas revealed that developers frequently overlook this practice, resulting insignificant information loss. To address this issue, automatic link recoverytechniques have been proposed. However, these approaches primarily focused onimproving prediction accuracy on randomly-split datasets, with limitedattention given to the impact of data leakage and the generalizability of thepredictive models. LinkFormer seeks to address these limitations. Our approachnot only preserves and improves the accuracy of existing predictions but alsoenhances their alignment with real-world settings and their generalizability.First, to better utilize contextual information for prediction, we employ theTransformer architecture and fine-tune multiple pre-trained models on bothtextual and metadata information of issues and commits. Next, to gauge theeffect of time on model performance, we employ two splitting policies duringboth the training and testing phases; randomly- and temporally-split datasets.Finally, in pursuit of a generic model that can demonstrate high performanceacross a range of projects, we undertake additional fine-tuning of LinkFormerwithin two distinct transfer-learning settings. Our findings support that tosimulate real-world scenarios effectively, researchers must maintain thetemporal flow of data when training models. Furthermore, the resultsdemonstrate that LinkFormer outperforms existing methodologies by a significantmargin, achieving a 48% improvement in F1-measure within a project-basedsetting. Finally, the performance of LinkFormer in the cross-project setting iscomparable to its average performance within the project-based scenario.", "output": "An Empirical Study on Data Leakage and Generalizability of Link Prediction Models for Issues and Commits."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a method, MMD-B-Fair, to learn fair representations of data viakernel two-sample testing. We find neural features of our data where a maximummean discrepancy (MMD) test cannot distinguish between representations ofdifferent sensitive groups, while preserving information about the targetattributes. Minimizing the power of an MMD test is more difficult thanmaximizing it (as done in previous work), because the test threshold's complexbehavior cannot be simply ignored. Our method exploits the simple asymptoticsof block testing schemes to efficiently find fair representations withoutrequiring complex adversarial optimization or generative modelling schemeswidely used by existing work on fair representation learning. We evaluate ourapproach on various datasets, showing its ability to ``hide'' information aboutsensitive attributes, and its effectiveness in downstream transfer tasks.", "output": "MMD-B-Fair: Learning Fair Representations with Statistical Testing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the last decade, Federated Learning (FL) has gained relevance in trainingcollaborative models without sharing sensitive data. Since its birth,Centralized FL (CFL) has been the most common approach in the literature, wherea central entity creates a global model. However, a centralized approach leadsto increased latency due to bottlenecks, heightened vulnerability to systemfailures, and trustworthiness concerns affecting the entity responsible for theglobal model creation. Decentralized Federated Learning (DFL) emerged toaddress these concerns by promoting decentralized model aggregation andminimizing reliance on centralized architectures. However, despite the workdone in DFL, the literature has not (i) studied the main aspectsdifferentiating DFL and CFL; (ii) analyzed DFL frameworks to create andevaluate new solutions; and (iii) reviewed application scenarios using DFL.Thus, this article identifies and analyzes the main fundamentals of DFL interms of federation architectures, topologies, communication mechanisms,security approaches, and key performance indicators. Additionally, the paper athand explores existing mechanisms to optimize critical DFL fundamentals. Then,the most relevant features of the current DFL frameworks are reviewed andcompared. After that, it analyzes the most used DFL application scenarios,identifying solutions based on the fundamentals and frameworks previouslydefined. Finally, the evolution of existing DFL solutions is studied to providea list of trends, lessons learned, and open challenges.", "output": "Decentralized Federated Learning: Fundamentals, State-of-the-art, Frameworks, Trends, and Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "DAG(Directed Acyclic Graph) from causal inference does not differentiatecausal effects and correlated changes. And the general effect of a populationis usually approximated by averaging correlations over all individuals. SinceAI(Artificial Intelligence) enables large-scale structure modeling on big data,the complex hidden confoundings have made these approximation errors no longerignorable but snowballed to considerable modeling bias - Such CausalRepresentation Bias (CRB) leads to many problems: ungeneralizable causalmodels, unrevealed individual-level features, hardly utilized causal knowledgein DL(Deep Learning), etc. In short, DAG must be redefined to enable a newframework for causal AI.The observational time series in statistics can only represent correlatedchanges, while the DL-based autoencoder can represent them as individualizedfeature changes in latent space to estimate the causal effects directly. Inthis paper, we introduce the redefined do-DAG to visualize CRB, propose ageneric solution Causal Representation Learning (CRL) framework, along with anovel architecture for its realization, and experimentally verify thefeasibility.", "output": "Realization of Causal Representation Learning and Redefined DAG for Causal AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work presents a novel deep-learning-based pipeline for the inverseproblem of image deblurring, leveraging augmentation and pre-training withsynthetic data. Our results build on our winning submission to the recentHelsinki Deblur Challenge 2021, whose goal was to explore the limits ofstate-of-the-art deblurring algorithms in a real-world data setting. The taskof the challenge was to deblur out-of-focus images of random text, thereby in adownstream task, maximizing an optical-character-recognition-based scorefunction. A key step of our solution is the data-driven estimation of thephysical forward model describing the blur process. This enables a stream ofsynthetic data, generating pairs of ground-truth and blurry images on-the-fly,which is used for an extensive augmentation of the small amount of challengedata provided. The actual deblurring pipeline consists of an approximateinversion of the radial lens distortion (determined by the estimated forwardmodel) and a U-Net architecture, which is trained end-to-end. Our algorithm wasthe only one passing the hardest challenge level, achieving over $70%$character recognition accuracy. Our findings are well in line with the paradigmof data-centric machine learning, and we demonstrate its effectiveness in thecontext of inverse problems. Apart from a detailed presentation of ourmethodology, we also analyze the importance of several design choices in aseries of ablation studies. The code of our challenge submission is availableunder ", "output": "Let's Enhance: A Deep Learning Approach to Extreme Deblurring of Text Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we introduce Optimal Classification Forests, a new family ofclassifiers that takes advantage of an optimal ensemble of decision trees toderive accurate and interpretable classifiers. We propose a novel mathematicaloptimization-based methodology in which a given number of trees aresimultaneously constructed, each of them providing a predicted class for theobservations in the feature space. The classification rule is derived byassigning to each observation its most frequently predicted class among thetrees in the forest. We provide a mixed integer linear programming formulationfor the problem. We report the results of our computational experiments, fromwhich we conclude that our proposed method has equal or superior performancecompared with state-of-the-art tree-based classification methods. Moreimportantly, it achieves high prediction accuracy with, for example, orders ofmagnitude fewer trees than random forests. We also present three real-worldcase studies showing that our methodology has very interesting implications interms of interpretability.", "output": "A Mathematical Programming Approach to Optimal Classification Forests."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Non-parametric episodic memory can be used to quickly latch ontohigh-rewarded experience in reinforcement learning tasks. In contrast toparametric deep reinforcement learning approaches in which reward signals needto be back-propagated slowly, these methods only need to discover the solutiononce, and may then repeatedly solve the task. However, episodic controlsolutions are stored in discrete tables, and this approach has so far only beenapplied to discrete action space problems. Therefore, this paper introducesContinuous Episodic Control (CEC), a novel non-parametric episodic memoryalgorithm for sequential decision making in problems with a continuous actionspace. Results on several sparse-reward continuous control environments showthat our proposed method learns faster than state-of-the-art model-free RL andmemory-augmented RL algorithms, while maintaining good long-run performance aswell. In short, CEC can be a fast approach for learning in continuous controltasks.", "output": "Continuous Episodic Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Label noise is a significant obstacle in deep learning model training. It canhave a considerable impact on the performance of image classification models,particularly deep neural networks, which are especially susceptible becausethey have a strong propensity to memorise noisy labels. In this paper, we haveexamined the fundamental concept underlying related label noise approaches. Atransition matrix estimator has been created, and its effectiveness against theactual transition matrix has been demonstrated. In addition, we examined thelabel noise robustness of two convolutional neural network classifiers withLeNet and AlexNet designs. The two FashionMINIST datasets have revealed therobustness of both models. We are not efficiently able to demonstrate theinfluence of the transition matrix noise correction on robustness enhancementsdue to our inability to correctly tune the complex convolutional neural networkmodel due to time and computing resource constraints. There is a need foradditional effort to fine-tune the neural network model and explore theprecision of the estimated transition model in future research.", "output": "Establishment of Neural Networks Robust to Label Noise."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Neural Networks usually rely on the assumption that the graph topologyis available to the network as well as optimal for the downstream task. Latentgraph inference allows models to dynamically learn the intrinsic graphstructure of problems where the connectivity patterns of data may not bedirectly accessible. In this work, we generalize the discrete DifferentiableGraph Module (dDGM) for latent graph learning. The original dDGM architectureused the Euclidean plane to encode latent features based on which the latentgraphs were generated. By incorporating Riemannian geometry into the model andgenerating more complex embedding spaces, we can improve the performance of thelatent graph inference system. In particular, we propose a computationallytractable approach to produce product manifolds of constant curvature modelspaces that can encode latent features of varying structure. The latentrepresentations mapped onto the inferred product manifold are used to computericher similarity measures that are leveraged by the latent graph learningmodel to obtain optimized latent graphs. Moreover, the curvature of the productmanifold is learned during training alongside the rest of the networkparameters and based on the downstream task, rather than it being a staticembedding space. Our novel approach is tested on a wide range of datasets, andoutperforms the original dDGM model.", "output": "Latent Graph Inference using Product Manifolds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over-parameterization of deep neural networks (DNNs) has shown highprediction accuracy for many applications. Although effective, the large numberof parameters hinders its popularity on resource-limited devices and has anoutsize environmental impact. Sparse training (using a fixed number of nonzeroweights in each iteration) could significantly mitigate the training costs byreducing the model size. However, existing sparse training methods mainly useeither random-based or greedy-based drop-and-grow strategies, resulting inlocal minimal and low accuracy. In this work, we consider the dynamic sparsetraining as a sparse connectivity search problem and design an exploitation andexploration acquisition function to escape from local optima and saddle points.We further design an acquisition function and provide the theoreticalguarantees for the proposed method and clarify its convergence property.Experimental results show that sparse models (up to 98% sparsity) obtained byour proposed method outperform the SOTA sparse training methods on a widevariety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10,ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models.On ResNet-50 / ImageNet, the proposed method has up to 8.2% accuracyimprovement compared to SOTA sparse training methods.", "output": "Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks have emerged as the workhorse for a large section ofrobotics and control applications, especially as models for dynamical systems.Such data-driven models are in turn used for designing and verifying autonomoussystems. They are particularly useful in modeling medical systems where datacan be leveraged to individualize treatment. In safety-critical applications,it is important that the data-driven model is conformant to establishedknowledge from the natural sciences. Such knowledge is often available or canoften be distilled into a (possibly black-box) model. For instance, an F1racing car should conform to Newton's laws (which are encoded within a unicyclemodel). In this light, we consider the following problem - given a model $M$and a state transition dataset, we wish to best approximate the system modelwhile being a bounded distance away from $M$. We propose a method to guaranteethis conformance. Our first step is to distill the dataset into a fewrepresentative samples called memories, using the idea of a growing neural gas.Next, using these memories we partition the state space into disjoint subsetsand compute bounds that should be respected by the neural network in eachsubset. This serves as a symbolic wrapper for guaranteed conformance. We arguetheoretically that this only leads to a bounded increase in approximationerror; which can be controlled by increasing the number of memories. Weexperimentally show that on three case studies (Car Model, Drones, andArtificial Pancreas), our constrained neurosymbolic models conform to specifiedmodels (each encoding various constraints) with order-of-magnitude improvementscompared to the augmented Lagrangian and vanilla training methods. Our code canbe found at: ", "output": "Guaranteed Conformance of Neurosymbolic Models to Natural Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Edge Impulse is a cloud-based machine learning operations (MLOps) platformfor developing embedded and edge ML (TinyML) systems that can be deployed to awide range of hardware targets. Current TinyML workflows are plagued byfragmented software stacks and heterogeneous deployment hardware, making MLmodel optimizations difficult and unportable. We present Edge Impulse, apractical MLOps platform for developing TinyML systems at scale. Edge Impulseaddresses these challenges and streamlines the TinyML design cycle bysupporting various software and hardware optimizations to create an extensibleand portable software stack for a multitude of embedded systems. As of Oct.2022, Edge Impulse hosts 118,185 projects from 50,953 developers.", "output": "Edge Impulse: An MLOps Platform for Tiny Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have recently emerged as a promising learningparadigm in learning graph-structured data and have demonstrated wide successacross various domains such as recommendation systems, social networks, andelectronic design automation (EDA). Like other deep learning (DL) methods, GNNsare being deployed in sophisticated modern hardware systems, as well asdedicated accelerators. However, despite the popularity of GNNs and the recentefforts of bringing GNNs to hardware, the fault tolerance and resilience ofGNNs have generally been overlooked. Inspired by the inherent algorithmicresilience of DL methods, this paper conducts, for the first time, alarge-scale and empirical study of GNN resilience, aiming to understand therelationship between hardware faults and GNN accuracy. By developing acustomized fault injection tool on top of PyTorch, we perform extensive faultinjection experiments on various GNN models and application datasets. Weobserve that the error resilience of GNN models varies by orders of magnitudewith respect to different models and application datasets. Further, we explorea low-cost error mitigation mechanism for GNN to enhance its resilience. ThisGNN resilience study aims to open up new directions and opportunities forfuture GNN accelerator design and architectural optimization.", "output": "PyGFI: Analyzing and Enhancing Robustness of Graph Neural Networks Against Hardware Errors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This article presents a dataset of 10,917 news articles with hierarchicalnews categories collected between 1 January 2019 and 31 December 2019. Wemanually labeled the articles based on a hierarchical taxonomy with 17first-level and 109 second-level categories. This dataset can be used to trainmachine learning models for automatically classifying news articles by topic.This dataset can be helpful for researchers working on news structuring,classification, and predicting future events based on released news.", "output": "MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is an emerging paradigm to train model withdistributed data from numerous Internet of Things (IoT) devices. It inherentlyassumes a uniform capacity among participants. However, due to differentconditions such as differing energy budgets or executing parallel unrelatedtasks, participants have diverse computational resources in practice.Participants with insufficient computation budgets must plan for the use ofrestricted computational resources appropriately, otherwise they would beunable to complete the entire training procedure, resulting in modelperformance decline. To address the this issue, we propose a strategy forestimating local models without computationally intensive iterations. Based onit, we propose Computationally Customized Federated Averaging (CC-FedAvg),which allows participants to determine whether to perform traditional localtraining or model estimation in each round based on their current computationalbudgets. Both theoretical analysis and exhaustive experiments indicate thatCC-FedAvg has the same convergence rate and comparable performance as FedAvgwithout resource constraints. Furthermore, CC-FedAvg can be viewed as acomputation-efficient version of FedAvg that retains model performance whileconsiderably lowering computation overhead.", "output": "CC-FedAvg: Computationally Customized Federated Averaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability to jointly learn from multiple modalities, such as text, audio,and visual data, is a defining feature of intelligent systems. While there havebeen promising advances in designing neural networks to harness multimodaldata, the enormous success of data augmentation currently remains limited tosingle-modality tasks like image classification. Indeed, it is particularlydifficult to augment each modality while preserving the overall semanticstructure of the data; for example, a caption may no longer be a gooddescription of an image after standard augmentations have been applied, such astranslation. Moreover, it is challenging to specify reasonable transformationsthat are not tailored to a particular modality. In this paper, we introduceLeMDA, Learning Multimodal Data Augmentation, an easy-to-use method thatautomatically learns to jointly augment multimodal data in feature space, withno constraints on the identities of the modalities or the relationship betweenmodalities. We show that LeMDA can (1) profoundly improve the performance ofmultimodal deep learning architectures, (2) apply to combinations of modalitiesthat have not been previously considered, and (3) achieve state-of-the-artresults on a wide range of applications comprised of image, text, and tabulardata.", "output": "Learning Multimodal Data Augmentation in Feature Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "An emerging trend in deep learning research focuses on the applications ofgraph neural networks (GNNs) for mesh-based continuum mechanics simulations.Most of these learning frameworks operate on graphs wherein each edge connectstwo nodes. Inspired by the data connectivity in the finite element method, wepresent a method to construct a hypergraph by connecting the nodes by elementsrather than edges. A hypergraph message-passing network is defined on such anode-element hypergraph that mimics the calculation process of local stiffnessmatrices. We term this method a finite element-inspired hypergraph neuralnetwork, in short FEIH($phi$)-GNN. We further equip the proposed network withrotation equivariance, and explore its capability for modeling unsteady fluidflow systems. The effectiveness of the network is demonstrated on two commonbenchmark problems, namely the fluid flow around a circular cylinder andairfoil configurations. Stabilized and accurate temporal roll-out predictionscan be obtained using the $phi$-GNN framework within the interpolationReynolds number range. The network is also able to extrapolate moderatelytowards higher Reynolds number domain out of the training range.", "output": "A Finite Element-Inspired Hypergraph Neural Network: Application to Fluid Dynamics Simulations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Hamiltonian neural networks (HNN) have been introduced toincorporate prior physical knowledge when learning the dynamical equations ofHamiltonian systems. Hereby, the symplectic system structure is preserveddespite the data-driven modeling approach. However, preserving symmetriesrequires additional attention. In this research, we enhance HNN with a Liealgebra framework to detect and embed symmetries in the neural network. Thisapproach allows to simultaneously learn the symmetry group action and the totalenergy of the system. As illustrating examples, a pendulum on a cart and atwo-body problem from astrodynamics are considered.", "output": "Hamiltonian Neural Networks with Automatic Symmetry Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bagging is an important technique for stabilizing machine learning models. Inthis paper, we derive a finite-sample guarantee on the stability of bagging forany model. Our result places no assumptions on the distribution of the data, onthe properties of the base algorithm, or on the dimensionality of thecovariates. Our guarantee applies to many variants of bagging and is optimal upto a constant. Empirical results validate our findings, showing that baggingsuccessfully stabilizes even highly unstable base algorithms.", "output": "Bagging Provides Assumption-free Stability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language Models (LMs) have been shown to leak information about training datathrough sentence-level membership inference and reconstruction attacks.Understanding the risk of LMs leaking Personally Identifiable Information (PII)has received less attention, which can be attributed to the false assumptionthat dataset curation techniques such as scrubbing are sufficient to preventPII leakage. Scrubbing techniques reduce but do not prevent the risk of PIIleakage: in practice scrubbing is imperfect and must balance the trade-offbetween minimizing disclosure and preserving the utility of the dataset. On theother hand, it is unclear to which extent algorithmic defenses such asdifferential privacy, designed to guarantee sentence- or user-level privacy,prevent PII disclosure. In this work, we introduce rigorous game-baseddefinitions for three types of PII leakage via black-box extraction, inference,and reconstruction attacks with only API access to an LM. We empiricallyevaluate the attacks against GPT-2 models fine-tuned with and without defensesin three domains: case law, health care, and e-mails. Our main contributionsare (i) novel attacks that can extract up to 10$times$ more PII sequences thanexisting attacks, (ii) showing that sentence-level differential privacy reducesthe risk of PII disclosure but still leaks about 3% of PII sequences, and (iii)a subtle connection between record-level membership inference and PIIreconstruction. Code to reproduce all experiments in the paper is available at", "output": "Analyzing Leakage of Personally Identifiable Information in Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Heterogeneous graphs offer powerful data representations for traffic, giventheir ability to model the complex interaction effects among a varying numberof traffic participants and the underlying road infrastructure. With the recentadvent of graph neural networks (GNNs) as the accompanying deep learningframework, the graph structure can be efficiently leveraged for various machinelearning applications such as trajectory prediction. As a first of its kind,our proposed Python framework offers an easy-to-use and fully customizable dataprocessing pipeline to extract standardized graph datasets from trafficscenarios. Providing a platform for GNN-based autonomous driving research, itimproves comparability between approaches and allows researchers to focus onmodel implementation instead of dataset curation.", "output": "Geometric Deep Learning for Autonomous Driving: Unlocking the Power of Graph Neural Networks With CommonRoad-Geometric."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Finding saddle points of dynamical systems is an important problem inpractical applications such as the study of rare events of molecular systems.Gentlest ascent dynamics (GAD) is one of a number of algorithms in existencethat attempt to find saddle points in dynamical systems. It works by deriving anew dynamical system in which saddle points of the original system becomestable equilibria. GAD has been recently generalized to the study of dynamicalsystems on manifolds (differential algebraic equations) described by equalityconstraints and given in an extrinsic formulation. In this paper, we present anextension of GAD to manifolds defined by point-clouds, formulated using theintrinsic viewpoint. These point-clouds are adaptively sampled during aniterative process that drives the system from the initial conformation(typically in the neighborhood of a stable equilibrium) to a saddle point. Ourmethod requires the reactant (initial conformation), does not require theexplicit constraint equations to be specified, and is purely data-driven.", "output": "Gentlest ascent dynamics on manifolds defined by adaptively sampled point-clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider policy gradient methods for stochastic optimal control problem incontinuous time. In particular, we analyze the gradient flow for the control,viewed as a continuous time limit of the policy gradient method. We prove theglobal convergence of the gradient flow and establish a convergence rate undersome regularity assumptions. The main novelty in the analysis is the notion oflocal optimal control function, which is introduced to characterize the localoptimality of the iterate.", "output": "A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the widespread deployment of deep neural networks (DNNs), ensuring thereliability of DNN-based systems is of great importance. Serious reliabilityissues such as system failures can be caused by numerical defects, one of themost frequent defects in DNNs. To assure high reliability against numericaldefects, in this paper, we propose the RANUM approach including noveltechniques for three reliability assurance tasks: detection of potentialnumerical defects, confirmation of potential-defect feasibility, and suggestionof defect fixes. To the best of our knowledge, RANUM is the first approach thatconfirms potential-defect feasibility with failure-exhibiting tests andsuggests fixes automatically. Extensive experiments on the benchmarks of 63real-world DNN architectures show that RANUM outperforms state-of-the-artapproaches across the three reliability assurance tasks. In addition, when theRANUM-generated fixes are compared with developers' fixes on open-sourceprojects, in 37 out of 40 cases, RANUM-generated fixes are equivalent to oreven better than human fixes.", "output": "Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Differentially private training offers a protection which is usuallyinterpreted as a guarantee against membership inference attacks. By proxy, thisguarantee extends to other threats like reconstruction attacks attempting toextract complete training examples. Recent works provide evidence that if onedoes not need to protect against membership attacks but instead only wants toprotect against training data reconstruction, then utility of private modelscan be improved because less noise is required to protect against these moreambitious attacks. We investigate this further in the context of DP-SGD, astandard algorithm for private deep learning, and provide an upper bound on thesuccess of any reconstruction attack against DP-SGD together with an attackthat empirically matches the predictions of our bound. Together, these tworesults open the door to fine-grained investigations on how to set the privacyparameters of DP-SGD in practice to protect against reconstruction attacks.Finally, we use our methods to demonstrate that different settings of theDP-SGD parameters leading to the same DP guarantees can result in significantlydifferent success rates for reconstruction, indicating that the DP guaranteealone might not be a good proxy for controlling the protection againstreconstruction attacks.", "output": "Bounding Training Data Reconstruction in DP-SGD."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, deep learning has achieved remarkable success in variousfields such as image recognition, natural language processing, and speechrecognition. The effectiveness of deep learning largely depends on theoptimization methods used to train deep neural networks. In this paper, weprovide an overview of first-order optimization methods such as StochasticGradient Descent, Adagrad, Adadelta, and RMSprop, as well as recentmomentum-based and adaptive gradient methods such as Nesterov acceleratedgradient, Adam, Nadam, AdaMax, and AMSGrad. We also discuss the challengesassociated with optimization in deep learning and explore techniques foraddressing these challenges, including weight initialization, batchnormalization, and layer normalization. Finally, we provide recommendations forselecting optimization methods for different deep learning tasks and datasets.This paper serves as a comprehensive guide to optimization methods in deeplearning and can be used as a reference for researchers and practitioners inthe field.", "output": "Optimization Methods in Deep Learning: A Comprehensive Overview."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a new approach for generating sequences of implied volatility(IV) surfaces across multiple assets that is faithful to historical prices. Wedo so using a combination of functional data analysis and neural stochasticdifferential equations (SDEs) combined with a probability integral transformpenalty to reduce model misspecification. We demonstrate that learning thejoint dynamics of IV surfaces and prices produces market scenarios that areconsistent with historical features and lie within the sub-manifold of surfacesthat are essentially free of static arbitrage. Finally, we demonstrate thatdelta hedging using the simulated surfaces generates profit and loss (P&L)distributions that are consistent with realised P&Ls.", "output": "FuNVol: A Multi-Asset Implied Volatility Market Simulator using Functional Principal Components and Neural SDEs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Teacher-Student Framework (TSF) is a reinforcement learning setting wherea teacher agent guards the training of a student agent by intervening andproviding online demonstrations. Assuming optimal, the teacher policy has theperfect timing and capability to intervene in the learning process of thestudent agent, providing safety guarantee and exploration guidance.Nevertheless, in many real-world settings it is expensive or even impossible toobtain a well-performing teacher policy. In this work, we relax the assumptionof a well-performing teacher and develop a new method that can incorporatearbitrary teacher policies with modest or inferior performance. We instantiatean Off-Policy Reinforcement Learning algorithm, termed Teacher-Student SharedControl (TS2C), which incorporates teacher intervention based ontrajectory-based value estimation. Theoretical analysis validates that theproposed TS2C algorithm attains efficient exploration and substantial safetyguarantee without being affected by the teacher's own performance. Experimentson various continuous control tasks show that our method can exploit teacherpolicies at different performance levels while maintaining a low training cost.Moreover, the student policy surpasses the imperfect teacher policy in terms ofhigher accumulated reward in held-out testing environments. Code is availableat ", "output": "Guarded Policy Optimization with Imperfect Online Demonstrations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks provide state-of-the-art accuracy for vision tasks butthey require significant resources for training. Thus, they are trained oncloud servers far from the edge devices that acquire the data. This issueincreases communication cost, runtime and privacy concerns. In this study, anovel hierarchical training method for deep neural networks is proposed thatuses early exits in a divided architecture between edge and cloud workers toreduce the communication cost, training runtime and privacy concerns. Themethod proposes a brand-new use case for early exits to separate the backwardpass of neural networks between the edge and the cloud during the trainingphase. We address the issues of most available methods that due to thesequential nature of the training phase, cannot train the levels of hierarchysimultaneously or they do it with the cost of compromising privacy. Incontrast, our method can use both edge and cloud workers simultaneously, doesnot share the raw input data with the cloud and does not require communicationduring the backward pass. Several simulations and on-device experiments fordifferent neural network architectures demonstrate the effectiveness of thismethod. It is shown that the proposed method reduces the training runtime by29% and 61% in CIFAR-10 classification experiment for VGG-16 and ResNet-18 whenthe communication with the cloud is done at a low bit rate channel. This gainin the runtime is achieved whilst the accuracy drop is negligible. This methodis advantageous for online learning of high-accuracy deep neural networks onlow-resource devices such as mobile phones or robots as a part of an edge-cloudsystem, making them more flexible in facing new tasks and classes of data.", "output": "Hierarchical Training of Deep Neural Networks Using Early Exiting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the influence of different activation functions in the output layerof deep neural network models for soft and hard label prediction in thelearning with disagreement task. In this task, the goal is to quantify theamount of disagreement via predicting soft labels. To predict the soft labels,we use BERT-based preprocessors and encoders and vary the activation functionused in the output layer, while keeping other parameters constant. The softlabels are then used for the hard label prediction. The activation functionsconsidered are sigmoid as well as a step-function that is added to the modelpost-training and a sinusoidal activation function, which is introduced for thefirst time in this paper.", "output": "Lon-ea at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robust point cloud classification is crucial for real-world applications, asconsumer-type 3D sensors often yield partial and noisy data, degraded byvarious artifacts. In this work we propose a general ensemble framework, basedon partial point cloud sampling. Each ensemble member is exposed to onlypartial input data. Three sampling strategies are used jointly, two local ones,based on patches and curves, and a global one of random sampling. Wedemonstrate the robustness of our method to various local and globaldegradations. We show that our framework significantly improves the robustnessof top classification netowrks by a large margin. Our experimental setting usesthe recently introduced ModelNet-C database by Ren et al.[24], where we reachSOTA both on unaugmented and on augmented data. Our unaugmented mean CorruptionError (mCE) is 0.64 (current SOTA is 0.86) and 0.50 for augmented data (currentSOTA is 0.57). We analyze and explain these remarkable results throughdiversity analysis. Our code is available at:", "output": "EPiC: Ensemble of Partial Point Clouds for Robust Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A central task in control theory, artificial intelligence, and formal methodsis to synthesize reward-maximizing strategies for agents that operate inpartially unknown environments. In environments modeled by gray-box Markovdecision processes (MDPs), the impact of the agents' actions are known in termsof successor states but not the stochastics involved. In this paper, we devisea strategy synthesis algorithm for gray-box MDPs via reinforcement learningthat utilizes interval MDPs as internal model. To compete with limited samplingaccess in reinforcement learning, we incorporate two novel concepts into ouralgorithm, focusing on rapid and successful learning rather than on stochasticguarantees and optimality: lower confidence bound exploration reinforcesvariants of already learned practical strategies and action scoping reduces thelearning action space to promising actions. We illustrate benefits of ouralgorithms by means of a prototypical implementation applied on examples fromthe AI and formal methods communities.", "output": "Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "PCA-Net is a recently proposed neural operator architecture which combinesprincipal component analysis (PCA) with neural networks to approximateoperators between infinite-dimensional function spaces. The present workdevelops approximation theory for this approach, improving and significantlyextending previous work in this direction: First, a novel universalapproximation result is derived, under minimal assumptions on the underlyingoperator and the data-generating distribution. Then, two potential obstacles toefficient operator learning with PCA-Net are identified, and made precisethrough lower complexity bounds; the first relates to the complexity of theoutput distribution, measured by a slow decay of the PCA eigenvalues. The otherobstacle relates to the inherent complexity of the space of operators betweeninfinite-dimensional input and output spaces, resulting in a rigorous andquantifiable statement of the curse of dimensionality. In addition to theselower bounds, upper complexity bounds are derived. A suitable smoothnesscriterion is shown to ensure an algebraic decay of the PCA eigenvalues.Furthermore, it is shown that PCA-Net can overcome the general curse ofdimensionality for specific operators of interest, arising from the Darcy flowand the Navier-Stokes equations.", "output": "Operator learning with PCA-Net: upper and lower complexity bounds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, source-free unsupervised domain adaptation (SFUDA) has emerged as amore practical and feasible approach compared to unsupervised domain adaptation(UDA) which assumes that labeled source data are always accessible. However,significant limitations associated with SFUDA approaches are often overlooked,which limits their practicality in real-world applications. These limitationsinclude a lack of principled ways to determine optimal hyperparameters andperformance degradation when the unlabeled target data fail to meet certainrequirements such as a closed-set and identical label distribution to thesource data. All these limitations stem from the fact that SFUDA entirelyrelies on unlabeled target data. We empirically demonstrate the limitations ofexisting SFUDA methods in real-world scenarios including out-of-distributionand label distribution shifts in target data, and verify that none of thesemethods can be safely applied to real-world settings. Based on our experimentalresults, we claim that fine-tuning a source pretrained model with a few labeleddata (e.g., 1- or 3-shot) is a practical and reliable solution to circumventthe limitations of SFUDA. Contrary to common belief, we find that carefullyfine-tuned models do not suffer from overfitting even when trained with only afew labeled data, and also show little change in performance due to samplingbias. Our experimental results on various domain adaptation benchmarksdemonstrate that the few-shot fine-tuning approach performs comparatively underthe standard SFUDA settings, and outperforms comparison methods under realisticscenarios. Our code is available at .", "output": "Few-shot Fine-tuning is All You Need for Source-free Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Thermal imaging has numerous advantages over regular visible-range imagingsince it performs well in low-light circumstances. Super-Resolution approachescan broaden their usefulness by replicating accurate high-resolution thermalpictures using measurements from low-cost, low-resolution thermal sensors.Because of the spectral range mismatch between the images, GuidedSuper-Resolution of thermal images utilizing visible range images is difficult.However, In case of failure to capture Visible Range Images can prevent theoperations of applications in critical areas. We present a novel data fusionframework and regularization technique for Guided Super Resolution of Thermalimages. The proposed architecture is computationally in-expensive andlightweight with the ability to maintain performance despite missing one of themodalities, i.e., high-resolution RGB image or the lower-resolution thermalimage, and is designed to be robust in the presence of missing data. Theproposed method presents a promising solution to the frequently occurringproblem of missing modalities in a real-world scenario. Code is available at .", "output": "CoReFusion: Contrastive Regularized Fusion for Guided Thermal Super-Resolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Processing-in-memory (PIM) promises to alleviate the data movement bottleneckin modern computing systems. However, current real-world PIM systems have theinherent disadvantage that their hardware is more constrained than inconventional processors (CPU, GPU), due to the difficulty and cost of buildingprocessing elements near or inside the memory. As a result, general-purpose PIMarchitectures support fairly limited instruction sets and struggle to executecomplex operations such as transcendental functions and other hard-to-calculateoperations (e.g., square root). These operations are particularly important forsome modern workloads, e.g., activation functions in machine learningapplications.In order to provide support for transcendental (and other hard-to-calculate)functions in general-purpose PIM systems, we present emph{TransPimLib}, alibrary that provides CORDIC-based and LUT-based methods for trigonometricfunctions, hyperbolic functions, exponentiation, logarithm, square root, etc.We develop an implementation of TransPimLib for the UPMEM PIM architecture andperform a thorough evaluation of TransPimLib's methods in terms of performanceand accuracy, using microbenchmarks and three full workloads (Blackscholes,Sigmoid, Softmax). We open-source all our code and datasetsat~url{", "output": "TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper describes our submission to Task 10 at SemEval 2023-ExplainableDetection of Online Sexism (EDOS), divided into three subtasks. The recent risein social media platforms has seen an increase in disproportionate levels ofsexism experienced by women on social media platforms. This has made detectingand explaining online sexist content more important than ever to make socialmedia safer and more accessible for women. Our approach consists ofexperimenting and finetuning BERT-based models and using a Majority Votingensemble model that outperforms individual baseline model scores. Our systemachieves a macro F1 score of 0.8392 for Task A, 0.6092 for Task B, and 0.4319for Task C.", "output": "SSS at SemEval-2023 Task 10: Explainable Detection of Online Sexism using Majority Voted Fine-Tuned Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised discovery of stories with correlated news articles in real-timehelps people digest massive news streams without expensive human annotations. Acommon approach of the existing studies for unsupervised online story discoveryis to represent news articles with symbolic- or graph-based embedding andincrementally cluster them into stories. Recent large language models areexpected to improve the embedding further, but a straightforward adoption ofthe models by indiscriminately encoding all information in articles isineffective to deal with text-rich and evolving news streams. In this work, wepropose a novel thematic embedding with an off-the-shelf pretrained sentenceencoder to dynamically represent articles and stories by considering theirshared temporal themes. To realize the idea for unsupervised online storydiscovery, a scalable framework USTORY is introduced with two main techniques,theme- and time-aware dynamic embedding and novelty-aware adaptive clustering,fueled by lightweight story summaries. A thorough evaluation with real newsdata sets demonstrates that USTORY achieves higher story discovery performancesthan baselines while being robust and scalable to various streaming settings.", "output": "Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The monotone Variational Inequality (VI) is an important problem in machinelearning. In numerous instances, the VI problems are accompanied by functionconstraints which can possibly be data-driven, making the projection operatorchallenging to compute. In this paper, we present novel first-order methods forfunction constrained VI (FCVI) problem under various settings, including smoothor nonsmooth problems with a stochastic operator and/or stochastic constraints.First, we introduce the~{texttt{OpConEx}} method and its stochastic variants,which employ extrapolation of the operator and constraint evaluations to updatethe variables and the Lagrangian multipliers. These methods achieve optimaloperator or sample complexities when the FCVI problem is either (i)deterministic nonsmooth, or (ii) stochastic, including smooth or nonsmoothstochastic constraints. Notably, our algorithms are simple single-loopprocedures and do not require the knowledge of Lagrange multipliers to attainthese complexities. Second, to obtain the optimal operator complexity forsmooth deterministic problems, we present a novel single-loop AdaptiveLagrangian Extrapolation~(texttt{AdLagEx}) method that can adaptively searchfor and explicitly bound the Lagrange multipliers. Furthermore, we show thatall of our algorithms can be easily extended to saddle point problems withcoupled function constraints, hence achieving similar complexity results forthe aforementioned cases. To our best knowledge, many of these complexities areobtained for the first time in the literature.", "output": "First-order methods for Stochastic Variational Inequality problems with Function Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic speech recognition (ASR) has gained a remarkable success thanks torecent advances of deep learning, but it usually degrades significantly underreal-world noisy conditions. Recent works introduce speech enhancement (SE) asfront-end to improve speech quality, which is proved effective but may not beoptimal for downstream ASR due to speech distortion problem. Based on that,latest works combine SE and currently popular self-supervised learning (SSL) toalleviate distortion and improve noise robustness. Despite the effectiveness,the speech distortion caused by conventional SE still cannot be completelyeliminated. In this paper, we propose a self-supervised framework namedWav2code to implement a generalized SE without distortions for noise-robustASR. First, in pre-training stage the clean speech representations from SSLmodel are sent to lookup a discrete codebook via nearest-neighbor featurematching, the resulted code sequence are then exploited to reconstruct theoriginal clean representations, in order to store them in codebook as prior.Second, during finetuning we propose a Transformer-based code predictor toaccurately predict clean codes by modeling the global dependency of input noisyrepresentations, which enables discovery and restoration of high-quality cleanrepresentations without distortions. Furthermore, we propose an interactivefeature fusion network to combine original noisy and the restored cleanrepresentations to consider both fidelity and quality, resulting in even moreinformative features for downstream ASR. Finally, experiments on both syntheticand real noisy datasets demonstrate that Wav2code can solve the speechdistortion and improve ASR performance under various noisy conditions,resulting in stronger robustness.", "output": "Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given their flexibility and encouraging performance, deep-learning models arebecoming standard for motion prediction in autonomous driving. However, withgreat flexibility comes a lack of interpretability and possible violations ofphysical constraints. Accompanying these data-driven methods withdifferentially-constrained motion models to provide physically feasibletrajectories is a promising future direction. The foundation for this work is apreviously introduced graph-neural-network-based model, MTP-GO. The neuralnetwork learns to compute the inputs to an underlying motion model to providephysically feasible trajectories. This research investigates the performance ofvarious motion models in combination with numerical solvers for the predictiontask. The study shows that simpler models, such as low-order integrator models,are preferred over more complex, e.g., kinematic models, to achieve accuratepredictions. Further, the numerical solver can have a substantial impact onperformance, advising against commonly used first-order methods like Eulerforward. Instead, a second-order method like Heun's can greatly improvepredictions.", "output": "Evaluation of Differentially Constrained Motion Models for Graph-Based Trajectory Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is a growing interest in using reinforcement learning (RL) topersonalize sequences of treatments in digital health to support users inadopting healthier behaviors. Such sequential decision-making problems involvedecisions about when to treat and how to treat based on the user's context(e.g., prior activity level, location, etc.). Online RL is a promisingdata-driven approach for this problem as it learns based on each user'shistorical responses and uses that knowledge to personalize these decisions.However, to decide whether the RL algorithm should be included in an``optimized'' intervention for real-world deployment, we must assess the dataevidence indicating that the RL algorithm is actually personalizing thetreatments to its users. Due to the stochasticity in the RL algorithm, one mayget a false impression that it is learning in certain states and using thislearning to provide specific treatments. We use a working definition ofpersonalization and introduce a resampling-based methodology for investigatingwhether the personalization exhibited by the RL algorithm is an artifact of theRL algorithm stochasticity. We illustrate our methodology with a case study byanalyzing the data from a physical activity clinical trial called HeartSteps,which included the use of an online RL algorithm. We demonstrate how ourapproach enhances data-driven truth-in-advertising of algorithm personalizationboth across all users as well as within specific users in the study.", "output": "Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In populous countries, pending legal cases have been growing exponentially.There is a need for developing NLP-based techniques for processing andautomatically understanding legal documents. To promote research in the area ofLegal NLP we organized the shared task LegalEval - Understanding Legal Texts atSemEval 2023. LegalEval task has three sub-tasks: Task-A (Rhetorical RolesLabeling) is about automatically structuring legal documents into semanticallycoherent units, Task-B (Legal Named Entity Recognition) deals with identifyingrelevant entities in a legal document and Task-C (Court Judgement Predictionwith Explanation) explores the possibility of automatically predicting theoutcome of a legal case along with providing an explanation for the prediction.In total 26 teams (approx. 100 participants spread across the world) submittedsystems paper. In each of the sub-tasks, the proposed systems outperformed thebaselines; however, there is a lot of scope for improvement. This paperdescribes the tasks, and analyzes techniques proposed by various teams.", "output": "SemEval 2023 Task 6: LegalEval -- Understanding Legal Texts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Languages are not created randomly but rather to communicate information.There is a strong association between languages and their underlying meanings,resulting in a sparse joint distribution that is heavily peaked according totheir correlations. Moreover, these peak values happen to match with themarginal distribution of languages due to the sparsity. With the advent of LLMstrained on big data and large models, we can now precisely assess the marginaldistribution of languages, providing a convenient means of exploring the sparsestructures in the joint distribution for effective inferences. In this paper,we categorize languages as either unambiguous or {epsilon}-ambiguous andpresent quantitative results to demonstrate that the emergent abilities ofLLMs, such as language understanding, in-context learning, chain-of-thoughtprompting, and effective instruction fine-tuning, can all be attributed toBayesian inference on the sparse joint distribution of languages.", "output": "A Latent Space Theory for Emergent Abilities in Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While deep reinforcement learning has shown important empirical success, ittends to learn relatively slow due to slow propagation of rewards informationand slow update of parametric neural networks. Non-parametric episodic memory,on the other hand, provides a faster learning alternative that does not requirerepresentation learning and uses maximum episodic return as state-action valuesfor action selection. Episodic memory and reinforcement learning both havetheir own strengths and weaknesses. Notably, humans can leverage multiplememory systems concurrently during learning and benefit from all of them. Inthis work, we propose a method called Two-Memory reinforcement learning agent(2M) that combines episodic memory and reinforcement learning to distill bothof their strengths. The 2M agent exploits the speed of the episodic memory partand the optimality and the generalization capacity of the reinforcementlearning part to complement each other. Our experiments demonstrate that the 2Magent is more data efficient and outperforms both pure episodic memory and purereinforcement learning, as well as a state-of-the-art memory-augmented RLagent. Moreover, the proposed approach provides a general framework that can beused to combine any episodic memory agent with other off-policy reinforcementlearning algorithms.", "output": "Two-Memory Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent progress in deep learning, a special form of machine learning, has ledto remarkable capabilities machines can now be endowed with: they can read andunderstand free flowing text, reason and bargain with human counterparts,translate texts between languages, learn how to take decisions to maximizecertain outcomes, etc. Today, machines have revolutionized the detection ofcancer, the prediction of protein structures, the design of drugs, the controlof nuclear fusion reactors etc. Although these capabilities are still in theirinfancy, it seems clear that their continued refinement and application willresult in a technological impact on nearly all social and economic areas ofhuman activity, the likes of which we have not seen before. In this article, Iwill share my view as to how AI will likely impact asset management in generaland I will provide a mental framework that will equip readers with a simplecriterion to assess whether and to what degree a given fund really exploitsdeep learning and whether a large disruption risk from deep learning exist.", "output": "The impact of the AI revolution on asset management."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The transformer is a neural network component that can be used to learnuseful representations of sequences or sets of datapoints. The transformer hasdriven recent advances in natural language processing, computer vision, andspatio-temporal modelling. There are many introductions to transformers, butmost do not contain precise mathematical descriptions of the architecture andthe intuitions behind the design choices are often also missing. Moreover, asresearch takes a winding path, the explanations for the components of thetransformer can be idiosyncratic. In this note we aim for a mathematicallyprecise, intuitive, and clean description of the transformer architecture.", "output": "An Introduction to Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While Feedforward Neural Networks (FNNs) have achieved remarkable success invarious tasks, they are vulnerable to adversarial examples. Several techniqueshave been developed to verify the adversarial robustness of FNNs, but most ofthem focus on robustness verification against the local perturbationneighborhood of a single data point. There is still a large research gap inglobal robustness analysis. The global-robustness verifiable frameworkDeepGlobal has been proposed to identify textit{all} possible AdversarialDangerous Regions (ADRs) of FNNs, not limited to data samples in a test set. Inthis paper, we propose a complete specification and implementation ofDeepGlobal utilizing the SMT solver Z3 for more explicit definition, andpropose several improvements to DeepGlobal for more efficient verification. Toevaluate the effectiveness of our implementation and improvements, we conductextensive experiments on a set of benchmark datasets. Visualization of ourexperiment results shows the validity and effectiveness of the approach.", "output": "Using Z3 for Formal Modeling and Verification of FNN Global Robustness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The use of machine learning (ML) inference for various applications isgrowing drastically. ML inference services engage with users directly,requiring fast and accurate responses. Moreover, these services face dynamicworkloads of requests, imposing changes in their computing resources. Failingto right-size computing resources results in either latency service levelobjectives (SLOs) violations or wasted computing resources. Adapting to dynamicworkloads considering all the pillars of accuracy, latency, and resource costis challenging. In response to these challenges, we propose InfAdapter, whichproactively selects a set of ML model variants with their resource allocationsto meet latency SLO while maximizing an objective function composed of accuracyand cost. InfAdapter decreases SLO violation and costs up to 65% and 33%,respectively, compared to a popular industry autoscaler (Kubernetes VerticalPod Autoscaler).", "output": "Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate the potential of GPT-4~cite{gpt4} to perform NeuralArchitecture Search (NAS) -- the task of designing effective neuralarchitectures. Our proposed approach, textbf{G}PT-4 textbf{E}nhancedtextbf{N}eural archtextbf{I}tecttextbf{U}re textbf{S}earch (GENIUS),leverages the generative capabilities of GPT-4 as a black-box optimiser toquickly navigate the architecture search space, pinpoint promising candidates,and iteratively refine these candidates to improve performance. We assessGENIUS across several benchmarks, comparing it with existing state-of-the-artNAS techniques to illustrate its effectiveness. Rather than targetingstate-of-the-art performance, our objective is to highlight GPT-4's potentialto assist research on a challenging technical problem through a simpleprompting scheme that requires relatively limited domainexpertisefootnote{Code available athref{ broadly, we believe our preliminary results point to future research thatharnesses general purpose language models for diverse optimisation tasks. Wealso highlight important limitations to our study, and note implications for AIsafety.", "output": "Can GPT-4 Perform Neural Architecture Search?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Evaluating the relevance of an exogenous data series is the first step inimproving the prediction capabilities of a forecast algorithm. Inspired byexisting metrics for time series similarity, we introduce a new approach namedFARM - Forward Aligned Relevance Metric. Our forward method relies on anangular measure that compares changes in subsequent data points to aligntime-warped series in an efficient way. The proposed algorithm combines localand global measures to provide a balanced relevance metric. This results inconsidering also partial, intermediate matches as relevant indicators forexogenous data series significance. As a first validation step, we present theapplication of our FARM approach to synthetic but representative signals. Whiledemonstrating the improved capabilities with respect to existing approaches, wealso discuss existing constraints and limitations of our idea.", "output": "Exogenous Data in Forecasting: FARM -- A New Measure for Relevance Evaluation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a new general-purpose algorithm for learning classes of$[0,1]$-valued functions in a generalization of the prediction model, and provea general upper bound on the expected absolute error of this algorithm in termsof a scale-sensitive generalization of the Vapnik dimension proposed by Alon,Ben-David, Cesa-Bianchi and Haussler. We give lower bounds implying that ourupper bounds cannot be improved by more than a constant factor in general. Weapply this result, together with techniques due to Haussler and to Benedek andItai, to obtain new upper bounds on packing numbers in terms of thisscale-sensitive notion of dimension. Using a different technique, we obtain newbounds on packing numbers in terms of Kearns and Schapire's fat-shatteringfunction. We show how to apply both packing bounds to obtain improved generalbounds on the sample complexity of agnostic learning. For each $epsilon > 0$,we establish weaker sufficient and stronger necessary conditions for a class of$[0,1]$-valued functions to be agnostically learnable to within $epsilon$, andto be an $epsilon$-uniform Glivenko-Cantelli class.This is a manuscript that was accepted by JCSS, together with a correction.", "output": "Prediction, Learning, Uniform Convergence, and Scale-sensitive Dimensions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we aim to develop a large language model (LLM) with thereasoning ability on complex graph data. Currently, LLMs have achieved veryimpressive performance on various natural language learning tasks, extensionsof which have also been applied to study the vision tasks with multi-modaldata. However, when it comes to the graph learning tasks, existing LLMs presentvery serious flaws due to their several inherited weaknesses in performing{multi-step logic reasoning}, {precise mathematical calculation} and{perception about the spatial and temporal factors}.To address such challenges, in this paper, we will investigate theprinciples, methodologies and algorithms to empower existing LLMs with graphreasoning ability, which will have tremendous impacts on the current researchof both LLMs and graph learning. Inspired by the latest ChatGPT and Toolformermodels, we propose the Graph-ToolFormer (Graph Reasoning oriented Toolformer)framework to teach LLMs themselves with prompts augmented by ChatGPT to useexternal graph reasoning API tools. Specifically, we will investigate to teachGraph-ToolFormer to handle various graph data reasoning tasks in this paper,including both (1) very basic graph data loading and graph property reasoningtasks, ranging from simple graph order and size to the graph diameter andperiphery, and (2) more advanced reasoning tasks on real-world graph data, suchas bibliographic networks, protein molecules, sequential recommender systems,social networks and knowledge graphs.To demonstrate the effectiveness of Graph-ToolFormer, we conduct somepreliminary experimental studies on various graph reasoning datasets and tasks,and will launch a LLM demo online with various graph reasoning abilities.", "output": "Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The fastMRI brain and knee dataset has enabled significant advances inexploring reconstruction methods for improving speed and image quality forMagnetic Resonance Imaging (MRI) via novel, clinically relevant reconstructionapproaches. In this study, we describe the April 2023 expansion of the fastMRIdataset to include biparametric prostate MRI data acquired on a clinicalpopulation. The dataset consists of raw k-space and reconstructed images forT2-weighted and diffusion-weighted sequences along with slice-level labels thatindicate the presence and grade of prostate cancer. As has been the case withfastMRI, increasing accessibility to raw prostate MRI data will furtherfacilitate research in MR image reconstruction and evaluation with the largergoal of improving the utility of MRI for prostate cancer detection andevaluation. The dataset is available at ", "output": "FastMRI Prostate: A Publicly Available, Biparametric MRI Dataset to Advance Machine Learning for Prostate Cancer Imaging."}] \ No newline at end of file