Join us for a Research Symposium in Computational and Data Science. There will be 3 sessions - the morning session will focus on machine learning, the afternoon on network science, and the evening session will be keynotes and a reception. Leading researchers will present cutting-edge research and discuss new directions in these fields. The event will conclude with keynote lectures and a poster session and reception in the Center for the Arts Atrium, where graduate students from UB and surrounding areas will present a research poster. This is an open event to all attendees. The reception will include refreshments and networking.
Human Computer Partnerships – Advancing Artificial Intelligence
Earlier this year, the University at Buffalo launched the UB Artificial Intelligence Institute to will bring together university, industry, government, and community partners to advance core AI technologies, apply them in ways that optimize human-machine partnerships and provide the complementary tools and skills to understand their societal impact. We are facilitating interaction among faculty across UB who have a vested interested in advancing core AI technologies and cutting edge AI applications, developing new educational initiatives and exploring new ways of interacting with local industry. In this talk, I will overview some of our new and innovative programs and may out the future of AI at UB
Fake News on Twitter During the 2016 U.S. Presidential Election
The spread of fake news on social media became a public concern in the United States after the 2016 presidential election. We examined exposure to and sharing of fake news by registered voters on Twitter and found that engagement with fake news sources was extremely concentrated. Only 1% of individuals accounted for 80% of fake news source exposures, and 0.1% accounted for nearly 80% of fake news sources shared. Individuals most likely to engage with fake news sources were conservative leaning, older, and highly engaged with political news. A cluster of fake news sources shared overlapping audiences on the extreme right, but for people across the political spectrum, most political news exposure still came from mainstream media outlets.
From “Big Data” to “Small Data”: Classification Through Approximate Clustering.
One of the great challenges in computational theory is the extraction of patterns from massive and high-dimensional data sets, e.g., clustering and classification. Geometric clustering is extensively studied with a wide spectrum of applications, including, e.g., image processing, genomics, bioinformatics, and social networks.
In this talk, I will survey a unified approach to the design of efficient clustering and classification algorithms for increasingly ambitious and descriptive forms of data analysis. The typical data object, in both the statistical and algorithmic literature, is a point in geometric space. The suggested approach treats data objects not as points but rather as abstract functions that characterize the cost of associating a given input with a certain cluster center. Using this generalized view, a link between the combinatorial complexity of the function family at hand (measured in terms of classical VC dimension) and the paradigm of coresets, which are a compressed representation of the input set, is forged. A recent case study on outlier-resistant L1-norm principle component analysis will be discussed.
Keynote: Towards Building Non-Polarizing Recommender Systems
An inherent trait of recommendation systems is that they tend to influence their users. Often this influence is unintentional and sometimes causes polarization of the users. Consider a social media agency interested in recommending new articles to its users over multiple days. If the agency tries to simply predict what the user might like and greedily provide recommendations, it might end up polarizing its users. To better illustrate this phenomenon, consider the news agency that provides articles or recommendation about fruits. Say we have a user who initially likes apples and oranges equally and just happens to receive some article about apples and indicates to the system that the she might like apples. The recommender system that learns of this will initially start to recommend with a mild bias, articles about apples and their health benefits. Now subsequent rounds of interactions with this system leaves on the user a strong opinion about apples and the user might start to prefer apples over oranges, all the while the system further would strengthen its belief that the user really prefers apples over oranges. Continuous interaction with such a system leaves this user, who started as a person initially being neutral about apples Vs oranges, into someone who is an apple fanatic. Clearly this was just by happenstance and just as easily, the initial interactions could have swayed towards user liking oranges. The issue of polarization is further worsened by the notion of confirmation bias of users who perceive contents differently based on their prior beliefs on each round which might further speed up the polarization. Additionally, the issue of polarization by recommender systems can be worsened when one considers the fact that the users might be part of a social network and tend to share ideas and opinions. Users are often part of user groups or cliques, and these groups tend to further influence user preferences within the group. Specifically, there is an intrinsic bias for users to follow the herd, so to speak, and users can be more easily convinced to agree with their group's view while disagreeing strongly with others not in the group. Hence, a recommendation system, by making the greedy choice of articles to show to the users, might inadvertently polarize its users intro groups with strongly opposing opinions on issues.
In this talk I will present some of our initial attempts at building theory and algorithm design principles for building machine learning systems that not only aim to predict or recommend with high accuracy but also aim to no further polarize its users. Specifically, we assume that the users of the system are interconnected to each other via a social network and that the machine learning algorithm has access to the structure of this social network. We then build and extend existing mathematical models for formation/evolution of opinions of users based on what’s recommended to them and the interaction with their friends in the social network. Finally we provide an algorithm design principle for building recommendation systems that use the knowledge of the underlying social network to provide recommendations to users that not only aim for high accuracy but simultaneously aim to reduce a natural measure of polarization we propose. We show that under our model of opinion formation dynamics (that subsumes existing model for opinion dynamics) our recommendation algorithm provably has low polarization effect.
The Dynamic Organization of Cities and Its Connection With Urban Indicators
The city is a complex system that evolves through its inherent social and economic interactions.Mediating the movements of people and resources, urban street networks offer a spatial footprint of these activities. Of particular interest is the interplay between street structure and its functional usage. Studying the shape of spatiotemporally optimized travel routes in the 92 most populated cities in the world, reveals a collective morphology that exhibits a directional bias influenced by the attractive (or repulsive) forces resulting from congestion, accessibility, and travel demand. We develop a simple geometric measure, inness, that maps this force field. In particular, cities with common inness patterns cluster together in groups that are correlated with their putative stage of urban development as measured by a series of socio-economic and infrastructural indicators, suggesting a strong connection between urban development, increasing physical connectivity, and diversity of road hierarchies.
Integrating Simulation Modeling, Signal Processing and Machine Learning in the Study of Large Social Systems
The big data about our social systems gathered from the Internet of Things and social media call for new computational tools to study those systems and help people. In this tutorial, I will introduce the discrete event model to specify the complex system dynamics of large social systems in terms of how the individuals in the system interact with one another and how the interactions change their states. I will talk about the variational and sampling-based inference algorithms to track and predict the interactions, and the applications of these algorithms in predicting road traffic, urban socio-economical development, epidemic spreading and network formation. I hope to introduce the audience this framework to bring together modelers and data miners by turning the real world into a living lab.
Information Flow and Prediction Limits In Online Networks
Modern society depends on the flow of information over online social networks, and users of popular platforms generate substantial behavioral data about themselves and their social ties. How useful is this data? What fundamental limits may exist when using these data to infer the activities and interests of an individual, and to what accuracy can such predictions be made using an individual’s social ties? In this talk, I will describe research on the social flow of information, how predictable or ''profile-able'' an individual is via their social ties. We have found that as much 95% of the potential predictive accuracy for an individual is achievable using their social ties only, without requiring that individual’s data. We used information-theoretic tools to estimate the predictive information in the writings of Twitter users, providing an upper bound on the available predictive information that holds for any predictive or machine learning methods. As few as 8–9 of an individual’s contacts are sufficient to obtain predictability compared with that of the individual alone. Distinct temporal and social effects are visible by measuring information flow along social ties, allowing us to study the dynamics of online activity better. That information is so strongly embedded socially underscores the power of the social network: by knowing who the social ties of an individual are and what the activities of those ties are, our results show that one can, in principle, accurately profile even those individuals who are not present in the data. This has distinct privacy implications as social platform providers continue to enrich their understanding of the social network.
Keynote: Inherent Trade-Offs in Algorithmic Fairness
As data science has broadened its scope in recent years, a number of domains have applied computational methods for classification and prediction to evaluate individuals in high-stakes settings. These developments have led to an active line of recent discussion in the public sphere about the consequences of algorithmic prediction for notions of fairness and equity, including competing notions of what it means for such algorithms to be fair to different groups. We consider several of the key fairness conditions that lie at the heart of these debates, and in particular how these properties operate when the goal is to rank-order a set of applicants by some criterion of interest, and then to select the top-ranking applicants.
This talk is based on joint work with Sendhil Mullainathan, Manish Raghavan, and Maithra Raghu.
Keynote: Making Reliable Computational Predictions: Is it Possible?
One of the most challenging and important applications of computational models of physical systems is to make predictions when no observations of the quantities being predicted are available. This is the usual situation when model results are to be used to support decisions (e.g. design or operations decisions) where predictions are needed precisely because observational data are not available when the decision must be made. Predictions, then, are essentially extrapolations of available information to the quantities and scenarios of interest. The challenge is to assess whether such an extrapolation can be made reliably. Computational models of physical systems are typically based on a reliable theoretical foundation (e.g. conservation laws) composed with various more-or-less reliable embedded models (e.g. constitutive relations). This composite model structure can enable reliable predictions provided the less reliable embedded models are used within the domain in which they have been tested against data. In this case, a reliable extrapolation is possible through the reliable theory, whose validity in the context of the prediction is not in doubt. In this lecture, we will explore techniques for assessing the validity of predictions in the context of this composite model structure and see that it is indeed possible to make reliable predictions based on computational models.
Python and Jupyter for Programers
A hands-on introduction to Python and the Jupyter Notebook for those with experience programming in other languages. Participants must bring a laptop with an up-to-date installation of the Anaconda distribution of Python 3.
Requirements: Anaconda distribution of Python 3.7. Installation instructions and general workshop information can be found here.
The "Intro to R" workshop is designed to provide a hands-on, whirlwind tour through some of the capabilities of the R language. We will introduce basic syntax, useful development environments, and the use of packages to extend R's capabilities. We'll tinker with data structures and objects, use several different plotting packages, and perform some regressions. Everything we use will be open-source and you can take this knowledge away with you. Attendees should bring a laptop or tablet to participate in workshop exercises. We will work in RStudio and Jupyter.
An overview of the research computing services of UB's Center for Computational Research, with particular emphasis on the high performance computing resources and associated tools.
Computational science is now dependent on a shared software infrastructure which enables the use of cutting edge hardware, optimal algorithms, and sophisticated data analysis and visualization strategies. In this tutorial, we will enable students to use the Git system for version control (including a useful policy layer), configuration and build tools, and the LaTeX documentation system.
Clustering of High Dimensional Data With Uncertainty Assessment
In this talk, we address challenges in high dimensional clustering from three perspectives. First, we propose the Hidden Markov model on Variable Blocks (HMM-VB) as a density estimate for high dimensional data. A new mode search algorithm called Modal Baum-Welch (MBW) has been derived for mode-association clustering. Second, we have developed an ensemble clustering algorithm to enhance clustering by any baseline method using bootstrap samples. Third, we have developed novel measures to quantify variation/uncertainty in the clustering result at the levels of both overall partitions and individual clusters. In scientific data analysis, clusters identified computationally often substantiate existing hypotheses or motivate new ones. Yet the combinatorial nature of the clustering result, which is a partition rather than a set of parameters or a function, blurs notions of mean and variance. This intrinsic difficulty hinders the development of methods to improve clustering by aggregation or to assess the uncertainty of clusters generated. We overcome that barrier by aligning clusters via optimal transport. Equipped with this technique, we construct a unified framework for aggregating multiple clustering results and for assessing uncertainty in clustering. Applications to multiple research areas will be presented.
Integrating the R Package Mrgsolve With Available Optimization Routines for Parameter Estimation With PK, PK-PD and QSP Models
After a brief introduction to mrgsolve (https://mrgsolve.github.io), we will discuss concepts and applications for using the package in R to estimate parameters given a model and data. We will establish a basic framework for running optimization in R and work hands-on examples using different optimizers, including local and global search algorithms. Building on this framework, we will also illustrate related workflows including local sensitivity analysis and likelihood profiling. Finally, a simulation-based workflow for generating virtual populations (parameters) according to the method of Allen et al. (https://doi.org/10.1002/psp4.12063) will be demonstrated.
Business Intelligence with Power BI
As data collection has increased exponentially, so has the need for people skilled at using and interacting with data. Some of the skills needed to interact with data include being able to communicate with data visualizations. Power BI is quickly gaining popularity among professionals in data science as a cloud-based service that helps them easily visualize and share insights from their organizations’ data.
This course is designed to give an overview of the fundamentals of Business Intelligence and working with data visualizations. This course starts with the basics and assumes no prior knowledge or skills in Power BI or data Analysis. In addition to setting this foundation, this workshop will also discuss methods to import and transform data, explore data, create dashboards and use custom visualizations.
Requirements: Power BI Desktop installation (instructions)
Analytical functions in SQL can take you beyond traditional approaches of querying data to business intelligence. They make complex post-query processing simpler by using clearer and more concise code. Other advantages of the analytical SQL functions include improved query speed, manageability of the code and minimized learning effort. A basic understanding and hands on knowledge of SQL queries is a pre-requisite to learning SQL Analytical functions.
Some of the most widely used analytical functions in SQL would include the following:
Apart from learning the definition of the above functions and where these can be used, it is equally important to use them in practice to understand their application.
Prerequisite: Knowledge of basic SQL including joins, subqueries
Requirements:
Join us for TED-Style Talks focused on Innovation in Data Science Across Industry. There will be four, twenty-minute talks by exemplary leaders in the industry, followed by a one hour Q&A session, and ending with a reception in the Bansal Atrium of Davis Hall.
Introduction to Blockchain and Decentralized Application Development
This workshop will introduce blockchain emerging technology and decentralized application development. It will consist of two conceptual components and one hands-on project, then conclude with some Q&A on applications development.
Bioconductor for Everyone: Exploring, Analyzing and Visualizing Large Data Sets in The R Programming Language
We'll take a fast-paced tour through R and the software project I work on, Bioconductor (https://bioconductor.org), learning how to explore and visualize large cancer-related data sets. We'll work through two particular analyses. In the process, we'll learn some pretty significant new R skills. For instance, we will learn about formal classes for representing complex data, strategies for iteration and parallel processing, and accessing 'remote' resources accessible through web-based interfaces. This workshop should be interesting to people who know a bit of R, and want to learn more!
Requirements: Students should come with a laptop able to access the UB internet; we'll work through examples using RStudio running in the cloud and accessible through your browser.
Curious Case of Wind Turbines: Life, Death and Life Extension
(Author: Vijayant Kumar, PhD; Vice President – Data Science & Engineering, Sentient Science)
Wind turbines are a prudent solution to mitigate climate change and improve global low-cost energy availability: increased component failures (aka poor reliability) in aging wind turbines adversely affects that goal.
For the betterment of us and our children's future, it makes sense that wind turbines fail less and last longer - we first help the audience understand why do turbines fail and why is it so hard to predict why and when they would fail?
This presentation will take you through this journey of how wind turbines relate to climate change, why it is a big deal when wind turbines fail prematurely, why is it so hard to predict why and when they fail and what it takes to work on a multi-faceted problem like wind turbine life extension. We will show with humorous effect how bad things happen when the fledgling yet boastful "industrial AI" touches the physical world of wind turbines and why it needs help from his older cousin "science" to make an impact. And last but not the least, why despite all these complex modeling techniques and data, we still need a human to make an impact whenever AI touches the physical world!
And the end result after all this - a platform which we have created; a platform that combines materials science and data science to optimize wind turbine reliability and enable life extension; a platform which currently is deployed on 20000 (and growing) wind turbines globally!
Using Social Data Science to Understand What Really Drives Healthcare Outcomes
At Socially Determined, we are fusing clinical and claims data with outside contextual data to build holistic models of people, the communities they live in, and their interactions with the healthcare system. Our analytics go well beyond gathering data about people, as we seek to understand the most important questions in healthcare:
UB Symposium on Job and Career Perspectives in the Computational Sciences
The symposium will address the questions of students who work (or plan to work) in the computational research groups at UB regarding their career prospects once they conclude their university education. Every year, five speakers are invited to this full-day event, and their combined experience has so far covered jobs at BASF, BigDataBio, Bosch, Cryos Net Consultancy, Dow Chemical, Eastman Kodak, HGST/Western Digital, IBM, Kitware, NASA, Nexight Group, NIH, NIST, NREL, Pfizer, PNNL, Q-Chem, Schlumberger/OneSubsea, and Wiley. Speakers have typically conducted computational research during graduate school (including at UB) and then transitioned into employment outside the university domain. They share their experience of finding jobs coming from a computational background, their move from academia to their new environment, and their thoughts on the computational R&D landscape in the industry. They report on the job situation in their chosen profession, give insights into the nature of their work, and talk about opportunities with their specific employers. Each presentation is followed by a Q&A session and we hold a joint panel discussion with all speakers at the end of the symposium. Students have the chance for personal conversations with the speakers during a light lunch.