In recent years, we have seen an increasing number of data sources available for academic research from both the public and private sectors. We also see a rise in using machine learning, natural language processing, and other cutting-edge methods in management research. Consequently, a shared understanding is required for novel and more rigorous use of big data and data analysis methods, especially for the studies of causal inferences in management research. However, in the big data era, it is not an easy task to construct the appropriate sample (Marx & Hsu 2021; Arora, Belenzon & Sheer 2021; Furman & Teodoridis 2020), select the proper models (Starr & Goldfarb 2020; King, Goldfarb & Simcoe 2021), and use machine learning toolkits to generate research insights (Choudhury, Wang, Carlson & Khanna, 2019; Choudhury, Allen & Endres 2021). Therefore, in this symposium, we would like to bring together a group of prominent scholars who are pushing the frontiers of quantitative analysis in the fields of strategy, innovation and entrepreneurship. We will discuss methodological novelty and rigorousness in two back-to-back panels. The first panel will discuss the opportunities in applying machine learning methods, and in the second panel, we will cover rigorousness in applying conventional methods
During the first session, we will discuss the use of machine learning methods to leverage big data, create variables, and identify patterns in strategy, entrepreneurship and innovation research. Specifically, Professor David Hsu will share his insights on constructing datasets in entrepreneurship and innovation to train models and establish the "ground truth". Professor Prithwiraj Choudhury will discuss how to use machine learning and textual methods to measure the novelty of work products. Professor Florenta Teodoridis will discuss how topic modeling can shed light on innovation and strategy research by mapping the knowledge landscape as captured in relevant text documents. Overall, the panel discussions will be organized around the novel use of machine learning methods to advance the frontiers of management research.
In the second session, we will talk about the novel processes and practices that scholars can employ to ensure robustness and rigorousness in their results. We first begin by discussing the importance of data sources as the first step in the qualitative analysis process. Professor Matt Marx will discuss the importance of public data sources in promoting transparency, and reproducibility of the results, while talking about the challenges young scholars could face using these datasets. Following the discussion on the data source, we continue the panel by talking about identification strategy and the assumptions associated with our analysis, and the recent advancements in assessing the results when these assumptions are violated. We also will hear from Professor Goldfarb how the historical methods complement the statistical analysis of archival data to generate explanations closer to the truth.
Panelists. We are honored to have experts using machine learning in their research be our panelists in Session 1: Opportunities in applying machine learning methods. We are honored to have experts who have rich experiences in identifying, collecting and processing large-scale datasets to share their opinion and methodologies in Session 2. Below please find the list of panelists in alphabetic order.
Summary of Discussion (in the order of presence)
The Wharton School of Business
University of Pennsylvania
Professor Hsu will discuss how to use datasets in entrepreneurship and innovation (E&I) to train models and establish the "ground truth". Establishing the "ground truth" is a difficult issue in any area, and it is particularly tricky in E&I research. To illustrate one solution to this issue, Professor Hsu will present his work on valuing patents based on text using deep learning models. Relatedly, he will also discuss how to go from very big data to smaller data (yet still statistically meaningful) in order to better understand and exploit the institutional features of the context. Specifically, he will illustrate this method with a dataset covering the early-stage technical and managerial labor market (AngelList Talent data on job postings and applicants). He will then give the example of focusing on one sector (cryptocurrency) and changes in that sector to better understand labor market dynamics.
Prithwiraj (Raj) Choudhury
Harvard Business School
Professor Choudhury will share the use of machine learning and textual methods to study how patterns of hybrid work correlate with the novelty of work products generated by workers. In particular, he and his coauthors use the MD5 hashing method and the Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH) method to code the novelty of email attachments for workers from a hybrid work randomized controlled trial.
Marshall School of Business
University of Southern California
Professor Teodoridis will discuss how to capture the far-reaching benefits of machine learning (ML) in innovation research, given that ML algorithms are focused on prediction rather than inference. For example, topic modeling can be employed not only to identify similarities but also to map the knowledge landscape as captured in relevant text documents. Mapping the knowledge space enables developing measures of distance and movement, attributes that would benefit research on the role of heterogeneous firm knowledge in achieving competitive advantage, the impact of knowledge distance on the ex-ante probability of alliances or acquisitions, the innovation trajectories of competitor firms, and the heterogeneous innovation responses to competitive events.
Robert H. Smith School of Business
University of Maryland
In this symposium, Professor Goldfarb talks about the historical methods -- the analysis and representation of the past through the interpretation of records -- complement the statistical analysis of archival data to generate explanations closer to the truth. Historical methods achieve this goal by providing the understanding needed to balance an explanation's consistency with statistical patterns and its consistency with contextual detail. Further, we suggest that to infer the best explanation using archival data, as scholars predominantly do in Strategy and Entrepreneurship, scholars need to reconcile incommensurable and sometimes contradictory explanatory virtues. This process inevitably requires an act of judgment on non-quantifiable information about the past. Historical methods facilitate this judgment of explanations by providing strategy scholars a pathway to assess the relative merits of competing virtues.
Johnson College of Business
Scholars can promote transparency, reproducibility, and cumulative research by a broader set of participants when they employ open datasets rather than proprietary ones. In this talk, Professor Marx will survey key open datasets available in the areas of academic science, innovation, entrepreneurship, and linkages between these, including the Microsoft Academic Graph (and its successor, OpenAlex) vs. proprietary alternatives for academic alternatives; paid vs. free sources of full-text patent data; Reliance on Science linkages from patents to scientific articles; mappings of patent assignees to public firms (DISCERN vs. KPSS vs. UVA) and also to startups; and an upcoming public database of patent-paper pairs.
In this panel, Professor Starr will discuss recent advancements in how to assess the plausibility of estimated effects when specific identifying assumptions are violated. For instance, Cinelli and Hazlett (2020) derive a variety of novel techniques to address the extent to which the selection of unobservables might drive an effect of interest. Conley et al. (2012) also derive tools to understand how sensitive IV estimates are to violations of the exclusion restriction, while Rambachan and Roth (2020) consider the sensitivity of diff-in-diff estimates to violations of parallel trend assumptions. While some of this work may have been in the literature for a while, they are not utilized in the strategy research with any regularity. As a result of this discussion, scholars are encouraged to think carefully about the identifying assumptions to rely on in their analysis, and how robust their results are to violations of those assumptions.
Arora, A., Belenzon, S., & Sheer, L. 2021. Matching patents to compustat firms, 1980–2015: Dynamic reassignment, name changes, and ownership structures. Research Policy, 50(5): 104217.
Choudhury, P., Allen, R. T., & Endres, M. G. 2021. Machine learning for pattern discovery in management research. Strategic Management Journal, 42(1): 30–57.
Choudhury, P., Wang, D., Carlson, N. A., & Khanna, T. 2019. Machine learning approaches to facial and text analysis: Discovering CEO oral communication styles. Strategic Management Journal, 40(11): 1705–1732.
Furman, J. L., & Teodoridis, F. 2020. Machine learning could improve innovation policy. Nature Machine Intelligence, 2(2): 84.
King, A., Goldfarb, B., & Simcoe, T. 2021. Learning from testimony on quantitative research in management. Academy of Management Review, 46(3): 465–488.
Marx, M., & Hsu, D. H. 2021. Revisiting the Entrepreneurial Commercialization of Academic Science: Evidence from "Twin" Discoveries. Management Science.
Starr, E., & Goldfarb, B. 2020. Binned scatterplots: A simple tool to make research easier and better. Strategic Management Journal, 41(12): 2261–2274.