Hierarchical and Matrix Structures in a Large Organizational Email Network: Visualization and Modeling Approaches

This paper presents findings from a study of the email network of a large scientific research organization, focusing on methods for visualizing and modeling organizational hierarchies within large, complex network datasets. In the first part of the p…

Authors: Benjamin H. Sims, Nikolai Sinitsyn, Stephan J. Eidenbenz

Hierarchical and Matrix Structures in a Large Organizational Email   Network: Visualization and Modeling Approaches
Hierarchical and Matrix Structures in a Lar ge Or ganizational Email Netw ork: V isualization and Modeling Approaches Benjamin H. Sims Nikolai Sinitsyn Stephan J. Eidenbenz Los Alamos National Laboratory Los Alamos, Ne w Mexico 87545 Abstract This paper presents findings from a study of the email network of a large scientific research organization, focusing on methods for visualizing and modeling organizational hierarchies within lar ge, complex network datasets. In the first part of the paper , we find that visualization and interpretation of complex organizational network data is f acilitated by inte gration of network data with information on formal or ganizational di visions and lev els. By ag- gregating and visualizing email traffic between or ganizational units at various lev els, we deriv e sev eral insights into how lar ge subdivisions of the or gani- zation interact with each other and with outside organizations. Our analysis shows that line and program management interactions in this org anization systematically de viate from the idealized pattern of interaction prescribed by “matrix management. ” In the second part of the paper , we propose a power law model for predicting degree distribution of or ganizational email traf fic based on hierarchical relationships between managers and employees. This model considers the influence of global email announcements sent from man- agers to all employees under their supervision, and the role support staf f play in generating email traffic, acting as agents for managers. W e also analyze patterns in email traffic v olume o ver the course of a work week. 1 Intr oduction In this paper , we present results of our analysis of a large organizational email dataset, comprising nearly complete email traf fic records for Los Alamos National Laboratory (LANL) over a period of se veral months. 1 V ery few organizational 1 This document is an e xtended version of [1] as submitted to Lectur e Notes in Social Networks (Springer). 1 communication networks of this scale have been analyzed in the literature. An- alyzing such large email datasets from complex organizations poses a number of challenges. First, considerable work is required to parse large quantities of raw data from network logs and con vert it into a format suitable for network analysis and visualization. Second, a great deal of care is required to analyze and visualize net- work data in a way that makes sense of complex formal or ganizational structures - in our case, 456 or ganizational units that are connected through div erse organi- zational hierarchies and management chains. Finally , it can be difficult to sort out the ef fects of email traf fic generated by mass announcements and communications along management chains from the more chaotic, less hierarchical traffic generated by e veryday interactions among colleagues. This paper addresses these complexities in two ways. First, we demonstrate methods for understanding large-scale structural relationships between organiza- tional units by using carefully thought-out visualization strategies and basic graph statistics. Second, we propose a power law model for predicting the degree distri- bution of email traffic for nodes of large degree that engage in mass emails along hierarchical lines of communication. This likely characterizes a significant portion of email traffic from managers (and their agents) to employees under their super- vision. 2 Analysis of Organizational Structur e While many analysts hav e examined ways of extracting structural features from corporate email exchange networks, they hav e typically focused at the lev el of email exchanges between individuals (albeit sometimes large numbers of indi- viduals), bringing little or no information about formal org anizational structures into their analysis [2, 3, 4]. Aggregating relationships based on formal organiza- tional structures of fers another important le vel of insight, which can be particularly useful for managers and analysts interested in interactions among business units, capabilities, or functions rather than indi viduals. Automatically collected email data has significant advantages for capturing interactions among organizational units: although email does not capture all relev ant interactions, it provides com- prehensi ve coverage across the entire org anization without the overhead in volv ed in large-scale surve y-based studies. In order to locate individuals within organiza- tional structures, we used organizational telephone directory data to associate email addresses with lo w-le vel or ganizational units, and information from organization charts to generate mappings of these units to higher-le v el ones. 2 2.1 Structural relationships between elements of the or ganization Our analysis of structural relationships within LANL focuses on two broad, cross- cutting distinctions: program vs. line organizations, and technical research and de velopment functions vs. operations functions (safety , physical plant, etc.) LANL is a hybrid matrix management organization. In a fully matrixed or- ganization, each employee has two managers: a line manager and a program or project manager (Fig. 1A). The emplo yee is assigned to a line management unit based on their skill set and capabilities. For example, a computer scientist might be assigned to a Computational Modeling group, or an engineer to a Structural En- gineering group. Line management plays little or no role in guiding the day-to-day work of employees, howe v er . Instead, the employee is assigned to work on one or more projects, each of which is supervised by a program or project manager . A project is generally directed to ward a specific product or deli verable, such de- sign of a particular model of aircraft or completion of a particular research task. The day-to-day work of the employee tow ard these particular goals is directed by the program or project manager . Both line and program managers usually report, through some management chain, to upper level general managers. The idealized communication pattern that results is one in which program and line managers communicate primarily vertically , interacting with both upper management and employees (Fig. 1B). In order to keep things running smoothly , howe ver , program and line managers must also periodically communicate laterally , to ensure a good fit between capabilities and projects. The matrix management model became popular in the aerospace industry with the rise of program management in the 1950s, and was in part influenced by the org anizational structure of the Manhattan Project, [5] in which Los Alamos played a major role. At LANL today , line and program organizations play less distinct roles. The base-le vel line units that house most employees are called groups, which may be built around programs or capabilities. In our analysis, we dra w a distinc- tion between groups and higher-le vel line management org anizations, which aren’t directly inv olv ed in technical or operations work. Program organizations play a va- riety of coordinating roles among groups, management, and outside or ganizations, and sometimes conduct technical or operations work as well. Despite this flexible definition, our analysis re veals that technical program or ganizations occupy a v ery well-defined structural space within the org anization as a whole. Fig. 2 shows email traffic between all organizational units at LANL over a period of 25 days, laid out using a force-vector algorithm. The units are colored according to the classification described abo ve, and their sizes represent between- ness centrality . There are some visible patterns in this layout. First, a number of operations groups hav e the highest centrality , probably because they provide ser - 3 PM1 PM2 E1 E2 E3 E4 E5 E6 E7 LM1 LM2 LM3 UM Figure 1: A) Schematic representation of a typical organizational chart for a fully matrixed organization. Each employee reports to one line and one program man- ager , and line and program managers independently report to upper management. B) The idealized communication pattern that results from A. Dotted line indicates less frequent communication. C) The actual communication pattern at LANL, re- vealed through analysis of email data. (UM = upper management, PM = pro- gram/project management, LM = line management, E = employee.) 4 Figure 2: Email traf fic between organizational units at LANL, using a force-vector layout. Node size represents betweenness centrality . Edge color is a mix of the colors of the connected nodes. Although indi vidual edges are dif ficult to discern at this scale, the ov erall color field reflects the type of units that are most connected in a gi ven re gion. vices to most of the other or ganizational units at the laboratory . Ranking the nodes by betweenness centrality confirms this: 17 of the top 20 nodes are operations orga- nizations. In addition, operations units and technical units occupy distinct portions of the graph; this indicates that there is generally more interaction within these cat- egories than between them. The highly central operations groups appear to play a bridging role between the tw o categories. Administration units appear to be some- what more closely associated with technical units than operations units, although this is dif ficult to state with certainty . Some of the ambiguities in interpretation can be clarified by grouping all units in a given category into a single node, resulting in the 7-node graph shown in Fig. 3. This view , which uses a simple circular layout, re veals that there is a large amount of email traf fic (in both directions) on the technical side of the organization 5 Figure 3: Email traf fic between organization types at LANL. Node diameter repre- sents total de gree (i.e. total number of incoming and outgoing emails) of the node; edge width represents email volume in the direction indicated. along the path Administration - Management - Program - Group, and relati vely lit- tle traf fic between these entities along any other path. The operations side of the org anization does not display this pattern, indicating that relationships between groups, programs, and management are more fluid there. This suggests that techni- cal program organizations at LANL, rather than representing an independent chain of command (as in a true matrix organization) ha v e instead ev olved to play an inter- mediary role between technical groups and technical management. The structure of this relationship at LANL is depicted in Fig. 1C. Another way of understanding the roles different types of organizational units play is in terms of their relationships with outside entities. Fig. 4 plots the num- ber of emails each type of organization sends and recei ves to/from commercial vs. non-commercial domains. This indicates that all types of operational units com- municate significantly more with commercial entities, which is probably dri v en by relationships with suppliers and contractors. T echnical groups, technical manage- ment, and administration communicate about equally with commercial and non- commercial domains. The outlier here is technical programs, which are much more highly connected to non-commercial domains, particularly .gov addresses. This further expands on the role of technical programs, suggesting that they are a nexus for coordination of technical work both internally , among line management org anizations, and externally , between LANL and outside funding agencies. This 6 Figure 4: T otal emails to/from commercial (.com, .net, .info) vs. non-commercial (.gov , .edu, .mil, etc.) domains, by organization type. is a potentially important finding, with implications for how program organizations should be supported and managed. 2.2 Structural relationships within or ganizational units Email network maps can also be used to visualize relations among members of an org anizational unit. Figures 5 and 6 show email networks that were obtained from email exchange records among the members of two LANL groups ov er a period of two weeks. W e intentionally chose groups that do similar work (theoretical research). In the smaller group in Fig. 5, the two nodes with highest betweenness centrality are group managers, and the third is technical support staf f. Thus, the group has a relatively unified hierarchical structure with management and support staf f at the center . In the larger group, managers were still among the most central nodes, but many other nodes had similar betweenness centrality (Fig 6). These include administrativ e assistants, seminar organizers, and se veral project leaders. This indicates a flatter , less centralized or ganizational structure. Application of a community detection algorithm to this graph re veals two main communities. As it happens, this group was created recently by merging two previously existing groups, and the detected communities generally correspond to those groups. 7 Figure 5: Email network for 2 week period in smaller group. Size of a node is proportional to logarithm of its betweenness centrality . Nodes with different col- ors correspond to different communities that were identified by application of the Girv an-Newman algorithm to the group’ s email network [6, 7]. Link widths are proportional to the logarithm of the number of emails e xchanged along these links. The netw ork was visualized by assigning repulsion forces among nodes and spring constants proportional to the link weights, and then finding an equilibrium state. Figure 6: Email network for 2 week period in lar ger group. 8 3 Node connectivity distribution as a function of organi- zational hierar chy Se veral network types, including biological metabolic networks [8], the W orld W ide W eb, and actor networks [9], are conjectured to have po wer law distribu- tions of node connectivity . In the case of metabolic networks, the interpretation of scale free behavior is complicated by the lack of complete kno wledge and rel- ati vely small sizes ( ∼ 10 3 nodes) of such networks, while the mechanisms of self- similarity in many large social networks are still the subject of debate. Howe ver , org anizational hierarchy has been sho wn to generate degree distributions for con- tacts between indi viduals that follow po wer la ws [10]. Managers prefer to use email to communicate with subordinates in many dif- ferent communication conte xts [11]. W e propose that node connecti vity patterns in the email networks of large, formal organizations are dri v en, in part, by man- agement hierarch y and specific patterns of email use by managers, in particular the mass broadcast of email announcements. Based on this observation, we dev elop a scale-free behavioral model that takes into account features specific to email com- munications in organizations. In this model, the self-similarity of the connectivity distribution of the email network is a consequence of the static self-similarity of the management structure, rather than resulting from a dynamic process, such as preferential attachment [12] or optimization strate gies [13]. More specifically , self- similarity is due to the ability of a manager to continuously and directly commu- nicate only with a relati vely small number of people, while communications with other employees ha ve to be con veyed in the form of broad announcements. Suppose that the top manager in an organization sends emails to all employees from time to time. This manager must correspond to the node in the email network that has highest connecti vity N . Suppose that the top manager also talks directly (in person) to l managers that are only one step lower in the director’ s hierarchy (let’ s call them 1st le vel managers). Each of those 1st le vel managers, presumably , control their own subdivisions in the organization. Assuming roughly equal spans of managerial control, we can e xpect that, typically , one 1st lev el manager sends emails to N /l people. In reality , each manager also has a support team, such as assistants, administrators, technicians, etc. who also may send announcements to the whole subdi vision. Let us introduce a coefficient a which says ho w man y support team employees are inv olv ed in sending global email announcements in the division on the same scale as their manager . W e can then conclude that at the 1st lev el from the top there are al persons who send emails to N /l employees at a lo wer le v el. Each 1st lev el manager controls l 2nd le vel ones and we can iterate our argu- 9 ments, leading to the conclusion that there should be ( al ) 2 managers on the 2nd le vel who should be connected to N / ( l 2 ) people in their corresponding subdi vi- sions. Continuing these arguments to the lower lev els of the hierarchy , we find that, at a gi ven lev el x , there should be ( al ) x managers (or their proxies) who write email announcements to N / ( l x ) people in their subdi vision. Consider a plot that shows the number of nodes n vs. the weight of those nodes, i.e. their outdegree w . Considering previous arguments, we find that the weight w = N / ( l x ) should correspond to n = ( al ) x nodes. Excluding the v ariable x , we find log( n ) = log( al ) log( l ) (log( N ) − log ( w )) , (1) where log is the natural logarithm. Eq. (1) shows that the distribution of connectivity , n ( w ) , in a hierarchical orga- nizational email network should generally be a power law with e xponent log( al ) log( l ) > 1 . Obviously , at some level x , this hierarchy should terminate around the point at which ( al ) x = N / ( l x ) , because the number of managers should not normally exceed the number of employees. Hence the po wer law (1) is expected to hold only for nodes with heavy weights, e.g. n > 50 , i.e. for nodes that send announcement- like one-to-many communications, and at lo wer n this model predicts a transition to some different pattern of degree distribution. At this level, it is likely that non- hierarchical communication patterns begin to dominate in an y case. In order to compare this model to actual network data, we analyzed the statis- tics of node connectivity in email records at LANL during a two-week time interval (Fig. 7). W e removed nodes not in the domain lanl .g ov and cleaned the database of v arious automatically generated messages, such as bouncing emails that do not find their target domain. Howe ver , we kept domains that do not correspond to specific employees, such as emails sent from software support services. Our remaining net- work consisted of N ≈ 32000 nodes, which is still about three times the number of employees at LANL. This is partially attributed to the fact that we did not exclude domains that are not attached to specific people, and also the fact that a signifi- cant fraction of employees hav e more than one email address for various practical reasons. Numerical analysis, in principle, should allow us to obtain information about parameters l , x and a , from which one can make some very coarse-grained conclu- sions about the structure of the organization. Such an analysis should, of course, always be applied with a certain de gree of skepticism due to potential issues with data quality , the simplicity of the model, and log arithmic dependence of the po wer law on some of these parameters [14]. W e found that our data for w > 40 could be well fitted by log( n ) ≈ 14 . 0 − 2 . 47log ( w ) (Fig. 8). If, e.g., we assume l = 4 , then 10 Figure 7: LogLog plot of the distribution of the number of nodes n having the number of out-going links w . Figure 8: Zoom of Figure 7 for w > 40 . Red line is a linear fit corresponding to log( n ) ≈ 14 . 0 − 2 . 47log ( w ) . 11 a ≈ 7 , i.e. each manager has the support of typically a − 1 = 6 people, who help her post various announcements to her domain of control. The power law should terminate at the lev el of hierarchy x giv en by ( al ) x = N / ( l x ) , which corresponds to x ≈ 3 , i.e. the email network data suggest that there are typically x = 3 man- agers of dif ferent ranks between the w orking employee and the top manager of the org anization. The typical number of email domains to which the lo west rank man- ager sends announcements is w min ≈ N /l x ≈ 48 . This should also be the de gree of the nodes at which the po wer law (1) should be no longer justified. Indeed, we find the breakdo wn of the po wer law (1) at w < 40 . This estimate also predicts that a typical working employee recei ves emails from ( x + 1) a = 28 managers or their support teams. Comparing these results to the actual organizational structure of the or ganiza- tion is very difficult due to the large excess of email addresses over the number of actual emplo yees, and the lack of empirical data on many of the model parame- ters. Keeping in mind these difficulties, the estimated model parameters seem to be generally consistent with the actual organizational structure. In reality , LANL has 5 possible layers of line management between an employee and the laboratory director , b ut this is complicated by the facts that the lowest layer is often not used, and some employees work for organizations that report directly to a higher-le vel manager . So the estimate of x ≈ 3 gi v en abo ve might be consistent with the actual org anization structure. The av erage group size at LANL is difficult to determine quantitati vely from av ailable data, but appears to be generally in the 20-40 per- son range, which is somewhat lower than the number of domains (48) to which the lowest-le v el manager sends emails based on model estimates. Again, although these results might suggest possible conclusions about the accuracy of the model, we do not currently ha ve data of suf ficient quality to make a rigorous comparison between model estimates and real-world or ganizational structure in this case. 4 Email traffic in r eal time Fig. 9 shows total email traffic and number of addresses sending email o ver one week with a one minute resolution. W orking days hav e a bi-modal distribution with heaviest acti vity at the be ginning and end of the day . The lower le vel of activity on Friday is related to an alternati ve work schedule that most LANL employees follo w . This schedule enables emplo yees to take ev ery other Friday off in exchange for working longer hours Monday-Thursday . As a consequence, only slightly more than 50% of the workforce is at work on a given Friday . This is directly reflected in the amount of email traf fic on Fridays. 12 Figure 9: The number of emails sent per minute (top) and number of addresses sending email per minute (bottom) ov er a one week time interv al. 13 5 Conclusion V isualizing and modeling email traffic in complex organizations remains a chal- lenging problem. V isualizing email data in terms of formal org anizational units reduces complexity and provides results that are more intelligible to organiza- tion members and analysts interested in understanding or ganizational structure at a macro lev el. For predicting the degree distribution of high-degree nodes in an orga- nization, we find that it is useful to take into account both organizational hierarchy and email-specific behavior (in particular , the use of mass emails within line man- agement chains). These findings suggest that considering information about formal org anizational structures alongside email network data can pro vide significant ne w insights into the functioning of large, comple x org anizations. Refer ences [1] Sims, B.H., Sinitsyn, N., Eidenbenz, S.J.: V isualization and Modeling of Structural Features of a Large Organizational Email Network. In: Proceed- ings of the 2013 IEEE/A CM International Conference on Advances in So- cial Networks Analysis and Mining, pp. 787-791. A CM, New Y ork (2013) [2] J. Diesner, T . L. Frantz, and K. M. Carley , “Communication networks from the Enron email corpus ‘It’ s alw ays about the people. Enron is no differ - ent. ’ ” Computational and Mathematical Organizational Theory , vol. 11, no. 3, pp. 201-228, Oct. 2005. [3] A. Chapanond, M. S. Krishnamoorthy , and B. Y ener , “Graph theoretic and spectral analysis of Enron email data, ” Computational and Mathematical Or ganizational Theory , vol. 11, no. 3, pp. 265-281, Oct. 2005. [4] T . Karagiannis and M. V ojnovic. (2008, May). Email Infor - mation Flow in Lar ge-Scale Enterprises [Online]. A vailable: http://research.microsoft.com/pubs/70586/tr-2008-76.pdf. [5] G. E. Bugos, “Programming the American aerospace industry , 1954-1964: The business structures of technical transactions, ” Business and Economic History , vol. 22, no. 1, pp. 210-222, F all 1993. [6] D. L. Hansen, B. Shneiderman, and M. A. Smith, Analyzing Social Media Networks with NodeXL: Insights fr om a Connected W orld . Burlington, MA: Else vier , 2011. 14 [7] M. Girv an and M. E. J. Ne wman, “Community structure in social and bio- logical networks, ” PNAS , v ol. 99, no. 2, pp. 7821-7826, Apr . 2002. [8] E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.-L. Barabasi, “Hierarchical Organization of Modularity in Metabolic Networks, ” Science vol. 297 no. 5586 pp. 1551-1555, Aug. 2002. [9] E. Ra vasz and A.-L. Barabasi, “Hierarchical or ganization in complex net- works, ” Phys. Rev . E , v ol. 67, no. 2, 026112, Feb . 2003. [10] A.-L. Barabasi, E. Rav asza, and T . V icsek, “Deterministic scale-free net- works, ” Physica A , v ol. 299, no. 3-4, pp. 559-564, Oct. 2001. [11] M. L. Markus, “Electronic Mail as the Medium of Managerial Choice, ” Or- ganization Science , v ol. 5, no. 4, pp. 502-527. [12] M. Mitzenmacher, “ A brief history of generativ e models for power -la w and lognormal distributions, ” Internet Mathematics , v ol. 1, p. 226, 2004. [13] F . Papadopoulos, M. Kitsak, M. A. Serrano, M. Boguna, and D. Krioukov , “Popularity v ersus similarity in growing networks, ” Natur e vol. 489, p. 537, 2012. [14] A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power -law distributions in empirical data, ” SIAM Re view , v ol. 51, no. 4, pp. 661-703, 2009. 15

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment