Are We Related? The Cradle of Industries - A Short Summary of the Data and Findings

Are We Related? The Cradle of Industries - A Short Summary of the Data and Findings

Introduction

Why are industries located where they are? How did these industries develop? What effect did a developing industry have on companies that were already present in the region? Why did some companies choose to settle in a region while other companies choose to exit a region? And what part did and do people who work at these industries play in the macro-economic process of industry evolution?

Industries consist of companies, companies provide work to employees, and these employees are hired based on skills suitable for the occupation. Top-down, from macro to micro-level, this point of view allows us to analyze existing industries, the companies and the employees of these companies, in search for commonalities and discrepancies that may provide answers as to how industries evolve. By discovering these commonalities and discrepancies key ingredients to industry evolution may be found. With knowledge of such key ingredients effective policy measures can then be made across regions and countries to initiate, support and propel the development of industry to the next level. Scholars are in search for such knowledge to advance knowledge itself, while policymakers could use this knowledge to advance the very livelihood of an entire country. Given the impact of this golden nugget of knowledge called industry evolution there has been a wide array of research that cover or touch upon this subject. In this paper we will focus on three concepts that touch upon the subject of industry evolution and how relatedness plays a central role in all three concepts. The three concepts are: path dependence, related diversification and agglomeration externalities. Next to reviewing past research on these concepts, we will also elaborate on our choice of focus for these three concepts and how, by linking these three concepts together a relatedness-based theoretical framework for industry evolution can be formed. In the section theory and hypotheses we will also present hypotheses derived from this theoretical framework, to be tested against US Census data (1850-2000) in the section data and methods.

Before we continue with our theoretical framework we will first provide a short overview of the U.S. history to set the background for the focus of our study: the regional development of industries in the United States of America.

U.S. history overview
Social and economic events are shaped and gain meaning only when they are matched with the history of a nation. In the next overview we will attempt to draw briefly the main historic events that took place in the United States in order to become the most developed and industrialized nation in the world.
At the very beginning North America didn’t have many differences from any other new land discovered. The Spanish, French and English conquerors had the same brutal methods than anywhere else. Around 1600 the difference became distinct, the colonialist and pioneers that were captivated by this land had the ideals of freedom. Europe was overcrowded and with not much space in many ways, land, new economical activities, freedom of religious or political thoughts. This ideal shaped the American people and made them more entrepreneurial than any other part. The combination of economic oppression by the British system and the ideals of freedom lead the American Revolution, followed by the Declaration of Independence in 1776, and the creation of the confederation.
Around 1850 the United States couldn’t reconcile important differences in their approach concerning law, economics and African-American slavery between the northern and the southern states. By the time Abraham Lincoln won the 1860 Election, about eleven Southern states seceded from the union, giving the trigger for the start of the American Civil war (until 1865). It’s most important legacy are ending slavery and restoring the Union. But the fact that the North won the war made a huge impact on the overcoming results concerning the development and growth of the American industries. The successive technological advances like the railroad, telegraph, telephone and internal combustion engine also contributed towards development. The expansion of America and the connection of the border with the eastern states was possible because of the industrial developments but also industries grew because of the immense needs of expansion.
By the end of the 19 century, American industrial production and per capita income exceeded other’s nations and was only behind Great Britain. Due to the industrial revolution and territory expansion in the United States, an unprecedented wave of immigration entered the country, providing labour and creating diverse communities in undeveloped areas. From 1880 to 1914, more than 22 million people migrated to the United States.
The progressive era followed from 1890 until 1918, and was characterized by corruption, greater federal regulation of anti-trust laws, drugs, and more railroads constructions. Also in this era The United States began to be seen as an international power with substantial population, industrial growth and many military actions abroad. After winning World War I, the U.S. grew steadily in economic and military power. During the 1920s the US experienced a great economic prosperity, farm prices and wages fell, while new industries and industrial profits grew. Consequently the stock market was inflated, leading to Crash in 1929 and carrying down the worldwide Great depression. The United States experienced deflation, unemployment increased from 3% in 1929 to 25% in 1933 and manufacturing output decreased by one-third. The economy was only totally recovered by 1940.
United States also joined World War 2 against Germany and Japan. Its participation was decisive for the Allied victory and left the United States emerged as one of the two dominant superpowers. Between 1955 and 1970, Americans began to migrated from farms into the cities and experienced a period of sustained economic expansion. Also the beginning of the first stage of the cold war (1945-1964) between the Soviet Union gave rise to a rush up in many different fields such as physics, chemistry, mathematics, computing and aerospace. As a result many new industries were created like the Information
By contributing with science and technologies the U.S. stressed even more its industrial development and economic success. During the 90’s United States had the longest period of economic expansion, becoming the world superpower.

Theoretical framework and
hypotheses
When we look at highly developed industries in any region or country one apparent observation can be made, there is a certain amount of cohesion between these industries in a region; we don’t find, for example, revved up automobile manufacturers in Hollywood or action packed movie studios at Oil Creek Pennsylvania, nor do we expect to find drilling rigs toiling for oil in Detroit. The industries and companies we do find in such regions of specialized fame display a certain degree of closeness in resources, activities and target market that is difficult to explain using classical economic theory. Through the lens of the classic theories on supply and demand and competition one would expect that Adam Smith’s (1759) “invisible hand” will spread companies and industries that are so similar to one another out across the country until equilibrium is reached; a concentration of similar companies would drive up prices for resources and lower prices for products. It would therefore make more sense for a company to locate elsewhere where there is less demand for the same resources and less supply of the same products until a Nash equilibrium (1950) is reached where no company would want to relocate because relocation will increase the costs of resources or decrease the price for their products. The resulting equilibrium should then be regions with more or less the same portfolio of industries and companies. Yet in reality we do see these highly specialized clusters of industries and companies, and considering their success the benefits of being where the action is seems to outweigh the costs of competing for the same resources and market.
In order to better understand the industry evolution that may have occurred from an agricultural United States to the highly successful and specialized industry clusters we find in some U.S. states today, we will first discuss why and how industries already present in a region may diversify into technologically related industries and how the agglomeration of such related industries can yield beneficial externalities. We will then address the concept of path dependence and how the self-organizing effect of related diversification and resulting positive feedbacks of agglomeration externalities can cause the emergence of a region with highly successful clusters consisting of similar and related industries (Arthur, 1987). Finally in this theoretical section and before we go on to the testing of the hypotheses formulated in this section, we will reconcile the concept of path dependence which is rooted in historical economics with the seemingly ahistorical economic train of thought that had been developed in the past, and is still persistent now in ahistorical economic modelling and theories (David, 2001).

Diversification and the
development of an industry
An investor’s portfolio of assets shows many similarities with a region’s portfolio of assets. Granted, the level of analysis is a bit different, but the dynamics behind the development of an investor’s asset portfolio and the development of a region’s industry portfolio is surprisingly the same. As it is probably easier for us, human of flesh and blood, to relate to an investor, which we all can become (and perhaps already are), then to relate to an industry or even an entire region like a U.S. state, we would like you to look through the eyes of an investor in this section in order to illustrate the role diversification plays in the development of industries in a region.

Through the eyes of an investor
Stage 0: Like an investor without any assets when we go back in time to the very beginning we will find a region to be empty and undeveloped. A town could then be considered the region’s first asset.
Stage 1: An investor’s first asset would be one he deems profitable. And indeed, in the region’s first asset, the town, we will mostly find the development of businesses its inhabitants deem profitable, which would be businesses that address the basic needs of shelter, food and clothing.
Stage 2: When an investor succeeds in making his first profits he may decide to invest more in the same asset or invest in other assets he thinks profitable. In the same way we see a town develop through the enhancement of its current activities (diversifying in the kinds of shelter, food and clothing) and by attracting new kinds of profitable business (theatres for entertainment perhaps or dining establishments for example).
Stage 3: As an investor gains more experience and wealth he will find the benefits of keeping a well diversified portfolio for a steady return on investment. This too we see in reality, where we find towns with a well diversified portfolio of businesses that fulfil the more basic needs (small investors) and cities with an equally well diversified, but larger portfolio of businesses that fulfil more than the basic needs (big investors).
Stage 4: By dealing with different kinds of investment an investor may learn where he excels and can make the most profit in. For the same matter a town, city or region could at some point “discover” where it excels in and develop a local or region specific specialty.

Up to this point we have illustrated how the development of an investor and his asset portfolio is analogous to the development of a town, city or region and its business and industry portfolio. You may now rightly wonder why such an analogy would be of importance to explaining the role diversification plays in regional industry evolution. The development up to stage 4 is rather basic and may perhaps provide only a few new insights into regional development. But, if the analogy up to stage 4 is true, the analogy may also be true for stage 5, and that is where the interesting high value (but also high risk) developments are taking place. At stage 5 the analogy is between the knowledgeable and specialized investor who actively trades to get the highest returns and his macro-level counterpart: the specialized industry cluster. We will first examine stage 5 from an economic point of view, followed by formulating hypotheses to test whether related diversification has played a significant role in the industry development of U.S. regions. This section will conclude with completing the analogy from the investor’s point of view, discuss the potential insights gained from this analogy and suggest possible implications this might have on established views of regional development.

Stage 5 of regional development –
The importance of related diversification
In a recent empiric study on the development of Swedish industries Neffke et al. (2009) have found “that industries that were technologically related to pre-existing industries in a region had a higher probability to enter the region, as compared to unrelated industries.” They have also found that these unrelated industries had a higher probability of exiting the region. Their finding would mean that over time a region’s industry portfolio seems to diversify into technologically related industries, while unrelated industries either left the region or left business for good. This technological relatedness therefore seems to have played an important part in the regional diversification and development of Sweden in the years under examination (1969-2002) (Neffke et al., 2009).
When we consider stage 4 of regional development to be a time when a region “discovers” where it excels in, we would indeed expect to find a higher likelihood of related industries entering the region because additional profit can be made. But even before stage 4 we would expect related diversification to play a role in the development of a region. From stage one to stage 2 there would be related diversification in the food, clothing and building industry. From stage 2 to stage 3 as a town or region grows more prosperous the existing industries for food, clothing and building can further diversify into related industries that offer even a greater variety of these basic goods. New industries pertaining to for example leisure may also enter the region as people have sufficiently satisfied their basic needs (Maslow, 1943) and have money to spare on other goods and services. New industries will find it profitable to enter until there is a well-rounded portfolio of activities. From stage 3 to 4 new industries like leisure that have entered the region may further diversify into related industries giving rise to specialties like a movie theatre or theme parks. Regions can also emerge to be more competent in one industry, giving that industry a competitive advantage and allowing it to grow and expand into related industries.
If there were a stage 5 of industry development we would expect stage 5 to be a continuation and enhancement from stage 4. That is, continued entry of related industries and exit of unrelated industry would ultimately result in a highly specialized region. An enhancement from stage 4 would mean that this highly specialized region is also highly successful and distinctly different from a “normal” region specific specialty.
When we view the evolution of industries in U.S. regions, we can say that a highly specialized, successful and distinct area such as Silicon Valley fit the description of being in developmental stage 5. Boschma and Frenken (2009) provide four knowledge transfer mechanisms that are likely to support the process of related diversification, or regional branching as they termed it, in developmental stage 5. The mechanisms they have described are those of firm diversification, spin-offs, labour mobility and social networking, all of which are very present in Silicon Valley. Through these knowledge transfer mechanisms new entrants to Silicon Valley can connect to the knowledge available. The efficiency of knowledge transfer is also very high at stage 5 due to the proximity occupants of such a region would share in their way of thinking, doing business, social network and of course location. Boschma (2005) states however that proximity seems to display a reversed U-Shape relation with innovation; too much proximity will result in too much of the same knowledge and with too little proximity knowledge cannot be transferred to the greatest effect, that is to be of aid in creating new knowledge/innovation. The magical combination for innovation therefore seems to be an adequate amount of proximity for knowledge transfer and an adequate amount of diversity to ensure that there will also be new knowledge. Through combining existing knowledge with new knowledge, new combinations of knowledge can be made. When we take into account that Schumpeter defined the engine of economic growth, the entrepreneur, as an innovator who carries out new combination, we can see how related diversification may prove to play a crucial role for regional development.
In order to empirically test the importance of related diversification on regional development we have formulated the following hypotheses:
(A picture will be inserted around here to illustrate the differences between the hypotheses and the economic thought behind it.)
Hypothesis 1a
The higher the closeness of an industry outside the state portfolio – but in the state - to the state portfolio, the higher the probability that this industry will enter the state portfolio.
Hypothesis 1b
The higher the closeness of an industry outside the state portfolio – but in the state - to the largest industry in the portfolio, the higher the probability that this industry will enter the state portfolio.

Hypothesis 2a
The higher the closeness of an industry outside the state to the state portfolio, the higher the probability that this industry will enter the state.
Hypothesis 2b
The higher the closeness of an industry outside the state to the state portfolio, the higher the probability that this industry will enter the state portfolio
Hypothesis 2c
The higher the closeness of an industry outside the state to the largest industry in the state portfolio, the higher the probability that this industry will enter the state.
Hypothesis 2d
The higher the closeness of an industry outside the state to the largest industry in the state portfolio, the higher the probability that this industry will enter the state portfolio.

Agglomeration externalities and
the growth of an industry
Agglomeration externalities could be advantages or disadvantages that occur to firms, because of the fact that they are closely located to other firms. Firm’s tendency to diversify in a specific region will depend on the level of relatedness with the rest of the industries. By doing so firms may benefit from the created synergies and also industries could foster grow. Firms that possess similarities among the use of technology, resources, and the skills needed by their workers, will experience knowledge spillovers and labour market pooling. For example, experienced oil fields engineers will tend to live close by the oil industrial pool, allowing the costs of hiring people to go down.
Spillovers make room for new entrants, specially related activities, and as a consequence the industry will grow. Spillovers might be the best promoted when there is a degree of relatedness between relatively distinct sectors in terms of products, knowledge base, technology or skills. Such spillovers may foster growth if innovations and improvements in one firm brings external benefits to other firms without the beneficiary paying full compensation (Glaeser et al., 1992)
Agglomeration externalities are also seen as a consequence of a path dependant process, in witch the development of an industry in a certain state firstly depended on the past location of a few firms in an industry. Hollywood industry in California started out from four studios and developed into a well diversified related network of firms.
H3: The higher the level of LQ of industries in the state portfolio the higher the growth rate of these industries.

H4a: The higher the level of closeness of industries in a state portfolio to the state portfolio the higher the growth rate of these industries.

H4b: The higher the level of closeness of industries outside a state portfolio to the state portfolio the higher the growth rate of these industries.

Path dependence and the origin of an industry
We will further revise and append this part by elaborating on path dependence and how the self-organizing effect of related diversification and resulting positive feedbacks of agglomeration externalities can cause the emergence of a region with highly successful clusters consisting of similar and related industries (Arthur, 1987). We will also try to reconcile the concept of path dependence, which is rooted in historical economics, with the seemingly ahistorical economic train of thought that had been developed in the past, and is still persistent now in ahistorical economic modelling and theories (David, 2001).

When we go back in time to the origin of any industry we may find certain events that have determined their developmental path. These historic events, such as the decision of a company to locate in a certain region may cause this region to “lock-in” to a new pathway of development (Martin and Sunley, 2006). In the case of Hollywood, such a lock-in might have occurred when big players such as Warner Bros, Paramount, Colombia decided to “set up shop” in Hollywood. At the moment one or a certain amount of these big players settled in Hollywood history might have been written. At that very moment in history when a lock-in might have occurred, Hollywood was destined to develop into the large blockbuster producing movie capital we know today. Besides historic events such as firm establishment, path dependence may occur through a variety of reasons such as sunk costs of local assets and infrastructure, regional specific institutions, presence of natural resources. Whatever the reasons however, relatedness plays a large role in the concept of path dependence. To illustrate the importance of relatedness we will discuss several sources for path dependence and how relatedness plays an integral part in it.
Relatedness and natural resources: A region might be blessed with the presence of rich mineral veins or large oil fields. The discovery of these natural resources can then set off a lock-in to the development of an equally rich industry as is the case when oil was discovered in 1859 at Oil Creek Pennsylvania. The concept of path dependence need not be constrained to just one region; an event may also be the trigger to national path dependence. In the case of the U.S. oil industry, the discovery of oil in Pennsylvania was the cause of the lock-in to the development of the oil industry in the region. This event, the development of oil industry in Pennsylvania went on to trigger the drilling and development of oil industry in other regions of the U.S. The subsequent discovery of oil in Kansas, Texas and other U.S. states set the stage for the internationally operating U.S. oil industry we know today.
The presence and exploitation of these natural resources guaranteed that companies related to the oil industry will stay in the region (or country) where the natural resources are discovered. Furthermore, when the development of an industry in a region is path dependent, it would mean that without the proper trigger, like the discovery of natural resources, lock-in to this new developmental path will not occur in other regions. That is, related industries will continue to grow in the region after lock-in has occurred, while these industries will be absent or not grow (as much) in regions where there was no trigger to induce lock-in. When path dependence does play a role in the formation of a region’s industry portfolio, one would suspect that these path dependent industries display staying power and will remain in the region’s portfolio. To investigate this we therefore hypothesize:

H5a: The higher the closeness of an industry to the state portfolio, the higher the likelihood that this industry will stay in the state portfolio.

Relatedness and sunk costs of local assets and infrastructure:
As an industry grows in size investments will have been made to facilitate this growth. Large plants may have been built, a skilled and dedicated labour pool may have developed, and infrastructure to facilitate the industry may be created. The costs for these local assets, when incurred, may bind an industry to that region; it will be too costly for the industry to relocate. This is especially true for the largest of industries such as the “Big Three” U.S. automakers in Detroit. These large industries will have invested a vast amount in local assets such as factories, physical infrastructures and a trained work force. Not only may such an industry have locked-in to a developmental path at the moment it has grown past a certain size, industries that are closely related to this large industry may also find themselves locked-in. This is the case when related industries benefit from the built up local assets of the large industry to the point that it will be too costly for these related industries to relocate as they will not find such a skilled work force, dedicated facilities, business partners or infrastructure in other regions. To investigate whether such an asset driven path dependency of large industries bind related industries to its region we hypothesize:

H5b: The higher the closeness of an industry to the largest industry in the state portfolio, the higher the likelihood that this industry will stay in the state portfolio.

Data and methods
Source of data
In order to test our hypotheses we obtained a large dataset containing United States census data. This data is provided by the Integrated Public Use Microdata Series USA project (Ruggles et al., 2010). The dataset contains 10-yearly census data from United States households from the time period 1850-2000. The occupational and industrial classification of the year 1950 is used throughout the entire dataset to enhance comparability across years. In this sense we use an harmonized classification system throughout our analysis. Our initial dataset consisted of five variables provided by IPUMS-USA. With these variables we constructed other variables in order to test our hypotheses.

Variables
The initial dataset consisted of five variables with US census data: year, industry classification, occupational classification, state and the number of employees. A detailed description of all the variables is available in appendix #. With these variables a location quotient is calculated (Haig, 1926). This allows for the construction of closeness variables with two distinct types: the average closeness to all industries in the state portfolio and the closeness to the largest industry in the state portfolio. With these variables and the initial variables in the dataset we are able to test our hypotheses.

year
Variable that provides the four-digits census year from 1850 till 2000 in 10-yearly steps (e.g. 1850, 1860, 1870, etc.).

statefip
Variable that reports the state using the Federal Information Processing Standards (FIPS) coding scheme.

occ1950
Variable that reports occupational data using the 1950 Census Bureau occupational classification system.

ind1950
Variable that reports the industry using the 1950 Census Bureau industrial classification system.

count
Variable that reports how many employees are working in the industry (provided by ind1950).

total_empl_state
Variable that reports the total employment in a state (provided by statefip) for a given year.

total_empl_us
Variable that reports the total employment in the United States for a given year.

total_empl_per_ind1950_cat
Variable that reports the total employment in an industry (given by ind1950) for a given year.

LQ
Variable that reports the location quotient of an industry (given by ind1950) in a state (given by statefip) for a given year. Location quotient is based on economic base analysis. It identifies the basic industries of a state by comparing the total employment of a state in that industry with the average employment in that industry in the United States (Haig, 1926). The location quotient of industry i in state r is derived with the following formula:

Where Ei,r is the number of employees in industry i within state r, Er is the number of employees in state r, Ei is the number of employees in industry i (total in United states) and E is the total number of employees (in the United States). A location quotient that is bigger than 1 states than an industry is over-represented in the particular state. A location quotient that is smaller than 1 states that an industry is under-represented in the particular state. A location quotient of 1 is equal to the national average.

state_PF_LQ10
Dummy variable with value 1 if a particular industry is in the state portfolio. Inside the state portfolio is defined as an industry having a location quotient bigger than 1.

Strategy analysis 1
Logistic regression model
In order to test hypotheses one, two and five we construct a logistic regression model with a changing dependent and independent variable and the same set of control variables. The control variables are state, year and industry classification. We choose these control variables because each of them also influences the relation between our dependent and independent variables. With this logistic model we can test the probabilities that are stated in the hypotheses. The advantages of a logistic regression model are that in can cope with our binary dependent variable as well as the relaxed assumptions of no auto-correlations and collinearity.

Hypothesis 1a
The higher the closeness of an industry outside the state portfolio – but in the state - to the state portfolio, the higher the probability that this industry will enter the state portfolio.
The dependent variable is a dummy variable PF_entry which indicates if an industry enters the portfolio or not and the independent variable defines the closeness as average closeness to the portfolio.

Hypothesis 1b
The higher the closeness of an industry outside the state portfolio – but in the state - to the largest industry in the portfolio, the higher the probability that this industry will enter the state portfolio.
The dependent variable is a dummy variable PF_entry which indicates if an industry enters the portfolio or not and the independent variable defines the closeness as the closeness to the largest industry in the portfolio.

Hypothesis 2a
The higher the closeness of an industry outside the state to the state portfolio, the higher the probability that this industry will enter the state.
The dependent variable is a dummy variable REG_entry which indicates if an industry enters the state or not and the independent variable defines the closeness as average closeness to the portfolio.

Hypothesis 2b
The higher the closeness of an industry outside the state to the state portfolio, the higher the probability that this industry will enter the state portfolio
The dependent variable is a dummy variable REG_AND_PF_entry which indicates if an industry enters the state and the state portfolio or not and the independent variable defines the closeness as average closeness to the portfolio.

Hypothesis 2c
The higher the closeness of an industry outside the state to the largest industry in the state portfolio, the higher the probability that this industry will enter the state.
The dependent variable is a dummy variable REG_entry which indicates if an industry enters the state or not and the independent variable defines the closeness as the closeness to the largest industry in the portfolio.
Hypothesis 2d
The higher the closeness of an industry outside the state to the largest industry in the state portfolio, the higher the probability that this industry will enter the state portfolio.
The dependent variable is a dummy variable REG_AND_PF_entry which indicates if an industry enters the state and the state portfolio or not and the independent variable defines the closeness as the closeness to the largest industry in the portfolio.

Hypothesis 5a
The higher the closeness of an industry inside the state portfolio to the state portfolio, the higher the probability that this industry will stay in the state portfolio.
The dependent variable is a dummy variable PF_stay which indicates if an industry stays in the portfolio or not and the independent variable defines the closeness as average closeness to the portfolio.

Hypothesis 5b
The higher the closeness of an industry inside the state portfolio to the largest industry in the portfolio, the higher the probability that this industry will stay in the state portfolio.
The dependent variable is a dummy variable PF_stay which indicates if an industry stays in the portfolio or not and the independent variable defines the closeness as the closeness to the largest industry in the portfolio.

Strategy analysis 2
Linear regression model
In order to test hypotheses three and four we construct a linear regression model with a fixed dependent variable – growth rate – and a changing independent variable. The model uses the same set of control variables. The control variables are state, year and industry classification. With this linear regression model we can test if an impact on the growth rate – the dependent variable - as stated in the hypotheses exists. The independent variable is the changing factor in the hypotheses.
Assumptions (see appendix for outcomes)
Before we can validate the results of the model and answer our hypotheses and our research questions we need to assure that the model is free of error and all assumptions for simple regression are met. To improve our calculations we have eliminated all of the severe low and severe high outliers of the dataset before running the tests and all of the variables are log transformed. The first assumption for a simple regression is that the data is linear. To check this we have created a scatter plot in which we can see the linearity of the data which is acceptable. Second we need to make sure that the errors are normally distributed. To test this we have graphed the standardized normal probability of the residuals. Third, we have tested the multicollinearity by means of a variance inflation factor (VIF), values of 1,08 for H4, 1,01 for H5a and H5b indicate no collinearity problems meaning that the relation between variables is not perfectly linear. As for the fourth assumption we have calculated the Breusch-Pagan test for homoskedasticity and with a p-value of 0,000 we can accept the alternative hypothesis that the variance is not homogeneous and thus heteroskedastic. The last assumption of autocorrelation is checked by means of a scatter plot with the residuals and the time variable year. We can conclude that none of the assumptions are violated so we can use this model of simple regression to solve our hypothesis.

Results
The different logit regressions that were carried out can now be used to see whether the formulated hypotheses 1, 2 and 3 can be accepted. The results of the logit regressions are presented in tables XX in appendix XX

H1a
Table X in appendix XX shows the results of the Hosmer and Lemshow Goodness of fit test. Based on this test we conclude that this model fit’s the data well. The results of the logit regression that is used to test hypothesis 2a are presented in table X in appendix XX. We see that the coefficient of the independent variable is positive and significant. Thus we can accept the hypothesis that the closeness of an industry outside the state portfolio, but in the state, to the state portfolio has a positive effect on the probability that this industry will enter the state portfolio.

H1b
The Hosmer and Lemshow Goodness of fit test shows a level of 0.1283 (Table XX in appendix XX). This indicates that we can state that the model’s estimates fit the data well. The results of the tests concerning hypothesis 2b are presented in Table XX in appendix XX. We can conclude that the independent variable, closeness of an industry outside the state portfolio, but in the state, to the largest industry in the portfolio, has a positive effect on the chance that this industry will enter the state portfolio. The effect is significant on a threshold level of 0.1. Therefore we accept hypothesis 2b.

H2a
Table XX in appendix XX shows the results of the logit regression used to test hypothesis 3a. The coefficient of the independent variable, closeness, is negative and significant. We therefore reject the hypothesis that a higher closeness of an industry outside the state to the state portfolio will increase the probability that this industry will enter the state. We however note that it could be the case that our database is not perfectly suitable to test this hypothesis. The low value of the Hosmer and Lemshow Goodness of fit test (0.000) supports this assumption (Table XX in appendix XX).

H2b
Based on the results of the Hosmer and Lemshow test (Table XX in appendix XX) we can state that the model’s estimates do not fit the data at an acceptable level. However, the likelihood that this test is positive is relatively small, due to the large dataset we use. Thus we will use this model. The coefficient of the independent variable (Table XX in appendix XX) of the logit regression belonging to hypothesis 3b is positive and significant. This means that we can accept hypothesis 3b. We can conclude that closeness has a positive effect on the probability of an industry entering the state portfolio in the future.

H2c
The low results (Table XX in appendix XX) of the Hosmer and Lemshow Goodness of fit test mean that the model’s estimates do not fit the data at an acceptable level. This result is probably caused by the large size of the database. We therefore use the model. We can see in Table XX (appendix XX) that the independent variable, closeness of an industry outside the state portfolio to the largest industry in the state portfolio, has a small, insignificant and positive effect has on the dependent variable, the probability that this industry enter the state. Thus we reject the hypothesis 3c.

H2d
Table X in appendix XX shows the results of the Hosmer and Lemshow Goodness of fit test. The results indicate that the model should be rejected because it does not fit the data well. However because of our huge dataset the likelihood that this goodness of fit test is positive is relatively small and therefore we will use this model. The results of the logit regression that is used to test hypothesis 3d are presented in table X in appendix XX. We see that the coefficient of the independent variable, closeness to the largest industry in the state portfolio, is positive and significant. Thus we can accept the hypothesis.

H3
With a positive coefficient and a significant value of 0.000 we can state that there is a positive relationship between the growth rate per decade and the LQ of industries in the state portfolio. (Table XX in appendix XX) Since all of the assumptions for this model are met and the results show a significant effect, we therefore accept this hypothesis. Furthermore we can state that the relationship between the two variables is fairly strong. So if an industry is overrepresented in a certain state we could say that this industry is likely to grow faster than other industries in the same state with a lower LQ.

H4a
There is a positive relationship between growth and portfolio closeness, this effect is significant with a value of 0,027 (Table XX in appendix XX). However we must stress that the relationship found in this model suggest a connection which is not very strong. Compared to the previous hypothesis, this result is remarkable since we expected the level of closeness to be a more important indicator for industry growth than overrepresentation in a state measured by a location quotient.

H4b
This hypothesis is also significant with a value of 0,000 and the relation is positive (Table XX in appendix XX). However, like the previous hypothesis, the relationship found is one that is relatively small. We still can state closeness matters when it comes to industries outside the state portfolio. The higher the closeness, the higher the probability that the growth rate in these industries is higher.

H5a
The Hosmer and Lemshow Goodness of fit test shows that the significant level of the dependent variable is below 0.05 (table XX; Appendix XX). This finding means that the model’s estimates do not fit the data at an acceptable level. The likelihood that this test would have been positive is relatively small due to the large size of the dataset. Therefore we will use this model. We can see in the table X in appendix XX that the independent variable, closeness of an industry inside the state portfolio to the state portfolio, has a significant and positive effect on the dependent variable, the probability that this industry will stay in the state portfolio. Thus we can accept the hypothesis 1a.

H5b
Based on the Hosmer and Lemshow test presented in table XX (Appendix XX) we can state that the model does not fit the data well. The low value of this test could be caused by the large size of the database. We therefore use this model to test hypothesis 1b. Table X in appendix XX shows that closeness of an industry inside the state portfolio to the largest industry in the state portfolio has a small, positive and insignificant effect on the probability that that industry will stay in the state portfolio. Therefore we reject the hypothesis.