+景,这一部分为定制化计价。
+
+
+### 2.4 AI 应用层开源工具百花齐放
+
+#### 2.4.1 应用层开源工具百花齐放
+
+应用层人工智能的发展正如百花齐放之景,展现了技术多样性和应用广泛性的壮观图景。当下,应用层 AI 的影响力不断扩大,它们有的面向 C 端用户,提供涵盖日常生活方方面面的服务,如娱乐、社交、音乐、个人健康助理等等;同时也在更专业 B 端领域发挥着重要作用,如市场分析、法务处理、智能设计等。这些应用展现了 AI 技术的深度和广度,不仅提高了效率和便利性,还在很大程度上推动了创新和科技进步。
+
+
+
+
\ No newline at end of file
diff --git a/en/commercialization.md b/en/commercialization.md
new file mode 100644
index 0000000..1a4ad0a
--- /dev/null
+++ b/en/commercialization.md
@@ -0,0 +1,1511 @@
+---
+outline: deep
+---
+# OSS Commercialization
+
+## 1. Overview
+In the Commercialization chapter of the Open-source Annual Report for the past two years, the underlying drivers of successful commercialization of open-source software, possible commercialization paths for open-source software companies, decision making criteria of investors in open-source projects, and case studies are presented. Last year, combined with some trends and changes in the market environment at that time, we discussed the drivers, challenges and realization paths for domestic open-source projects to explore the process of globalizationand commercial development, which triggered a lively discussion among many open-source buddies.
+
+In 2022-2023, the field of AI has seen an explosion of pre-trained large language model (LLM) technology, which has sparked widespread interest across society and is predicted to continue to deepen its impact on life and work in the future. It is not difficult to find that in this wave of AI technology iteration, the open-source ecosystem has also played a essential role in promoting the development of technology, and there are many open-source models as well as open-source projects actively seeking commercialization. However, there are numerous differences between open-source models and traditional open-source software. In such an era, the commercial development of AI open-source projects and open-source models has become a topic worthy of in-depth research and discussion.
+
+The security and controllability of open-source projects, including open-source software and open-source models, is one of the key considerations for business users in the commercialization process. Combined with the current technology trends, the analysis of the security of open-source software, the controllability of open-source models, and open-source commercial licenses are topics of interest.
+
+Capital is an important participant in promoting the development of open-source markets. For investment institutions, when judging an open-source project, they will often consider the following points:In the product development stage, the focus should be on whether the team has the ownership and control of the code, and whether it has international competitiveness; in the community operation stage, the main point is to see whether the operating ability is strong enough; in the commercialization stage, the market matching ability and the maturity of the business model will become the main focus.
+
+As the first organization in the field to focus on open-source and continue to work on it, Yunqi Partners has successfully identified and invested in open-source companies such as PingCAP, Zilliz, Jina AI, RisingWave Lab, TabbyML, etc., and continues to participate in the construction of the open-source ecosystem.
+
+In order to further enrich the content of the report, this year we are honored to jointly organize a series of closed-door discussion Meetup with Open-source Society. We had a deep discussion on the development of open-source commercialization related to the development of AI Infrastructure, the development of open-source LLMs, together with industry guests including Microsoft, Google, Apple, Meta, Huawei, Baidu and other domestic and international manufacturers, Stanford University, Shanghai Jiao Tong University, China University of Science and Technology, UCSD and other universities and research institutes, as well as a large number of domestic and international front-line entrepreneurs open-source open-source LLMSome of the key insightsare included in this report.
+
+This chapter is written by the investment team of Yunqi Partners, the topics discussed this year focusing oncutting-edge trends and technology, together with some outlookand prediction.We combined industry participants experience and opinions to put forward our views, if there are inconsiderate or different ideas, further discussion is highly welcomed.
+
+Key elements include:
+
+**Open source ecosystem for rapid AI growth**
+
+**Open source security challenges**
+
+**Capital market situation for open source projects**
+
+
+## 2. Open source ecosystem fuels rapid AI development
+
+### 2.1 The proliferation of pre-trained LLM is strongly driven by open source
+
+#### 2.1.1 Rapid development of pre-trained LLMs
+
+The development of pre-trained LLM has been groundbreaking over the past few years, and they have become a major landmark in the field of AI. These models, not only are growing in scale, but also have made huge leaps in their intelligent processing capabilities. From the complexity of the language processing to the finesse of image processing, and the depth of advanced data analysis, these models demonstrate unprecedented capability and precision. Especially in the field of Natural Language Processing (NLP), pre-trained LLM, such as the GPT series, have been able to simulate complex human languages by learning a large amount of textual data for high-quality text generation, translation and comprehension. These models not only show significant improvements in expression fluency, but also show an increasing ability to understand context and capture subtle linguistic differences.
+
+In addition, these LLMs perform extremely well in complex data analysis. They are capable of extracting meaningful patterns and correlations from huge data sets to support a wide range of fields such as scientific research, financial analysis, and market forecasting. It is worth noting that the development of these models is not limited to their own enhancements. As they are popularized and applied, they are driving technological advances across the industry and society as a whole, facilitating the creation of new applications such as intelligent assistants, automated writing tools, advanced diagnostic systems, etc. Their development opens up more new development directions for incoming AI applications and research, indicating a new round of technological innovation.
+
+Enthusiasm for AI among public users is surging rapidly. Number of ChatGPT users reached 100 million in just 2 months, compared to TikTok's 9-month-record. This is not only a huge commercial success, but also a major milestone in the history of AI technology development.
+
+
Figure 2.1 Time to reach 100 million users for major apps (in months)
+
+
+Along with the growing AI popularity, the global AI market size is also growing rapidly. According to Deloitte, it will grow at a CAGR of 23% during 2017-2022, and is expected to reach seven trillion dollars in 2025.
+
+
Figure 2.2 Global AI Market Size (Trillions of Dollars)
+
+#### 2.1.2 Open Source Power for AI
+
+The power of the open-source ecosystem has played an essential role in making such great strides in pre-trained models. This includes not only research from academia but also support from industry. Under the joint efforts of the open-source ecosystem, the performance of the open-source-based LLM is rapidly developing and gradually rivaling that of closed-source.
+
+**The power of open source from academia has contributed significantly to the evolution of AI technology**
+
+Since Princeton University published ImageNET in 2009, a significant paper in computer vision, there has been a gradual increase in the number of papers related to AI machine learning. Over the years, researchers have proposed many open-source algorithms. By 2017, the number of papers on AI machine learning on Arxiv had reached over 25,000. The "Attention Is All You Need" paper was published that same year, introducing the open-source Transformer model. The publication of this paper led to a concentrated surge in research and papers on LLM. As a result, from 2017 to 2023, the number of Arxiv papers related to LLM surged to over 100,000. This surge also considerably accelerated the open-source development of related models and laid the theoretical foundation for the subsequent explosion of LLM technology.
+
+
Figure 2.3 Cumulative number of AI / Machine Learning related papers published on Arxiv
+
+
+:::info Expert Review
+**Willem Ning JIANG**:This insight is quite exciting, and academic open-source plays a very important role.
+:::
+
+**The industry's open source power fuels rapid development of LLM**
+
+With the ChatGPT LLM popularity, more and more technicians are devoted to the research and development of LLMs. In addition to closed-source products, many great open-source LLMs are also leading the industry. Stable Diffusion in 2022, with its powerful graphical capabilities and community strength, quickly caught up with Midjourney, a famous closed-source graphical model, and has already taken the lead in some aspects; the robust capabilities of open-source large language models, represented by Meta LLaMA 2, have made Google researchers reflect that "we don't have a moat, and neither does OpenAI"; and there are also emerging open-source leaders in various fields, such as Dolly, Falcon, etc. With its powerful community resources and cheaper cost of use, Open-source LLM quickly gained many business and individual users, acting as an indispensable force in the development of LLM.
+
+
+
+
+**Performance of open-source LLMs is rapidly catching up with closed-source**
+
+Closed-source LLM represented by OpenAI ChatGPT4 started earlier, and the number of parameters and various performance metrics showed a tendency to outperform open-source models in the early stage. However, open-source models have a strong community and technical support, resulting in rapid performance growth. The most mature version of ChatGPT4 scored 1,181, while Llama 2, an open-source LLM launched less than four months ago, scored 1,051, with a difference of only 11%. It's worth noting that the rankings 4-9 are all open-source LLMs, indicating that the growth in open-source LLM performance is not an isolated case but an industry trend. Open-source LLMs are highly cost-effective due to their low usage costs and smaller performance gap compared to closed-source LLMs, which makes them attractive to increasing numbers of business and individual users. Please see the more detailed discussion of costs later.
+
+Benefiting from the open nature of open-source models, users can easily fine-tune LLMs to fit different vertical application scenarios. Fine-tuned LLMs are more industry-specific than general-purpose LLMs, which is an advantage that closed-source models cannot provide.
+
+
+
+Figure 2.5 [ELO ratings](https://en.wikipedia.org/wiki/Elo_rating_system) of LLMs based on user feedback
+
+
+#### 2.1.3 The three layers of the LLM
+
+The technical architecture of the LLM is divided into three main layers, as shown in the figure below. Open source has made significant contributions to the model layer, the developer tools layer, and the application layer. Each layer has its unique function and importance, and together, they form the complete architecture of the large-scale model technology. The subsequent sections (2.2, 2.3, 2.4) will discuss each layer in detail.
+
+
+
+
+- **Model layer**
+
+The model layer is the foundation of the entire architecture, including the core algorithms and computational frameworks that make up the LLM, typical models such as GPT and Diffusion are the core of generative AI. This layer involves model training, including pre-processing of large amounts of data, feature extraction, model optimization and parameter tuning. The key to the model layer is efficient algorithm design and large-scale data processing capabilities.
+
+- **Development tools layer**
+
+The development tools layer provides the necessary tools and platforms to support the development and deployment of LLM, including various machine learning frameworks (e.g., TensorFlow, PyTorch) and APIs that simplify the process of building, training, and testing models. The development tools layer may also include cloud services and computing resources that support model training and deployment. In addition, this layer is responsible for version control, testing, maintenance, and updating of the model.
+
+- **Application layer**
+
+The application layer mainly considers how to access the LLM capabilities in real applications. This layer integrates models into specific business scenarios, such as intelligent assistants, automated customer service, personalized recommendation systems, etc. The key to the application layer is translating complex models into user-friendly, efficient, and valuable applications while ensuring good performance and scalability.
+
+These three layers are interdependent and constitute the complete architecture of the LLM technology; from the basic construction of the model to the realization of specific applications, each layer plays an important role. The corresponding open-source content for each of the three layers is discussed in detail next.
+
+### 2.2 Open source is the second driving force fuelling the development of foundation models
+
+#### 2.2.1 Supply side:Concentrate on R&D
+
+**Saving the number of developers and centralizing R&D capabilities**
+
+The development of AI models requires technical expertise, and there is a shortage of related talent in China. Open-source technology can promote the development of advanced AI functionality and alleviate pressure on SMEs. Open-source Language Models lower the entry barrier and save development time, enabling more researchers to access advanced AI technologies directly.
+
+Based on efficient pre-trained models, developers can directly innovate and improve in a targeted way rather than being distracted from building the infrastructure. This concentration on innovation rather than infrastructure has greatly contributed to rapid technological advances and the expansion of applications. At the same time, sharing open-source models facilitates the dissemination of knowledge and technology, providing a platform for developers worldwide to learn and collaborate, which plays a crucial role in driving overall progress across the industry.
+
+**Saving computational power and avoiding reinventing the wheels**
+
+As the performance of the LLM continues to grow, so does its number of parameters, which has jumped 1,000 times in the past five years. According to estimates, ChatGPT chip demand reaches more than 30,000 NVIDIA A100 GPUs, corresponding to an initial investment of about 800 million U.S. dollars, with daily electricity costs of $50,000. The computational requirements for training are becoming more and more costly, so reinventing wheels over and over again is a massive waste of resources. Coupled with the U.S. ban on NVIDIA's A100/H100 supply to mainland China, it's becoming increasingly difficult for domestic companies to train on LLMs. The open-source pre-trained LLM has become a perfect choice, which can solve the current dilemma so that more companies can leverage LLMs for secondary development.
+
+Four steps are required for LLM training:pre-training, supervised fine-tuning, reward modeling, and reinforcement learning. The computational time for pre-training occupies more than 99% of the entire training cycle. Thus, the open-source model can help developers of LLM platforms directly skip the steps, with 99% of the cost of investing limited funds and time in fine-tuning steps, which is a significant help to most application layer developers. Many SMEs need model service providers to customize models for them. The open-source ecosystem can save a lot of costs for the secondary development of LLMs and thus can give birth to many startups.
+
+
Figure 2.7 Increasing number of large model parameters
+
+
+**Open source allows for exploration of a wider range of technological possibilities**
+
+Whether the world-shattering Transformer model is the optimal solution is still unanswered, and whether the next best thing is an RNN (Recurrent Neural Network) is still in question. However, due to the open-source ecology, developers can try on different branches of the AI family cohesive with various new development forces, ensuring the diversity of technological development. Therefore, the human exploration of the LLM will be unrestricted to the local optimal solution and will promote the possibility of continuous development of AI technology in all directions.
+
+#### 2.2.2 Demand side: lowering the barriers to capture the market
+
+**Open source models significantly reduce costs for model users**
+
+Deploying an open-source model initially requires some investment, but as usage increases, it exhibits scale economy, and the cost of usage is more controllable compared to closed-source. If you have a usage scenario where the average daily request frequency has an upper limit, then directly invoking the API is less expensive. However, if you have a higher request frequency, deploying the open-source model is less costly, so you should choose the appropriate method based on your actual usage.
+
+
Figure 2.8 Cost Comparison of Calling OpenAI APIs and Deploying Open Source Models on the AWS Cloud
+
+
+Comparison of directly calling OpenAI's API and deploying Flan UL2 model on public cloud as an example:
+
+According to the latest data from OpenAI's official website, using ChatGPT4 model, the input is $0.03 / 1000 tokens and the output is $0.06 / 1000 tokens. Considering the relationship between input and output and assuming an average cost of $0.04 / 1000 tokens, each token is about 3/4 of an English word, and the number of tokens in a request is equal to the prompt word + the output tokens. Assuming a block of text is 500 words, or about 670 tokens, the cost of a block of text is 670 x 0.004/1000 = $0.00268.
+
+Suppose the open-source model is deployed on the AWS cloud, taking the Flan UL2 model with 20 billion parameters, as mentioned in the related tutorial published by AWS, as an example. In that case, the cost consists of three parts:
+
+- Fixed cost of deploying models as endpoints using AWS SageMaker is about $5-6 per hour or $150 per day
+- Connecting SageMaker endpoints to AWS Lambda: Assume responses are returned to users in 5s, using 128MB of memory. The price per request is: 5000 x 0.0000000021 (unit price per millisecond for 128MB) = $0.00001
+- Open this Lambda function as an API via API Gateway. The Gateway costs about $1 / 1 million requests or $0.000001 per request.
+
+Based on the above data, it can be calculated that the total cost of the two is equal when the number of requests is 56,200 in a day. When the number of requests reaches 100,000 per day, the cost of using ChatGPT4 is about $268, while the cost of the open-source big model is $151; when the number of requests reaches 1,000,000 per day, the cost of using ChatGPT4 is about $2,680, while the cost of the open-source big model is $161. It can be found that the cost savings of the open-source big model are significant as the request volume increases.
+
+**Open source improves the explanability and transparency of models and lowers the barrier to technology adoption**
+
+Open-source models are more accessible for evaluation than closed models. Open-source models provide access to their pre-training results, and some even disclose their training datasets, model architectures, and more, making it easier for researchers and users to conduct in-depth analyses of the LLMs and comprehend their strengths and weaknesses. Scientists and developers worldwide can review, evaluate, explore, and understand each other's underlying principles, improving security, reliability, explanability, and trust. Sharing knowledge widely is crucial to promote technological progress and also helps to reduce the possibility of technology misuse. Closed-source models can only be evaluated through performance tests, essentially a "black box." It is hard to measure the strengths and weaknesses, applicability scenarios, and other factors of the closed-source models, and their explanability and transparency are considerably lower than that of open-source models.
+
+Closed-source models can face the risk of being questioned for their originality. Users cannot be sure that closed-source models are genuinely original, leading to concerns about copyright and sustainability issues. On the other hand, open-source models are more convincing to users because the code is available to validate their originality. According to Hugging Face technician comments, open-source models like Llama2, which have published details of training data, methods, labeling, and so on, are more transparent than the black box of closed-source LLMs. With transparency in the articles and the code, users know what's in there when they use it.
+
+Higher explainability and transparency are conducive to enhancing users' trust, especially business users, in the LLMs.
+
+**Business users can realize specific needs with open source base models**
+
+Business users have multiple types of specific needs, such as:industry-specific fine-tuning, local deployment to ensure privacy, and so on.
+
+As the amount of LLM parameters continues to increase, the training cost continues to rise. There are better solutions than simply growing the LLM parameters to improve performance. On the contrary, fine-tuning for a specific problem can quickly improve the performance of LLM targeting to achieve better results with less effort. For example, WizardMath, an open-source LLM of mathematics fine-tuned by Microsoft based on LLaMA2, has only 70 billion parameters, but after testing on the GSM8k dataset, the mathematical ability of WizardMath directly beats that of several LLMs such as ChatGPT, Claude Instant 1, PaLM 2-540B, and so on, which fully shows the critical role of fine-tuning in improving the professional problem-solving ability of LLMs, which is also a significant advantage of the open-source LLM.
+
+
+
+
+Many business users have incredibly high data privacy requirements, and the ability to deploy open-source LLMs locally greatly protects business privacy. When clients call closed-source LLMs, the closed-source models always run on the servers of companies such as OpenAI. Clients can only send their data remotely to the servers of the LLM providers, which is very unfavorable to privacy protection. Enterprises in China also face related compliance issues. While the open-source LLM can be locally deployed, all the data is processed within the company owning the data, even allowing offline processing, significantly protecting the clients' data security.
+
+**Open source model facilitates long-lasting customer experience**
+
+FCreating a reliable dataset for enterprises is crucial to keeping up with the constant changes in open-source models. Open-source models can be customized to fit an enterprise's specific needs, but this requires a high-quality dataset. By investing in a dataset, enterprises can fine-tune multiple models and avoid constantly replacing them with newer versions, which saves money in the long run, as the dataset does not need to be updated continuously. Enterprises can leverage the model's capabilities without incurring significant costs.
+
+Open-source models are updated quickly to meet the changing needs of users. The power of R&D in the open-source community quickly fills the shortcomings of open-source LLMs. LLaMA2 itself lacks a Chinese corpus, leading to unsatisfactory Chinese comprehension; however, only the day after LLaMA2 was made available, the first open-source Chinese LLaMA2 model, "Chinese LLaMA27B", appeared in the community and could be downloaded and run. Adequate community power support can meet the different needs of users. In contrast, closed-source companies usually need help to take care of the distinct needs of various types of users comprehensively.
+
+**Open source helps to capture market opportunities**
+
+Open-source models are more accessible to users and can expand the market quickly due to their low barrier to entry. Stable Diffusion, an open-source image generation model, has become an essential competitor to MidJourney, a closed-source model, because of its large developer community and diverse application scenarios. Although not as good as MidJourney in some ways, Stable Diffusion has captured a significant share of the image generation market with its open-source and free features, making it one of the most popular image generation models. Its success has also brought widespread attention and investment to the companies after it, RunwayML and Stability AI.
+
+#### 2.2.3 Ecological Side:Converging Diversity for Long-Term Growth
+
+**Open source facilitates large model companies to quickly seize ecological resources**
+
+The low threshold and easy accessibility of open-source models will also help the models quickly capture relevant ecological resources. StableDiffusion is an open-source project that has received positive responses and support from many freelance developers worldwide. Many enthusiastic programmers were actively involved in building the easy-to-use graphical user interface (GUI). Many LoRA modules have been developed to provide Stable Diffusion with features such as accelerated generating, more vivid images, etc. According to the official website of Stable Diffusion, one month after the release of Stable Diffusion 2.0, four of the top ten apps in the Apple App Store are AI painting apps based on Stable Diffusion. A thriving ecosystem has become the solid foundation of Stable Diffusion.
+
+At the time of the original release of the open-source LLM LlaMA2, there were 5,600 projects on GitHub containing the "LLaMA" keyword and 4,100 projects containing the "GPT4" keyword. After two weeks, the LLaMA-related ecosystem has grown significantly, with 6,200 related projects compared to 4,400 "GPT4"-related projects. For LLM companies, ecosystem means markets, technological power, and inexhaustible driver for growth. With lower barriers, open source can grab ecological resources faster than closed-source models. Therefore, open-source LLM companies should seize this advantage, strengthen communication with community developers, and provide them with sufficient support to promote the rapid development of relevant ecosystems.
+
+**Open source facilitates large model vendors to pry the market and gain business alliances**
+
+After LLaMA2 was commercially open-sourced, Meta quickly cooperated with Microsoft and Qualcomm. As the major shareholder of OpenAI, Microsoft chose to collaborate with open-source vendor Meta, which means that open-source has become a force to be reckoned with. For future collaboration, Meta stated that users of Microsoft Azure cloud service will be able to fine-tune the deployment of Llama2 directly on the cloud. Microsoft disclosed that Llama2 has been optimized for Windows and can run directly on Windows locally.
+
+The collaboration between the two companies highlights that open-source LLMs and cloud vendors have a natural cooperation foundation. Not coincidentally, there is a similar trend in domestic open-source LLM vendors: Baidu's ERNIE and Ali's Qwen are both open-source LLMs. Although users usually do not pay for using open-source LLMs, they need to pay for the computational power using Baidu Cloud and Ali Cloud as computational platforms.
+
+Meta's partnership with Qualcomm also signals its expansion into the mobile sector. Due to its broad audience, open-source LLMs can be deployed locally. With other advantages, mobile phones have become the future of convenient use of LLMs of a vital carrier. This also attracts mobile phone chip manufacturers to collaborate with open-source model vendors.
+
+In summary, the open-source LLM, with its broad reach, facilitates the company behind it to find partners and pry into the market.
+
+**Open source can mobilize a wide range of communities and bring together diverse development forces**
+
+The power of the community has always been an essential strength of open source. As shown in the figure below, the generative AI projects on GitHub have realized rapid growth in 2022, soaring from 17,000 to 60,000. The rapidly growing community can not only quickly provide a large amount of technical feedback for open-source LLM developers but also fully enhance the end reach of open-source LLMs and fine-tune the application of open-source models to various vertical domains to bring more users to the LLMs.
+
+
Figure 2.10 Changes in the number of generative AI-related projects open-sourced on GitHub (Source: GitHub)
+
+
+Open-source language models (LLMs) are built with contributions from developers worldwide from different cultures, regions, and technical backgrounds. This is in contrast to closed-source models. The graph below shows that contributors from various countries, including China, India, Japan, Brazil, and others, have made significant contributions to the open-source community for generative AI and the United States. By including contributions from developers worldwide, the open-source LLM can be adapted to suit different regions' customs, languages, industries, and other usage habits. This will make the open-source LLM more versatile and appealing to a broader audience.
+
+
Figure 2.11 Top 10 global communities creating the most generative AI projects on GitHub (Source:Github)
+
+
+**Domestic open source base LLM is booming, keeping pace with global leaders**
+
+Based on the domestic ecosystem of tech companies, the country's open-source pre-trained foundation LLMs are also booming, keeping pace with global leaders.
+
+In June, Tsinghua ChatGLM was upgraded to the second generation, which took the "top spot" in the Chinese circle (Chinese C-Eval list), and ChatGLM3 launched in October not only has a performance comparable to that of GPT-4V at the multimodal level, but also is the first LLM product with code interaction capability in China ( Code Interpreter.)
+
+In October, the Aquila LLM series has been fully upgraded to Aquila2, and Aquila2-34B with 34 billion parameters have been added. At that time, in 22 evaluation benchmarks in four dimensions, namely, code generation, examination, comprehension, reasoning, and language, Aquila2-34B strongly dominated the top 1 of several lists.
+
+On November 6, the LLM startup company Zero One Everything, led by Dr. Kai-Fu Lee, officially open-sourced and released its first pre-trained LLM, Yi-34B, which has achieved amazing results in a number of leaderboards, including Hugging Face's Open LLM Leaderboard.
+
+In December, Qwen-72B, a model with 72 billion parameters from AliCloud's Tongyi Qianwen, topped the Open LLM Leaderboard of Hugging Face, the world's largest modeling community, by overpowering domestic and international open-source LLM models such as Llama 2.
+
+Domestic open-source pre-trained base LLMs are far more numerous than the above; the booming open-source pre-trained base LLM ecology is exciting, and it includes academic institutions, Internet giants, and some excellent startups. At the end of the report, the statistics of startups and models with open-sourced LLMs are summarized.
+
+#### 2.2.4 PPaths to Commercialization of Open-source LLMs
+
+Currently, we are in the era of rapid development of open-source LLM technology, a field that, while promising, also faces significant business model exploration challenges. Based on exchanges with practitioners and case studies, this paragraph attempts to summarize some of the directions of commercialization exploration at this stage.
+
+**Provision of support services**
+
+With the emergence of more and more basic open-source technologies, the complexity and professionalism of the software have increased dramatically, and the user's demand for software stability has increased simultaneously, requiring professional technical support. At this time, the emergence of Redhat as a representative of the enterprise began to try to achieve commercialization of the operation based on open-source software, the main business model for the "Support Services" model, for the use of open-source software customers to provide paid technical support and consulting services.The overall complexity and specialization of the current foundation model is high, and the user needs professional technical support as well.
+
+In the LLM space, Zhipu AI's business model is more similar to Redhat. It provides enterprises with local private deployment services of ChatGLM, a self-developed LLM, providing efficient data processing, model training and deployment services.Provide Wisdom Spectrum LLM files and related toolkits, users can train their own fine-tuned model and deploy reasoning services, on top of which Wisdom Spectrum will provide technical support and consulting related to the deployment of the application, updates of primary model. With this solution, companies can achieve complete control of data and run their models securely.
+
+
Figure 2.12 Zhipu AI's Pricing Model for Private Deployment
+
+
+**Provision of cloud hosting services**
+
+Cloud growth has continued to exceed expectations since the development of cloud computing technology.The growing need for flexible and scalable infrastructure is driving IT organizations' cloud spending and increasing cloud penetration worldwide. Against this technological backdrop, there is a growing demand from users to reduce software O&M costs. Cloud hosting services are SaaS that enable customers to skip on-premise deployment and host software as a service directly on a cloud platform. By subscribing to SaaS services,clients can turn high upfront capital expenditures into small recurring expenditures, and relieve O&M pressure to a large extent. Some of the more successful open-source software companies include Databricks, HashiCorp, and others.
+
+In the field of LLMs, Zhipu AI directly provides standard API products based on ChatGLM, so that customers can quickly build their own proprietary LLM applications, pricing according to the number of tokens of text actually processed by the model. The service is suitable for scenarios that require high level of knowledge, reasoning ability and creativity, such as advertisement copywriting, novel writing, knowledge-based writing, code generation, etc. The pricing is:0.005 yuan / thousand tokens.
+
+At the same time, Zhipu AI also provides API interfaces for super-simulated LLMs (supporting character-based role-playing, extended multi-round memory, and individualized character dialogues) and vector LLMs (vectorizing the input text information so as to combine with vector databases, provide external knowledge bases for LLMs, and improve the accuracy of LLM inference).
+
+Hugging Face also offers a cloud-hosted business model. The Hugging Face platform hosts a large number of open-source models and also offers a cloud-based solution, the Hugging Face Inference API, which allows users to easily deploy and run these models in the cloud via an API.This model combines the accessibility of an open-source model with the convenience of cloud hosting, allowing users to use it on demand without having to set up and manage a large infrastructure on their own.
+
+
+
+
+**Development of commercial applications based on a foundation model**
+
+Based on the base model to charge fees, refers to part of the open-source vendor's own base model is free open source, but the vendor based on the base model and developed a series of commercial applications, and for commercial applications to charge for the model, typical cases, such as Tongyi Qianwen.
+
+AliCloud has developed eight applications based on its open-source base model Tongyi Qianqi:Tongyi Tingwu (speech recognition), Tongyi Xiaomei (to improve customer service efficiency), Tongyi Zhiwen (text comprehension), Tongyi Stardust (personalized roles), Tongyi Spirit Codes (to assist in programming), Tongyi Faerui (legal industry), Tongyi Renxin (pharmaceutical industry), and Tongyi Diaojin (financial industry).Each of these applications has a corresponding enterprise-level payment model. Also some of the apps include a individual-level payment model , such as Tongyi Tingwu. It mainly provides voice-to-text related services such as meeting minutes, and its charges are mainly calculated based on the length of the audio.
+
+
+
+
+**Model-as-a-Service business model**
+
+The lowest level of Model as a Service (abbreviated to:MaaS) means to take the model as an important production element, design products and technologies around the model life cycle, and provide a wide variety of products and technologies starting from the development of the model, including data processing, feature engineering, training and tuning of the model, and services for the model.
+
+AliCloud initiated the "ModelScope Community" as the advocate of MaaS. In order to realize MaaS, AliCloud has made preparations in two aspects:One is to provide a model repository, which collects models, provides high-quality data, and can also be tuned for business scenarios. Model usage and computational need to be combined in order to provide a quick experience of the model so that a wide range of developers can quickly experience the effects of the model without having to coding. The second is to provide abstract interfaces or API interfaces so that developers can do secondary development for the model. In the face of specific application scenarios, providing fewer samples or zero samples, it is easy for developers to carry out secondary optimization of the model, which really allows the model to be applied to different scenarios.
+
+
+
+
+**LLM business models need to be explored and experimented with**
+
+Currently, the business path of open-source LLM companies has not yet been validated by the market, so a large number of companies are actively exploring different business models without sticking to a single pricing strategy. However, so far, no effective business model has been found to cover their high development and operating and maintenance costs, thus making their economic sustainability questionable.This situation reflects, to some extent, the nature of this emerging industry:While technological breakthroughs have been made, the question of how to translate these technologies into economic benefits remains an open one.
+
+However, it is worth noting that despite such challenges, the rise and development of open-source LLMs still marks the birth of a new industry. This industry has its own unique value and potential, offering unprecedented technical support and innovation possibilities for a wide range of industries. In this process, all participants (including research institutions, enterprises, developers and users) are actively exploring and trying to find a model that can balance technological innovation and economic returns.
+
+This exploration is not an overnight process; it takes time, experimentation, and a deep understanding of market and technology trends. We are likely to see a variety of innovative business models emerge, such as technical support services, cloud hosting, MaaS, etc. as mentioned above. Although the current business models for these open-source LLMs are not yet mature, it is this kind of exploration and experimentation that will drive the entire LLM field forward and ultimately find a business path to sustainable growth with profitable returns.
+
+### 2.3 Making AI developer tools open-source has become an industry consensus at this stage
+
+#### 2.3.1 Developer Tools Play an Important Role in the AI Chain
+
+The Develop Tools layer is an important link in the chain of AI LLM development. As shown in the figure below, the development tools layer plays the role of the top and bottom, linking the middle layer:
+
+For taking on computational resources, the development tool layer plays a PaaS-like role.Cloud-based platforms help LLM developers more easily deploy computational, development environments, invoke, and allocate resources, allowing them to focus on the logic and functionality of model development and realize their own innovations.
+
+For linking pre-trained models, the development tool layer provides a series of tools to accelerate the development of the model layer, including dataset cleaning and labeling tools.
+
+
Figure 2.16 Location of Developer Tools in the AI LLM Chain
+
+
+To promote the development of AI applications, the developer tools layer plays an essential role in helping enterprises and individual developers to develop and deploy their final products. For enterprise developers, developer tools help to realize the deployment of LLMs in the industry, as well as the monitoring of the model to ensure the regular operation of the enterprise model. Other related functions include model evaluation, database inference, and supplementation of the model running process. For individual developers, developer tools help them simplify deployment steps and reduce development costs, inspiring the creation of more fine-tuned models for specific functions, such as Autotrain by Hugging Face, which allows developers to fine-tune open-source models based on private data with just a few mouse clicks. At the same time, the developer tools also help to establish the connection between the end-user and the LLM APP and even the deployment of the LLM on the end-user's device.
+
+With the increasing maturity and advancement of development tools, more and more developers are venturing into development related to LLMs. These tools not only improve development efficiency but also lower the barrier to entry, enabling more innovative-thinking talent to participate in the field. From data processing and model training to performance optimization, these tools provide comprehensive support for developers. As a result, we have witnessed the birth of a diverse and active LLM development community with some cutting-edge projects and innovative applications.
+
+
+
+
+LLM development tools are blossoming, covering everything from data preparation and model construction to performance tuning, and they continue to push the frontiers of AI technology. Some tools focus on data annotation and cleaning so that developers can more easily obtain high-quality data; some tools are committed to improving the efficiency of fine-tuning so that the LLM is more in line with the customization needs; there are also tools responsible for the operation of the LLM monitoring, to provide timely feedback to the developers, users. These diverse tools promote technological innovation and provide developers with more choices, together building a vibrant and creative ecosystem for LLM development. There is no shortage of great open-source projects that greatly benefit both users and open-source companies.
+
+
Figure 2.18 Large number of development tools covering different levels of LLM development
+
+
+#### 2.3.2 Open source for developer tools is important
+
+**Supply-side benefits**
+
+Open-source developer tools are conducive to polishing and upgrading the product in different scenarios, which contributes to its rapid maturity. One of the main advantages of open-source developer tools is that they provide an extensive testing and application environment. Because open-source tools are freely available for use and modification by a variety of users and organizations, they are often applied and tested in diverse real-world scenarios and are thus "battle-tested. "This extensive use and feedback helps the product identify and fix potential defects more quickly while facilitating the development of new features and improvements to existing ones. Especially for startups,this is the fastest and most cost-effective way to get product feedback, promote product improvement, and help quickly bring more mature commercialized products to market.
+
+Open-source developer tools underlying products with high user stickiness are conducive to rapidly spreading the market. As mentioned earlier, developer tools contain many indispensable components of the LLM development process. Once developers become accustomed to specific tools, they tend to use them consistently because changing tools means relearning and adapting to the new tool's features and usage. Therefore, these products naturally have high user stickiness.
+
+![FW-_aFHXEAMjI09](https://hackmd.io/_uploads/By-9g0d5T.jpg)
+
+
Figure 2.19 High user stickiness of open source development tools
+
+
+The chart shows the net revenue retention rate for major SaaS products, which reflects the retention rate of regular customers, their ability to keep paying, and their loyalty to the product. Developer product stickiness is generally higher than the median, with Snowflake at the top of the list at 174% and Hashicorp, Gitlab, and Confluent at over 120%.
+
+As you can see, with such high stickiness, the faster the customer acquisition rate, the higher future revenues will be. When these tools are available as open source, they can be more quickly and widely adopted because open source lowers the barrier to trying and adopting new tools. This rapid market expansion is critical to building brand awareness and a user base.
+
+**Demand-side benefits**
+
+Open-source developer tools reduce the cost for SMEs to enter the LLM market, making it easier for them to focus more on application layer development. For SMEs, entering the market to develop large-scale models and complex systems often requires significant technical investment and financial support. Open-source developer tools lower this barrier because they are usually free or less expensive overall and contain many proven features and components. SMEs can utilize these resources to develop and test their products without creating all the essential elements from scratch. In this way, they can focus more resources and energy on application-level innovations and solutions for specific business needs rather than spend much time and money building the underlying technology. This reduces the cost of entry and speeds up product development, enabling SMEs to compete more effectively with larger firms.
+
+Due to the ecological effect of open-source development tools, their technology iterations usually outpace closed-source tools. In such an open-source ecosystem, the latest research results from the lab can be quickly integrated and shared, and such a mechanism ensures the rapid updating and dissemination of technology. Active participation in the open-source community facilitates the rapid exchange of innovative ideas and technologies, making the latest development tools and technological achievements accessible and usable by many developers. The strength of this open-source culture is that it is open and collaborative, providing developers with a quick and easy way to access and utilize state-of-the-art tools. It not only accelerates the development of technology but also offers individual developers or small teams the opportunity to compete with large corporations, promoting the healthy development and innovation of the entire technology sector.
+
+#### 2.3.3 Open-source developer tools need to emphasize ecological construction.
+
+**Making developer tools open source requires technical support to maintain a stable community ecosystem**
+
+Open-source development tools rely on the support and maintenance provided by the community and partners. This is essential to ensure the stability and reliability of the tool. For example, the success of an open-source database management system depends not only on its functionality but also on the community's ability to respond to user-reported problems and provide fixes promptly. At the same time, market feedback from partners and users in the ecosystem is critical to optimizing open-source development tools. If an open-source code analysis tool is widely used in an enterprise environment, the feedback from those enterprise users will directly influence the future direction of the tool. This feedback helps developers understand which features are most popular and which need improvement to tailor the tool to market needs.
+
+**Open source developer tools need to complement the strengths of cloud vendors to expand market reach and user base**
+
+The developer tools themselves are to be deployed based on the platform provided by the cloud vendor, whose strength lies in its specialization and technical strength. In contrast, the cloud vendor's advantage lies in delivering the just-needed computational platform and its broader user base. The two collaborate on developer tools, and developers can leverage cloud vendors to offer better computational power deals to attract more users while benefiting from the cloud vendors' own sales channels to gain more substantial end-to-end reach. This virtuous cycle helps to extend open-source development tools to a broader user base. This increases the tool's visibility and provides more opportunities for its practical application and improvement. More users means more feedback, which promotes continuous tool optimization and adaptation to changing market needs.
+
+MongoDB, for example, started its cloud transformation early by launching Atlas, a SaaS service. Even though Atlas accounted for only 1% of total revenue when MongoDB went public in 2017, when MongoDB had already built all of its systems based on the Open Core model, MongoDB still spent a lot of resources on building SaaS-related products and marketing systems. Since then, Atlas's revenue has increased at a compound annual growth rate of more than 40%.In contrast, its competitor, CouchBase, has relied too heavily on its traditional model and has spent a lot of effort on mobile platform support services. This slow-growing market has dragged the company into a quagmire.SaaS service-based product systems are essential for developer tools vendors today and must emphasize cooperation with cloud vendors.
+
+
+
+
+**Establishing an ecology conducive to building open source industry standards**
+
+Developer tools, as the underlying tool layer, are decisive for the principle architecture of the upper model development. Collaboration with partners such as cloud vendors, open-source model vendors, and others helps to build consensus and establish industry standards, which is critical to ensure interoperability, compatibility, and consistency of user experience with development tools. Standardization reduces compatibility issues and enables easier integration and use of different products and services. For example, MongoDB leverages the community to form the industry standard for NoSQL RDMS. This active community not only brought high-quality, low-cost licenses to the early commercial versions of MongoDB but also served as the basis for the later Atlas (managed service). Based on the collaboration of the open-source community, Milvus launched Vector DB Bench (which can measure the performance of vector databases through the measurement of key metrics, allowing vector databases to maximize their potential), thus gradually establishing an industry standard for vector databases, and facilitating the selection of vector databases tailored to the needs of users.
+
+
+
+
+#### 2.3.4 Exploring the Commercialization Path of Open Source Developer Tools
+
+The commercialization dimension of AI developer tools can draw on traditional software developer tools; the overall commercialization is still in the early stage of exploration; based on the research and analysis of open-source developer tools that have attempted commercialization, we found that there are currently following commercial paths:
+
+**Cloud Hosting Managed Service - Consumption-Based Pricingg**
+
+With the popularity of cloud computing, more and more developer tools have defaulted to serving users directly through hosted resources on the cloud. Such hosting services on the cloud can reduce the user's threshold for use but also directly provide the latest and most professional product services; in the absence of data, security, and privacy concerns, it is a good commercialization option for open-source AI developer tools projects.
+
+Under the business model of hosting services on the cloud, more and more projects are choosing Consumption-Based Pricing (CBP) with different product offerings; the pricing unit can be computational resources, data volume, number of requests, etc.
+
+AutoTrain by Hugging Face is a platform that automatically selects suitable models and fine-tunes them based on a user-supplied dataset. It has selectable model categories, including text categorization, text regression, entity recognition, summarization, question answering, translation, and tables. AutoTrain provides non-researchers with the ability to train high-performance NLP models and deploy them at scale quickly and efficiently. AutoTrain's pricing rules are not disclosed; rather, an estimated fee is charged before training based on the amount of training data and model variants.
+
+Scale AI focuses on data annotation products with a simple pricing model that starts at 2 cents per image and 6 cents per annotation for Scale image, 13 cents per frame and 3 cents per annotation for Scale Video, 5 cents per task and 3 cents per annotation for Scale Text, and 7 cents per annotation for Scale Document Al. Scale Text starts at 5 cents per task and 3 cents per entry; Scale Document begins at 2 cents per task and 7 cents per entry. In addition, there are enterprise-specific charging options based on the amount of data and services for specific enterprise-level projects.
+
+**Cloud Hosting Managed Service - tiered subscription pricing**
+
+Some development tool layer projects also use Cloud Hosting Managed Services but offer subscription services yearly or monthly.
+
+
+
+
+The subscription business model allows different tiers to balance cost and price according to users' needs and willingness to pay. The company Dify.ai, pictured above, for example, has tiered pricing for different volumes of users: There is a free version for individual users, but given the cost overhead, there are many limitations set; for professional individual developers and small teams, there are fewer limitations for a lower price, but there is still an upper limit on usage; and for medium-sized teams, there is a higher price for a relatively complete service.
+
+However, Cloud Hosting Managed Services, whether per-volume pricing or tiered subscriptions, can only offer standardized product services, and the data needs to flow to the public cloud. Some large enterprises still need to privatize and customize such a business model.
+
+**Private Cloud / Dedicated cloud / Customized Deployment**
+
+While more and more projects are utilizing services hosted directly on the cloud, hosted services on the cloud are no longer an option when larger enterprises need to have more private, customized requirements.
+
+Usually, with such a business model, the program also offers different options to the users. The Bring Your Own Cloud (BYOC) model is prevalent in North America, while the On-Premise scenario is better suited for more data-compliance-sensitive scenarios.
+
+The commercialization of open source projects at the development tool level often provides a variety of options, including the above three business models. This can be interpreted as the diversity and complexity of customer demand at this level. In exploring business models, various projects are also attempting to synchronize different paths. The future direction of development is worthy of long-term sustained attention.
+
+#### 2.3.5 Successful cases of open source on the developer's tool side
+
+Zilliz is a next-generation data processing and analytics platform for AI that provides the underlying technology for application-oriented enterprises. Zilliz developed Mega, a GPU-accelerated AI data middleware solution, which includes MegaETL for data ETL, MegaWise for database, MegaLearning for model training in the Hadoop ecosystem, and Milvus for feature vector retrieval. These systems can meet the traditional scenarios and needs of accelerated data ETL, accelerated data warehousing, and accelerated data analytics, as well as emerging AI application scenarios.
+
+
Figure 2.23 Zilliz Global Users (from company website)
+
+
+Zilliz's success represents a GPU-based giant data accelerator that provides an effective solution to organizations' growing data analytics needs. Zilliz's core project, the vector similarity search engine Milvus, is the world's first GPU-accelerated massive feature vector matching and retrieval engine. Relying on GPU acceleration, Milvus provides high-speed feature vector matching and multi-dimensional data joint query (joint query of features, labels, images, videos, text, and speech) and supports automatic database sharding and multi-replicas, which can interface with AI models such as TensorFlow, PyTorch, and MxNet, enabling second-level queries for billions of feature vectors. Milvus was open-sourced on GitHub in October 2019, and the number of Stars continues to grow at a high rate, reaching 25k+ in December 2023, with a developer community of over 200 contributors and 4000 + users. In the capital market, Zilliz received $43 million in Series B, the most significant single Series B financing for open-source infrastructure software worldwide. This indicates that investment institutions are optimistic about Zilliz's potential for future development.
+
+
+
+
+Zilliz's main product is the Vector Database, a key piece of developer tools. It is a database system specialized in storing, indexing, and querying embedded vectors. This allows LLMs to store and read knowledge bases more efficiently and fine-tune models at a much lower cost. It will also play an important role in the evolution of AI-native applications.
+
+Zilliz is commercialized as Zilliz Cloud, with a monthly subscription business model. It is deployed in the form of SaaS, and determines the monthly subscription fee based on the number of vectors, vector dimensions, computational unit (CU) type, and average data length. Zilliz also offers a PaaS-based proprietary deployment service for scenarios with a high focus on data privacy and compliance, which is based on customized pricing.
+
+
+
+
+
+### 2.4 Open-source tools for the AI application layer are blooming
+
+#### 2.4.1 Application Layer Open Source Tools Bloom
+
+The development of application-layer AI is like a blossoming landscape, showing a spectacular picture of technological diversity and application breadth. Nowadays, the influence of application layer AI is expanding, some of them are oriented to consumer users, providing services covering all aspects of daily life, such as entertainment, socialization, music, personal health assistant, etc.; at the same time, they also play an important role in more specialized business fields, such as market analysis, legal processing, intelligent design, etc. These applications demonstrate the depth and breadth of AI technology, which not only improves efficiency and convenience, but also promotes innovation and technological advancement to a great extent.
+
+
Figure 2.26 A wildly diverse array of AI application layer products (source:Sequoia)
+
+
+A large number of open-source application layer products have also been born, which are mostly based on LLMs and fine-tuned with industry-specific datasets. Application layer tools customized for the industry offer better performance than the generic LLMs, and the open-source nature helps bueiness and consumer users using these applications to further customize their development to better fit the needs.
+
+Open-source tools at the application layer facilitate integration across disciplines and industries. For example, industries such as medicine, finance, education, and retail are utilizing open-source AI tools to solve industry-specific problems, driving the adoption of the technology across all sectors. Open-source tools encourage experimentation and innovation due to low cost and low risk. Developers are free to experiment with new ideas and technologies, and this spirit of experimentation has greatly contributed to the application layer boom.
+
+
Figure 2.27 Mapping of open-source tools for application testing (with examples of selected products in each domain)
+
+
+#### 2.4.2 Drivers of open source at the application layer
+
+**Open-source application layer products have a low threshold for use and are more easily accepted by users**
+
+Application layer open-source tools are less expensive and more in line with the low willingness of domestic enterprises to pay. According toiResearch, domestic enterprises are not professional enough in their internal management processes, have low recognition of the value of software, and are more willing to pay for manpower. Manufacturers need to curve to indoctrinate companies, give them a reprieve from accepting the product, and gradually unleash the demand side. Based on the above background, open-source tools meet the needs of these markets with their low-cost features, making organizations more willing to try and adopt these tools. For domestic companies with limited budgets, low cost is a significant advantage. Low- or no-cost features allow these organizations to access and use advanced technology tools without additional financial burden.
+
+At the same time the low-cost nature of open-source tools encourages companies to make long-term investments. Firms can build and expand their technology infrastructure over time without taking on significant financial risk. With the deepening of the enterprise's understanding of open-source products and the deepening of the degree of dependence, open-source products can gradually consider providing value-added services content, so as to achieve the purpose of long-term customer acquisition.
+
+At the same time open-source products are conducive to achieve seamless integration with other systems to enhance the user experience. A distinguishing feature of open-source application layer products is that they are often highly flexible and customizable. Allows users to modify and adapt to their specific needs. This means that open-source products can be customized to better fit existing systems and workflows for seamless integration with other systems. Many open-source projects follow industry standards, which helps ensure compatibility between different systems and components. Standardization promotes interoperability between different software products and simplifies the integration process, thereby improving the overall user experience. Open-source communities are typically made up of developers and users from around the world who work together to improve products and provide support. This collaborative spirit not only fosters continuous improvement of the product, but also provides a resource for solving problems that may be encountered during the integration process.
+
+**Open-source application layer products can receive contributions from the community to facilitate technology iteration and broaden the application scenarios**
+
+Application layer open-source can receive strong support from community development forces. As the application scenarios are more diverse and decentralized, the needs of different sub-scenarios are more differentiated, and the expertise of contributors to the corresponding scenarios is more demanding. Stable Diffusion (SD) is an open-source text-to-image application that, with the power of the community, has been rapidly catching up in terms of performance since its release and in some ways surpassing the closed-source text-to-image application Midjourney. While there are some inconveniences when using Stable Diffusion, users have access to hundreds of LoRAs, fine-tuning settings, and text embeds from the community. For example, users of Stable Diffusion found it to be limited in its ability to process hand images. In response, the community reacted quickly and within the next few weeks a LoRA fix was developed specifically for the hand image issue. This timely and professional feedback from the community greatly contributes to the rapid advancement and improvement of application layer open-source tools.
+
+Open-source products, due to a lower barrier to use, may be adopted by users from different industries and backgrounds in a variety of environments and contexts as soon as they are released. These application scenarios may go far beyond the developer's initial design and imagination. When products are used in these diverse scenarios, they may reveal new potential or needs, revealing previously unnoticed usage scenarios. This can provide product developers with valuable insights into how their products are performing in real-world use and potential room for improvement. Faced with these newly discovered usage scenarios, developers have the opportunity to innovate and improve.They can add new features, optimize existing features, or redesign products to better meet these needs based on actual user experience in different environments. The iteration based on real-world use cases, is a key driver for the continued progress of open-source products.
+
+**Application layer open-source products have Product-Led Growth (PLG) model features that can drive paid conversions**
+
+The PLG model focuses on customer acquisition through a bottom-up sales model, where the product is at the center of the entire sales process. The PLG model's growth flywheel has three main phases:Acquisition, Conversion, and Retention. In all three phases, open-source has advantages that distinguish it from traditional business models.
+
+In the customer acquisition phase, the open-source operating model reduces the cost of customer acquisition and makes the customer acquisition process more targeted. The interactions between developers and the community-based collaboration brought about by platforms such as GitHub accelerate the spread of customer acquisition. The initial customer orientation of open-source products is usually participants in the open-source community, often developers or IT staff in the organization. By nurturing these quality prospects, you also have a "mass base". Communities help open up the boundaries of the enterprise and make word-of-mouth spreading of good open-source projects and products possible. Users spontaneously download and use it in order to solve their own problems and pain points. At this point, open-source software products are not just used as a way to solve user problems through functionality, but can also be a vehicle to help organizations spread and grow. In the long run, it will be possible to reduce the cost of customer acquisition for your organization, allowing for more and more automated customer acquisition and lowering expenses on the sales side.
+
+At the conversion stage, open-source software tends to have a higher paid conversion rate compared to traditional commercial software. On the one hand, when the user has used the free version of the software, as long as the software's functions can well meet the user's needs, it can be converted into a paid conversion at the speed of a shorter cycle, and make it a long-term user. On the other hand, companies can conduct targeted conversion follow-up and up-selling by observing users' behavior with the free version of the software, for example, by providing the sales team with a list of customers who have exceeded their usage limits and are ready to pay. In addition to traditional sales conversions, conversions can also be made through self-service buying paths (Self-service selling), which largely reduces the cost of sales.
+
+In the retention phase, open-source software allows users to avoid the risk of vendor lock-in, making them willing to engage in long-term use. Based on the same open-source project, there may be multiple vendors downstream that offer software with similar functionality, and the choice of vendor can be changed at a relatively small cost, so users can be confident in their choice of software for the long term. On the contrary, when a customer uses a closed-source product, if he/she wants to switch to another software after a period of time, he/she needs to redeploy hardware, data, etc., resulting in a significant transfer cost. Thus when users choose to use closed-source software, they may abandon their continued use of the software because the software's later development does not meet their needs or the cost of transferring it is too high.
+
+
+
+
+#### 2.4.3 Market Status of LLM Application Layer Open Source
+
+**Internet giants and startups working together**
+
+There are opportunities for both Internet giants and startups to participate and compete in the LLM application layer open-source market. This is due to several factors:1) The lowered technology barrier. The open-source of the modeling layer and developer tools layer lowers the threshold of technology acquisition and application. Instead of having to develop complex LLM algorithms from scratch, startups can utilize open-source models and tools to develop solutions that meet specific needs. 2) Cost-effectiveness. Open-source models often do not require costly licenses or API fees, which is especially beneficial for SMEs with relatively limited capital. 3) Innovation and flexibility. Startups are often able to adapt more quickly to market changes and innovate for specific market segments or application scenarios.
+
+At present, the Internet giants are mainly based on the LLM, on which they extend a series of vertical applications. For example, Ali's Tongyi Qianwen recently released Tongyi Qianwen 2.0 and derived eight applications based on it:Tongyi Tingwu (speech recognition), Tongyi Xiaomei (improving customer service efficiency), Tongyi Zhiwen (understanding text), Tongyi Stardust (personalized roles), Tongyi Lingyi (assisted programming), Tongyi Faryi (legal industry), Tongyi Renshen (pharmaceutical industry), and Tongyi Dijin (financial industry).
+
+Startups mainly choose a certain niche industry for deep cultivation, such as Lanboat Technology's self-developed LLM focusing on marketing, finance, cultural creativity and other scenarios; XrayGPT focusing on medical radiology image analysis; Finchat focusing on financial field models, etc. Yunqi Partners has supported two open-source application layer startups this year, TabbyML, a tool to aid programming, and Realchar, an AI personal assistant that allows for real-time customization, both of which have quickly amassed a large number of users on Github.
+
+**Competitive landscapes in Business-end and Consumer-end are different**
+
+Significant differences in the competitive landscape between the business-end and consumer-end of the open-source market for LLM application layers:
+
+- **To-Business Markets**:Enterprise-oriented applications are typically focused on improving efficiency, reducing costs, and enhancing decision-making capabilities. In this area, open-source LLMs can be used to automate processes, data analytics, customer service optimization, and more. The competition here focuses more on the practicality of the technology and the ability to customize it.
+- **To-Consumer Markets**:Consumer-oriented applications are more focused on user experience, interactivity and ease of use. This includes personalized recommendations, virtual assistants, entertainment and social media apps, and more. Competition in the consumer market is more about innovative user interfaces and new features that appeal to users.
+
+**Large number of sub-scenarios still belong to the blue ocean market, no obvious lead**
+
+As technology evolves, market demand for AI applications becomes more segmented.For example, in industries such as healthcare, law, finance, and education, each field has its own unique needs and challenges. These market segments offer a great deal of opportunity, but also require targeted solutions.There are a number of relevant applications emerging in each of these areas, but most are at the start-up stage and have yet to produce a headline application. And because there are so many segments of the industry, there is not much competition, making it a better opportunity to get in. In these blue ocean markets, no clear market leader has yet formed due to the novelty and constant evolution of the market. This provides opportunities for new entrants and innovators to capture market share through unique solutions or innovative business models.
+
+**Expect innovative applications to emerge based on the new capabilities of LLMs**
+
+Although significant progress has been made in LLM technology, its deep integration and innovative application in specific application areas is still in its infancy. This means that there is plenty of room to explore and implement new ways of applying it in many sub-scenarios.With the rapid development of large-scale AI models, we are ushering in a new era of potential and innovation. These models will not only optimize and improve existing technology applications, but more importantly, they will be pioneers in leading completely new markets and application areas. In a future full of unknowns and surprises, we can look forward to the emergence of a huge variety of powerful new applications that will be integrated into our daily lives in unprecedented ways. These emerging markets and applications will open a window into never-before-seen possibilities for far-reaching social and cultural change. They will stimulate human creativity and imagination, pushing us to break through existing technological boundaries and explore a wider world.
+
+In this dynamic and innovative era, we will witness the seamless integration of technology into our daily lives and experience the convenience and efficiency that comes with intelligence. The synergy between humans and machines will open up new modes of cooperation and innovation, leading us to a smarter, more efficient, and more personalized future. It's a time of great anticipation, with every step of technological advancement building a more exciting, rich and diverse world for us. In this new era, we will witness and create unprecedented miracles together, and explore together the infinite possibilities of the common development of science and technology and mankind.
+
+### 2.5 The commercialization of open-source LLMs is encountering difficulties
+
+#### 2.5.1 Technology is evolving at a rapid pace and open-source projects need to be continuously iterated to remain competitive
+
+In the field of artificial intelligence and LLMs, technology is evolving at an extremely fast pace with new algorithms, data processing techniques, optimization methods, and computational architectures continue to emerge. For open-source projects, this means that constant updates and upgrades are needed to keep the technology current and effective. This need for continuous updating is a challenge in terms of resources and time.For open-source projects, especially those with relatively limited financial and human resources, it can be challenging to keep up with this rapid pace of technology iteration. This means that not only do they have to race against the clock, but they also face stiff competition from commercial companies and other open-source projects. If a project is not kept up to date to reflect the latest technological advances, it can quickly become obsolete and thus lose the interest and support of users and community members.
+
+In the face of well-funded companies from some tech giants such as OpenAI and Ali, some of the LLMs that small and medium-sized companies have spent a lot of money on developing could be quickly surpassed, leading to a serious funding gap. A "burn-in" strategy is possible for large vendors that small and medium-sized companies can't afford, which could potentially discourage the current 100-flower LLM market and reduce its diversity.
+
+#### 2.5.2 Difficulty in defining the scope of plagiarism / inspiration
+
+The original intention of open-source LLMs was to allow more users to access and use LLMs, but in the process of using them, disputes often arise over code attribution, licenses, and many other issues. Since LLM open-source is a relatively new concept, the relevant legal and regulatory system is not perfect, and many of them also involve cross-border issues, there is no clear definition of the boundary about whether LLM is plagiarized or borrowed. The recent Zero One Everything issue regarding LLaMA's "Shell Controversy" has generated a lot of attention. At the heart of the disagreement, but not the final judgment, is the difficulty of defining the scope of plagiarism / borrowing.
+
+Some argue that Zero One Everything's software uses Llama's source code without attributing it, making it appear that they developed that part of the content themselves, and is indeed suspected of violating the right of attribution, i.e., suspected of plagiarism. However, there is also the view that the structural design of the Zero One Everything LLM is based on a mature structure that draws on the publicly available results of the industry's top level. Since the development of the LLM technology is still at a very early stage, keeping the structure consistent with the industry's mainstream will be more conducive to the overall adaptation and future iteration. Meanwhile the Zero One Everything team has done a lot of work on understanding models and training, and is also continuing to explore breakthroughs in the nature of the structural level of models.
+
+This identification becomes even more complex in a context where LLMing technologies are still in their infancy and laws and regulations are not yet perfect. We should recognize that, with the continuous evolution of technology and the improvement of the legal system, how to balance the protection of innovation and the promotion of cooperation will be a process that needs to be continuously explored and improved.Ultimately, this is not only a legal and technical issue, but also an ethical and moral issue that concerns the healthy development of the entire industry.
+
+#### 2.5.3 Difficulty for community participants to provide direct contributions to model iterations
+
+In the process of building and iterating large-scale AI models, participants in the eco-community face a notable challenge:Due to the complexity of model training, it is often difficult for them to contribute directly to the development of the models. These LLMs, such as Llama or other advanced machine learning models, typically require highly specialized technical knowledge and resources, including large-scale data-processing capabilities, deep algorithmic understanding, and expensive hardware resources.For ordinary community members, these demands are often beyond their means.
+
+As a result, while community members may be enthusiastic and willing to participate, they are limited in their ability to substantially iterate on the model. This lack of expertise means that even the most active community members may only be able to play a role in relatively peripheral areas such as model application, feedback collection, or elementary debugging. This limitation not only affects the extent to which the community contributes to the development of the model, but may also lead to a weakened sense of community involvement and belonging during the model development process. Finding appropriate ways to enable a wider range of community participants to contribute their wisdom and efforts effectively is therefore an important topic in the development of the LLM.
+
+#### 2.5.4 Rapid development of open source technology and high cost of late updates
+
+One of the main advantages of open-source software is that it reduces the initial cost to the user. Enterprises can acquire and use open-source LLMs without paying expensive license fees. This is especially attractive to small businesses or startups with limited budgets, as they can utilize advanced technology without a significant financial burden. While open-source software saves money in the initial phase, they can come with higher update costs over the long run.
+
+Open-source projects are often known for their speed of innovation and community-driven dynamism, which drives technology to progress and evolve. However, as technology rapidly updates and iterates, the cost of maintaining and upgrading existing systems increases. Such costs include not only direct financial inputs, such as hardware upgrades or the purchase of new services, but also indirect costs, such as training staff to adapt to new technologies and the time and labor involved in migrating existing systems to newer versions. Especially for long-term projects, it becomes especially challenging to keep up with the latest open-source technologies. Every major update or technology transition can involve complex adaptation efforts and compatibility testing that require significant human and technical resources. In addition, frequent updates may lead to system stability and security issues, increasing potential operational risks.
+
+Therefore, while open-source technologies offer great advantages in terms of innovation and flexibility, organizations and developers must carefully consider the update costs associated with adopting and maintaining these technologies, and how to find a balance between continuous innovation and cost-effectiveness.
+
+Although open-source LLMs currently face numerous challenges, such as the rapid development of technology iterations, the risk of plagiarism, the limitations of community contributions, and the increased cost of maintenance, their future remains promising.Open-source LLMs have shown great potential to drive technological innovation, facilitate knowledge sharing, and accelerate R&D processes. In order to realize these potentials and overcome current challenges, a concerted effort by all parties from different fields and backgrounds is required!
+
+## [3. Open source security challenges](https://hackmd.io/DAH6W1DsQK2xlfd4DoFfHA#3-Open-source-security-challenges)
+
+## [4. Capital market situation for open source projects](https://hackmd.io/DAH6W1DsQK2xlfd4DoFfHA#4-Capital-market-situation-for-open-source-projects)
+
+
+
+## 3. Open source security challenges
+
+Security is an important factor in determining whether an open-source product can be successfully commercialized. Business users usually need to conduct a comprehensive security assessment of the products they use to ensure that the overall business is secure and controllable, which includes cyber-attack security, data security, and commercial license controllability.
+
+According to Synopsys, by the end of 2022, 84% of repository contain at least one known open-source vulnerability, 48% contain high-risk vulnerabilities, and 34% of respondents also said they had experienced "an attack launched using a known vulnerability in open-source software in the past 12 months. Open-source security is an issue that requires a great deal of attention, and it greatly affects customer trust in open-source software, as well as whether the large open-source ecosystem can be stabilized in the future. Only by ensuring security, open-source software can go farther on the road to commercialization.
+
+
Figure 3.1 Open Source Codebase Vulnerabilities (Data Source:Synopsys)
+
+
+### 3.1 Open source software cybersecurity
+
+#### 3.1.1 Open source software security vulnerabilities can be exploited with serious consequences
+
+Open-source software plays a key role in driving technological innovation and facilitating knowledge sharing, but they are also inherently at risk of security vulnerabilities. The root causes of these security vulnerabilities usually lie in open-source code management and maintenance issues, such as programming errors, lack of continuous security reviews, and lagging application of updates and patches. Particularly where programs are not active enough or lack effective regulation, these vulnerabilities may go unrecognized or unfixed for long periods of time. Historically, several serious security incidents have occurred due to security vulnerabilities in open-source software, resulting in sensitive data breaches and financial losses.
+
+In April 2014, a major security vulnerability in the widely used open-source component OpenSSL, known as Heartbleed, emerged. This vulnerability has existed since the May 2012 release and allows an attacker to obtain data containing certificate private keys, usernames, passwords, email addresses, and other sensitive information. Because this vulnerability went undetected for nearly two years, its impact was extremely widespread and almost impossible to accurately measure. Again, in December 2021, another widely used open-source component, Apache Log4j2, was found to have a serious remote code execution vulnerability called Log4Shell. This vulnerability quickly spread globally due to the high performance and low exploitation barrier of Apache Log4j2, affecting a number of well-known companies and service platforms, including Steam, Twitter, Amazon, and others.
+
+#### 3.1.2 The relative prevalence of open source software cybersecurity issues
+
+**Open source software is inherently more vulnerable**
+
+According to the results of "2022 [QiAnXin QAX](https://en.qianxin.com/) Open-source Project Inspection Program", the overall defect density of open-source software is 21.06/thousand lines, and the density of high-risk defects is 1.29/thousand lines. The number of defect densities and high-risk defect densities has been increasing for three consecutive years, with an accelerating trend. The overall detection rate of the ten categories of typical defects in open-source software was 72.3%, while this figure was only 56.3% two years ago. There is a rapid increase in the detection rate of open-source software, suggesting the security issue of the software itself is quite serious.
+
+
+
+
Figure 3.2 Three-Year Comparison of Average Defect Density of Open Source Software
+
(Source:2023 China Software Supply Chain Security Analysis Report)
+
+
+
+In terms of the absolute number of open-source software flaws and vulnerabilities, according to data from [QiAnXin (QAX)](https://en.qianxin.com/), by the end of 2022, 57,610 vulnerabilities related to open-source software will be included in the public vulnerability database, and 7,682 new vulnerabilities will be added in 2022, an incremental increase of about 15%, which is a worrisome situation.
+
+:::info Expert Review
+**Yu Jie**:The security of open-source software urgently needs to be given sufficient attention, and it is clear that the strength of individual communities alone is not enough to deal with it. How to build an effective systems and regimes to comprehensively protect the security of open-source software has become a major issue that cannot be avoided with its rapid development.
+:::
+
+**Open-source projects with too low or too high levels of activity are more likely to have security risks**
+
+Open-source software that is too inactive and updated too infrequently will result in vulnerabilities not being fixed in a timely manner, thus increasing the risk exposure of the software; if it is too active and updated too quickly, it will also result in users not being able to update accordingly in a timely manner, which puts more pressure on security operations and maintenance.
+
+According to the data of QAX, if the open-source projects that have not been updated for more than a year are regarded as inactive projects, the number of inactive open-source projects in the mainstream open-source software package system will be 3,967,204 in 2022, accounting for 72.1%, while this ratio was 69.9% and 61.6% in 2021 and 2020, respectively, which indicates that the overall motivation of the open-source authors to maintain the software has decreased, which is not favorable to the long-term development of the security of the open-source software ecosystem.
+
+
+
Figure 3.3 Statistics of Inactive Open Source Projects
+
+
+Against the backdrop of generally low activity, there are also some open-source software that are overly active, again putting a lot of security O&M pressure on users. According to QAX, there will be 22,403 open-source projects with more than 100 versions in the mainstream open-source package ecosystem in 2022, compared to 19,265 and 13,411 in 2021 and 2020, respectively.
+
+
+
+
+
+Too little or too much activity poses a high security risk to users of the open-source ecosystem, and a balance is urgently needed to ensure the healthy and sustainable development of open-source software. A more scientific version management and release mechanism is needed to ensure that updates respond to security and functionality needs in a timely manner without disturbing users too frequently. For projects with insufficient activity, their activity can be enhanced by increasing community participation and providing incentives. For projects with frequent updates, more attention should be paid to communicating with users, providing clear update logs and support guidelines to help users better understand and adapt to these changes.
+
+At the same time, users should also be encouraged to actively participate in the feedback and contribution of the open-source project to form a positive interaction. Users' actual experience and feedback are important references for adjusting the update pace and optimizing software functions. By establishing a healthy user-developer interaction mechanism, we can effectively balance the activity and update frequency to ensure the safety and usability of the software.
+
+**Some users are using software that is outdated or with version usage being disorganized**
+
+According to QAX , many software projects use very outdated versions of open-source software, even versions released 30 years ago, with many vulnerabilities and very high risk exposure. One of the earliest software is IJG JPEG 6 released in 1995, which is still used by many projects. Older versions often come with older vulnerabilities, and there are still very old open-source vulnerabilities in some software projects. The oldest vulnerability is from 2002, 21 years ago, and is still used by 11 projects.
+
+
+
Figure 3.5 Aged Open Source Vulnerabilities and Their Usage
+
+
+There is a lot of confusion over the use of versions of open-source software, not all of which are up-to-date. For example, there are 181 versions of Spring Framework in use. The use of earlier versions can lead to a large number of vulnerabilities that have been fixed in newer versions can still be exploited maliciously, thus posing a significant security risk.
+
+#### 3.1.3 Strategies for dealing with vulnerability risks in open source software
+
+**Regular security audits and code checks**
+
+A clear audit process needs to be defined that includes a comprehensive review of the overall architecture, codebase, and dependencies of the software. These audits can be performed by assembling specialized security teams or utilizing third-party security services. These teams or service providers should have an in-depth understanding of open-source software.
+
+Regular code review meetings are also held to encourage team members to review each other's code, which not only helps identify potential security issues, but also improves the team's programming skills and code quality. Audits and code review should be an continuous process, constantly monitoring and updating the code base in response to newly discovered vulnerabilities and security threats.
+
+**Using the SCA (Software Component Analysis) tool**
+
+Software Component Analysis (SCA) is a methodology for managing the security of open-source components, enabling development teams to quickly track and analyze the open-source components used in their projects. SCA tools identify all relevant components and supporting libraries, as well as direct and indirect dependencies between them. In addition, they can check software licenses, identify deprecated dependencies, and discover potential vulnerabilities and threats. A SCA scan produces a software bill of materials (SBOM) that contains a complete list of the project's software assets.
+
+With the widespread use of open-source components in software development, SCA is emerging as a key component of application security, although the concept itself is not new. The number of SCA tools has grown with its importance. In modern software development practices, including DevSecOps, SCA not only needs to provide ease of use for developers, but also needs to guide and direct developers safely throughout the software development lifecycle (SDLC).
+
+When using SCA for open-source security, the following points should be considered:
+
+- Adopt developer-friendly SCA tools: Developers are often busy writing and optimizing code, and they need tools that promote efficient thinking and rapid iteration. Unfriendly SCA tools can slow down the development process. An easy-to-use SCA tool simplifies setup and operation. Such tools should integrate easily with existing development workflows and tools, and should be implemented early in the software development life cycle (SDLC). It is important that developers understand the importance of SCA and incorporate its security checking process into their daily work to minimize code rewrites due to security issues.
+- Integrate SCA into the CI/CD process: Using SCA tools does not mean that they will interfere with the development, testing, and production processes. Instead, organizations should integrate SCA scanning into Continuous Integration/Continuous Deployment (CI/CD) processes so that vulnerabilities can be identified and remediated as a functional part of the software development and build process. This approach also helps developers make code security part of their daily workflow.
+- Effective Use of Reports and Software Bills of Materials: Many organizations, including the U.S. Federal Government, require a software bill of materials (SBOM) when purchasing software. Providing a detailed SBOM means that organizations recognize the importance of keeping track of every component within an application. Clear security scanning and remediation reports are also critical, as they provide detailed information about an organization's security practices and the number of vulnerabilities remediated, demonstrating a commitment to and actual action on software security.
+
+**Enhancing education and training**
+
+Conduct regular security awareness training for developers to increase their knowledge of security threats and best security practices, including educating them on identifying common security vulnerabilities and attack tactics. Use hands-on simulation exercises and workshops to allow developers to learn how to handle security incidents in a secure environment. These exercises can include vulnerability mining, code remediation, and security testing.
+
+Given the rapid changes in the security landscape, encourage developers to continuously learn and update their knowledge, including by participating in online courses, seminars and industry conferences. Create a platform, such as an internal forum or regular meetings, for developers to share their knowledge and experience in security to foster learning and collaboration among teams.
+
+
+### 3.2 Controllable open source licences
+
+#### 3.2.1 Open source licenses are a constraint on users of open source resources, with a wide range of categories
+
+An open-source license is a binding for open-source resources (including, but not limited to, software, code, and web users). Based on the open-source license, the user gets the right to use, modify and share the open-source resources. If the software is not licensed, it means that the copyright is retained and the user can only view the source code and not use it. Therefore, an open-source license is essentially a legal permit that protects project contributors and users of open-source resources, ensures that contributors can open-source the resources they own in the way they want to, and also ensures that users can use the resources in a reasonable and legal way to avoid being caught in intellectual property disputes, which greatly contributes to the prosperity of the open-source community.
+
+Open-source licenses are divided into three overall categories based on how restrictive the license is:Permissive, Weak Copyleft, Strong Copyleft
+
+
+
+
+**The Permissive category** is the most flexible category of licenses, including BSD, MIT, Apache, ISC, etc., which provide extremely permissive licensing conditions that allow people to freely use, modify, copy, and distribute the software. They equally support the use of software for commercial or non-commercial purposes.The only requirement is that the appropriate license text and copyright information be included in each copy of the software.
+
+**The Weak Copyleft category** is a more restrictive license than the Permissive category, including LGPL, MPL, etc., which requires that any changes made to the code be released under the same license. Also, the modified code must contain the license and copyright information of the original code. However, they do not mandate that the entire project be released under the same license.
+
+**The Strong Copyleft category** is an even more restrictive type of license, including GPL, AGPL, CPL, etc. This type of license states that the entire project must be released under the same license, including those cases where only a portion of the software is used. In addition, these licenses require that all modified versions of the code be publicly released.
+
+Under these broad categories, specific licenses and license families will have unique restrictions, permissions, and specific differences in additional parameters, and the overall logical relationship of licenses is organized as follows:
+
+
+
+
+Kaiyuanshe provides an open-source license filter, which provides good help to understand the best license options faster and better, and is highly recommended for readers who need it:https://kaiyuanshe.cn/tool/license-filter
+
+#### 3.2.2 Risk of infringement by using open source resources without complying with the license
+
+**Open source license infringement**
+
+"Open-source license infringement" is the use of open-source software without complying with the terms and conditions of the open-source license associated with the software, thereby violating the legal constraints imposed by the license. Such behavior can lead to a host of legal and ethical problems. While open-source software is freely available to the public for use and modification, such use and modification is still subject to certain limitations, which are specified by the corresponding open-source license.
+
+Specific instances include, but are not limited to, the following:
+
+Ignoring Copyright Notices and Attribution:Many open-source licenses require that original copyright notices and author attributions be retained when copying, distributing, or modifying software. Ignoring this requirement, such as removing the original author's copyright information or failing to properly attribute the work, is considered an infringement.
+
+Non-availability of Source Code:Some licenses, such as the GPL (General Public License), require that the source code be made available along with the distribution of the software. If a piece of software based on such a license is distributed without the source code being made available at the same time, this also constitutes infringement.
+
+Restrictive Use:Some licenses have restrictions on the scenarios in which the software can be used. For example, certain licenses may prohibit the use of the software in certain types of business activities. Violation of these restrictive covenants is also a tort.
+
+Violating Conditions for Distribution and Re-licensing:Copyleft open-source licenses such as the GPL requires that any modifications and derivative works based on GPL-licensed software must also be released under the GPL license. Violations of this rule, such as privatizing GPL code or distributing derivative works under non-GPL licenses, constitute copyright infringement.
+
+Violation of Specific Terms:In addition to the common scenarios described above, there are specific license terms that may be violated under certain circumstances. This depends on the specific requirements of the particular license.
+
+**License Reciprocity Requirement Leads to Expanded Scope of Open Source Copyright Problems**
+
+The so-called "reciprocity requirement" of an open-source license, i.e., whether a derivative work follows the license of the original work, refers to the fact that the terms and conditions of an open-source license tend to continue to apply during the process of open sourcing the software, which includes copying, modifying, manipulating, redistributing, and displaying. The permissions and limitations of such licenses can extend vertically to derivative works and modified versions based on the original software development, and even horizontally affect other parts of the software developed based on such open-source software.
+
+Of the many open-source licenses, the GPL has the strongest reciprocity requirements and the most lawsuits associated with it. The main reason for this is:Any derivative software based on GPL code modifications needs to be open source. If a piece of software contains GPL code, even if it is only a portion, the software as a whole is usually required to be open-source (unless it meets the terms of a specific exception). Failure to open-source portions of proprietary software affected by the GPL may result in infringement by the user in violation of the obligations of the GPL license. Moreover, the GPL is extremely complex, containing 17 terms. It has more stringent requirements for users, and once these requirements are violated, the user's license agreement is terminated and continued use of GPL-licensed open-source software may constitute copyright infringement.
+
+
+
+
+**Infringement of open source licenses may lead to serious consequences**
+
+Once an open-source license is characterized as an infringement, the loss to the defendant company or individual is far more than just compensation payment, but also includes a series of issues such as reputation and partnership:
+
+Lawsuits and Fines:In 2017, Versata Software sued Ameriprise Financial for violating Versata's patents. While this is not a pure case of open-source license infringement, it involves software licensing and copyright issues. The case eventually ended in a settlement, but the legal fees and time costs involved were prohibitive.
+
+Enforcing Compliance with License Requirements:A famous case is the 2015 VMware vs. Hellwig case. Hellwig, a Linux kernel developer, accused VMware of using GPL-based Linux code in its ESXi products without following the open-source requirements of the GPL license. Although the court did not ultimately rule in Hellwig's favor, the case sparked a broader discussion about GPL license obligations and derivative works.
+
+Reputational Damage:Red Hat filed a lawsuit against Speakeasy, Inc. in 2004 for allegedly failing to comply with the requirements of the GPL license. Despite the settlement of the case, Speakeasy's reputation has suffered, especially in the open-source community.
+
+Business Impact:Cisco was sued by the Free Software Foundation (FSF) in 2008 for violating the GPL license for its Linksys products. Cisco ultimately agreed to comply with the GPL license and pay an undisclosed amount as a donation. The lawsuit led Cisco to reconsider its open-source strategy for its products.
+
+Partnership Damage:a company is found to be in violation of an open-source license, its business partners may reevaluate their relationship with the company, especially if the collaborative project involves open-source software.
+
+#### 3.2.3 Open source large model licenses are largely distinct from traditional licenses
+
+As open-source LLMs are still evolving and iterating, two highly influential open-source LLMs of the year:Llama2 and Falcon, have both been questioned as to whether or not they are truly "open source" due to tweaks to the terms of their open-source licenses. Both do not use commercially available licenses, but rather their own "LLAMA 2 COMMUNITY LICENSE AGREEMENT" and "TII Falcon LLM License", respectively; and both impose additional restrictions on their commercial use. Both have additional restrictions on their commercial use.
+
+**Difference in open source licenses for LLaMA2**
+
+Much of the discussion of Llama2's violation of open-source guidelines comes from its more unique terms:
+
+- The Llama2 open-source model may not be used in products or service platforms with monthly active MAUs greater than 700 million, unless approved and licensed by Meta;
+- The Llama2 open-source model may not be used in any manner that violates applicable laws or regulations, including trade compliance laws. Also not applicable to use in languages other than English;
+- Other LLMs (not including Llama2 or its derivatives)
+
+The Open Source Initiative (OSI) has published ten definitions of open source, which are currently recognized internationally, and the Llama2 protocol conflicts with two of them
+
+- Non-Discrimination Against Individuals or Groups:The Llama License prevents enterprise users with more than 700 million monthly users from obtaining licenses directly through this License.
+- Non-Discrimination Against Fields:The license shall not restrict anyone from using the program in a particular field. The Llama License prohibits the use of Llama2 outputs to improve other AI LLMs, which would be a restriction on the domain of use. Llama2's language restrictions also lead to limitations in the use of Chinese language domains.
+
+**Difference in open source licenses for Falcon**
+
+The TII Falcon LLM License makes some key changes from the Apache License. The Apache License is a popular open-source license that is friendly to commercial use and allows users to distribute or sell their modified code as an open-source or commercial product after meeting certain conditions.
+
+Falcon's license is similar to the Apache License in that it also provides broad permissions to use, modify, and distribute the licensed work, and requires that the license text be included in the distribution and properly attributed, in addition to a disclaimer of limitations of liability and warranties.
+
+However, the TII Falcon LLM License introduces additional commercial use terms that require commercial applications to pay a 10% license fee on annual revenues in excess of $1 million. It also places additional restrictions on the manner in which the work may be published or distributed, such as emphasizing the need for attribution to "Falcon LLM technology from the Technology Innovation Institute."
+
+**The purpose of open-source for LLMs of open-source is different from that of traditional open-source software**
+
+In the case of Llama2, for example, the license is essentially a guiding framework for organizations that intend to develop and deploy AI systems while adhering to Meta's established specifications and standards. The purpose of this framework is to ensure that these organizations meet specific rules and standards set by Meta when developing and deploying AI technologies. Such an approach helps Meta manage the scope and manner in which its AI technology is applied, thereby safeguarding its business interests and brand image.
+
+The Llama2 license may constitute a compliance requirement that must be adhered to for those who plan to conduct AI development on the Meta platform. This means that these organizations must follow Meta's specific specifications and requirements when using Meta-provided tools and resources to develop and deploy AI models. In doing so, these companies may need to apply to Meta for the appropriate licenses, of which the Llama2 license is a part.
+
+#### 3.2.4 Means of securing controllable licenses
+
+**Document the use of open source components**
+
+When the enterprise or individual user's software reaches a certain size, the burden of managing the included open-source components becomes heavier, which leads to infringement problems due to the inability to manage them in a timely manner. According to Synopsys, 89% of the codebase contains open-source code that has been out of date for at least four years, and 88% of the codebase contains components that have been inactive for the past 2 years and contain components that are not the latest version. In many cases, developers may have completely forgotten which open-source components have been used and are unable to react in a timely manner when licenses for those open-source components are updated, leading to infringement issues. Therefore, it becomes very necessary to manage open-source components in a reasonable way.
+
+Developers can manually or automatically maintain a detailed dependency list of all used open-source components and their version information in the project's documentation. For example, in many programming languages, dependencies can be tracked using files such as requirements.txt (Python), package.json (Node.js), and so on.
+
+Create an internal document or knowledge base that records all relevant information about the open-source components used, including their origin, license information, and how they are used, and regularly check their licenses for updates. Track in detail in the documentation which open-source components are used, and add comments in the corresponding places in the code to indicate this. Add the corresponding license website to the document to check it regularly and find out the changes of the license terms in time. Also document in your programming how you have complied with valid license conditions.
+
+For larger volumes of development work, manually recorded text may not be able to meet the project requirements, at this time you can use related tools, such as code component analysis (SCA) software. These tools automatically identify and document the open-source components used in a project. They are usually able to provide detailed reports that include component license information, versions, and possible security vulnerabilities.
+
+**Cautious use of supplementary coding tools**
+
+Intelligent programming assistants such as ChatGPT and GitHub Copilot provide programming advice and code snippets by analyzing a large number of codebases and documentation. While these tools are extremely valuable in improving programming efficiency, there are several key points to consider when using the code they generate to avoid potential open-source license infringement issues:
+
+- License Issues with Source Code:Assistive programming software may generate suggestions based on code in its training datasets. These training datasets may contain code from different open-source projects that may have various license requirements. Usually supplementary programming results do not index the corresponding licenses, and copyright issues may be involved if the generated code snippets are too close to the original code and are copied directly by the user.
+
+- Attribution of Responsibility:When using code generated by an intelligent programming assistant, it needs to be clear that the ultimate responsibility lies with the user. This means that the developer is responsible for the legality and suitability of the generated code. As a result, developers conduct regular code reviews, especially for sections generated using assisted programming, to ensure that they do not violate the terms of any open-source license.
+
+**Adequate code audits during mergers and acquisitions**
+
+An adequate code audit during the M&A process is essential, especially to avoid infringement issues involving open-source licenses. M&A activities usually involve a thorough evaluation of the target company's assets, of which technology assets, especially software assets, occupy an important place. The following issues need to be highlighted in M&A audits:
+
+- Identifying Open-source Components:An important task of a code audit is to identify all open-source components used in the target company's products. This includes open-source libraries and frameworks that are used directly, as well as open-source software that is indirectly relied upon. Understanding these components and their versions is critical to assessing the associated license requirements.
+- Reviewing License Compliance:After confirming an open-source component, its corresponding license needs to be reviewed. This includes determining the types, limitations and obligations of these licenses. In particular, note that some licenses may have specific restrictions on commercial use or require disclosure of modified source code.
+- Assessing Risks and Responsibilities:During the audit, the legal and financial risks that may arise from non-compliance with open-source licenses should be assessed. This includes potential infringement lawsuits, fines, or the need to refactor parts of the product that rely on specific open-source components.
+- Post-Integration Compliance Strategies:After an M&A is completed, there needs to be a clear plan for integrating the target company's codebase and ensuring continued compliance with all relevant open-source license requirements. This may involve implementing new code management and compliance monitoring processes throughout the organization.
+- Professional Legal Advice:Because open-source licenses can be very complex, obtaining professional legal advice is critical. A professional attorney can help correctly interpret the terms of the license and provide advice on how to handle potential license conflicts.
+
+### 3.3 Open Source AI Security
+
+With the popularity of LLMs, in addition to the LLM license issues mentioned above, more AI safety and control issues have gradually entered people's view. Since the technology is relatively new and there is no clear definition and specification, this paragraph lists the topics of greater concern to the relevant practitioners at the moment based on desk research, in the hope of triggering readers' thinking, and welcomes discussion and feedback.
+
+#### 3.3.1 Open Source AI Poses New Requirements for Data Security
+
+Unlike traditional data security, since a large part of the output results of AI LLMs depends on the training dataset, issues such as the quality of the dataset and whether the dataset contains malicious data are particularly important for AI LLMs, especially open-source LLMs, because many of the datasets of the open-source LLMs provide data internally by the enterprise, and the cleansing, monitoring, and compliance can't be done as professionally as those of the professional closed-source LLM vendors.
+
+**Improper handling of the training dataset triggers a range of biases**
+
+Data bias occurs when certain elements in a data set are overemphasized or underrepresented. When training AI or machine learning models based on such biased data, it can lead to biased, unfair and inaccurate results.
+
+- **Selective Bias**:Some facial recognition systems, trained primarily on white images, have relatively low accuracy in recognizing faces of different races;
+- **Exclusionary Bias**:This bias usually occurs at the data preprocessing stage, and if the data is based on stereotypes or false assumptions, then the results will be biased regardless of which algorithm is used;
+- **Observer Bias**:Researchers may consciously or unconsciously bring their personal views into a research project, which can influence the results;
+- **Racial Bias**:Racial bias occurs when a dataset is biased toward a particular group;
+- **Measurement Bias**:This bias occurs when the data used for training does not match the data in the real world, or when incorrect measurements distort the data.
+
+These biases, when used maliciously, can lead to outputs that are significantly politically or racially biased, or data errors that can significantly affect the performance and credibility of the larger model.
+
+**Training data sources should be taken into account when choosing a LLM of an open-source base**
+
+Many of the LLM training data sources are obtained directly from the Internet via crawler tools, where discriminatory, hateful and offensive speech and information is prevalent. In practice, people read, comment, like and spread negative messages far more than positive ones. As a result, human-generated information sources have long been in a more chaotic and unhealthy state. LLMs in this environment may contribute to the spread of racial discrimination and disinformation by being influenced by such data.
+
+Once the data source at the base of the LLM is contaminated, even if the enterprise itself is fine-tuned to use a perfect data source, it can lead to significant bias in the final output. Therefore, when choosing a LLM for the base, users should not only consider the performance of the LLM, but should also take the source of the training data into consideration. The focus should be on LLMs that select annotated datasets from multiple sources in a responsible manner, while considering bias minimization as a factor to focus on throughout the model building process and even after deployment.
+
+#### 3.3.2 The extensive use of open-source AI LLMs raises ethical considerations for society
+
+**The problem of LLM hallucinations can lead to serious consequences**
+
+There is an unresolved problem with current LLMs - hallucinations. According to the Sail Lab at HIT (Harbin Institute of Technology), hallucination refers to "text generation tasks in which unfaithful or meaningless text is sometimes produced. "While hallucinatory texts are unfaithful and meaningless, they are often so readable due to the powerful context generation capabilities of the LLM that the reader is led to believe that they are based on the provided context, even though it is actually very difficult to find or verify that such a context actually exists. This phenomenon is similar to mental hallucinations that are difficult to distinguish from other "real" perceptions, and it is also difficult to capture hallucinatory texts at a glance.
+
+There are many types of illusions and they are still emerging as the use of LLMs expands. The main types of common hallucinations are the following:
+
+- **Logic Errors**:The LLM makes logical errors in its reasoning, which results in outputs that seem reasonable but don't stand up to scrutiny;
+- **Fabricated Facts**:The database of the LLM itself does not support its answer to this question, but since the LLM cannot define its own boundaries, it will confidently assert facts that simply do not exist;
+- **Data-Driven Bias**:As mentioned in the previous section, due to the prevalence of certain data, the output of the model may be biased in certain directions, leading to erroneous results.
+
+False outputs due to LLM hallucinations may cause harm to some users who are convinced by them. On May 16, 2023, the World Health Organization issued a statement of caution on the use of AI LLM tools. They noted that while these tools facilitate access to health information and may enhance the efficiency of diagnosis, particularly in resource-poor areas, their use requires a rigorous assessment of potential risks. The World Health Organization further emphasized that rushing into the use of inadequately tested systems could lead to mistakes by healthcare professionals, harm to patients and reduced trust in AI technologies, which could undermine or delay the potential long-term benefits and applications of such technologies globally.
+
+
+
Figure 3.9 Classification of hallucinations by Harbin Institute of Technology
+
+
+Since there is not yet a clear accountability entity for LLMs, and even more so for open-source LLMs, in the event of serious consequences, it will be very difficult for users who have suffered losses to defend their rights and their losses to be mitigated. Currently there are 2 pressing issues to be addressed in this regard:
+
+- How LLMs hallucinations can be better addressed - technical aspects
+- How to define more clearly who is responsible for LLMs - legal aspects
+
+**Outputs from LLMs may output content that violates ethical laws**
+
+At present, some LLMs lack content filtering mechanisms, resulting in output content that violates domestic laws and regulations, public order and morals, mainly containing the following situations:
+
+- Copyright Issues:LLMs may generate content that contains or resembles copyrighted material. For example, the model may create text that is similar to pre-existing literary works, song lyrics, movie scripts, and so on.Such a generation may violate the rights of the original author or copyright holder, leading to legal disputes;
+
+- Territorial legislation:Different countries and regions have their own unique legal systems. For example, certain countries have stricter censorship of Internet content, such as explicit bans on politically sensitive content, religious messages or specific expressions on gender issues. When the LLM runs in these regions, the generated content must comply with local laws. For example, when someone asked an LLM "how to cook wild giant salamander", the model answered "braise it" and even provided detailed steps. Such answers may mislead the questioner. As a matter of fact, wild giant salamander are Class II protected animals and should not be captured, killed or eaten.
+
+- Defamation and Misinformation:If model-generated content contains false accusations or defamatory statements about individuals or organizations, legal action may result. This places high demands on ensuring the accuracy and legitimacy of the content.
+
+In order to ensure compliance with various legal requirements, organizations using LLMs may need to put in place regulatory mechanisms, such as auditing generated content to ensure that it does not violate any legal requirements. Especially for open-source models used by enterprises, they are relatively more leniently scrutinized for content output, and enterprises need to pay extra attention to related issues to prevent getting into legal disputes and incurring losses. Here again, it can be summarized in 2 questions:
+
+- How to Enhance Information Filtering Mechanisms for LLMs - Technical Aspects
+- How to define whether LLM output content is infringing and illegal - legal aspects
+
+**LLMs may exacerbate social divide**
+
+The Secretary General of the Digital Economy Committee of the Beijing Computer Society has said:The potential security issues of LLMs are of particular concern for those who lack critical thinking and analytical skills, and who are not well-informed about paid knowledge and healthcare services. With the dramatic increase in the number of Internet users and the widespread use of mobile devices, such as cell phones, low-education and low-income populations are increasingly relying on these avenues for medical, educational, and daily life advice. However, large-scale generative language models may exacerbate discriminatory portrayals and social biases against these marginalized groups, deepen social divisions, increase the harm of misleading, malicious information, and raise the risk of disclosure and misuse of individuals' real information.
+
+The use of LLMs is like a double-edged sword; on the one hand, it can reintegrate network resources and improve the efficiency of information collection; on the other hand, it may exacerbate information barriers due to problems such as hallucinations and lead to the misinformation of many populations with scarce information sources. There are 2 issues that need to be addressed at this point:
+
+- Enhancing public education that LLMs are not a panacea and need to be viewed with caution - Social communication aspect
+- How to ensure the quality of LLM training datasets and reduce their bias - technical aspect
+
+## 4 Capital market situation for open source projects
+
+### 4.1 The status of global markets
+
+#### 4.1.1 Global VC Investment Declines in 2023, but AIGC is in the Spotlight
+
+Since 2023, volatility in global financial markets has increased due to growing interest rates, challenging economic conditions, geopolitical conflicts, and concerns about the stability of the international financial system, which has led to a bleak picture for the global VC capital markets. According to KPMG, global venture capital activity has declined for seven consecutive quarters through Q3 2023 (see Figure 4.1).
+
+
Figure 4.1 Global Venture Capital Activity (Source:KPMG)
+
+
+Against the backdrop of a declining equity market, fund managers have generally reduced their allocations to private equity assets to maintain portfolio proportions; at the same time, due to the high volatility of venture capital and the uncertainty of the future global economic situation, the scale of venture capital fundraising in 2023 will drop significantly compared with that of previous years. Compared to an average of more than $250 billion annually over the past five years (2018-2022), venture capital commitments as of 2023Q3 amounted to just $116 billion (according to KPMG). Overlaying the trend of seven consecutive quarters of declining venture capital activity, fundraising will shrink significantly in 2023Q4 and for the full year.
+
+
Figure 4.2 Global Venture Capital Fundraising Scale (Source:KPMG)
+
+
+At the valuation level, investor caution is also growing. Compared to 2021 and 2022, the proportion of premium financing has decreased by about 10%, and the proportion of par and discount financing has risen by about 5%, which creates an obstacle to the exit of early-stage capital.
+
+
Figure 4.3 Global VC Premium, Parity, and Decline Investment Ratios (Source:KPMG)
+
+
+However, against the backdrop of an overall bleak environment, AIGC-related financings have been in the global spotlight, with a significant increase in the size of related financings. In North America, the largest number of AI-related companies will be unicorns in 2023, including AI agent startup Imbue, AI + biotech company TrueBinding, generative AI company Runway, and natural language processing company Cohere; in Europe, despite the overall slowdown in funding, AI companies have been particularly strong, with a large number of startups receiving funding, such as French AI platform company Poolside; in Asia, investor interest in AI is also rising, but national regulators are also increasing the regulation of generative AI. In Europe, despite the overall funding slowdown, AI companies are doing particularly well, with a large number of startups receiving funding, such as French AI platform company Poolside; and while investor interest in AI in Asia continues to grow, so too does regulatory oversight of generative AI by national regulators.
+
+It is expected that along with the rapid iteration of AI technology, the concepts of LLM and AI Agent continue to be hot, the investment and financing related to the AI field will be less affected by the contraction of the scale of global venture capital investment.
+
+#### 4.1.2 Global Open Source Financing
+
+The growth of commercial open-source companies has been remarkable in recent years, with the combined market capitalization of these companies growing rapidly from $10 billion to surpass the $500 billion mark. This significant growth not only demonstrates the huge potential of open-source technology in the commercial sector, but also reflects the high level of investor recognition and trust in the open-source model. According to OSS Capital, the market capitalization of commercial open-source companies is expected to reach a staggering $3 trillion in the future.
+
+The open-source business sector has shown solid growth over the past four years. Over 400 startups raised approximately 700 rounds totaling $29 billion during this period.Specifically, annual financing increases from $270 million in 2020 to $12.5 billion in 2023, a compound annual growth rate of 255%.
+
+Although the size of the financing showed a downward trend in 2022, this trend was mitigated in 2023. Beginning in February 2023, financing begins to pick up gradually. In the first 11 months of 2023, total funding has already surpassed the amount raised in all of 2022. However, volatility in the scale of financing increased throughout the year, influenced by geopolitical conflicts and the post-epidemic economic recovery. Financing peaked at around $2 billion or so in March, May and September, and was below average in June and August.
+
+Even in the lowest funding month of 2023, $386 million in monthly funding exceeded the highest monthly funding in 2021 and even surpassed the total funding for all of 2020 ($272 million). This trend reflects the capital market's continued interest in and recognition of open-source business. This apparent trend of growth in funding shows the growing interest and confidence of the capital markets in open-source business. Investors value not only the innovative potential and technological advantages of open-source models, but also their sustainability in the marketplace and long-term growth potential.
+
+
+
Figure 4.4 Amount of Global VC Funds Invested in Commercialized Open-source Software Companies (Source:OSS Capital)
+
+
+Analyzing from the perspective of financing scale of each round, the capital prefers medium-term financing such as B, C, D, and so on. This reflects the characteristics of commercial open-source companies:In the early stage, the technical details are still unclear, and the business model is not clear; however, when they gradually cross the start-up stage, commercial open-source companies will explode with stronger growth momentum, attracting more capital; in the later stage when the business model is gradually matured and the open-source product becomes well-known and generates stable cash flow, the need for financing will be reduced.
+
+
Figure 4.5 Distribution of Financing Rounds for Commercialized Open Source Software Companies ($M) (Source:OSS Capital)
+
+
+A total of 328 commercial open-source companies have received more than $10 million in funding over the past four years. Of these, the main concentration was in the US$10-50 million range, with a total of 210 rounds, or 64% of all rounds, in the US$10-20 million and US$20-50 million ranges. There were 49 rounds of $50-100 million and 46 rounds of $100-200 million, accounting for 29% of all rounds. A total of 23 companies received more than $200 million in funding, with two of them even receiving more than $500 million in a single round.
+
+
Figure 4.6 Distribution of Financing Rounds for Commercialized Open-source Software Companies ($M) (Source:OSS Capital)
+
+
+### 4.2 The status of China market
+
+#### 4.2.1 Overview of the development of China's equity capital market
+
+**The number and size of newly established funds declined, but the overall trend is gradually improving**
+
+In the first half of 2023, 3,930 new funds were launched in the (PE/VC) market, down 12% from 4,456 new funds launched in the same period last year. During this period, new fund launches totaled $364.2 billion, a decrease of 3% year-over-year. Despite the decline in size and volume compared to last year, the second quarter performed better than the first quarter, with an overall improving trend:Specifically, new fund launches in the first quarter amounted to $161.4 billion, a decline of nearly 20% year-on-year, while the second quarter recorded $202.8 billion, an increase of 16% year-on-year.
+
+
Figure 4.7 Domestic Private Equity Fund Contributions and Volume (Source:investment.com, KPMG)
+
+
+**Increase in the size of RMB funds and a significant decrease in the size of foreign currency funds**
+
+In the first half of 2023, the number of new RMB funds launched was 3,840, a decrease of 13% compared to the same period last year. The total size of RMB funds reached US$339.5 billion, a 13% increase compared to the same period last year. The size of foreign currency funds was $24.7 billion, a significant decline of 67% from the previous year. Despite the increase in the number of foreign currency funds in 2023, their impact on the total size is small as most are small funds.
+
+This trend indicates that the domestic equity investment market prefers the more conservative investment style of RMB funds:and requires a higher degree of stability in the portfolio companies. For open-source business startups in China, simply following the market buzz is no longer enough to attract investment. Technological strength and long-term growth potential become key factors in assessing whether to make further investments.
+
+
Figure 4.9 Domestic Private Equity Foreign Currency Fund Size and Volume (Source:KPMG)
+
+
+**Economic recovery falls short of expectations and decline in overall investment volume and size**
+
+Against the macro backdrop of unstable roots of economic recovery, slowdown in overall demand, and instability in external markets, the total number of investments in the H1 equity market in 2023 will be 3,750, a year-on-year decline of 31%; the total amount of investment supplied will be USD56.9 billion, a decline of 6% compared to the same period last year. Compared to the financing side where the size of newly established funds declined by 3%, a stronger contraction has been shown on the investment side, which further illustrates the cautious sentiment of investors, which is consistent with the trend shown by international markets.
+
+
Figure 4.10 Amount and number of investments in the domestic equity market (Source:KPMG)
+
+
+#### 4.2.2 Steady development of domestic open source ecology
+
+**Open-source industry is gradually improving in all aspects of the ecosystem and is steadily growing**
+
+At present, the domestic open-source industry is experiencing the development pattern of both top-level design and industrial progress, talent reserve and technological innovation, making progress together in all aspects from laws and regulations, policy support, competition selection, and all links of the industry chain.
+
+In terms of laws and regulations, Zhang Guofeng, deputy director of the Institute of Artificial Intelligence and Change Management at the University of International Business and Economics in Shanghai and secretary general of the Shanghai Open-source Information Technology Association on November 2, 2023, said at the media communication meeting of the 2023 Open-source Industry Ecological Conference that Shanghai's open-source industry planning and policies are in the process of being drafted and pushed forward, and that Shanghai must seize the historic opportunity to actively participate in digital governance and digital public goods international cooperation (news from The Paper); in terms of policy support, at the 2023 Global Open-source Technology Summit (GOTC), the Shanghai open-source industry service platform was officially announced to start:Shanghai Pudong Software Park signed a contract with the Linux Foundation Asia-Pacific to officially land the Linux Foundation Asia-Pacific Open-source Community Service Center, and signed a strategic cooperation agreement with OSChina to build the Shanghai open-source ecological (News from Wen Hui Bao). In terms of competition selection, China has already had a series of open-source competitions such as "China Software Open-source Innovation Competition" and "OpenHarmony Competition Training Camp", which have attracted students from Shanghai Jiaotong University, Fudan University and other domestic universities to participate in the competitions, and a large number of innovative highlights have emerged from the competitions, fully reflecting the momentum and great potential of the flourishing co-construction of the open-source ecosystem. A large number of innovative highlights emerged in the competition, fully reflecting the good momentum and great potential of the open-source ecological construction.
+
+All segments of the open-source chain are thriving. In the field of artificial intelligence, numerous companies have open-sourced base LLMs, including Alibaba open-sourcing Tongyi Qianwen, High-Flyer Quant open-sourcing DeepSeek, and more. Startups in Baichuan Intelligence, Zhipu AI, Zero One Everything and so on have respectively released a variety of LLMs of their own training base, it is worth mentioning that these companies are favored by the capital market, respectively, in this year, one or more high-value financing. In the developer tools layer, a number of startups that are already deep in the game are joined by new players and there are already products that are trying to go global. In the foreseeable future, there are also opportunities for open-source AI applications to usher in more opportunities at the application layer.
+
+In the area of underlying operating systems, large companies are promoting the localization of operating systems, including the Anolis OS open-source community developed by Alibaba and the openEuler community supported by the OpenAtom Open-Source Foundation. These large enterprises also have notable open-source project layouts in a number of key areas, including cloud native, big data, artificial intelligence, and front-end technologies. For example, ant-design, Ant Group's enterprise UI design tool, PaddlePaddle, Baidu's deep learning platform, and Apache Echarts, a data visualization charting library, all have a wide reach and large user base in the GitHub community.
+
+In the big data and database industry, a number of startups are actively strategizing in response to the large and diverse data generated by domestic and international markets, as well as the growing demand for data processing. For example, PingCAP launched TiDB, a distributed relational database, and TiKV, a distributed key-value database; TDengine, a time-series database; and ShardingSphere, a distributed database middleware from SphereEx. With the development of AI technology, innovative products have emerged in the AI field, such as Zilliz's vector database developed for AI applications and Jina.ai's neural search engine, which enables searches across all types of content.
+
+
Figure 4.11 Map of domestic AI-related tech companies' open source projects and open source companies (partial)
+
+
+**ModelScope has become the first portal for domestic open source LLMs, marking the gradual growth of China's open-source AI community construction**
+
+ModelScope Community is an AI modeling community launched by Ali Dharma Institute in collaboration with the Open-source Development Committee of China Computer Federation (CCF), aiming to build a next-generation open-source model-as-a-service sharing platform, and strive to lower the threshold of AI applications. Since its launch, it has expanded rapidly:The community now has over 2,300 models, over 2.8 million developers, and over 100 million model downloads. Baichuan Intelligence, Wisdom Spectrum AI, Shanghai Artificial Intelligence Laboratory, IDEA Research Institute and other leading LLMing organizations use ModelScope as their open-source model debut platform.
+
+The ModelScope community upholds the concept of "Model as a Service" and treats AI models as an important element of production, providing services around the model lifecycle, from model pre-training to secondary tuning and finally to model deployment. Compared to the foreign community Hugging Face, ModelScope pays more attention to domestic needs, provides a large number of Chinese models, and promotes the application of relevant AI scenes in China.
+
+
Figure 4.12 So far, ModelScope community has 11 model classes including LLM, zero-sample learning, etc.
+
+
+The establishment and rapid development of the ModelScope community has set a benchmark for China's open-source community culture, which is conducive to further promoting the spread of open-source culture in China, attracting more creative, open-source spirit of technology creators, technology users to join, and promoting the further prosperity of China's open-source cause.
+
+#### 4.2.3 Domestic Open Source Company Financing Remains Hot
+
+The market heat maintained in 2023, with several large investments taking place and some startups raising multiple rounds of funding in a year, reflecting the high level of investor interest. Open Source China is an open-source community platform company, including nearly 100,000 world-renowned open-source projects, under the banner of open-source community Landscape and Japan's old open-source community OSDN, and also owns the code hosting platform Gitee, which is the leading code hosting service platform in China, and has obtained a 775 million yuan of strategic financing in the B+ round; SelectDB develops and promotes open-source real-time data warehouse Apache Doris, and provides technical support and commercial services for Apache Doris users, and has obtained a new round of several hundred million yuan of financing so far. Flywheel Technology, which develops and promotes the open-source real-time data warehouse Apache Doris and provides technical support and commercial services for Apache Doris users, has obtained a new round of financing of hundreds of millions of yuan, and the total financing scale has reached nearly 1 billion yuan up to now; Lanboat Technology, which provides a new generation of cognitive intelligence platform based on NLP technology, has completed the investment of the Pre-A+ round, and the total financing scale has reached hundreds of millions of yuan in less than a year.
+
+At present, the development of China's open-source ecosystem is still at an early stage, and the financing events in 2023 will mainly focus on round B and before, involving artificial intelligence, open-source communities, data warehouses and LLMing platforms, and other fields, with vast market opportunities.
+
+
+Table 4.1 Investment and Financing of Domestic Open Source Software Startups (slide to right to view full content)
+
+
+(Github statistics as of December 7, 2023)
+
+
+| **Company** | **Open source project** | **Corporate operations** | **Latest round of financing round** | \*\* Amount of latest round of financing\*\* | **Time of latest round of financing** | **GitHub Star** | **GitHub Fork** |
+| --------------------------------------------- | ------------------------------------ | -------------------------------------------------- | ---------------------------------------- | ----------------------------------------------------- | ------------------------------------- | --------------- | --------------- |
+| **Tributary Technologies** | Apache APISIX | Microservices API Gateway | A + round | Millions of dollars. | June 2021 | 10.8k | 2k |
+| **Moby Dick Open Source** | Apache DolphinScheduler | Cloud-Native DataOps Platform | Pre-A round | tens of millions of dollars | July 2022 | 9.4k | 3.5k |
+| **Flywheel Technologies** | Apache Doris | Cloud Native Real-Time Warehouse | Pre-A round | several hundred million dollars | June 2023 | 6.5k | 1.9k |
+| **Even Tech** | Apache HAWQ | Hadoop SQL Analysis Engine | B + round | Nearly $200 million | August 2021 | 672 | 324 |
+| **Tianmou Technology** | Apache IoTDB | Time Series Database System | angel round (finance) | nearly a billion dollars | June 2022 | 2.8k | 750 |
+| **Short step information technology** | Apache Kylin | Big Data online analytical processing engine | D round | $70 million. | April 2021 | 3.4k | 1.5k |
+| **StreamNative** | Apache Pulsar | distributed message queue | A + round | - | 2023 | 12k | 3.2k |
+| **SphereEx** | Apache ShardingSphere | Distributed Database Pluggable Ecology | Pre-A round | Nearly $10 million | January 2022 | 17.7k | 6.1k |
+| **Antoine Mound (AutoMQ)** | automq-for-rocketmq automq-for-kafka | Streaming storage software and message queues | Angel rounds + | Tens of millions of RMB | November 2023 | 195 | 34 |
+| **Smart Spectrum AI** | ChatGLM | Large Prophecy Model | B++++ | RMB 1.2 billion | September 2023 | 36.3k | 4.9k |
+| **Luchen Technology** | Colossal-AI | High-Performance Enterprise AI Solutions | angel round (finance) | $6 million | September 2022 | 6.8k | 637 |
+| **Chatopera** | cskefu | Multi-Channel Intelligent Customer Service System | angel round (finance) | millions of dollars | August 2018 | 2.2k | 742 |
+| **Digital Change Technology** | Databend | cloud warehouse (computing) | angel round (finance) | Millions of dollars. | August 2021 | 4.8k | 500 |
+| **Dify.AI** | Dify | LLMOps platform | fund | undisclosed | 44986 | 11.8k | 1596 |
+| **Image Cloud Technology** | EMQX | MQTT Message Middleware | B round | 150 million | December 2020 | 10.8k | 1.9k |
+| **TensorChord** | Envd | MLOps | seed round | Millions of dollars. | November 2022 | 1.3k | 102 |
+| **Stoneware Technology** | FydeOS | Chromium-based operating systems | Pre-A round | tens of millions of dollars | February 2022 | 1.5k | 192 |
+| **Generalized intelligence** | GAAS | Autonomous UAV flight program | * | undisclosed | October 2018 | 1.7k | 411 |
+| **GeekCode** | Geekcode.cloud | cloud development environment | seed round | Millions of RMB | April 2022 | 42 | 2 |
+| **Gitee** | git | Git Code Hosting | B + round | 775 million | July 2023 | - | * |
+| **Polar Fox** | GitLab | DevOps Tooling Platform | A++ round | tens of millions of dollars | September 2022 | - | * |
+| **White Sea Technology** | IDP | AI Data Development Platform | seed round | tens of millions of dollars | December 2021 | 17 | 3 |
+| **Ella Yunko** | illa-builder | Low-code development platform | angel round (finance) | Millions of dollars. | September 2022 | 2.3k | 126 |
+| **Gina Technology** | Jina | A multimodal neural network search framework | Series A | $30 million | November 2021 | 16.8k | 2k |
+| **Juicedata** | JuiceFS | distributed file system (DFS) | angel round (finance) | millions of dollars | October 2018 | 7.1k | 605 |
+| **Harmonic Cloud Technology** | Kingdling | Container Cloud Products and Solutions | B + round | over one hundred million dollars | January 2022 | 270 | 56 |
+| **Fly to Cloud** | JumpServer | Cloud & DevOps | D + Wheel | 100 million | April 2022 | 19.5k | 4.8k |
+| **Talent Cloud Technology** | Kubernetes | Container Cloud Platform | Mergers and Acquisitions - Bytes | undisclosed | July 2020 | 94.1k | 34.5k |
+| **Zeto Technology** | Kunlun | distributed database | angel round (finance) | tens of millions of dollars | August 2021 | 112 | 15 |
+| **Deepness Technology** | LinuxDeepin | Linux operating system | B round | tens of millions of dollars | April 2015 | 413 | 70 |
+| **Matrix origin** | Matrixone | data intelligence | angel + round | Tens of millions of dollars | October 2021 | 1.3k | 212 |
+| **Mission Technologies** | Mengzi | macrolanguage model | Pre-A+ round | several hundred million yuan (RMB) | March 2023 | 530 | 61 |
+| **Zilliz** | milvus | vector search engine | B + round | $60 million. | August 2022 | 14.4k | 1.9k |
+| **Euronet** | Nebula | distributed graph database | Pre-A + round | Nearly $10 million | November 2020 | 8.3k | 926 |
+| **PLEASURE NUMBER TECHNOLOGY** | NebulaGraph | distributed graph database | Series A | Tens of millions of dollars | September 2022 | 9.7k | 1.1k |
+| **First class technology** | oneflow | Deep Learning Framework | Mergers and Acquisitions - Meituan | - | 2023 | 4.1k | 478 |
+| **Facial Intelligence** | OpenBMB | Large model applications | seed round | undisclosed | August 2021 | 359 | 49 |
+| **EasyJet Travel Cloud** | OpenStack | IaaS | Round E | undisclosed | July 2021 | 4.6k | 1.6k |
+| **Original Language Technology** | PrimiHub | privacy calculations | Angel rounds + | multi-million dollar | October 2022 | 263 | 60 |
+| **Good Rain Technology** | Rainbond | Cloud Operating System for Enterprise Applications | Pre-A round | millions of dollars | August 2016 | 3.6k | 664 |
+| **Quick use of cloud computing** | QuickTable | Code-free data modeling tools | * | undisclosed | August 2021 | 7 | 3 |
+| **Rayside Technology** | RT-Thread | Internet of Things Operating System | - | undisclosed | January 2020 | 7.6k | 4.2k |
+| **Giant Sequoia Database** | SequoiaDB | Distributed relational database | D round | several hundred million dollars | October 2020 | 305 | 115 |
+| **Borderless Technology** | Shifu | IoT Software Development Framework | Series A | undisclosed | June 2022 | 205 | 21 |
+| **Dingshi Vertical** | StarRocks | MPP Analytical Database | B round | undisclosed | January 2022 | 3.6k | 793 |
+| **Stone Atomic Technology** | StoneDB | Real-time HTAP database | angel round (finance) | tens of millions of dollars | February 2022 | 639 | 100 |
+| **TabbyML** | TabbyML | Open Source AI Programming Assistant | seed round | undisclosed | 45108 | 13.9k | 515 |
+| **Taiji graphic** | Taichi | Digital content creation infrastructure | Series A | $50 million | February 2022 | 21.7k | 2.1k |
+| **Titanium-platinum data** | Tapdata | Real-time data service platform | Pre-A + round | Tens of millions of dollars | July 2021 | 223 | 52 |
+| **Throughout data** | TDengine | Time-Series Spatial Big Data Engine | B round | $47 million | May 2021 | 20.1k | 4.6k |
+| **PingCAP** | TiDB | distributed database | Round E | undisclosed | July 2021 | 32.9k | 5.3k |
+| **Digital Paradise** | uni-app | A Unified Front-End Framework with Vue Syntax | B + round | undisclosed | September 2018 | 37.4k | 3.4k |
+| **LINGO TECHNOLOGY** | Vanus | Large Model Middleware | seed round | Millions of dollars. | 45108 | 2.2k | 110 |
+| **Future speed** | Xorbits | Distributed Data Science Computing Framework | angel round (finance) | Millions of dollars. | 44958 | 933 | 58 |
+| **Levi Software** | Zabbix | IT operations management | Series A | undisclosed | November 2022 | 2.6k | 766 |
+| **KodeRover** | Zadig | Cloud Native Software Delivery Cloud | Pre-A round | tens of millions of dollars | August 2021 | 1.8k | 636 |
+| **EasySoft Tianchuang** | zentaopms | Agile Project Management | Series A | tens of millions of dollars | October 2021 | 946 | 275 |
+| **Cloud Axis Information** | ZStack | IaaS | * | undisclosed | March 2021 | 1.2k | 380 |
+
+
+
+
+Table 4.2 Investment and Financing of Domestic Open-source LLMing Startups (slide to right to view full content)
+
+
+(Hugging Face statistics as of December 7, 2023)
+
+
+
+
+
Company
+
Latest financing round
+
Date of last financing
+
Recent financing volume
+
Model Introduction
+
model name
+
likes
+
download
+
+
+
百川智能
+
A 轮
+
2023-10-17 00:00:00
+
3 亿美元
+
在知识问答、文本创作领域表现突出
+
Baichuan-7B
+
795
+
102k
+
+
+
Baichuan-13B-Chat
+
612
+
8.29k
+
+
+
Baichuan2-13B-Chat
+
321
+
133k
+
+
+
智谱 AI
+
B+++++ 轮
+
2023-09-19 00:00:00
+
12 亿人民币
+
多模态理解、工具调用、代码解释、逻辑推理
+
ChatGLM-6B
+
2.67k
+
56.8k
+
+
+
ChatGLM2-6B
+
1.91k
+
97.7k
+
+
+
ChatGLM3-6B
+
501
+
104k
+
+
+
元语智能
+
出资设立
+
2022-11-24 00:00:00
+
—
+
功能型对话大模型
+
ChatYuan-large-v2
+
171
+
669
+
+
+
ChatYuan-large-v1
+
108
+
120
+
+
+
ChatYuan-7B
+
9
+
3
+
+
+
面壁智能
+
天使轮
+
2023-04-14 00:00:00
+
数千万人民币
+
大语言模型,包括包括文字填空、文本生成、问答
+
cpm-bee-10b
+
158
+
19
+
+
+
cpm-ant-10b
+
22
+
12.6k
+
+
+
cpm-bee-1b
+
12
+
7
+
+
+
澜舟科技
+
Pre-A + 轮
+
2023-03-14 00:00:00
+
数亿人民币
+
处理多语言、多模态数据,文本理解、文本生成
+
mengzi-t5-base
+
41
+
1.42k
+
+
+
mengzi-bert-base
+
32
+
1.46k
+
+
+
mengzi-t5-base-mt
+
17
+
44
+
+
+
虎博科技
+
A 轮
+
2019-03-01 00:00:00
+
3300 万美元
+
多语言任务大模型,覆盖生成、开放问答、编程、画图、翻译、头脑风暴等 15 大类能力
+
tigerbot-70b-chat-v2
+
40
+
1.68k
+
+
+
tigerbot-180b-research
+
33
+
12
+
+
+
tigerbot-70b-base-v1
+
15
+
3.25k
+
+
+
深势科技
+
C 轮
+
2023-08-18 00:00:00
+
超 7 亿人民币
+
高精度蛋白质结构预测模型
+
Uni-Fold-Data
+
—
+
6
+
+
+
三维分子预训练模型
+
Uni-Mol-Data
+
—
+
3
+
+
+
元象 XVERSE
+
A + 轮
+
2022-03-11 00:00:00
+
1.2 亿美元
+
大语言模型,具备认知、规划、推理和记忆能力
+
XVERSE-13B
+
117
+
42
+
+
+
XVERSE-13B-Chat
+
42
+
412
+
+
+
XVERSE-65B
+
35
+
6.18k
+
+
+
零一万物
+
天使轮
+
2023-11-06 00:00:00
+
—
+
通用型 LLM,其次是图像、语音、视频等多模态能力。
+
Yi-34B
+
1.07k
+
109k
+
+
+
Yi-6B
+
303
+
26.7k
+
+
+
Yi-34B-200K
+
107
+
4.55k
+
+
+
diff --git a/en/data.md b/en/data.md
new file mode 100644
index 0000000..53f5ac0
--- /dev/null
+++ b/en/data.md
@@ -0,0 +1,1494 @@
+---
+outline: deep
+---
+# OSS Data Analytics
+
+## Overview
+
+The China Open Source Annual Report is based on in-depth and comprehensive data insights and is divided into eight major parts. The 1st part, **General Overall Macro Insights**, provides an overview of China's global open-source ecology through an in-depth analysis of essential events, active repositories, active users, open-source licensing, and programming languages. The 2nd part, **OpenRank Rank List**, is the list of open source projects, enterprises, foundations, developers, and collaborative robots in all areas of the world and China, and provides a comprehensive and systematic OpenRank indicator information service for industry. Part 3 and Part 4 contain **Enterprise Insights** and **Foundation Insights**, which illustrate the evolution of global and Chinese enterprises and foundations in the open source area through evolution maps and trend analyses. Part 5 **Technology Sector Insights** provides an in-depth study on the evolution of the Top 10 lists and projects in each area, showing the direction and trends in forward technology. Part 6 **Open Source Project Insights** provides insights into the diversity and innovative directions of different project types, areas, and topics. Part 7 **Open Source Developer Insights** An analysis of developer types, hours of work, geographical distribution, and robotic use shows the diversity and characteristics of the developer community. Part 8, **Case Studies**, provides a series of interesting case analyses that allow readers to glimpse China's exponential ecological boom. Overall, the data page offers a panorama of China's open-source ecology in 2023 through rich data insights and analyses.
+
+### Introduction to indicators
+
+**OpenRank**
+
+The OpenRank indicator is a collaborative network indicator developed by the X-lab Open Laboratory and based on an open source developer-project collaborative relationships network, which not only characterizes the overall development of projects community participation but also introduces elements of open source ecology, which can be well identified and displayed by such entities as projects, people, organizations, etc. in open source ecology. OpenRank is now widely accepted by industry and academia, including the China Institute for Standardization (ISI) series of Open Source Governance Standards, the ICT White Paper on Open Source Governance, the Open Atomic Open Source Foundation Global Open Source Screen, and the Business Open Source Office Governance Toolkit.
+
+For a definition of this indicator, refer to:
+
+[1] [Shengyu Zhao et al.: OpenRank Leaderboard: Motivating Open Source Collections Through Social Network Evaluation in Alibaba. ICSE, 2024] (https\://www\.researchgate.net/publication/3766686121_OpenRank_Leaderboard_Motivating_Open_Source_Collections_Through_Social_Network_Evaluation_in_Alibaba)
+
+[2] [Zhao Honghou: How to evaluate an open source project (iii) value stream, 2021] (https\://blog.frankzhao.cn/how_to_measure_open_source_3)
+
+[3] Institute for Standardization of the Ministry of Industry and Information: Information Technology Open Source Governance Part 3:Community Governance and Operationalisation [T/CESA 1270.3-2023]; Information Technology Open Source Governance Part 5:Evaluation Model for Open Source Contributors" [T/CESA 1270.5-2023], 2023
+
+**Activity**
+
+Activity is a statistical indicator of the level of activity of the X-lab researcher or developer. Developer activity is weighted by the behavior of developers, such as Issue, PR, and Code Review. The project's activity is processed by the sum of the total activity of all developers in the project.
+
+For a definition of this indicator, refer to:
+
+[1] [Xiaoya Xia et al: Exploring activity and contributors on GitHub: Who, what, when, and when. APSEC, 2023](https://ieeexplore.iee.org/abstract/document/10043221)
+
+[2] [Frank Zhao:How to evaluate an open source project (i) - activity,2021](https://blog.frankzhao.cn/how_to_measure_open_source_1)
+
+
+## 1. Overall Macro Insight
+
+### 1.1 Basic Events
+
+**Basic events** are the database for this data page analysis and refer to a series of event log data generated by developer activity on GitHub, Gitee, and others on the global open-source collaborative platform. A statistical analysis of underlying events provides a macro insight into the dynamics of global ecological development. This annual open-source report covers the collaborative platforms GitHub, Gitee, and GitLink.
+
+#### 1.1.1 Trends in events across GitHub
+
+First, the total number of events logs for statistical analysis across GitHub is shown in the graph below.
+
+![1-1](/image/data/chapter_1/1-1.png)
+
+
Figure 1.1 Trends in GitHub annual events
+
+
+The overall activity of global open sources and the number of active warehouses have increased significantly in recent years, reflecting the growth rate in global open-source development.2023 GitHub log data reached 1.4 billion compared to 2022 when it increased by about 10.32 percent. After high growth in 2018-2020, the GitHub platform's annual event growth gradually declined, with a growth rate of about 10% in 2023. However, the 10 percent growth rate, because of its overall volume, continues to highlight open-source technology's dynamic and critical role in the global digital transition.
+
+#### 1.1.2 Comparison of overall events trends in GitHub and Gitee
+
+Because of the size of the events active on the GitHub platform, the subsequent analysis was built on the benchmark of the top 30,000 active warehouses per platform. For ease of comparison, we have selected GitHub for statistical analysis of 8 categories of events of greater relevance to open source participation in Gitee, including CommunityCommentEvent, ForkEvent, IssueCommentEvent, IssuesEvent, FullRequestEvent, FullRequestReviewCommentEvent, PushEvent, and WatchEvent.
+
+![1-2](/image/data/chapter_1/1-2.png)
+
+
Figure 1.2 GitHub and Gitee Active Repository Events
+
+
+The Gitee platform showed a more pronounced growth trend. Even since 2021, the number of incidents in the top 30,000 active warehouses has surpassed GitHub, highlighting the outbreak of active open-source projects in the country. Domestic developers' active participation and contribution to open-source communities have injected new dynamism into technological innovation and knowledge sharing.
+
+However, it must be emphasized that data on the first 30,000 active projects alone does not fully reveal the reality of the global GitHub platform, as the long-end effects are still evident globally. Subsequent analyses will reflect this more clearly, especially in the broad and diverse nature of the GitHub platform as the world's leading open-source community. In the future, with the evolution of technology and the promotion of an open-source culture, the Chinese open-source community can be expected to continue to flourish globally.
+
+Further to the analysis of disaggregated data on underlying events, the results are shown in the figure below.
+
+![1-3](/image/data/chapter_1/1-3.png)
+
+
Figure 1.3 GitHub vs. Gitee Active Repository Event Types
+
+
+Can be seen from the analytics results:
+
+The most frequent event type on the GitHub platform is the Push event, while Pull Request events and Issue Comment events rank 2nd and 3rd, respectively. The occurrence rates of each event type have remained relatively stable, reflecting a trend towards a stable ecosystem in GitHub's open-source community.
+On the Gitee platform, event data grew significantly in 2020, initially focusing on Watch events. But after 2020, Pull Request and Review Events grew rapidly, becoming the largest event type in 2022 and growing steadily in 2023. The structural changes in Gitee event data reflect a significant shift in the role of domestic developers from a watchdog to a contributor, which is consistent with observations worldwide.
+
+#### 1.1.3 GitLink Events Analysis
+
+For the GitLink platform, we have also selected the top 30,000 active repositories as benchmarks. Given the limitations of the data, only data covering the six types of events—CommunityCommentEvent, ForkEvent, IssueCommentEvent, IssuesEvent, FullRequestEvent, and WatchEvent—were selected for analysis.
+
+![1-17](/image/data/chapter_1/1-17.png)
+
+
Data analysis of events on the GitLink platform
+
+
+While the number of active repository events on GitLink still lags behind platforms like GitHub and Gitee, it exhibits a notable upward trend. On the GitLink platform, Issues events and CommitComment events constitute the vast majority of active repository events.
+
+### 1.2 Active Repository
+
+#### 1.2.1 Trends in GitHub total number of active warehouses
+
+The following figure shows the statistical analysis of the overall activity trends of GitHub and Gitee active repositories.
+
+![1-4](/image/data/chapter_1/1-4.png)
+
+
Figure 1.5 Trends in the number of GitHub annual active repositories
+
+
+
+According to overall data for 2023, the total number of active repositories worldwide reached 87.92 million, marking a 4.06% increase from the previous year; this aligns with the overall trend in events, which has been declining annually since experiencing high growth from 2018 to 2020. This decline could stem from the COVID-19 pandemic and global economic developments.
+
+Because of the gap in the number of GitHub and Gitee warehouses, the following analytical work is also based on 30,000 active repositories in front of each platform.
+
+#### 1.2.2 Comparison of the overall activity of GitHub and Gitee
+
+The graph below shows the statistical analysis of GitHub and Gitee's overall activity in the repositories.
+
+![1-5](/image/data/chapter_1/1-5.png)
+
+
Figure 1.6 GitHub vs. Gitee active repository activity
+
+
+Looking at the activity data of the top 30,000 active repositories from each platform, the overall activity on the Gitee platform grew rapidly from 2019 onwards. By 2022, it surpassed GitHub and maintained this high-growth trend, revealing the enormous vitality of open-source development in China during this period.
+
+![1-6](/image/data/chapter_1/1-6.png)
+
+
Figure 1.7 GitHub compared to Gitee active repository activity
+
+
+Furthermore, the detailed analysis of the composition of the activity reveals the following:
+
+
+On the GitHub platform, the activity stemming from "Create PR" events comprises nearly half of the total activity, while "Merge PR" events contribute to approximately one-fourth. Reviewing PRs contributes around 10% of the activity, while the combined activity from issue creation and comments nearly matches, accounting for 7%.
+
+On the Gitee platform, the highest activity contribution comes from reviewing PRs, constituting two-thirds of the total activity. Similarly to GitHub, "Merge PR" events follow closely behind in activity contribution, with a proportion comparable to that on the GitHub platform. A surprising finding is that while "Create PR" events contribute the highest proportion of activity on GitHub, they contribute the least on the Gitee platform, accounting for only 2% of the total activity events.
+
+#### 1.2.3 GitHub and Gitee overall active repository OpenRank trends vs.
+
+The graph below shows the statistical analysis of GitHub and Gitee's active repository, OpenRank trends.
+
+![1-7](/image/data/chapter_1/1-7.png)
+
+
Figure 1.8 GitHub vs. Gitee Active Repository OpenRank
+
+
+Although the activity of the top 30,000 repositories on Gitee briefly surpassed that of GitHub in 2022, the influence gap measured by OpenRank remains significant (approximately 5:2). Not only is the gap considerable but there also seems to be no indication of it narrowing in terms of trends. This is particularly noteworthy and underscores a key area of focus for future open-source development in China.
+
+
+### 1.3 Active users
+
+#### 1.3.1 Trends in the total number of active users on GitHub
+
+The following figure presents a statistical analysis of the overall active user count on GitHub.
+
+![1-8](/image/data/chapter_1/1-8.png)
+
+
Figure 1.9 Trends in GitHub annual active users
+
+
+In 2023, the total number of active developers in the field reached 21.93 million, an increase of 8.88 percent over the previous year. Like the GitHub active warehouse, after nearly five years of high growth, the growth rate began to decline in 2020. The growth of active users on the GitHub platform began to slow (although the GitHub official announced at the beginning of 2023 that the overall number of users of its platform surpassed 100 million), there was also some correlation with changes in the global situation and the rise of a platform like Gitee.
+
+#### 1.3.2 Active user geographical distribution and ranking
+
+The annual report can include detailed geo-location data analysis for GitHub developers as a contribution to the award-winning game of the OpenDigger Open Source Software Ecological Data Analysis Dredging Platform ([OpenSODA](https://github.com/ECNU/OpenSODA)).
+
+The following analysis is based on approximately 2 million developers who have correctly filled in their geographical location information out of the 10 million active developers on GitHub in 2023. Considering the total registered users on GitHub to be 100 million, the sampling ratio is approximately 2%.
+
+**1. Geographical distribution of global developers**
+
+First, analyze developers' geographical distribution worldwide, as shown in the following chart.
+
+![1-9](/image/data/chapter_1/1-9.png)
+
+
Figure 1.10 Global geographical distribution of developers
+
+
+
Table 1.1 Global Developer Distribution by Country/Region (Top 15)
+
+
+| Ranking | States | Total Number | Percentage | Annual Activity | Active rate |
+| :-----: | :------------: | :----------: | :--------: | :-------------: | :---------: |
+| 1 | United States | 408983 | 21.09% | 236899 | 57.92% |
+| 2 | India | 177669 | 9.16% | 107066 | 60.26% |
+| 3 | China | 171039 | 8.82% | 126238 | 73.81% |
+| 4 | Brazil | 114855 | 5.92% | 83932 | 73.08% |
+| 5 | Germany | 88767 | 4.58% | 64836 | 73.04% |
+| 6 | United Kingdom | 83245 | 4.29% | 55175 | 66.28% |
+| 7 | Canada | 65241 | 3.36% | 42238 | 64.74% |
+| 8 | France | 57480 | 2.96% | 40341 | 70.18% |
+| 9 | Russia | 47213 | 2.43% | 31534 | 66.79% |
+| 10 | Australia | 31638 | 1.63% | 20512 | 64.83% |
+| 11 | Poland | 31469 | 1.62% | 21792 | 69.25% |
+| 12 | Japan | 30873 | 1.59% | 21942 | 71.07% |
+| 13 | Netherlands | 30617 | 1.58% | 21685 | 70.83% |
+| 14 | Spain | 28928 | 1.49% | 19509 | 67.44% |
+| 15 | South Korea | 28325 | 1.46% | 21811 | 77.00% |
+
+Overall, developers from various countries are continuously increasing:
+
+- The United States ranks first due to its early involvement in the open-source domain and its advantage in technology talent.
+- Based on the calculated total number of developers from the United States in the table (409,000), the actual number of developers from the United States on GitHub is estimated to be around 21.01 million, with a deviation of approximately 4% from the official data released by GitHub (22 million).
+- India, China, and Brazil, with their large population bases, rank second, third, and fourth in terms of the number of developers. However, based on the activity rate (annual active users/total users), China has the highest rate among the top four.
+- Developers from European countries also constitute a significant force in the open-source community, collectively ranking second in volume.
+- According to the official data released by GitHub and Gitee (both around 12 million), the total number of global open-source developers from China is likely to exceed 20 million, roughly equivalent to the number from the United States in quantity alone.
+
+**2. Geographical distribution of Chinese developers**
+
+Further analysis shows the geographical distribution of Chinese developers, as shown in the graph below\.Of these, the data sources are almost 150,000 developers of “China” users who correctly fill out provincial information.
+
+![1-10](/image/data/chapter_1/1-10.png)
+
+
Figure 1.11 Geographical distribution of Chinese developers
+
+
+According to data from GitHub 2023 Q3 quarter, the total number of Chinese developers is approximately 18.8 million, which can be estimated on the basis of proportion to the total actual developers in each province.
+
+
Table 1.2 Distribution of Chinese Developers (Top 15)
+
+
+
+| Ranking | Provinces | Total Number | National percentage | Actual Total |
+| :-----: | :-------: | :----------: | :-----------------: | :-------------: |
+| 1 | Beijing | 32982 | 22.04% | 262.25 million |
+| 2 | Sengah | 24581 | 16.43% | 1955.45 million |
+| 3 | Guangdong | 21684 | 14.49% | 172.41 000 |
+| 4 | Zhejiang | 14256 | 9.53% | 113.35 million |
+| 5 | Taiwan | 12173 | 8.13% | 96.79 million |
+| 6 | Jiangsu | 7335 | 4.90% | 58.32 million |
+| 7 | Chechen | 7012 | 4.69% | 55.75 million |
+| 8 | Hong Kong | 4678 | 3.13% | 37.19 million |
+| 9 | Hubei | 4415 | 2.95% | 35.1 million |
+| 10 | Shaanxi | 2815 | 1.88% | 22.38 000 |
+| 11 | Fujian | 2405 | 1.61% | 19.12 million |
+| 12 | Shandong | 2035 | 1.36% | 16.18 million |
+| 13 | Hunan | 1858 | 1.24% | 14.77 000 |
+| 14 | Chongqing | 1833 | 1.22% | 1457 000 |
+| 15 | Annah | 1487 | 0.99% | 11.82 million |
+
+Ranking and data in the above table reveal the relevance of Chinese open-source developers and regional economic development levels:
+
+- The number of open source developers in the North, Upper and Zhej's four major cities has surpassed one million classes, particularly in Beijing;
+- The fifth and eighth places respectively of Taiwan and Hong Kong, highlighting the importance of Hong Kong and the Taiwan Strait;
+- The open source developer in the Long Triangle (Jijjiang Zhejushu) region has reached almost 38.8 million;
+- The central western regions, such as Sichuan, Hubei and Shaanxi, have also shown good performance, particularly in Sichuan, which has attracted a large number of developers through their suitable, fast-growing software industries.
+
+### 1.4 Open source licenses
+
+#### 1.4.1 Number of warehouses using open-source licenses
+
+The graph below shows the number of open-source licenses that GitHub's active repository uses.
+
+
+
+
+
+
Figure 1.12 Number of warehouses using open source licenses
+
+
+The analysis revealed that the most used open-source licenses are currently available, including MIT licenses, Apache licenses v2.0, GNU General Public Licence v3.0, and BSD 3-Clause licenses. Of these, MIT licenses rank first to reach 60%. The MIT license is named after the Massachusetts Institute of Technology. The simplicity and flexibility of MIT licenses have made it one of the licenses chosen by many developers and have provided the least legal restrictions to encourage developers to use and disseminate software freely.
+
+#### 1.4.2 Trends in Open-Source Licensing Types
+
+Statistical analysis has been conducted on the trends of open-source license types, as shown in the following figures.
+
+![1-12](/image/data/chapter_1/1-12.png)
+
+
Figure 1.13 Trends in the Number of Open Source License Types
+
+
+Overall, the number of open-source license types has continuously increased since 2017. Introducing licenses such as the Eclipse Public License 2.0, the European Union Public License 1.2, and others contributed to the growth observed between 2017 and 2018. Subsequently, the growth rate of open-source license types slowed down. Between 2021 and 2022, a new batch of open-source licenses, such as the Mulan Series Licenses and the CERN License v2, began to emerge. Following this, the development trend stabilized, and currently, the mainstream license types on GitHub have remained steady at 46 types for two years.
+
+### 1.4.3 Trends in the Number of Repositories Using Open Source Licenses
+
+According to Github's log data, in 2023, nearly 7.7 million active repositories used various open-source licenses, accounting for 8.76% of all active repositories. We present the MIT License's data separately due to its significant influence.
+
+**1. Trends in the Number of Repositories Using the MIT License**
+
+Statistical analysis of the trends in the number of repositories using the MIT License is shown in the following figure.
+
+![1-13](/image/data/chapter_1/1-13.png)
+
+
Figure 1.14: Trends in the Number of Repositories Using the MIT License
+
+
+Observations:
+
+- The MIT License is currently the most popular open-source license, with 1.58 million active repositories in 2023.
+- The trends in the number of repositories using the MIT License are similar to those of the total repository count, with significant growth observed. However, the growth rate slowed down in 2022 and 2023, which correlates with the overall slowdown in project growth.
+
+**2. Trends in the Number of Repositories Using Other Top Five Open Source Licenses**
+
+The following figure shows a statistical analysis of the trends in the number of repositories using other top-five open-source licenses.
+
+![1-14](/image/data/chapter_1/1-14.png)
+
+
Figure 1.15: Trends in the Number of Repositories Using Other Licenses
+
+
+Observations:
+
+- The number of open-source licenses is growing, with MIT, Apache, and GNU licenses remaining the top choices.
+- Differences between niche and popular open-source licenses still exist.
+- Since 2022, the usage of GNU General Public License (GPL) versions 2 and 3 has been declining overall, while GNU Affero General Public License version 3 has been increasing yearly.
+
+#### 1.4.3 Trends in the Number of Repositories Using the Mulan Series Licenses
+
+The following figure shows a statistical analysis of the trends in the number of repositories using the Mulan Series Licenses.
+
+![1-15](/image/data/chapter_1/1-15.png)
+
+
Figure 1.16 Accumulative Trends in the Number of Repositories Using the Mulan Series Licenses
+
+
+The Mulan Series Licenses (including the Mulan Permissive Software License and the Mulan Public License, among others) are drafted, revised, and released by Peking University, with the support of the National Standardization Technical Committee on Cloud Computing and the China Open Source Cloud Alliance. As the first open-source software agreement recognized by the Open Source Initiative (OSI) in China, the Mulan Permissive Software License (Mulan PSL) holds significant influence.
+
+Observations indicate a growth in repositories utilizing the Mulan licenses starting September 2022. By December 2023, there were 220 such active repositories, showcasing the increasing influence of Mulan open-source licenses.
+
+### 1.5 Programming Languages
+
+#### 1.5.1 Top Programming Languages Used by Developers in 2023
+
+The popularity of programming languages is of great interest to developers. The analysis below presents the most popular programming languages among developers in 2023, as shown in the following table.
+
+
Table 1.3: Top 15 Programming Languages Used by Developers
+
+
+| Rank | Programming Language | Number of Developers Using | Number of Repositories Using |
+|:-------:|:-----------------------:|:-------------------------------:|:--------------------------------:|
+| 1 | JavaScript | 765,589 | 1,806,477 |
+| 2 | Python | 629,423 | 653,025 |
+| 3 | HTML | 564,121 | 676,364 |
+| 4 | TypeScript | 462,729 | 886,453 |
+| 5 | Java | 368,795 | 463,660 |
+| 6 | CSS | 190,480 | 239,187 |
+| 7 | C++ | 177,905 | 135,330 |
+| 8 | C# | 158,159 | 180,537 |
+| 9 | Go | 143,433 | 165,367 |
+| 10 | PHP | 128,186 | 272,980 |
+| 11 | Jupyter Notebook | 122,475 | 102,708 |
+| 12 | Shell | 122,456 | 108,209 |
+| 13 | C | 107,918 | 80,159 |
+| 14 | Rust | 69,370 | 72,778 |
+| 15 | Ruby | 66,857 | 374,835 |
+| 16 | Kotlin | 64,307 | 62,709 |
+| 17 | Vue | 56,099 | 170,639 |
+| 18 | SCSS | 50,526 | 44,672 |
+| 19 | Dart | 46,143 | 43,006 |
+| 20 | Swift | 33,839 | 35,978 |
+
+From the table above:
+
+- The top five programming languages most used by developers are JavaScript, Python, HTML, TypeScript, and Java, which represent the leading programming languages developers use. Starting from the sixth-ranked CSS, the number of users decreased by nearly half compared to Java, the fifth-ranked language.
+
+#### 1.5.2 Trends in Programming Language Usage from 2019 to 2023
+
+Statistical analysis of developers' programming language usage trends from 2019 to 2023 is depicted in the following figure.
+
+![1-16](/image/data/chapter_1/1-16.png)
+
+
Figure 1.17: Trends in Programming Language Usage from 2019 to 2023
+
+
+Observations from the figure:
+
+- JavaScript, Python, HTML, TypeScript, and Java are the leading programming languages developers use.
+- Python and TypeScript have shown rapid growth compared to the other three primary languages and have maintained a consistently rapid growth trend over the past five years.
+- TypeScript, in particular, has experienced rapid growth in the number of users over the past five years. In 2021, it significantly surpassed other programming languages, becoming one of the main programming languages developers use. Perhaps by 2024, the number of developers using it will be comparable to the number of developers using HTML, which is ranked third.
+
+
+
+## 2. OpenRank Rankings
+
+**Rankings** are a popular form of presenting analysis results.
+
+The 2023 China Open Source Annual Report separates the rankings into a dedicated section for centralized display. This is partly to showcase better the development trends of various entities (repositories/projects, countries/regions, enterprises, foundations, developers, etc.) in the open source ecosystem, and another important reason is the maturation of the OpenRank indicators and the completeness of global data.
+
+With the addition of global data from both GitHub and Gitee this year, we are able to take a global perspective with China's open source as the starting point, allowing the world to see the joint efforts and contributions of Chinese enterprises, foundations, developers, and other entities in developing the global open-source ecosystem, which is not available in other reports on the market.
+
+### 2.1 Global Open Source Repository OpenRank Rankings
+
+![2-1](/image/data/chapter_2/2-1.png)
+
+
Figure 2.1 Global Open Source Project OpenRank Rankings (Top 20)
+
+### 2.2 China Open Source Project OpenRank Rankings
+
+![2-2](/image/data/chapter_2/2-2.png)
+
+
Figure 2.2 China Open Source Project OpenRank Rankings (Top 20)
+
+
+> Chinese open-source projects are based on data from the OpenDigger project tags, and a single project may include multiple organizations or repositories on GitHub or Gitee platforms.
+
+### 2.3 Global Enterprise OpenRank Rankings
+
+![2-3](/image/data/chapter_2/2-3.png)
+
+
Figure 2.3 Global Enterprise OpenRank Rankings (Top 20)
+
+
+> Enterprise rankings are based on data from OpenDigger project tags, meaning the sum of all open source projects initiated by a certain enterprise's OpenRank, including projects donated to foundations.
+
+### 2.4 China Enterprise OpenRank Rankings
+
+![2-4](/image/data/chapter_2/2-4.png)
+
+
Figure 2.4 China Enterprise OpenRank Rankings (Top 20)
+
+### 2.5 Global Foundation OpenRank Rankings
+
+![2-5](/image/data/chapter_2/2-5.png)
+
+
Figure 2.5 Global Foundation OpenRank Rankings (Top 10)
+
+### 2.6 Country and Region OpenRank Rankings
+
+![2-6](/image/data/chapter_2/2-6.png)
+
+
Figure 2.6 Country and Region OpenRank Rankings (Top 20)
+
+
+> Country and region data is based on location information filled in by GitHub developers, with a sample size of the top 10 million OpenRank users globally.
+
+### 2.7 Global Developer OpenRank Rankings
+
+![2-7](/image/data/chapter_2/2-7.png)
+
+
Figure 2.7 Global Developer OpenRank Rankings (Top 30)
Figure 2.8 China Developer OpenRank Rankings (Top 30)
+
+
+> Chinese developer accounts are based on OpenDigger tag data.
+
+## 3. Enterprise Insights
+
+Enterprises are the core force driving the development of the global open-source ecosystem. They are initiators, as well as developers and maintainers, at the forefront of the development and commercial exploration of open-source projects.
+
+
+### 3.1 Evolution of Global Enterprise OpenRank Over the Past 10 Years
+
+![3-1](/image/data/chapter_3/3-1.png)
+
+
+![3-1](/image/data/chapter_3/3-1.png)
+
+
Figure 3.1 Changes in China Enterprise OpenRank Rankings
+
+
+Observations on the global impact of enterprise open source are as follows:
+
+- Microsoft began laying out open source over a decade ago (in 2008) and reached the pinnacle of global open source influence in 2016, a position it has held unchallenged to this day.
+- Since being officially sanctioned by the United States in 2019, Huawei has made open source a strategic priority. It has been soaring ever since and surpassed Google and Amazon this year.
+- Alibaba has been a leader in domestic open source until 2021 and has maintained its sixth position globally.
+- Ant Group's performance in the past three years has been remarkable, and it officially entered the top ten in the world in 2023.
+- Baidu, the fourth largest player in domestic open source, has fallen to 12th globally due to rapid changes in the domestic open source landscape.
+- According to the [OpenLeaderboard](https://open-leaderboard.x-lab.info/), Chinese enterprises entering the top 30 globally also include ByteDance (18), PingCAP (19), Feizhiyun (24), Deepin (25), Tencent (26), and Espressif (27).
+
+### 3.2 Evolution of China Enterprise OpenRank Over the Past 10 Years
+
+![3-2](/image/data/chapter_3/3-2.png)
+
+
Figure 3.2 Changes in China Enterprise OpenRank Rankings
+
+
+This chart effectively demonstrates the open-source strategies of domestic companies and their changing trends:
+
+Huawei began to make efforts in 2019 and, in just two years, achieved first place in China and second place globally.
+As traditional domestic leaders in open source, Alibaba and Ant have shown stable performance.
+- Baidu has slipped to fourth place due to competition from the first three.
+- ByteDance has made visible and rapid progress in recent years.
+- Espressif (Espressif Systems) is a relatively low-profile semiconductor open-source leader in China.
+- Fit2Cloud is another low-key but pragmatic open-source enterprise, with several open-source software under its belt being highly favored by developers.
+- Tencent, PingCAP, JD, and TAOS have shown a slight downward trend in the past two years, indicating that competition in the post-pandemic era will intensify.
+
+
+### 3.3 Proportion of China Enterprises' OpenRank on GitHub/Gitee Platforms
+
+
+
+
+
+
+
Figure 3.3 Proportion of China Enterprises' OpenRank among Global Enterprises (Left) and Comparison of OpenRank between Chinese and American Enterprises at the Project Level (Right)
+
+
+The left chart shows the trend of increasing influence of Chinese enterprises in the global open source ecosystem, while the right chart reflects the trend of ups and downs between China and the United States in the post-trade war era, especially after the pandemic. The influence of Chinese open source has risen significantly, as has the influence of companies like Huawei. However, it can also be seen that the gap between Chinese and American enterprises in overall open source influence is still significant (about 3 times the difference). Still, this momentum is very promising for the future.
+
+## 4. Foundations Insights
+
+This section examines the development of open-source ecology from a foundation perspective. Foundations are non-profit organizations that play a crucial role in organizing, developing, and innovating open-source projects and communities. They provide comprehensive support in technology, operations, and law to incubate open-source software and guide the building and operation of open-source communities. Foundations act as incubators and accelerators and are essential organizers of the open-source ecosystem. This year, we have included a separate section on insights from open-source foundations, where we can see the global impact of China's open-source foundations.
+
+### 4.1 Global Foundation OpenRank trend analysis
+
+
+
+
+
+
+
Figure 4.1 Global Foundation OpenRank Overall Trend
+
+
+The following trends can be seen in:
+
+- The Apache Foundation's #1 ranking has evolved at a mature and steady pace, and today it remains the first choice for many companies to develop globalization projects;
+- OpenAtom Open Source Foundation was founded more than three years ago, the rapid development of its projects, and the total impact of its projects beyond the Linux Foundation's sub-foundations, ranked second only after the Apache Foundation;
+- LF AI & Data ranked third, outpacing CNCF in cloud-native due to advancements in AI.;
+- The development of the other (sub)foundations has generally been relatively stable..
+
+### 4.2 Global Foundation project OpenRank trend analysis
+
+
+
+
+
Figure 4.2 Global Foundation Project OpenRank Trends
+
+
+In terms of open source projects under the Global Foundation:
+
+- Kubernetes continues to rank first, but influence declines every year, giving way to projects in emerging areas;
+- Doris, an open source real-time data warehouse initiated by Baidu under the Apache Foundation, has grown rapidly in recent years and ranks second;
+- OpenHarmony, a project of OpenAtom Open Source Foundation, and its various sub-repositories are a close second. If combined, they would rank #1.
+
+### 4.3 Analysis of Trends in OpenRank Projects under Foundation in China
+
+
+
+
+
Figure 4.3 Trends in OpenRank Projects under Foundation in China
+
+
+Chinese projects under various foundations are examined separately:
+
+- Doris and OpenHarmony are developing most noticeably;
+- The Milvus Vector Database has experienced rapid growth due to demand in the AIGC domain;
+- Projects like Flink and ShardingSphere are relatively stable.
+
+### 4.4 Analysis of Trends in OpenRank Projects under the Open Atom Foundation
+
+
+
+
+
Figure 4.4 Trends in OpenRank Projects under the Open Atom Foundation
+
+
+This year marks the first time we can observe the development of projects under the Open Atom Flag:
+
+- The top three are OpenHarmony, openEuler, and Anolis, representing the absolute status of the operating system, especially OpenHarmony, which is developing the fastest;
+- Other listed projects are developing steadily, and we look forward to their progress in the new year.
+
+
+## 5. Technological insights
+
+The technology field is rapidly evolving, especially in various subfields. **Operating systems** are being developed in new architectures, **cloud native** are driving digital transformation, **databases** are becoming the infrastructure for data innovation, **big data** is facilitating intelligent decision-making, **artificial intelligence** is accelerating automation in various industries, and **front-end** technologies are focusing on interaction and aesthetics. These areas are at the forefront of technology, attracting innovators and investors and creating a booming trend. In this section, we will provide insights into these six areas in terms of two metrics: influence and activity.
+
+### 5.1 Overall development trend of six major technology areas in the past five years
+
+![5-1](/image/data/chapter_5/5-1.png)
+
+
Figure 5.1 Trends in OpenRank by subfield over the last 5 years
+
+![5-2](/image/data/chapter_5/5-2.png)
+
+
Figure 5.2 Trends in activity by subfield over the past five years
+
+
+Cloud-native computing and artificial intelligence (AI) have gained popularity in the past five years, reflected in their increased number of repositories. Databases remain critical, while the influence of front-end development is shrinking. Operating systems have a smaller number of repositories but hold great value.
+
+### 5.2 5-Year Trends in OpenRank and Activity for the Top 10 Projects in Each Technology Area
+
+#### 5.2.1 Cloud Native
+
+![5-3](/image/data/chapter_5/5-3.png)
+
+
Figure 5.3 Trends in the Cloud-Native Top 10 OpenRank Projects over the Last Five Years
+
+![5-4](/image/data/chapter_5/5-4.png)
+
+
Figure 5.4 Cloud-Native Top 10 Active Project Trends in the Last Five Years
+
+
+
+Both indicators of Kubernetes have significantly decreased, while Grafana has emerged as the top influencer. The llvm-project has shown remarkable growth and has become the most active project in the past three years. LLVM is a compiler framework that comprises a collection of modular and reusable compiler as well as toolchain technologies. Its rapid growth in popularity among developers is a testament to its effectiveness.
+
+#### 5.2.2 Artificial intelligence
+
+![5-5](/image/data/chapter_5/5-5.png)
+
+
Figure 5.5 Trends in the AI Top 10 OpenRank Projects over the Last Five Years
+
+![5-6](/image/data/chapter_5/5-6.png)
+
+
Figure 5.6 Artificial Intelligence Top 10 Active Project Trends in the Last Five Years
+
+
+
+TensorFlow has been declining and is out of the top 5, while Pytorch is growing and widening the gap. LangChain, an open-source software project by Harrison Chase, is in second place in both indicators since it launched in October 2022 and is now one of the most popular frameworks for LLM development.
+
+#### 5.2.3 Big Data
+
+![5-7](/image/data/chapter_5/5-7.png)
+
+
Figure 5.7 Trends in the Big Data Top 10 OpenRank Projects in the Last Five Years
+
+![5-8](/image/data/chapter_5/5-8.png)
+
+
Figure 5.8 Big Data Top 10 Active Projects Trends in the Last 5 Years
+
+
+Kibana and Grafana are the top two big data solutions, with a consistent upward trend. Grafana is predicted to surpass Kibana and become the top-ranked solution in the future.
+
+Kibana is an open-source tool for data visualization and exploration, tightly integrated with ElasticSearch.
+
+Grafana is an open-source tool for monitoring and reporting. It can visualize data from various sources, including Prometheus, InfluxDB, and Graphite, among others. Grafana's data processing and visualization features enable the creation of different charts and dashboards.
+
+#### 5.2.4 Database
+
+![5-9](/image/data/chapter_5/5-9.png)
+
+
Figure 5.9 Trends in the Database Top 10 OpenRank Projects over the Last Five Years
+
+
+![5-10](/image/data/chapter_5/5-10.png)
+
+
Figure 5.10 Database Top 10 Active Project Trends in the Last Five Years
+
+
+Doris is the fastest-growing database, with activity metrics nearing the top spot, while ElasticSearch is dropping back in popularity. It is predicted that Doris will surpass ClickHouse in the future.
+
+ClickHouse is an open source MPP architecture designed by Yandex. It analyzes large amounts of data and is claimed to be 100-1000x faster than traditional databases. Key feature: high-performance vectorized execution engine. Also known for rich functionality and reliability.
+
+Apache Doris is contributed by Baidu open source MPP analytical database products , distributed architecture is simple , easy to operate and maintain .
+
+#### 5.2.5 Frontend
+
+![5-11](/image/data/chapter_5/5-11.png)
+
+
Figure 5.11 Trends in the Frontend Top 10 OpenRank Projects over the Last Five Years
+
+![5-12](/image/data/chapter_5/5-12.png)
+
+
Figure 5.12 Frontend Top 10 Active Project Trends in the Last Five Years
+
+
+While declining in both indicators year over year, Flutter still has a clear advantage over Next.js, which started to gain momentum in 2023 and is rising significantly. The 3-10 ranked programs are highly competitive, with little gap between them.
+
+Flutter is a framework developed and supported by Google. Front-end and full-stack developers use Flutter to build the user interface of applications for multiple platforms with a single code base.
+
+Next.js is an open source platform created by Vercel, built with Node.js and Babel translators and designed for use with React Single Page Application Framework. In addition, Next.js provides many useful features, such as preview mode, rapid developer compilation and static export.
+
+#### 5.2.6 Operating system
+
+![5-13](/image/data/chapter_5/5-13.png)
+
+
Figure 5.13 Trends in the Operating System Top 10 OpenRank Projects over the Last Five Years
+
+
+![5-14](/image/data/chapter_5/5-14.png)
+
+
Figure 5.14 Operating System Top 10 Active Project Trends in the Last Five Years
+
+
+As you can see, several repositories under the OpenHarmony project are in the top 10 list. This insight combines data from the Gitee platform so you can more intuitively see the advantages of domestic operating systems in various aspects (there are several repositories under the OpenHarmony project, and this insight analyzes them in terms of repositories). SerenityOS has fallen back a bit since 2021 and is second only to OpenHarmony and OpenEuler, which also have good performance.
+
+### 5.3 OpenRank Top 10 list for each field in 2023
+
+Below are the OpenRank rankings for projects in each field for 2023.
+
+#### 5.3.1 Cloud Native
+
+Table 5.1 Top Projects in Cloud Native
+
+| Number | Project Name | OpenRank |
+| :----: | :--------------------: | :------: |
+| 1 | grafana/grafana | 7134.37 |
+| 2 | lvm/llvm-project | 7049.62 |
+| 3 | kubernetes/kubernetes | 5374.14 |
+| 4 | ClickHouse/ClickHouse | 4941.99 |
+| 5 | cilium/cilum | 3215.42 |
+| 6 | ceph/ceeph | 3172.49 |
+| 7 | keycloak/keycloak | 3095.56 |
+| 8 | gravitational/teleport | 3082.18 |
+| 9 | envoyproxy/envoy | 2929.08 |
+| 10 | backstopage/package | 2903.39 |
+
+#### 5.3.2 Artificial Intelligence
+
+Table 5.2 Top Projects in Artificial Intelligence
+
+| Number | Project Name | OpenRank |
+| :----: | :----------------------------------: | :------: |
+| 1 | pytorch/pytorch | 10182.45 |
+| 2 | langchain-ai/langchain | 6080.25 |
+| 3 | Paddle/Paddle | 5408.62 |
+| 4 | huggingface/transformers | 4422.84 |
+| 5 | AUTOMATIC1111/stable-diffusion-webui | 3881.6 |
+| 6 | openvinoolkit/openvinvinino | 3857.31 |
+| 7 | microsoft/onnxruntime | 3006.75 |
+| 8 | tensorflow/tensor | 2723.26 |
+| 9 | Significant-Gravitas/AutoGPT | 2664.85 |
+| 10 | ggerganov/llama.cpp | 2339.8 |
+
+#### 5.3.3 Big Data
+
+Table 5.3 Top Projects in Big Data
+
+| Number | Project Name | OpenRank |
+| :----: | :-------------------: | -------- |
+| 1 | elastic/kibana | 7601.04 |
+| 2 | grafana/grafana | 7134.37 |
+| 3 | ClickHouse/ClickHouse | 4941.99 |
+| 4 | airbytehq/airbyte | 4658.86 |
+| 5 | apache/doris | 4307.26 |
+| 6 | elastic/elasticsearch | 3729.39 |
+| 7 | apache/airflow | 3642.9 |
+| 8 | StarRocks/starrocks | 3194.56 |
+| 9 | trinodb/trino | 2703.4 |
+| 10 | apache/spark | 2654.02 |
+
+#### 5.3.4 Database
+
+Table 5.4 Top Projects in Database
+
+| Number | Project Name | OpenRank |
+| :----: | :-------------------: | :------: |
+| 1 | ClickHouse/ClickHouse | 4941.99 |
+| 2 | apache/doris | 4307.26 |
+| 3 | elastic/elasticsearch | 3729.39 |
+| 4 | cockroachdb/cockroach | 3443.7 |
+| 5 | StarRocks/starrocks | 3194.56 |
+| 6 | trinodb/trino | 2703.4 |
+| 7 | apache/spark | 2654.02 |
+| 8 | pingcap/tidb | 2200.38 |
+| 9 | milvus-io/milus | 2001.11 |
+| 10 | yugabyte/yugabyte-db | 1940.75 |
+
+#### 5.3.5 Frontend
+
+Table 5.5 Top Projects in Frontend
+
+| Number | Project Name | OpenRank |
+| :----: | :-------------------: | :------: |
+| 1 | flutter/futter | 9361.81 |
+| 2 | vercel/next.js | 6638.65 |
+| 3 | appsmithorg/appsmith | 3474.07 |
+| 4 | nuxt/nuxt | 3387.23 |
+| 5 | facebook/react-native | 3260.55 |
+| 6 | Ant-design/ant-design | 3053.25 |
+| 7 | nodejs/node | 2736.37 |
+| 8 | angular/angular | 2273.82 |
+| 9 | Electron/electron | 1773.31 |
+| 10 | denoland/denoo | 1654.01 |
+
+#### 5.3.6 Operating system
+
+Table 5.6 Top Projects in Operating System
+
+| Number | Project Name | OpenRank |
+| :----: | :---------------------------------------------------------------------------: | :------: |
+| 1 | openharmony/docs | 3277.69 |
+| 2 | openharmony/arkui_ace_engagement | 2818.09 |
+| 3 | SerenityOS/serenity | 2257.68 |
+| 4 | openharmony/graphic_graphic_2d | 1239.6 |
+| 5 | openeuer/docs | 1206.9 |
+| 6 | openharmony/xts_acts | 1186.06 |
+| 7 | openharmony/arkcompiler_ets_runtime | 961.99 |
+| 8 | openharmony/interface_sdk-js | 910.91 |
+| 9 | reactos/reactos | 745.23 |
+| 10 | armbian/build | 679.1 |
+
+## 6. Insights on open source projects
+
+In 2023, large AI models like GPT-4 and CLIP emerged, leading to competition among global enterprises to invest in research and development for cutting-edge technologies like language understanding and image generation. The industry saw rapid evolution, marking the beginning of a new era in the broad application of AI. The database field experienced a trend of innovation with various technologies like distributed databases, time-series databases, and graph databases emerging to cater to different application scenarios. Cloud-native databases became popular, offering flexible scaling and high availability. This section provides data insights on project types by statistically analyzing project topics. In-depth insights are also provided into the two core areas of database and AI.
+
+### 6.1 Type of project
+
+This subsection selects the top 10,000 active GitHub repositories for statistical analysis.
+
+#### 6.1.1 Ratios for different project types
+
+
+
+
Figure 6.1 Ratios for different project types
+
+
+- Software development primarily comprises components and frameworks (libraries and frameworks), which constitute 31.36% of it. Developers enjoy using these open-source collaborative innovations, which are the most popular types to contribute to;
+- The Application Software category is second only to the Component Framework category (24.34%) due to its utility, enabling all users (not just developers) to utilize open source software in a variety of industries and domains;
+- Non-Software content holds a significant share of 23.17%. It shows the growing trend of open-source as a collaborative development model that extends to the entire content domain, including documentation, education, art, hardware, and other non-programming-related areas;
+- Developers find the Software Tools category valuable as it allows them to focus on building software applications and products, making up 18.9% of their work;
+- The System Software category comprises fundamental software, accounting for only 2.3% of the total despite its immense value and complexity.
+
+#### 6.1.2 Percentage of OpenRank by Project Type
+
+
+
+
+
Figure 6.2 Percentage of OpenRank by Project Type
+
+
+Let's take this a step further and look at these categories through the lens of OpenRank influence:
+
+- The most significant change is that content resource type (Non-Software) projects have relatively low impact, although they have high activity;
+- System Software, on the other hand, has a small percentage of activity but a relatively large percentage of influence, and a similar phenomenon can be observed with Software Tools projects;
+- The component framework type and the application software type have not changed much, and both are among the more prevalent types.
+
+#### 6.1.3 OpenRank Trends by Project Type in the Last 5 Years
+
+
+
+
+
Figure 6.3 OpenRank Trends by Project Type in the Last 5 Years
+
+
+As you can see from the five-year OpenRank evolution chart above, the influence of the System Software category is increasing year by year, while the influence of the Non Software category is decreasing.
+
+### 6.2 Project Topic Analysis
+
+This section also analyzes the top 10,000 active GitHub repositories and obtains insights from the Topic tags under the repositories.
+
+#### 6.2.1 Top Topic
+
+
+
+Figure 6.4 Top 10 appearances of Topic
+
+The top 10 topics cover a diverse range of areas, demonstrating the broad interest of the open-source community. JavaScript, Hacktoberfest, and Python are some of the most popular topics, representing hotspots for cutting-edge technologies, active community activities, and versatile programming languages. These topics highlight the interest in front-end development, open-source contributions, and interdisciplinary programming.
+
+#### 6.2.2 Overall OpenRank Trends for Repositories of Popular Topics
+
+
+
+Figure 6.5 OpenRank trends for repositories with top 10 Topic occurrences (2019 - 2023)
+
+
+- Hacktoberfest is an annual event that takes place in October. It aims to promote the open-source community and is organized by DigitalOcean in collaboration with GitHub. The goal of the event is to encourage more people to participate in open-source projects and contribute to the community. OpenRank is used to measure people's enthusiasm for open-source projects, community involvement, and contributions. Developers play an active role in the campaign by submitting Pull Requests to open-source projects, thus helping to increase the reputation and influence of the repository.
+- JavaScript and Python:technologies have maintained relatively stable trends over the past few years, with no significant growth or decline.
+
+### 6.3 Project analysis in databases
+
+This section uses information from open-source databases, which are disclosed in the [Database of Databases](https://dbdb.io/) and [DB-Engines Ranking](https://db-engines.com/en/ranking). The field is divided into 18 subcategories based on the storage structure and usage of databases. These subcategories include Relational, Key-value, Document, Search Engine, Wide Column, Time Series, Graph, Vector, Object Oriented, Hierarchical, RDF, Array, Event, Spatial, Native XML, Multivalue, Content, and Network. We then collect and analyze corresponding database information on GitHub. We examine the corresponding open-source projects for each database and gather and analyze their collaboration log data on GitHub. This helps us gain detailed insights into the field.
+
+#### 6.3.1 2023 OpenRank and Activity Lists by Subdomain in the Database Domain
+
+**1, OpenRank Rankings for Database Subdomains**
+
+Table 6.1 OpenRank Rankings for Database Subdomains
+
+| Ranking | Subfield Name | OpenRank |
+| :-----: | :-------------: | :------: |
+| 1 | Relational | 58092.36 |
+| 2 | Key-value | 21834.08 |
+| 3 | Document | 17264.93 |
+| 4 | Search Engine | 8093.77 |
+| 5 | Wide Column | 7896.43 |
+| 6 | Time Series | 7813.54 |
+| 7 | Graph | 5196.52 |
+| 8 | Vector | 4965.41 |
+| 9 | Object Oriented | 3104.07 |
+| 10 | Hierarchical | 1355.4 |
+| 11 | RDF | 592.68 |
+| 12 | Array | 383.95 |
+| 13 | Event | 256.59 |
+| 14 | Spatial | 224.05 |
+| 15 | Native XML | 209.51 |
+| 16 | Multivalue | 15.89 |
+| 17 | Content | 3.43 |
+
+**2, Activity Rankings for Database Subdomains**
+
+Table 6.2 Activity Rankings for Database Subdomains
+
+| Ranking | Subfield Name | Activity |
+| :-----: | :-------------: | :-------: |
+| 1 | Relational | 161025.44 |
+| 2 | Key-value | 62501.64 |
+| 3 | Document | 49400.11 |
+| 4 | Search Engine | 23799.87 |
+| 5 | Time Series | 22077.57 |
+| 6 | Wide Column | 21292.17 |
+| 7 | Vector | 16395.88 |
+| 8 | Graph | 14947.43 |
+| 9 | Object Oriented | 8418.14 |
+| 10 | Hierarchical | 3406.55 |
+| 11 | RDF | 1701.67 |
+| 12 | Array | 1280.14 |
+| 13 | Native XML | 737.94 |
+| 14 | Spatial | 680.79 |
+| 15 | Event | 654.42 |
+| 16 | Content | 33.94 |
+| 17 | Multivalue | 12.68 |
+
+The OpenRank and activity rankings for 2023 for each sub-domain of the database domain show that:
+
+- Relational, key-value, and document databases are the top three subdomains, accounting for over 70% of the database domain;
+- Relational's two indicators exceeded those of the second through fifth-place finishers combined and accounted for more than 40 percent of the database field, making it a mega-subcategory.
+
+#### 6.3.2 Trends over the last five years in projects under the various subfields of the database area
+
+![6-6](/image/data/chapter_6/6-6.png)
+
+Figure 6.6 Trends in OpenRank by Subdomain in Database Domain (2019 - 2023)
+
+![6-7](/image/data/chapter_6/6-7.png)
+
+Figure 6.7 Trends in Activity by Subdomain in Database Domain (2019 - 2023)
+
+The trend of OpenRank and the trend of activity of projects in each subdomain of the database domain over the past five years shows that:
+
+- Over the past five years, Relational, Key-value, and Document have consistently ranked in the top three in both indicators;
+- Search Engine, Wide Column, Time Series, Graph, Vector, and Object Oriented ranked fourth through ninth, with both indicators trending upward;
+- Search Engine and Vector subcategories have shown a fast growth rate. Search Engines have jumped two positions to become the fourth largest subcategory. Vector is still competing with the Graph subcategory and has the potential to improve its OpenRank. The influence created by the large model has not yet subsided, and it is predicted that Vector will overtake Graph by 2024.
+
+#### 6.3.3 Open source quadrant map of projects under each sub-domain of the database domain
+
+There are three metrics involved in the Open Source Quadrant diagram: Activity, Openrank, and CommunityVolume. CommunityVolume is the same formula as the Attention metric in open-digger, i.e. a weighted sum of the number of stars and the number of forks of the target project in a given period of time:`sum(1*star+2*fork)`.
+
+Quadrant plotting methods:
+
+1. Select the Top 10 projects by activity for each database subcategory;
+2. Make a `log(x)-log(y)` scatterplot of `log(openrank)-log(communityvolume)`, the base of the log is 2, denote the number of half-lives required for the spatial influence openrank and the temporal influence communityvolume to decay to 1, respectively.
+3. The vertical line corresponding to the mean value of the horizontal coordinates of all points on the graph is used as the vertical axis, and the horizontal line corresponding to the mean value of the vertical coordinates of all points on the graph is used as the horizontal axis to divide into four quadrants.
+
+There are a total of 18 subcategory labels in the database domain, and the top 9 categories that account for more than 1% of activity in 2023 were selected for statistical analysis to map the open source quadrant as follows:
+
+
+
+
+
+
Figure 6.8 Relational Database OpenRank-CommmunityVolume log-log Open Source Quadrant Map
+
+
+
+
+
+
Figure 6.9 Key-Value Database OpenRank-CommmunityVolume log-log Open Source Quadrant Map
+
+
+
+
+
+
Figure 6.10 Document-based databases OpenRank-CommmunityVolume log-log Open Source Quadrant Chart
+
+
+
+
+
+
Figure 6.11 Search Engine OpenRank-CommmunityVolume log-log Open Source Quadrant Chart
+
+
+
+
+
+
Figure 6.12 Time series database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart
Figure 6.14 Vector database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart
+
+
+
+
+
Figure 6.15 Graph database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart
+
+
+
+
+
+
Figure 6.16 object-oriented database OpenRank-CommmunityVolume log-log Open Source Quadrant Chart
+
+
+
+
+
+
Figure 6.17 Top 9 Subcategory Databases by Activity OpenRank-CommmunityVolume log-log Open Source Quadrant Chart
+
+
+The search engine category is highly polarized, with projects like ElasticSearch with high OpenRank and CommmunityVolume, and projects like Sphinx and Xapian with very low OpenRank and CommmunityVolume.
+
+From the first quadrant: relational, document, search engine, and vector are all database types with strong openrank influence and CommmunityVolume focus, while object_oriented is relatively weak in both areas.
+
+The Open Source Quadrant plot shows the vertical distribution of the Top 9 subclasses of databases in terms of activity. Among these subclasses, two stand out - search engine and vector. These two subclasses have a higher community volume than OpenRank, which means they have more active contributors. They also have a higher community voice, meaning their opinions and feedback are more valued. Additionally, they are known for faster development expectations compared to the other subclasses.
+
+### 6.4 Project Analysis of Generative AI Area
+
+This section will examine the open-source projects related to generative AI, using the [Generative AI Open Source (GenOS) Index](https://www.decibel.vc/articles/launching-the-generative-ai-open-source-genos-index) as a reference point. We will classify these projects into four subcategories: tools, models, applications, and infrastructure. The detailed insights are outlined below:
+
+#### 6.4.1 Growth trends in subfields of generative AI over the past five years
+
+
+
+
Figure 6.18 OpenRank Trends in Generative AI by Subdomain, 2019 - 2023
+
+
+
+
+
Figure 6.19 Activity Trends in Generative AI by Subdomain, 2019 - 2023
+
+
+- Categorization analysis of activity and influence across models, tools, apps, and infrastructure reveals consistent trends;
+- AIGC open source projects in the modeling category are more influential and active than those in the tools and applications categories;
+- The modeling category has grown rapidly since 2022 and surpassed Infrastructure in 2023. AIGC's innovative application development had a significant breakthrough in 2023, leading to concurrent application growth.
+
+#### 6.4.2 Trends in OpenRank and Activity Top 10 for Projects in the Generative AI Domain
+
+
+
+
Figure 6.20 5-Year Trend of OpenRank Top 10 Projects in Generative AI
+
+
+
+
+
Figure 6.21 5-Year Trend of the Top 10 Active Projects in Generative AI
+
+
+- langchain is ranked #1 in terms of influence and activity and is highly regarded by developers;
+- transformers has been the reigning champion in the AIGC field for the past few years, and its position is expected to remain unchallenged until 2023. This project has significantly impacted both the academic and open-source communities, showcasing its groundbreaking capabilities;
+- stable-diffusion-webui is an AIGC tool that has gained a lot of attention from developers. It has surpassed "Transformers" in terms of activity and is likely to surpass it in terms of influence by 2024;
+- Since being open-sourced in 2023, several AIGC projects have gained significant influence and activity, placing them on the Top 10 list. This highlights the rapid pace of change in the field of AIGC.
+
+#### 6.4.3 Top 10 List of OpenRank and Activity of Projects in Generative AI in 2023
+
+**1. List of OpenRank Top 10 Projects in Generative AI**
+
+
+
+
+| Ranking | Project Name | Activity |
+| :-----: | :----------------------------------------: | :------: |
+| 1 | langchain-ai/langchain | 22563.04 |
+| 2 | AUTOMATIC1111/stable-diffusion-webui | 13933.03 |
+| 3 | huggingface/transformers | 13618.11 |
+| 4 | Significant-Gravitas/AutoGPT | 10961.81 |
+| 5 | cobabooga/text-generation-webui | 8597.33 |
+| 6 | ggerganov/llama.cpp | 8108.62 |
+| 7 | run-llama/llama_index | 7532.47 |
+| 8 | milvus-io/milus | 6488.35 |
+| 9 | facebookincubator/velox | 4923.05 |
+| 10 | Chatchat-space/Langchain-Chatchat | 4477.63 |
+
+## 7. Developer Insights
+
+**Developers** are vital to open-source innovation. They create and supply open-source projects and contribute significantly to them. The total number of developers and their collaboration mechanism impact the amount of contribution. In this section, we will analyze data on individual developers at national and regional levels.
+
+### 7.1 Geographical distribution of developers
+
+This analysis, like the one in Section 1.3, is based on 10 million active GitHub developers. Out of the 100 million registered users on GitHub, only 2 million developers have provided accurate geolocation information, which makes up a 2% sample.
+
+**1. GitHub Active Developers Distribution Map**
+
+The number of active developers on GitHub was first visualized on a map, as shown below.
+
+![7-1.png](/image/data/chapter_7/7-1.png)
+
+Figure 7.1 2023 GitHub Active Developers Distribution Map
+
+
+GitHub developers are concentrated in areas with large populations and fast internet development, such as coastal regions of China, Europe, the United States, India, and the southeast coast of Brazil. They are sparsely distributed in other areas with small populations or less developed internet.
+
+**2. GitHub Active Developers by Country / Region**
+
+![7-2.png](/image/data/chapter_7/7-2.png)
+
+
Figure 7.2 GitHub Active Developers by Country / Region
+
+
+
+Table 7.1 2023 Ranking of Countries/Regions by Number of Active Developers
+
+
+| Ranking | States | Number of active |
+| :-----: | :------------: | :--------------: |
+| 1 | United States | 236899 |
+| 2 | China | 113893 |
+| 3 | India | 107066 |
+| 4 | Brazil | 83932 |
+| 5 | Germany | 64836 |
+| 6 | United Kingdom | 55175 |
+| 7 | Canada | 42238 |
+| 8 | France | 40341 |
+| 9 | Russia | 31534 |
+| 10 | Japan | 21942 |
+
+The United States has the largest number of developers, followed by China, India and Brazil, while other countries with a certain population and economic level, such as Canada and some European countries, also have a large number of developers on GitHub.
+
+**3. Distribution of Active Developers on GitHub in China**
+
+The graph below visualizes the distribution of the number of active developers on GitHub on a map.
+
+![7-4.png](/image/data/chapter_7/7-4.png)
+
+
Figure 7.3 2023 Distribution of Active Developers in China
+
+
+Table 7.2 2023 Regional Ranking of Active Developers in China
+
+
+| Ranking | Regions | Quantity |
+| :-----: | :-------: | :------: |
+| 1 | Beijing | 24151 |
+| 2 | Sengah | 18215 |
+| 3 | Guangdong | 16153 |
+| 4 | Zhejiang | 10927 |
+| 5 | Taiwan | 8823 |
+| 6 | Jiangsu | 5437 |
+| 7 | Chechen | 5311 |
+| 8 | Hong Kong | 3344 |
+| 9 | Hubei | 3273 |
+| 10 | Shaanxi | 1993 |
+
+Beijing is found to have the most GitHub users in China, followed by Shanghai, Guangzhou, and Zhejiang. Most of China's active GitHub users are in the eastern coastal regions, while some central provinces such as Shaanxi, Hunan, and Hubei also have a lot of active users, and it's worth noting that Sichuan has the most active GitHub users outside of the coastal regions.
+
+**4. GitHub China Developer Influence Distribution after OpenRank Weighting**
+
+Trying to do the aggregation with the OpenRank value of the developers in each region, we get the influence distribution map and regional ranking of Chinese developers, as shown in the following graph.
+
+![7-3.png](/image/data/chapter_7/7-3.png)
+
+
Figure 7.4 OpenRank influence distribution of Chinese developers
+
+
+Table 7.3 OpenRank Influence Ranking in China
+
+
+| Ranking | Regions | OpenRank |
+| :-----: | :-------: | :-------: |
+| 1 | Beijing | 506624.08 |
+| 2 | Sengah | 435804.42 |
+| 3 | Guangdong | 306014.24 |
+| 4 | Zhejiang | 274284.92 |
+| 5 | Taiwan | 216991.49 |
+| 6 | Chechen | 96881.79 |
+| 7 | Jiangsu | 83321.13 |
+| 8 | Hong Kong | 83238.46 |
+| 9 | Hubei | 51370.74 |
+| 10 | Fujian | 33482.25 |
+
+As you can see from the rankings, the OpenRank regional rankings are highly consistent with the regional rankings for the number of active developers:
+
+- There are significant regional differences in terms of the influence of Chinese developers. Developers from Beijing and Shanghai dominate the first class, while developers from Guangdong, Zhejiang, and Taiwan fall into the second class. These regions have a different level of influence compared to those ranked lower;
+- The overall number of active people in Sichuan is smaller than in Jiangsu, but the overall influence is greater, and the same phenomenon occurs in Fujian and Shaanxi.
+
+### 7.2 Developer Working Hours Analysis
+
+This section analyzes the working hours of GitHub and Gitee developers. By default, the time is in the UTC zone, with an 8-hour lag compared to the East Eighth Time Zone, i.e., Beijing Standard Time. The data is scaled to the [1-10] range by default using the min-max method, with larger dots representing higher values in the time zone graph.
+
+#### 7.2.1 Distribution of working hours of global developers
+
+**Distribution of working hours of GitHub-wide developers**
+
+According to statistics on developers' working hours across GitHub, the majority of developers work between 6 and 21 hours. There is a higher concentration of developers working at 12 o'clock, likely due to timed tasks. Weekends (Saturdays and Sundays) are relatively inactive.
+
+![7-5.png](/image/data/chapter_7/7-5.png)
+
+
Figure 7.5 GitHub-wide developer working hours in 2023
+
+
+**Distribution of working hours of Gitee-wide developers**
+
+![7-6.png](/image/data/chapter_7/7-6.png)
+
+
Figure 7.6 Gitee-wide developer working hours in 2023
+
+
+The Gitee data clearly aligns more with the East Eighth Time Zone's work time routine.
+
+**Global developer working hours distribution, excluding bots**
+
+![7-7.png](/image/data/chapter_7/7-7.png)
+
+
Figure 7.7 2023 Global Developers' Working Hours, Excluding Robots
+
+
+RAfter removing the bot data, it is found that the time distribution of developers is more prevalent in the interval of 6:00 - 21:00, which is more evenly distributed.
+
+#### 7.2.2 Distribution of working hours on the project
+
+Below is a comparison of the working hours distribution of the top four Chinese OpenRank repositories and the top four global OpenRank GitHub repositories in 2023.
+
+Distribution of working hours on the top four OpenRank projects in the global GitHub repository
+
+1. NixOS/Nixpkg
+
+![7-8.png](/image/data/chapter_7/7-8.png)
+
+
Figure 7.11 MicrosoftDocs/azure-docs Working Hours in 2023
+
+
+**Distribution of working hours of the top 4 OpenRank repositories in China**
+
+1. OpenHarmony
+
+![7-12.png](/image/data/chapter_7/7-12.png)
+
+
+
+
+### 7.3 Developer Role Analysis
+
+This section categorizes GitHub users into four roles: **Explorer**, **Participant**, **Contributor**, and **Committer**, based on events they trigger in open-source repositories. The four roles are defined in the table below.
+
+
Table 7.5 Four Roles of Developer
+
+
+
+| Roles | Definitions | Meaning |
+| -------------------------------------------- | ----------------------------------------------------------------- | ---------------------------------------------------- |
+| Explorer | Users who star a project | Indicates the user has some interest in the project |
+| Participants | Users who have made an Issue or Comment on a project | Indicates user participation in the project |
+| Contributor | Users with Pull Requests (PRs) for a project | Indicates that the user has contributed to the project's code base |
+| Commiter | Users participating in PR-review or merge | Indicates that the user has contributed deeply to the project |
+
+The figure below shows the four cascaded and structured roles. Using the defined role structure, we evaluate the top 10 projects in the OpenRank rankings of GitHub-wide projects from three perspectives: number of roles, time change, and developer role evolution. This is based on the project ranking list in Part II.
+
+![7-16.png](/image/data/chapter_7/7-16.png)
+
+
Figure 7.16 Developer Roles and Relationships
+
+
+#### 7.3.1 Distribution of roles
+
+
Table 7.6 Distribution of the number of developer roles for the top 10 projects in the OpenRank rankings
+
+
+
+Spring:
+
+- Based on the number of explorers, the three most popular projects are godotengine/godot, microsoft/vscode, and home-assistant/core, suggesting they have received widespread attention and support;
+- microsoft/vscode is the project with the largest gap between the number of participants and contributors, while microsoft/winget-pkgs has the smallest gap between the two;
+- NixOS/nixpkgs has the highest number of committers at 2,638 compared to other projects. In contrast, the digitalinnovationone/dio-lab-open-source project has the lowest number of committers.
+
+#### 7.3.2 New additions to roles in 2023
+
+Role additions are counted as valid additions to role X if a user who was not in role X (e.g., a contributor or submitter role) before 2023 becomes in that role in 2023.
+
+For example, if A submits a PR to Project B in 2021 (but never participates in the Code Review process), and A reviews the PR in Project B in 2023, A is the new committer.
+
+The details of the roles added are shown in the graph below and the table below.
+
+![7-18.png](/image/data/chapter_7/7-18.png)
+
+
Figure 7.18 Map of new roles in the open source community in 2023
+
+
+
Table 7.7 Distribution of the number of new developer roles for the top 10 projects in the OpenRank rankings
+
+
+
+| Repository name | New Committer | New Contributor | New Participant | New Explorer |
+| ---------------------------------------- | ------------- | --------------- | --------------- | ------------ |
+| NixOS/Nixpkg | 1226 | 1622 | 1591 | 3027 |
+| Home-assistanceant/core | 538 | 808 | 4640 | 8998 |
+| microsoft/vscode | 263 | 394 | 10216 | 15746 |
+| MicrosoftDocs/azure-docs | 352 | 1420 | 3913 | 1579 |
+| pytorch/pytorch | 391 | 802 | 2083 | 13016 |
+| godotenine/godot | 386 | 708 | 2834 | 22996 |
+| flutter/futter | 184 | 455 | 3954 | 13579 |
+| odooo/odoo | 244 | 453 | 472 | 4991 |
+| digitalinnovationone/dio-lab-open-source | 40 | 3611 | 732 | 504 |
+| microsoft/winget-pkgs | 231 | 957 | 485 | 1373 |
+
+The results showed:
+
+- The repository godotengine/godot received the highest number of stars, 22,996, with half added in September 2023 due to game developers seeking open-source alternatives to Unity's new charging strategy. Meanwhile, digitalinnovationone/dio-lab-open-source and Microsoft/winget-pkgs received the fewest new stars, 504 and 1,373, respectively;
+- The repository with the highest number of new participants was microsoft/vscode with 10,216; digitalinnovationone/dio-lab-open-source had the fewest new Issues with 732;
+- The repository with the highest number of new contributors was NixOS/nixpkgs with 1,622;
+- The repository with the highest number of new committers was also NixOS/nixpkgs with 1,226.
+
+#### 7.3.3 Perspectives on Developer Evolution
+
+The developer evolution process is defined as the number of roles in an open-source community that moves to other roles. This report only measures the number of developers who have moved from one role to a more profound one. For example, a user who participated until 2023 will change from a participant to a contributor in 2023 when they make their first PR.
+
+![7-19.png](/image/data/chapter_7/7-25.png)
+
+
Figure 7.19 Developer Role Evolution Diagram
+
+
+
Table 7.8 Distribution of the number of role conversions for the top 10 OpenRank projects
+
+
+
+| Repository name | Contributor -> Committer | Participant -> Contributor | Explorer -> Participant |
+| :--------------------------------------: | :----------------------: | :-----------------------: | :---------------------: |
+| NixOS/Nixpkg | 254 | 122 | 168 |
+| Home-assistanceant/core | 70 | 113 | 134 |
+| microsoft/vscode | 16 | 70 | 287 |
+| MicrosoftDocs/azure-docs | 129 | 169 | 21 |
+| pytorch/pytorch | 60 | 53 | 187 |
+| godotenine/godot | 63 | 131 | 330 |
+| flutter/futter | 31 | 91 | 419 |
+| odooo/odoo | 55 | 19 | 32 |
+| digitalinnovationone/dio-lab-open-source | 0 | 0 | 0 |
+| microsoft/winget-pkgs | 49 | 11 | 18 |
+
+The results showed:
+
+- Across communities, we can observe the typical funnel model of an evolutionary path from explorers to participants to contributors and committers. In godotengine/godot, for example, 330 contributors successfully evolved to committers, 131 participants became contributors, while 63 explorers evolved to participants. This trend was also observed in other communities and is consistent with the general evolution of community members from initial exploration to deeper involvement.
+- In some communities, such as NixOS/nixpkgs, we observed many contributors evolving into committers. In this community, 254 contributors successfully evolved into committers, which may represent a relatively high demand for code review. This may encourage more contributors to become deeply involved in maintenance, which may help improve the quality and stability of the community's code.
+- In some communities, such as flutter/flutter and godotengine/godot, we observed a relatively high number of successful conversions of explorers into participants. In flutter/flutter, 419 explorers evolved into participants, while in godotengine/godot, 330 explorers turned into participants.
+- The digitalinnovationone/dio-lab-open-source project has no data since it was created in 2023.
+
+### 7.4 Robot account analysis
+
+Robotic (bot) automation is a significant contributor to open-source collaboration platforms. This section analyzes nearly 600 million repository events across 7.7 million open-source repositories and over 1,200 bot accounts for 2023.
+
+#### 7.4.1 Analysis of active data of robots
+
+
+
+
+
+
+
Figure 7.20 Trend in number of robot events (left) & percentage of robot events in 2023 (right)
+
+
+Analyzing the robotics activity data from 2015 to 2023, some of the observations are as follows:
+
+Since 2019, the number of bot events has increased significantly, rising from 4,217,635 to 304,257,084. This surge in bot account activity on GitHub can be attributed to the widespread adoption and advancement of GitHub's automation, continuous integration, and continuous deployment (CI/CD) tools between 2019 and 2021.
+
+Despite the small number of bot accounts, each bot serves multiple repositories, demonstrating efficiency and broad reach.
+
+#### 7.4.2 Analysis of event types for robots
+
+![7-22.png](/image/data/chapter_7/7-22.png)
+
+
Figure 7.21 Difference in number and annual growth rate (%) of GitHub event counts (2022 vs 2023)
+
+
+This graph shows the change in the number of GitHub events by type and their growth rate between 2022 and 2023. By comparing the data from these two years, we can gain insight into the trend of bot account usage in the development process:
+
+- Dominance of Code Push: PushEvent dominates bot account activity, with a significant rise in volume especially in 2023, suggesting that bot accounts play an important role in code maintenance and updates;
+- Changes in project creation activity: CreateEvent is very active in 2022, but declines in 2023, which may indicate a decline in bot account activity in creating new projects;
+- Importance of code review and collaboration: PullRequestEvent and IssueCommentEvent numbers were higher in both years, showing the active participation of bot accounts in code reviews and issue discussions;
+- Changes in activity types: DeleteEvent decreases in 2023 compared to 2022, while ReleaseEvent increases, reflecting the different focus of robotic accounts in project lifecycle management;
+- Increase in annotation-related events: CommitCommentEvent and PullRequestReviewCommentEvent increased in 2023, indicating that bot accounts are becoming more active in the code review process with discussions and feedback;
+- Specific uses of bot accounts: less common event types such as GollumEvent, MemberEvent, PublicEvent, and WatchEvent are relatively low in number, suggesting that bot accounts are primarily used for specific automation tasks and are less involved in social interactions.
+
+#### 7.4.3 Distribution of working hours for robot accounts
+
+Similar to the developer working hours distribution, we also analyzed the data on the working hours of bot accounts.
+
+![7-23.png](/image/data/chapter_7/7-23.png)
+
+
Figure 7.22 Distribution of robot account working hours
+
+
+- The working hour distribution of the robot account is mainly centered on 0am to 1am and 12pm to 13pm;
+- Based on the global developer time zones it can be surmised that most automated processes are more active in the early morning and midday hours;
+- Robot work active time is less relevant to workdays and non-workdays, most automated collaborative tasks are scheduled, and fewer are related to responding to a contributor's event.
+
+#### 7.4.4 GitHub's top list of incidents for collaborative bots
+
+![7-24.png](/image/data/chapter_7/7-24.png)
+
+
Figure 7.23 2023 GitHub's top list of incidents for collaborative bots
+
+
+## 8. Case Studies
+
+### 8.1 openEuler Community Case Study
+
+In 2023, the OpenDigger community integrated Gitee data for the first time, allowing Gitee projects to participate in OpenRank calculations. The openEuler community surpassed PaddlePaddle in the same year, achieving an OpenRank value of 16,728. This made it the second largest open source community in China, after openHarmony.
+
+In 2023, the openEuler community attracted 3,941 developers to collaborate on Issues or PRs, with 1,934 contributors successfully contributing and merging at least one PR to the openEuler community's repository.
+
+It's worth noting that the openEuler community started a document bug hunt in early 2023. They also integrated an interactive page contribution mechanism with Gitee on the community's official document website. This feature enables developers to correct any errors they find while reading the documents directly on the official website. With just a single click, they can launch Gitee lightweight pull requests (PRs), without having to jump to the Gitee platform or perform Git operations.
+
+The data change from this innovative mechanism is impressive. In 2023, the openeuler/docs repository incorporated 7,764 PRs, 74% of which were submitted directly through the official web page. The launch of this mechanism also significantly increased the average number of active contributors per month (from 30 to 80), and the average number of PRs merged per month (from 116 to 722).
+
+One noteworthy project is openeuler/mugen, which is a highly active testing framework project within the openEuler community. In 2023, 138 developers participated in discussions and contributed to the project, with 95 successfully joining PR. The project has the third-highest OpenRank within the openEuler community, after the openeuler/docs documentation repository and the openeuler/kernel kernel repository. This excellent testing framework enables developers to quickly write and test cases to verify the correctness and validity of their contributions, significantly reducing the cost of subsequent contributions.
+
+To summarize, the openEuler community has achieved a high OpenRank value thanks to its effective contribution mechanism and testing framework. The community has designed an interactive system that allows for easy documentation contribution with minimal costs. Moreover, contributors can quickly verify the accuracy of their code through a reliable testing framework. These developer experience optimizations are excellent examples for other open-source communities to follow and implement.
+
+### 8.2 List of top repositories contributed by Chinese developers
+
+We analyzed how Chinese developers contributed to the top 30 repositories in the OpenRank ranking list for 2023 using data from almost 10 million GitHub developer accounts, including nearly 200,000 from China:
+
+![8-1.png](/image/data/chapter_8/8-1.png)
+
+
Figure 8.1 Top 30 Contributed Repositories by Chinese Developers on GitHub
+
+
+Most of the projects are represented in the master OpenRank list, the more interesting ones include:
+
+- [NixOS/Nixpkgs](https://github.com/NixOS/nixpkgs):It's also a top international project, a package management tool for a new operating system, and while most of the updates are package information updates, it also means that the ecosystem of that operating system itself is thriving.
+
+- [Intel-analytics/BigDL](https://github.com/intel-analytics/BigDL):a runtime repository was created to run LLM on the Intel XPU in 2017. However, it became nearly obsolete by the end of 2021. Surprisingly, it made a comeback with the rise of LLM in 2022 and now maintains an active size of around 50 people per month.
+
+
+
+
+
+
Figure 8.2 BigDL OpenRank Trend Chart
+
+
+
+> Screenshot above from [HyperCRX](https://github.com/hypertrons/hypertrons-crx)
+
+- [siyuan-note/siyuan](https://github.com/siyuan-note/siyuan):Siyuan Notes, a privacy-first domestic open source knowledge management tool, supports bidirectional knowledge block-level references and maintains an active community size of one hundred people per month. Supports subscription commercialisation at a very affordable price.
+
+- [baidu/amis](https://github.com/baidu/amis):is an open-source low-code page generation framework developed by Baidu. In recent years, low-code projects have gained immense popularity, such as Ali's open-source LowcodeEngine, Harmony ecosystem family's DevEco Studio, etc. These projects have provided great convenience for developers to rapidly develop applications using low-code.
+
+- [Cocos/cocos-engine](https://github.com/cocos/cocos-engine):domestic game engine leader, with the rise of the concept of meta-verse, godot and other game engines become the world's important top open source projects, and domestic game engine cocos/cocos-engine also has excellent performance in China.
+
+- [MaaAssistantArknights/MaaAssistantArknights](https://github.com/MaaAssistantArknights/MaaAssistantArknights) This is a fascinating project aimed at automating daily quests for the game Tomorrow's Ark using a script assistant. The automation can be achieved through a mobile phone simulator. The project is community-maintained, open source, free, and supports all desktop platforms. It has received over 10,000 stars and has more than 300 active contributors every month, which is fantastic.
+
+![8-3.png](/image/data/chapter_8/8-3.png)
+
+
+
diff --git a/en/index.md b/en/index.md
new file mode 100644
index 0000000..a3a225f
--- /dev/null
+++ b/en/index.md
@@ -0,0 +1,284 @@
+---
+# https://vitepress.dev/reference/default-theme-home-page
+layout: home
+
+hero:
+ name: "2023 China Open Source Annual Report"
+ text: ""
+ tagline: Kaiyuanshe collaborates with open-source communities and organizations to publish an annual report on global and China's open-source trends. The report provides valuable insights into the latest developments in the dynamic open-source field.
+
+ actions:
+ - theme: brand
+ text: Read 2023 Annual Report Immediately
+ link: /en/preface
+ - theme: alt
+ text: Previous Reports
+ link: https://kaiyuanshe.feishu.cn/wiki/wikcnUDeVll6PNzw900yPV71Sxd
+
+features:
+ - icon:
+ src: "/image/home/KaiYuanShe-logo.png"
+ width: 40
+ height: 40
+ title: KAIYUANSHE
+ details: KAIYUANSHE is a non-profit, vendor-neutral, open-source community formed in 2014. It comprises individual volunteers who contribute towards the cause of open source. The community envisions being "rooted in China, contributing globally, and promoting open-source as a way of life in the new era." Its mission is to achieve "open-source governance, global connection, community development, and project incubation." Its community governance principles are to practice "Contribution, Consensus, and Collegiality." The community's goal is to create a healthy and sustainable open-source ecosystem.
+ link: https://kaiyuanshe.cn/
+ linkText: website
+ - icon:
+ src: "/image/home/yunqi_partnets_logo.jpg"
+ width: 40
+ height: 40
+ title: Yunqi Partners
+ details: Yunqi is a research-based venture capital firm founded in 2014 in China. Its investment focuses on technology innovation and industry empowerment, covering various areas such as advanced manufacturing, enterprise software, cutting-edge technology, and industrial supply chain technology.Yunqi has been consistently ranked among China's Top 10 Best Early Stage Investment Firms by Zero2IPO, China Venture, and 36Kr. As an early-stage lead investor, Yunqi has invested in over 170 startups, out of which 30 have emerged as industry leaders, including Qifu Technology (NASDAQ:QFIN), Intco Medical (SZ:300677), Intco Recycling (SH:688087), Kujiale, Baibu, Deeproute.ai, MiniMax, KEENON Robotics, XTransfer, Worldwide Logistics, and Takfung. Besides, Yunqi also collaborates in co-creating the open-source ecosystem and has led investments in PingCAP, Zilliz, Jina AI, RisingWave, TabbyML, and several other open-source firms. Along with KAIYUANSHE, it has produced the open-source commercialization chapter of the China Open Source Annual Report in 2021, 2022, and 2023.
+ link: https://www.yunqi.vc/
+ linkText: website
+ - icon:
+ src: "/image/home/x_lab2017_logo.jpg"
+ width: 40
+ height: 40
+ title: X-lab
+ details: X-lab Open Lab is a community dedicated to open-source research and innovation. It comprises experts, scholars, and engineers from domestic and international universities, startups, and various Internet and IT companies. They focus on open innovation in the open-source software industry and come from diverse professional backgrounds, including computer science, software engineering, data science, business administration, sociology, economics, and other interdisciplinary fields. They have been practicing open source strategy, open source measurement, open source digital ecosystem, and other related topics for a long time. The group has significantly contributed to open-source governance standard development, open-source community behavior metrics and analysis, open-source community process automation, and open-source domain-wide data governance and insights.
+ link: https://github.com/X-lab2017
+ linkText: GitHub
+---
+
+
+
+
+
+
+
+ Writing Team
+
+
+ Convenor
+
+
+
+
+
+ OSS Questionnaire
+
+
+
+
+
+
+ Data Analytics
+
+
+
+
+
+
+ OSS Commercialization
+
+
+
+
+
+
+ OSS Chronicle
+
+
+
+
+
+
+ Editorial & Compilation
+
+
+
+
+
+ 设计/排版
+
+
+
+
+
+
+
+ Commentary Experts
+
+ (Sorted by Last Name)
+
+
+
+
diff --git a/en/open-source-milestones.md b/en/open-source-milestones.md
new file mode 100644
index 0000000..87558eb
--- /dev/null
+++ b/en/open-source-milestones.md
@@ -0,0 +1,712 @@
+---
+outline: deep
+---
+# OSS Chronicle
+
+## Overview
+
+Why do we include a considerable amount of international open-source news in the Open Source Chronicle section of the China Open Source Annual Report? These are the significant events that Chinese open-source enthusiasts must be aware of, and they are the crucial events that impact China's open-source community or will do so in the future.
+
+The Open Source Chronicle reflects the foremost open-source events of 2023 that have captured the attention of editorial volunteers from diverse backgrounds. The overarching theme underpinning our thought process is exploring open-source technologies' vast potential and accompanying benefits for a wide range of stakeholders. As editorial volunteers, we are committed to ensuring that our coverage of this landmark event is comprehensive, objective, and informative, enabling our readers to understand better the latest trends and developments in the open-source domain.
+
+* Disruptive innovations in global "**Open-Source Technologies**" such as artificial intelligence and machine learning, are the main theme throughout the Chronicle;
+* Global conflicts resulting from geopolitical dynamics indirectly impact "**Open-Source Ecology**", regardless of East vs. West;
+* This has resulted in a shift towards "**Open-Source Governance**" in all areas, including regions, law, trade, and communities;
+* whereas the topic "**Open Source Security**" is considered a top priority;
+* The growth of "**Open Source Commercialization**" is a promising trend, and though 2023 may pose some challenges, it's encouraging to know that there is an abundance of open-source startups thriving worldwide, including in China.;
+* In today's world, technology, ecology, governance, and commercialization are undergoing significant changes. This has made "**Open Source Education**" a crucial foundation for exploring new possibilities. Artificial intelligence is a prime example of disruptive innovation that requires persistent research and a robust higher education system to achieve its current level of success.
+* The last part of the "**Open Source Ranklists, Papers and Reports**" is like a delightful dessert after dinner. It will be fascinating to observe if it provides valuable insights and accurately predicts the future of open-source development in China. We can only know for sure by the end of 2024.
+
+
+This year, AI is present in all categories. A holistic approach is necessary for full comprehension.
+
+In brief, we stand on the brink of a world where AI will transform the way things work. We hope to meet you at next year's Open Source Chronicle!
+
+## 1. Open Source Technology Chronicle
+
+### 1.1 Artificial Intelligence and Large Models
+
+- **ZHIPU AI - GLM**
+ZHIPU AI has open-sourced the ChatGLM-6B series, ChatGLM-6B is an open-source dialogue language model that supports bilingual Q&A. In addition, ZHIPU AI has open-sourced VisualGLM-6B (CogVLM), a multi-modal dialogue model, which combines the capabilities of image processing and natural language processing to support both Chinese and English dialogues, aiming to provide a richer and more intuitive interactive experience.
+- **Baichuan**
+Over the past year, Baichuan has released several versions of large models, including Baichuan-7B. Later, they launched the 13B model and the Baichuan2 series of models, and made the base and chat versions open-source. One of the latest models, Baichuan2-192K, has a large size and a context window length of 192K.
+- **Intern general large model system**
+Shanghai Artificial Intelligence Laboratory (AIL) released the newly upgraded "Intern General Large Model System", which includes three basic models, including Intern Multimodal - Large Model, InternLM - Large Language Model and InternLandMark - Large-scale 3D Neural Radiance Field, as well as the first Full-Chain Open Source System for the research, development and application of large models.
+- **Alibaba - Qwen**
+Alibaba open-sourced the 7B model of Tongyi QianWen (Qwen), and then successively open-sourced the base and chat models of 1.8B, 14B, and 72B, and provided the quantised versions of the corresponding int4 and int8. In the multimodal scenarios, QianWen also open-sourced the two multimodal models of vision and speech, qwen-vl and qwen-audio.
+- **Kunlun - Skywork**
+Kunlun Inc. released the 10 billion large language model "Skywork" Skywork-13B series and open-sourced the 600GB, 150B Tokens large and high-quality open-source Chinese dataset. Skypile/Chinese-Web-Text-150B dataset.
+- **RWKV**
+RWKV has been continuously open-sourced since its release as a non-Transformer structured model for large languages. In 2023, RWKV has released multiple versions and entered LF AI & Data for incubation.
+- **Inspur - Yuan 2.0**
+Inspur Electronic Information Industry Co., Ltd. officially released "Yuan 2.0", a 100-billion base model. This series of models is fully open-sourced and commercially available, including three versions with parameter values of 102B (102.6 billion), 51B (51.8 billion), and 2B (2.1 billion). Compared with Source 1.0, Source 2.0 has improved programming, reasoning, and logic.
+- **01.AI - Yi**
+In November 2023, 01.AI released the Yi series of models with parameter sizes between 6 and 34 billion and 30 billion tokens of training data.
+- **Fire-Flyer Quant - DeepSeek**
+DeepSeek, a division of High-Flyer Quant, has released its 67B open-source large model. DeepSeek has open-sourced the 7B and 67B scale models, which contain a base model (base) and an instruction tuning model (chat). No application is required, and it is free for commercial use. At the same time, the project team has also opened nine model checkpoints in the middle of training for download.
+- **Ant Group - CodeFuse**
+Ant Group has open-sourced CodeFuse-13B and CodeFuse-CodeLlama-34B for CodeLlama, which currently supports a variety of code-related tasks such as code completion, text-to-code, and unit test generation. The open source includes the MFT (Multi-Task Fine-Tuning) framework, a dataset for enhancing the coding capabilities of LLMs, and a deployment framework.
+
+- **Meta Llama 2**
+In July 2023, Meta announced the Llama 2 project and disclosed that they had successfully open-sourced three pre-trained models at different scales, which included the 7B, 13B, and 70B parameter versions. These models were trained on a massive 2 trillion token scale during the pre-training phase. In the Supervised Fine-Tuning (SFT) phase, they were fine-tuned with over 100,000 pieces of data to improve their performance on specific tasks. Additionally, Meta made the Llama2-Chat model open-source, which is SFT-optimized based on conversation data. Furthermore, Meta is continuing to open-source the CodeLlama programming language large model.
+- **Mixtral 8x7B**
+In December 2023, Mixtral open-sourced the Mixture of Experts (MoE) open-source model Mixtral 8x7B, commercially available under the Apache 2.0 license. Mixtral-8x7B is a Mixtrue of Experts consisting of eight networks of experts with 7 billion parameters, a structure that improves the model's efficiency in processing information and reduces operating costs.
+- **Falcon 180B**
+Falcon 180B is an open source large language model released by the Technology Innovation Institute (TII). The model has 180 billion parameters and was trained using TII's RefinedWeb dataset.
+- **Arabic AI Large Models Jais Open Sourced**
+A team of UAE researchers has announced the open-sourcing of the Arabic large model Jais. Jais is a bilingual Arabic-English large language model pre-trained with 13 billion parameters.
+- **Microsoft open-sourced visual foundation model Visual ChatGPT**
+Microsoft has launched Visual ChatGPT, an open-source project that combines OpenAI's ChatGPT with a series of Visual Foundation Models (VFMs) to enable users to send and receive images during chats. The project aims to extend the functionality of ChatGPT so that it can not only process text but also understand and generate images, thus enabling a multimodal interactive experience.
+- **NVIDIA officially open sourced TensorRT-LLM**
+NVIDIA has officially released an optimized open-source library called TensorRT-LLM. This library helps to speed up the performance of large language models on AI GPUs such as Hopper. In order to test the performance, NVIDIA compared H100 with TensorRT-LLM-enabled H100, both based on A100. The results showed that in GPT-J 6B inference, the performance of H100 was 4 times better than A100, while the performance of TensorRT-LLM-enabled H100 was 8 times better than A100.
+- **Elon Musk drives the efforts of X (formerly Twitter) to open source its recommendation algorithm**
+X (Twitter) has released two repositories on GitHub (main repo , mlrepo) that cover much of the Twitter source code including recommendation algorithms, including the mechanisms used to control the tweets users see on the For You timeline.
+- **Hugging Face changes its Text Generation Inference (TGI) licence**
+Hugging Face has announced that in the latest release of TGI v1.0, its open-source license will change from Apache 2.0 to HFOIL 1.0. HFOIL stands for Hugging Face Optimized Inference License, which is HuggingFace's specifically designed license agreement for optimized inference solutions.
+- **Hugging Face has open sourced Rust-based machine learning framework Candle**
+Hugging Face recently open-sourced Candle, a novel and small Rust ML framework that runs extremely fast and supports a wide range of powerful models. It provides support for GPUs and has an optimised CPU backend that runs in the browser. Candle also includes several pre-trained models and use cases, such as speech recognition models, generic LLMs, computer vision models, and more.
+- **Alibaba has open sourced AnyText**
+Alibaba has recently released a multi-language visual text generation and editing model called AnyText. This model allows users to create text that is comparable to that of a professional Photoshop editor. With AnyText, users can customize the location, strength, intensity, and number of text seeds that appear in a picture.
+- **Jina AI launches world's first open source 8K Text embedding model**
+Jina AI announced the release of the Jina-embeddings-v2 model, an open source product that supports 8K (8,192 tokens) context lengths and is similar in functionality and performance to OpenAI's text-embedding-ada-002.
+
+### 1.2 Operating Systems and Programming Languages
+
+- **The Long Term Support (LTS) version of Linux kernel now has 2-year maintenance period instead of 6**
+The Linux kernel LTS releases were extended to six years in 2017. Recently, a tweak was made to the policy. Jonathan Corbet of Linux Weekly News said it doesn't make sense to maintain old kernels for so long because they're not used much.
+- **India's Ministry of Defence develops its own Linux distribution, Maya OS, to fully replace Windows**
+India's Ministry of Defence has announced a significant overhaul of its cybersecurity system. It plans to replace the Windows operating system with a Linux distribution called Maya in all its networked computers. The move is in response to the growing threat of malware and ransomware attacks. It aims at promoting independent innovation and reducing dependence on foreign software.
+- **Red Hat Announces CentOS 7 and RHEL 7 end of support on 30 June 2024**
+Red Hat has recently announced the discontinuation of support for CentOS 7 and RHEL 7. In addition, the complete source code for RHEL will no longer be publicly available. To maintain compatibility and support, downstream distributions of RHEL (such as CentOS, Rocky Linux, AlmaLinux, etc.) will need to recompile and release their versions within 30 days.
+ However, it is important to note that Red Hat has assured the CentOS community that it will not be going away. Community contributors and CentOS users will still be able to collaborate on open-source Linux distributions that are part of the CentOS Stream project.
+- **Google's open-source browser project Chromium announces the use of Rust**
+Google has posted a blog post announcing that it will support using third-party Rust libraries from C++ in Chromium, with plans to include Rust code in Chrome binaries by the end of the year. It also said that Rust, a Mozilla-developed programming language that offers security along with high performance, was initially designed to be used for writing browsers, so it's only fitting that open-source operating systems like Chromium rely on the technology.
+- **Open-source operating system openKylin 1.0 officially released, already supports Arm, RISC-V**
+The latest version of openKylin, version 0.9, now supports Arm and RISC-V. Additionally, the new openKylin 1.0 version comes with 6.1+5.15 dual kernels by default, along with independent selection and upgrade of 20+ operating system core components. The latest version also adds many new features and fixes more than a thousand bugs, improving the overall stability and compatibility of the system to provide users with a better experience.
+- **Huawei officially releases HarmonyOS 4**
+Huawei officially released the HarmonyOS 4 operating system. The new HarmonyOS 4 is said to have breakthroughs in privacy and security, AI large model capability, and personalized interaction.
+- **fit2cloud open sourced 1Panel**
+1Panel is a modern, open-source Linux server operation and management panel that provides users with accessible server-building and management resource services.
+- **AWS open source specific language Cedar Cedar**
+AWS has released Cedar as open source. Cedar is a domain-specific language that enables defining policy access permissions. It is integrated into Amazon Verified Permissions and AWS Verified Access. Cedar can also be integrated into applications through SDKs and language specifications.
+Cedar allows defining access policies separately from the application code, which facilitates writing, analyzing, and auditing the policies independently. Cedar supports both Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC).
+- **Microsoft releases Guidance language**
+Microsoft has introduced a domain-specific language called Guidance, designed to enhance developers' ability to manage contemporary language models. The new framework integrates generation, prompting, and logic control into a unified development process. The programming language enables developers to 'organize generation, prompting and logic control into a continuous flow that matches how the language model processes text.'
+It integrates seamlessly with providers such as the Hugging Face model. It combines an intelligent seed-based generation caching system and token healing to optimize prompt boundaries and remove bias in the lexical slicing process.
+
+
+### 1.3 Hardware Technology and the Internet of Things
+
+- **China supports the building of humanoid robot open source communities**
+
+ In October 2023, China's Ministry of Industry and Information Technology (MIIT) released the Guiding Opinions on the Innovative Development of Humanoid Robots. The document proposed the establishment of an open-source community for humanoid robots, which would promote the development of open-source foundations, provide support for key enterprise open-source projects, and encourage collaboration and innovation among developers around the world.
+
+- **Stanford University unveiled Mobile ALOHA, an open source robot**
+
+ In March 2023, Stanford University unveiled Mobile ALOHA (A Low-cost Open-source Hardware System), an open-source robot that can perform fine tasks via teleoperation, and by the end of 2023 is ready to be autonomously operated for simple tasks through joint training.
+
+- **Tesla Open Sourced Roadster Runner Design and Engineering Details**
+
+ Musk wrote on the social platform that Tesla is “fully open source” for the design and engineering details of its first-generation Roadster and has published research and development documents that are accessible to all.
+
+- **openKylin officially joined the RISC-V Foundation**
+
+ The openKylin community has recently become a member of the industry consortium of the RISC-V Foundation, with the aim to contribute towards the development of the RISC-V ecosystem. They intend to build an operating system that is in harmony with the hardware and software ecosystem of the RISC-V architecture.
+
+- **Ali T-Head open-sourced the XuanTie RISC-V family of processors**
+
+ Ali T-Head has made the XuanTie RISC-V series of processors open-source along with a range of tools and system software. This marks the first time a complete open-source stack of processors and necessary software has been made available, which will aid in the advancement of the RISC-V architecture, expedite the integration of RISC-V hardware and software technologies, and facilitate the adoption of innovative solutions.
+
+- **AMD Open Sourced FSR**
+
+ AMD open-sourced FSR (FidelityFX Super Resolution)3 under the MIT license, an upsampling technology that competes with NVIDIA's DLSS, but unlike DLSS, it doesn't rely on a proprietary CUDA core and is software-based.
+
+- **Baidu open sourced its messaging middleware BifroMQ**
+
+ China's Baidu has open-sourced BifroMQ, its high-performance and distributed messaging middleware. BifroMQ uses serverless architecture and has native multi-tenancy support. Developed over years by Baidu's IoT team, it facilitates IoT device connectivity and messaging systems at scale.
+
+### 1.4 Data Infra
+
+- **DragonflyDB 1.0**
+
+ DragonflyDB is a modern open-source in-memory database that is compatible with the Redis and Memcached APIs. It is a viable alternative to both, as it requires no code changes during migration. The development team recently released DragonflyDB version 1.0, stating that it is ready for production use. DragonflyDB 1.0 supports the most common data types and commands of Redis, as well as snapshots, master-slave replication, high availability, and other features.
+
+- **FerretDB 1.0 officially released**
+
+ FerretDB 1.0, an open source alternative to MongoDB, has been released. FerretDB wants to bring MongoDB database workloads back to their open source roots, enabling PostgreSQL and other database backends to run MongoDB workloads, preserving the opportunities offered by the existing MongoDB ecosystem.
+
+- **Apache Doris Version 2.0.0 Released**
+
+ The official release of Apache Doris version 2.0.0 was on August 11, 2023. Over 275 contributors submitted more than 4,100 optimizations and fixes, resulting in significant improvements. Specifically, the blind query performance on the standard Benchmark dataset has improved by over 10x in version 2.0.0 of Apache Doris.
+
+- **Apache SeaTunnel Graduated into Apache Top Level Project**
+
+ Apache SeaTunnel is the first Chinese domestic-led, top-tier project in Big Data Integration to contribute to the ASF. Apache SeaTunnel, formerly known as Waterdrop, changed its name to SeaTunnel in October 2021 and applied for membership in the Apache Incubator. SeaTunnel is an easy-to-use, ultra-high-performance, distributed data integration platform that supports SeaTunnel is a very easy to use, ultra-high performance distributed data integration platform that supports real-time synchronization of massive amounts of data.
+
+- **Aliyun's open-source graph computation engine GraphScope performance tops authoritative lists**
+
+ GraphScope, which is an open-source graph computing engine developed by Aliyun, has set a new record in the "LDBC SNB Interactive" graph benchmark list. It achieved a throughput rate of over 30,000 QPS when a single node performed a graph database query, which is twice as fast as the previous record holder.GraphScope, the open-source graph computing engine from Aliyun, has broken the record of the international authoritative graph benchmark "LDBC SNB Interactive" list with a throughput rate of more than 30,000 QPS for a single node executing a graph database query, which is twice the performance of the previous record holder.
+
+- **Baidu open-sourced its high-performance search engine Puck**
+
+ Baidu announced that it has open-sourced its self-developed search engine, Puck, under the Apache 2.0 protocol, the first open-source vector search engine for large data sets in China.
+
+- **ByteDance Open Sourced ByConity**
+
+ ByteHouse has recently released its kernel to the community as ByConity, under the Apache 2.0 license agreement.
+
+ ByConity is an open-source cloud-native data warehouse that is based on the ClickHouse kernel but comes with a new storage-computation separation architecture. It supports several essential features such as tenant resource isolation, elastic scaling up and down, storage-computation separation, and strong consistency between data reading and writing. ByConity aims to provide a reliable and scalable solution for data storage and computation in a cloud-native environment.
+
+- **Ali open sourced multi-database client tool Chat2DB**
+
+ Chat2DB is an open-source and free multi-database client tool that supports local installation of Windows and Mac, server-side deployment, and web page access. Compared with traditional database client software Navicat and DBeaver, Chat2DB integrates AIGC's ability to convert natural language to SQL and SQL to natural language and can give developers SQL optimization suggestions.
+
+- **ApeCloud open sourced KubeBlocks**
+
+ KubeBlocks is an open-source system for managing and running data infrastructure on K8s. It helps developers, platform engineers, and SREs deploy and maintain a dedicated DBPaaS in the enterprise across various public and private cloud environments. KubeBlocks is the only open-source multi-engine data/database management system project in CNCF Cloud Native LANDSCAPE, supporting 32 databases, such as MySQL, PG, MongoDB, Redis, Kafka, Pulsar, and more.
+
+### 1.5 Cloud Computing and Infrastructure Software
+
+- **GragGAN Got 20,000 Stars One Day After Open Source**
+
+ DragGAN is an image editing tool that was developed by Google researchers in collaboration with the Max Planck Institute for Informatics and MIT CSAIL. It allows users to easily adjust the position, pose, expression, size, and angle of the subject in a photo by manipulating the pixel points and orientation in the image. This intuitive tool is designed to make image editing quick and effortless.
+
+- **LLMOps platform Dify.AI code is completely open-source**
+
+ Dify.AI, the LLMOps platform, has announced that 46,558 lines of code are completely open-source and has temporarily decided to relax the open-source protocol from AGPL to Apache 2.0.
+
+- **Huawei open-sourced cross-end, cross-framework, cross-version enterprise application front-end component library OpenTiny and high-performance service grid Kmesh**
+
+ OpenTiny is a development kit for building web application front-ends using Vue2/Vue3/Angular. It includes a theme configuration system, back-end templates, a CLI command line, and other tool libraries.
+
+ Kmesh's high-performance service grid offers developers a new level of grid performance through innovative architecture. It leverages eBPF+ programmable kernel technology to achieve OS-native service grid data plane capabilities. Traffic governance is integrated into the OS, which significantly enhances the accessibility of grid services and improves their access performance.
+
+- **Baidu Intelligent Cloud Releases Open Source QianFan SDK Version**
+
+ Baidu Intelligent Cloud officially released the Python SDK (QianFan SDK) version, which is fully open source and available for free download and use by enterprises and developers.
+
+- **Volcano Engine Self-developed Universal Multimedia Processing Framework BMF**
+
+ Volcano engine is an open source, cross-language multimedia processing framework that offers flexibility and scalability. It includes the BMF (Babit Multimedia Framework), an eight-bit multimedia processing framework, and provides a simple and easy-to-use interface.
+
+ Dynamically manage and reuse video processing capabilities in a modular way to build high-performance multimedia processing links using Graph, enabling efficient project production for multimedia users.
+
+- **ByteDance Released and Open Sourced Rspack**
+
+ Rspack is a Rust-based bundler that has completed support for the Webpack Loader architecture. It is incubated by the ByteDance Web Infra team and offers high performance, customizability, and compatibility with the Webpack ecosystem.
+
+## 2. Open Source Ecology Chronicle
+
+An interesting phenomenon is that if something good happens in the open source community, it should mostly be written in the business chapter. And if something bad happens, it can mostly be filed in the ecology chapter. Of course, it's not just bad things, there's also some good news, as well as policies in various countries that can have a profound impact on the open source ecosystem.
+
+### 2.1 Leading enterprises are laying off open source workforce
+
+From the beginning of January, there have been rumours of layoffs at Google, GitHub and GItLab, and even companies like Red Hat are also laying off staff, and then there are news of layoffs vaguely disclosed by various large domestic companies. Although this chronicle focuses on the open source ecosystem and the situation of open source people, objectively speaking, it is true that the big companies are not specifically trying to lay off open source talent. It's just that once the layoffs start, the open source people within the company, will look "suspicious" and will be pressed with the question: what value have you really created for the company? And that's a question that's never easy to answer in a serious, positive way!
+
+### 2.2 Famous open source gurus struggle to make ends meet
+
+The next news is even more sobering. The 12,000 people Google laid off, dubbed the "Golden 12K", included some famous open source bigwigs. For example: Chris DiBona, who founded Google's OSPO 19 years ago, and Samba co-founder Jeremy Allison, 61, reluctantly tweeted, "Just got fired from Google. If anyone needs SMB 1/2/3 protocol or open source experience, I'd be interested".
+
+There are also some famous open source people who have suffered even worse, let's briefly list the news titles:
+
+- "The author of the open source framework NanUI turned to selling steel, and the development of the project was suspended"
+- "10 months in prison, internet busts, and struggling to earn a living! Behind the 9 billion downloads of an open source project is 9 years of work".
+- "Due to lack of funds, the full-time developer confesses: there may be no future for this open-source software!
+- "Unemployed due to mania, author of acclaimed open-source project begs for money online"
+- "The roots of free open source software have collapsed," complains the head of core-js, who has the entire modern web on his back, but has given up on open source for lacking of money"
+- "Another popular open source project announces cessation of functionality as funding critically falls short“
+
+It's really a case of "hear no evil, see no evil". In last year's Open Source Chronicle, we were still talking about the "twilight of individual heroism". Today, the trend has become more and more obvious.
+
+:::info Expert Review
+**Wei Jianfan**: If you are playing open source with a playful mindset, it's great, don't think about the money problem. If the livelihood issues have not been resolved, do not devote yourself to open source, as a hobby is good. Because open source itself is not used to make money.
+:::
+
+### 2.3 Well-known open source projects are ceasing development one after another
+
+In 2023, there are a number of notable open source projects, both domestically and internationally, that have announced the cessation of development for different reasons.
+
+The most outrageous one probably has to be AetherSX2, one of the best PlayStation 2 emulators on the Android platform. The developers had no choice but to announce the cessation of development because they suffered "endless impersonations, complaints, unreasonable demands, and even death threats"
+
+Most intolerable of all, aardio, a programming language focused on desktop software development, whose author announced that he no longer had the energy to maintain the project due to his wife's cancer.
+
+There are also some common reasons, such as the developer's company is short of money or out of business: Touca, libjpeg-turbo; and developers lost interest or no longer have the energy to maintain: Peek, wangEditor, lodash; and technology outdated: Mokee.
+
+### 2.4 40 years's rugged journey of the Free Software Foundation
+
+On 27 September 1983, Richard Matthew Stallman (RMS) announced the "GNU Project" to develop a Unix-like free software operating system, and in doing so, launched the Free Software Movement. In 2023, the Free Software Foundation also published an article celebrating forty years of GNU and the Free Software Movement.
+
+FSF Executive Director Zoë Kooyman said that "GNU is not only the most widely used operating system based on free software, but is also at the heart of the philosophy that has guided the free software movement for forty years. We hope that the 40th anniversary will inspire more hackers to join GNU in its goal of creating, improving and sharing free software around the world."
+
+However, also in April 2023, an article was published claiming that after nearly 40 years, the Free Software Foundation (FSF) was dying. The author argued that "the FSF has failed to focus on spreading the free software philosophy, developing, distributing, and promoting copyleft licences, and overseeing the health of the core concepts of the free software movement, while at the same time devoting its resources to other, unproductive tasks".
+
+In fact, we do talk more about open source software than free software these days. So has the Free Software Movement fulfilled its destiny, or is it likely to be revitalised through reform?
+
+### 2.5 Ageing of the open source community
+
+The aging of the open source community is an unavoidable phenomenon. Even the ever-tempered Linus Torvalds has begun to curb his temper and talk about "the aging of the kernel community". The Postgres community is also aging, with the main developer being 68 years old. There's also news of the death of Bram Moolenaar, the father of Vim, and the death of Thien-Thi Nguyen, a contributor to the GNU Free Software Project. What should we think about the phenomenon of "aging"?
+
+In fact, we should see more young people joining the open source community, but they tend to join younger projects that are more interesting and newer, rather than older projects with a long history.
+
+Maybe what we should really think about is: do those old open source projects really have to be active and release new versions all the time?
+
+### 2.6 Some encouraging news on China's open-source efforts
+
+There's still a lot of good news in China's open source community, such as the official report in April: "The number of China's open source software developers exceeded 8 million".
+
+In January 2023, Apache Linkis, Apache Kyuubi, Apache bRPC; in February, Apache EventMesh; in June, Apache SeaTunnel, Apache Kvrocks, one after another, officially graduated to become the Apache Software Foundation top-level projects. In February, Jina AI officially donated DocArray to the Linux Foundation, Paralus officially became a sandbox project of the CNCF Foundation, and in July, the Istio project officially graduated from the CNCF.
+
+openKylin officially joined the RISC-V Foundation, Huawei became China's first PyTorch Foundation Premier member, and Jiang Ning was re-elected as a director of the board of the Apache Software Foundation for the year 2023, all of which show that we are still actively participating in the international open source ecosystem, and are continuing to play an important role in it.
+
+In February 2023, after ALC (Apache Local Community) Beijing and Shenzhen, ALC also set up Xi'an chapter. At the same time, KAIYUANSHE also launched the KCC (Kaiyuanshe City Community) programme, which by the end of the year had grown to eleven cities, including Beijing, Changsha, Chengdu, Dalian, Hangzhou, Nanjing, Guangzhou, Shanghai, Shenzhen, Singapore and Silicon Valley.
+
+In March 2023, after the OpenAtom Open Source Foundation, China's second open source foundation, the CHANCE Foundation, was officially established in Chongqing. Later, it also launched "SigStore China Community", "Open Source Innovation Education Alliance", etc., and now three open source projects have been officially donated to the CHANCE Foundation. We look forward to the establishment of more quality foundations in China, for the world, in the future.
+
+**2023 open source related conferences / activities**
+
+- February
+ - Shenzhen:First OpenHarmony Conference
+- March
+ - Beijing:The 1st OSPO Summit
+ - Beijing:DevTogether Summit
+- April
+ - Suzhou:Mobile Cloud Conference - Open Source Forum
+ - Shanghai:openEuler Developer Day
+- May
+ - Shanghai:Global Open Source Technology Summit (GOTC)
+- June
+ - Beijing:BAAI Conference — AI Open Source Forum
+ - Beijing:OpenAtom Global Open Source Summit
+ - Beijing:18th Open Source China Open Source World Summit
+- July
+ - Beijing:China Internet Conference - Open Source Supply Chain Forum
+ - Taipei:2023 COSCUP (Conference for Open Source Coders, Users & Promoters)
+- August:
+ - Shanghai:World Artificial Intelligence Conference - Open Source Learning Forum
+ - Beijing:CommunityOverCode Asia 2023
+- September
+ - Shanghai:KubeCon + CloudNativeCon + Open Source Summit
+ - Shanghai:GOSIM (Global Open Source Innovation Conference)
+ - Shanghai:2023 INCLUSION.Conference on the Bund - Open Source Forum
+ - Beijing:Open Source Cloud Alliance for Industry (OSCAR) Conference
+- October
+ - Wuhan:CHANCE Foundation Diverse Cooperation Summit
+ - Changsha:CCF ChinaOSC
+ - Changsha:1024 Programmer Festival
+ - Chengo:COSCon 8th Annual China Open Source Conference
+- December
+ - Beijing:OpenInfra Days China 2023
+ - Sanya:OpenCS (Open-source Computer Systems) 2023
+ - Beijing:Operating System Congress & openEuler Summit
+ - Wuxi:OpenAtom Developer Conference
+ - Shanghai:Open Source Industry Ecological Conference
+
+### 2.7 The Impact of national policies on the open source ecosystem
+
+When it comes to open source ecology, it is necessary to mention the open source-related policies formulated by various countries and regions, all of which will have an all-round impact on the open source community, business and ecology. Simply summarised, they can be divided into the following categories:
+
+- **Government policies to support open source -** as reported in July 2023, a study found that "27% of the UK's total tech value-added comes from open source, valued at £13.59bn", and in China there are a range of policies in place, from the central government to the local level. There is dedicated support for specific open source projects (Shenzhen), targeted funding for specific foundation projects (Beijing), and promotion of the integration of open source technology with specific industries, to name but a few. We will see in the coming years how much impact this will have on the open source industry and ecosystem.
+- **The emergence of open source as a weapon in international competition -** whether it's Github blocking developer contributions from Russian companies, a US lawmaker proposing to restrict Chinese development in the RISC-V space, and a wide variety of "export-restrictive" policies that have been put in place or attempted to be put in place - makes the following Reuters report, "Open-source software becoming a key part of trade war," seem imminent!
+- **Around the open source security, the policy level also has a lot of action-** whether it is the United States, the European Union or China, have introduced a series of "open source security", "AI compliance" related bills and regulations. This also makes the open source community mixed feelings, happy that the security field is getting more and more attention from the government, and worried that unreasonable policies and regulations may hamper the development of open source technology.
+
+## 3. Open Source Governance Chronicle
+Open-source governance can be divided into three categories: community governance, project governance, and risk governance. Risk governance encompasses different types of risks such as ethical and social risks, legal compliance risks (including licenses), supply chain risks, security risks, and more. Given the importance of open-source security, we have included a separate chronology of open-source security events in the fifth part of this article.
+
+In 2023, a significant breakthrough in the development of Artificial Intelligence (AI) caused widespread debate among experts worldwide. Whether or not to limit the pace of AI development was discussed. At the same time, major geopolitical powers such as the EU, the US, and China focused on creating legislation to regulate AI. Furthermore, open-source technology played a crucial role in catalyzing the development of AI, leading to efforts to define open-source AI.
+
+During 2023, major open-source foundations and organizations from around the globe held online and offline discussions. They aimed to encourage policymakers and legislators worldwide to work together to face the challenges brought by the new era of AI through open-source cooperation and reject techno-nationalism and geopolitical hostility. Despite their efforts, the fragmented global open-source communities, particularly those from Asia and China, still need to gain significant influence on policymakers. Therefore, more attention and collaboration are necessary to address this issue.
+
+We have given more importance to the crucial events related to open-source AI governance this year. Due to limited space, several project governance events have been included in the community and risk governance categories and will not be listed separately.
+
+
+### 3.1 Community Governance
+
+#### 3.1.1 Controversies in the Rust community
+
+The Rust community underwent a series of crises and governance changes in 2023. Here are some of the major events and outcomes:
+
+- The Rust programming language team faced internal disagreements and created a new Leadership Council to decentralize authority. External experts were attacked by some core members, causing them to leave and leading to resignations. These conflicts led to the announcement of a new programming language called Crab. Crab's developers wanted more support with Rust's design, aiming to be more flexible, efficient, and faithful to Rust's original intent and philosophy.
+- The Rust Foundation's new Trademark Policy sparked community opposition over concerns that it could limit Rust's growth and innovation. The Foundation apologized, acknowledged its shortcomings, and promised to revisit and revise the policy while engaging in more dialogue with the community.
+- The management of the Rust community faced issues again recently. The organizers of RustConf removed some scheduled keynote speakers without informing them, which led to an outcry and protests within the community. As a result, some well-known Rust developers and speakers decided to withdraw not only from RustConf, but also from the Rust community as a whole.
+- Graydon Hoare, the founder of the Rust language, said in an interview that he was helpless and frustrated by the conflict and division in the Rust community, and that he believed that Rust had deviated from his original vision and goals, and that he was no longer able to control and save Rust, and that he hoped that the community would solve the problem on its own and leave him alone.
+
+While the Rust language went through some community crises and governance changes in 2023, it also published a roadmap for 2024 that focuses on three directions: lowering the barrier to learning, expanding the ecosystem, and improving the development process.
+
+The design team for the Rust language has stated that their goal is to simplify the program so that developers only have to deal with the inherent complexity of their domain and no longer have to deal with the unintended complexity of Rust, and also to give library authors more power and flexibility to meet the needs and innovations of their users.
+
+In addition, some observers believe that the Rust language is evolving toward ease of use as it proves its stability, performance, and productivity in 2021. Rust will likely see explosive growth as the cost of learning and use decreases even further. The focus on security, concurrency, and performance, and the growing adoption of the language as a language designed not only for today's challenges but also for the challenges of the future, suggests that the Rust language will be here to stay, but that community governance will remain a top priority must be addressed.
+
+#### 3.1.2 Controversies in the Red Hat community
+Red Hat sparked a storm in the open source world in 2023 involving the source distribution and licensing of its two Linux distributions, RHEL (Red Hat Enterprise Linux) and CentOS (Community Enterprise Operating System). Here are some of the major events and outcomes:
+
+- Red Hat, a popular software company, recently made an announcement that it will no longer share the complete source code of RHEL (Red Hat Enterprise Linux) publicly. Instead, it will only provide patches and updates. Additionally, downstream distributions of RHEL (like CentOS, Rocky Linux, and AlmaLinux) will need to recompile and release their versions within 30 days to maintain compatibility and support for RHEL. This decision has caused controversy among the open-source community. Many believe that Red Hat's actions go against the principles of open-source software and that the company is prioritizing profits over the spirit of open-source. The decision has also created difficulties and pressure for downstream distributions of RHEL.
+- Red Hat has responded to this, stating that they have not broken their commitment to open source, but rather to protect the brand and quality of RHEL from some bad behavior and abuse, as well as to encourage more users and developers to use RHEL directly and enjoy the services and support it provides.
+- CentOS, as the most extensive downstream distribution of RHEL, has been hit the hardest. Its ecosystem and community are facing a crisis of fragmentation and decline, and some users and developers have turned to other Linux distributions, such as Debian, Ubuntu, Fedora, etc., believing that CentOS has already lost its meaning and value of existence.
+- Both Oracle and SUSE took advantage of the opportunity to mock and provoke Red Hat, stating that they would continue to support and maintain RHEL's downstream distributions and even invested heavily in creating their own RHEL offshoots, such as Oracle Linux and SUSE Linux Enterprise Server, in an attempt to capture RHEL's market and users.
+- Red Hat has released a statement once again to explain why they are changing their RHEL source code release strategy. According to the statement, the company is making this change to improve the security, stability, and reliability of RHEL. The change will also promote innovation and development of RHEL. Red Hat assures that they still respect and support the open source community and welcomes more collaboration and feedback from it.
+
+### 3.2 Risk Governance
+
+#### 3.2.1 Ethics and social risks
+AI technology development and application have triggered several ethical, moral, and societal risk debates and concerns related to human safety, freedom, privacy, and responsibility. The following are some of the significant events and viewpoints:
+
+- **Over 1,000 tech leaders and researchers, including Elon Musk**, have called for artificial intelligence labs to suspend the development of advanced systems, warning in an open letter that AI tools pose significant risks to society and humanity. Conversely, Hongyi Zhou, CEO of 360, believes that not developing AI is the biggest insecurity. According to him, AI can help humans solve many problems, and its use can be regulated through laws and regulations.
+- **A 22-word statement signed by nearly 400 experts and scholars in the field of AI**, including Geoffrey Hinton, the godfather of AI, Sam Altman, CEO of OpenAI, and Ilya Sutskever, its Chief Scientist, warns that AI could extinguish the human race! It states: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war".
+- **In July 2023, many open-source foundations and organizations from around the world held an international conference in Geneva** with the aim of exploring the relationship between AI and open-source, including the challenges and opportunities involved. The conference concluded that open source is essential in promoting AI innovation and cooperation, and is an effective means of ensuring AI ethics and social responsibility. Many experts from around the globe pointed out that open source is an inevitable trend in the development of AI. They also highlighted that open source makes AI research and application more transparent, fair, and credible. It allows more people to participate and contribute to the development of AI, preventing monopolization and abuse of AI.
+- **Three Turing Award winners in the AI field - Andrew Ng, Geoffrey Hinton, and Yoshua Bengio - engaged in a lively debate on social media**. Their discussion focused on the U.S. government's restrictions and bans on AI technology. Andrew Ng criticized the U.S. bans, stating that they hinder the open exchange of AI and are detrimental to AI development and innovation. However, Hinton and Bengio argued that the U.S. bans are necessary controls on AI for security and ethical reasons.
+
+The emergence and utilization of AI technology reflect the diverse ideologies and values worldwide, and their influence on the ongoing humanitarian crisis. AI is not merely a technological issue, but also a political, economic, and social one that necessitates international consensus and cooperation for the creation of sustainable and equitable AI development.
+
+#### 3.2.2 AI Laws, Regulations and Policy Documents are emerging globally
+In 2023, several laws, regulations, and policy documents related to AI were issued on a global scale. These included:
+- The Interim Measures for the Administration of Generative Artificial Intelligence Services were jointly announced by seven Chinese ministries and commissions, including China's National Internet Information Office.
+- The Global Initiative on Artificial Intelligence Governance was issued by the Office of the Central Committee of the Communist Party of China's Committee on Cybersecurity and Informatization.
+- The Executive Order on Safe, Reliable, and Trustworthy AI issued by the U.S. White House.
+- The European Parliament, the EU member states, and the European Commission agreed upon the Artificial Intelligence Act.
+- The Bletchley Declaration is an international declaration signed by representatives of the governments of 28 countries and the EU.
+
+The documents of China's Global AI Governance Initiative and the European Union's Artificial Intelligence Act reflect the importance of promoting and protecting open source AI technology. For instance, China's initiative encourages the world to work together towards the healthy development of AI, sharing the knowledge and open sourcing AI technology. The EU's AI Act specifies that it does not apply to AI components provided under free and open source licenses unless they are part of a general base model or prohibited AI practices, or subject to transparency obligations as part of an AI system.
+
+#### 3.2.3 Global open-source organizations are addressing new AI governance challenges
+In June 2023, the Open Source Initiative (OSI) initiated the Defining Open Source AI campaign, which included online and offline global discussions and events to address the challenges of open-source AI governance. During the campaign, Kaiyuanshe actively participated in the mailing list discussions and organized the translation of the webinar series. The draft document of the Definition of Open Source AI, which has been published, consists of a preamble, a definition of Open Source AI, and a list of evaluation licenses. The document focuses on authorizing the use, study, modification, and sharing of AI systems.
+
+The Apache Software Foundation published Generative AI Guidelines for Contributors in June 2023. The guidelines help contributors who use AI-generated code, documents, and images for ASF projects. They recommend disclosing the AI-generated part of contributions and labeling it as "Generated-by: ". The
+
+China Academy of Information and Communications Technology (CAICT) released a report titled "Compilation of Trusted Open Source Large Model Cases (Phase I)" in December 2023. The report provides a comprehensive overview of China's open-source large model industry, including technical aspects, application scenarios, business models, governance, and development trends. It serves as a reference guide for developing China's large model industry and analyzes the technology ecology of open-source large models and the industry chain.
+
+#### 3.2.4 Open-source AI large models call for new types of licenses
+Open source is becoming mainstream for AI large models, but traditional licenses can't meet their unique needs. New licenses are being explored.
+
+The Open Source Initiative declared Meta's LLaMa license not open source due to commercial use limitations and purpose of use restrictions. Falcon-40B was also challenged for using a custom license with special restrictions and had to change to Apache 2.0. Hugging Face changed TGI's license from Apache 2.0 to HFOIL due to restrictions on selling hosted or managed services on TGI.
+
+By 2023, Hugging Face will have almost half a million models available with different licenses, including Apache 2.0, MIT, and OpenRAIL. The OpenRAIL license is an upgrade from RAIL and has behavioral restrictions. It includes licenses for source code, applications, models, and data: OpenRAIL-S, OpenRAIL-A, OpenRAIL-M, and OpenRAIL-D.
+
+China's domestic standards and research institutions are actively promoting innovative AI licensing practices. In May 2023, the China Academy of Information and Communications Technology (CAICT) jointly compiled and released the "Zhiyuan Open AI Model License Version 1." This license regulates the use of models (including their derivatives and supporting materials) but does not apply to the training data of the models. In August 2023, the Shanghai Jiao Tong University Intelligent Court Research Institute, along with the Artificial Intelligence Research Institute and Shanghai Magnolia Open Source and Open Research Institute, organized a workshop on designing the framework of the Mulan-Magnolia Open Data License 2.0. The license's function is to provide an open license for AI data. In December 2023, the OpenAtom Open Source Foundation and the Magnolia Open Source Community, the OpenI Qizhi Community, and other communities jointly developed the Mulan-Qizhi Model License (Beta). The license applies to models obtained through algorithmic training and supplementary materials, including model structure, parameters, weights, etc. However, it excludes the training models' algorithms and algorithmic source code.
+> **Commentary**:
+> Wei Jianfan: I believe that these disputes will soon cease to exist, and as long as the law is clear, all similar problems will be solved.
+
+#### 3.2.5 The development of China's open-source field standard is gaining momentum.
+China supports open-source standards and will develop standards for open-source terminology, licenses, interoperability, project maturity, community operation, governance, and supply chain management for open-source software.
+
+A new national standard for evaluating the open source code security of software products was drafted in April 2023 by the National Information Security Standardization Technical Committee, led by the China Academy of Information and Communications Technology. It is now open for public comment.
+
+In July 2023, the Chinese Electronics Industry Standardization Technology Association (CESTA) approved three group standards related to open-source technology. These standards provide guidelines for open-source governance and project evaluation. They include T/CESA 1269-2023 Information Technology Open Source Terminology and Overview, T/CESA 1270.1-2023 Information Technology Open Source Governance Part 1: Overall Framework, and T/CESA 1270.4-2023 Information Technology Open Source Governance Part 4: Project Evaluation Model. The Chinese Research Institute of Electronic Technology (CRIET) released these standards. T/CESA 1270.4-2023 Information Technology Open Source Governance Part 4: Project Evaluation Model and three other open-source group standards have been approved and published for use.
+
+In September 2023, the Chinese Academy for Electronic Technology Standardization formally approved four open-source standards. The China Electronics Industry Standardization Technology Association examined and approved these standards. The agreed standards are T/CESA 1270.2-2023 Information Technology Open Source Governance Part 2: Enterprise Governance Assessment Model, T/CESA 1270.3-2023 Information Technology Open Source Governance Part 3: Community Governance Framework, T/CESA 1270.5-2023 Information Technology Open Source Governance Part 5: Open Source Contributor Assessment Model, and T/CESA 1291-2023 Information Technology Open Source Metadata General Requirements. The Community Governance Framework, Open Source Contributor Assessment Model, and Information Technology Open Source Metadata Requirements are among the four open-source standards that have been formally approved and released.
+
+In October 2023, two open-source software group standards were approved and released. These standards were titled "Open Source Software Governance Evaluation Methods Part 3: Maturity Models" and "Open Source Software Governance Evaluation Methods Part 5: Governance Tools and Platforms". The China Academy of Information and Communications Research (CAICR) led the development of these standards, and they were reviewed and supported by the China Association for Communications Standardization (CAICS).
+
+## 4. Open Source Security Chronicle
+In today's digital age, software has become an essential element that supports the normal functioning of our society. However, as the software supply chain becomes more complex, so do the security issues. The Log4Shell vulnerability recently brought open-source security into the spotlight. Despite 2022 being touted as the "Year of Supply Chain Security," the vulnerability is still widespread, and the rate of adoption of fixes is low. As a result, the frequency of attacks in the software supply chain has skyrocketed. The broad adoption of open-source code has turned supply chain security into an existential issue. Log4Shell has made headlines as it revealed the security risks present in the open-source community. Moreover, other projects that are heavily reliant on open-source in the ecosystem may have a more extensive reach and more severe consequences than Log4Shell. Supply chain attacks are on a sharp upward trend, averaging 742% annual growth since 2019. Therefore, we need to focus on improving the security of open-source software.
+
+### 4.1 Latest trends and challenges
+An analysis of the latest trends and challenges in open-source security, including the following:
+- **Malware as a Service**: Hackers use open source code and tools to develop and distribute malware, creating a massive black market that threatens the security of the open source ecosystem.
+- **Human Errors**: Open-source projects are vulnerable to attacks due to human errors, such as ignoring security updates, using weak passwords, and leaking sensitive information by developers and maintainers working with open-source code.
+- **Supply Chain Attacks**: occur when hackers inject malicious code into open-source projects by manipulating repositories, dependency packages, or update channels, thereby compromising the reliability and trust of these projects.
+- **Legal Risks**: Open source projects may face legal risks in complying with license agreements, dealing with copyright disputes, responding to policy changes, etc., which must be identified and resolved promptly.
+- **Security Standards**: Open-source communities and organizations are developing and promoting some security standards and best practices, such as SLSA, OpenSSF, CII, etc., to improve the quality and security of open-source code.
+- **Security Tools**: Open source projects can utilize some open source or commercial security tools, such as Snyk, Dependabot, CodeQL, etc., to detect and fix security vulnerabilities and improve security protection.
+- **Security Education**: Open source projects need to strengthen security education and training, improve the security awareness and skills of developers and maintainers, establish a security culture and process, and prevent security risks.
+- **Security Cooperation**: Open source projects must strengthen security cooperation with other open source projects, organizations, enterprises, governments, etc., share security information and resources, form a security community, and jointly address security threats.
+- **Security Outlook**: The security landscape for open-source projects presents a mixed prospect. While the prevalence of increasingly intricate and severe security challenges is noteworthy, open-source projects are fortified by a sturdy and dynamic security force.
+
+### 4.2 Legal liability of open source security
+There is an ongoing debate concerning the legal liabilities of open-source software regarding security. The prevailing argument and accompanying legislation state that the authors of open-source software bear responsibility for any vulnerabilities detected in the code. Despite being offered free of charge, the authors are expected to guarantee the quality and security of their software. Vulnerabilities can cause significant harm, such as the compromise of user data and system attacks, making it imperative for authors to fix identified weaknesses and inform users promptly and swiftly. As such, the current trend in global legislation is to hold open-source legally accountable for cybersecurity.
+
+- In **China**, providers of network products and services are forbidden from developing malicious programs. They must take immediate corrective action, promptly notify users according to regulations, and report to the relevant authorities if security flaws, loopholes, or risks are identified in their network products and services. Furthermore, network product and service providers must maintain ongoing security for their products and services. They are not permitted to terminate the provision of security maintenance within the period specified or agreed upon by the parties. If a network product or service can collect user information, its provider must obtain explicit consent and communicate this to users. Moreover, if personal information is involved, the provider must comply with the relevant laws and administrative regulations on personal information protection.
+- **The EU Cyber Resilience Act (CRA)** aims to strengthen the cybersecurity of digital products in the EU by consolidating the existing cybersecurity regulatory framework. The Act imposes many cybersecurity requirements on digital products, including software. The Act is closely linked to the Highly Common Cybersecurity Directive (NIS 2 Directive), the Cybersecurity Act, the Artificial Intelligence Act, and the General Data Protection Regulation (GDPR). It could become one of the most critical EU cybersecurity laws.
+
+### 4.3 Some important open source security incidents in 2023
+
+#### 4.3.1 Log4j vulnerability resurrection
+LLog4j is a tool for developers to track their programs. In Dec. 2020, a severe issue let hackers control computers using Log4j. Alibaba and Amazon were affected. The Log4j team quickly fixed it in Jan. 2021 with Log4j 2.15.0.
+
+However, Log4j 2.15.0 has a new vulnerability, CVE-2021-44228. Attackers can exploit a Java problem by sending specific log messages. Log4j 2.16.0 turns off the Java feature in log messages to address this. Users must upgrade now and turn off unused logging. Use firewalls and intrusion detection to block malicious traffic.
+
+#### 4.3.2 Linux malware growth rate soars to 50%
+Linux malware surged 50% to 1.9 million threats in 2022, with Trojans, botnets, ransomware and mining software used to steal data, control devices, and extort money. Infections spread through web services, email, web pages, and mobile devices exploiting vulnerabilities, weak passwords and social engineering. To protect against Linux malware, regularly update systems and software, use strong passwords and two-factor authentication, install reliable anti-virus software, and avoid opening suspicious links and attachments.
+
+#### 4.3.3 New threats to the npm supply chain: "manifest confusion"
+Manifest Confusion is a security problem that affects the npm registration process. Attackers exploit this vulnerability to hide harmful code or dependencies by providing incorrect manifest information that does not match the contents of a tarball package. This security issue can affect millions of npm users and projects, potentially leading to the theft of sensitive information, execution of remote commands, spreading malware, and more. Developers and maintainers can prevent this vulnerability by using npm shrinkwrap or package-lock.json to lock down dependency versions, using npm audit, avoiding installing packages from untrusted sources or mirrors, and checking that the manifest information matches the contents of the tarball package before releasing it.
+
+#### 4.3.4 Electron's shocking Level 10 vulnerability!
+Electron is a framework for cross-platform desktop apps. It has a significant vulnerability that lets hackers use a bad link to run harmful code. Apple and Google warned about this, but many apps have not been updated. The vulnerability is caused by an old version of Chromium. To fix it, Electron needs to use a newer version of Chromium. The Electron team has already released a new version that fixes the problem. It's important for developers to keep their software up-to-date and secure. Finally, developers should always check their code for vulnerabilities.
+
+#### 4.3.5 Google awards $12 million for solving 2,900 vulnerabilities
+Google's Vulnerability Reward Program (VRP) paid $12M in bonuses to security researchers from 68 countries who found 2,900 vulnerabilities in 2022. The highest reward for a single vulnerability was $605,000. The VRP now covers Google Nest and Fitbit.
+
+#### 4.3.6 GitHub adds SBOM export feature to make it easier to comply with security requirements
+GitHub's new feature helps developers quickly create and export software build bills of materials (SBOMs) to enhance security and transparency. SBOM documents contain information about the software components and dependencies used in the codebase. The resulting SBOM can be accessed from GitHub's Security Tab and exported as SPDX or CycloneDX-format files.
+
+#### 4.3.7 OpenAI, Google, Microsoft and others create $10 million AI security fund
+Tech companies and research organizations, including OpenAI, Google, and Microsoft, have created a $10 million fund for AI security and ethics research. The goal is to promote responsible and trustworthy AI development, prevent risks, and encourage more participation. The Fund will be managed by an independent committee, which will select the most outstanding projects for funding.
+
+In brief, open-source software necessitates enhanced security risk governance mechanisms, including quality standards, security audits, vulnerability rewards, and shared responsibility. Similarly, open-source software necessitates more significant investment and support, including financial resources, workforce, and community engagement. The future of open-source software development relies on our response to the current situation and our ability to establish a more sustainable and secure open-source ecosystem.
+
+## 5. Open Source Commercialization Chronicle
+
+### 5.1 Early stage financing activities
+
+- **DBeaver, an Open Source Database Management Tool, Secures $6 Million in Angel Round Funding**
+
+ Open-sourced in 2013, DBeaver is a free and open-source general-purpose database management and development tool based on Java and running on a variety of operating systems. Its founders formed a commercialization company in 2017 to provide enterprise-level support and develop an enterprise version. DBeaver currently has 8 million users and more than 5,000 paying customers, including IBM, Samsung, and Moody's.
+
+- **Open Source Large Model Company Together Raises $20 Million in Funding**
+
+ Open-source prominent model startup Together, which hopes to "lead the Linux moment in AI by providing an open ecosystem across computational and best-in-class fundamental models," has secured a $20 million seed round of funding. Together is building a cloud-based platform for running, training, and fine-tuning open source models. One of Together's first projects, RedPajama, aims to foster open-source, generative models. Together now has a 1.2 trillion token training dataset in open source, allowing for commercialization.
+
+- **Union AI, an open source AI and data stream orchestration platform, secures $19.1 million in Series A funding**
+
+ Union AI provides Flyte hosting services (orchestrating ETL, machine learning workflows), has also built Pandera (a data testing framework) and Union ML (a framework that sits on top of Flyte to help teams build and deploy models using their existing toolsets), and this year launched the Union Cloud, which received $19.1 million in Series A funding funded by NEA.
+
+- **MindDB, an Open Source DB for AI Company, Secures $25 Million Seed Funding Round**
+
+ MindsDB is a platform that operates in the "DB for AI" scenario, connecting data and models using an AI-Table approach. This approach turns machine learning models into virtual tables in the database, enabling users to model directly in the database. It eliminates tedious steps such as data processing and building machine learning models and accelerates AI applications. In 2023, MindsDB received consecutive funding rounds totaling nearly $50 million.
+
+- **Star Open Source LLM Company Mistral AI Raises Multiple Round of Funding, Ranks Among Unicorns**
+
+ Mistral AI, founded by scientists from Meta and Google, recently released the open-source MOE large model Mixtral 8X7B, which has attracted enormous attention. Mistral AI has also completed multiple rounds of funding in the last year, securing $415 million in its most recent Series A round, and is currently valued at over $2 billion.
+
+- **Model Continuous Testing Validation Tool Deepchecks Raises $14M Angel Round**
+
+ Israeli company Deepchecks is positioned in the ML continuous test validation space, which allows customers to reuse and customize components to test ML models and datasets comprehensively. deepchecks launched an open-source version of its ML testing tool in 2020 and, earlier this year, launched a commercial version of the Deepchecks Hub.
+
+ To date, the open source product Deepchecks has been downloaded more than 500,000 times, and its users include AWS, Booking.com, and Wix, among others. Deepchecks recently announced a $14 million angel round of funding.
+
+- **Endor Labs, an Open Source Component Supply Chain Security Platform, Raises $70 Million in Series A Funding**
+
+ Endor is positioned to help organizations monitor the security posture of their development pipeline, including both reachable and exploitable risks, manage developer access to code, and keep a close eye on the secrets hard-coded into their code base. They recently secured $70 million in Series A funding led by Lightspeed Venture Partners.
+
+- **AutoGPT Closed $12 Million Financing Round**
+
+ AutoGPT uses language models such as GPT-4 and GPT-3.5 to build multifunctional intelligence that can perform tasks independently and continuously improve performance. The project has been online for over 50 days and has 131k stars and 26.7k forks, making it one of the fastest-growing projects in GitHub's history.
+
+### 5.2 Mid to late stage financing activities
+
+
+- **UK-based MLOps company Seldon secures $20 million in Series B funding**
+
+ Seldon was established in 2014 to address the issues of deployment, monitoring, management and interpretation of AI Model at the production level.2020 A Revolving Finance to date, Seldon's open-source product installation has increased by YoY 400%.
+
+- **Temporal Secured $75 Million in Funding**
+
+ Temporal, a startup based on Cadence, Uber's open-source distributed task orchestration and scheduling engine, has secured $7,500 million in a new round of funding, giving it a pre-investment valuation of $1.4 billion.
+
+- **SAST/SCA Open Source Security Vendor Semgrep Raises Series C Funding**
+
+ Semgrep entered the SAST space with its SAST (Static Application Security Testing) engine, which users can integrate with their CI/CD processes and code hosting platforms such as Github, Gitlab, etc., to inspect code using Semgrep's built-in and customized rules. Semgrep open-sourced its product in 2020 and already has over 2 million users and 7.5x revenue growth in 2022 compared to 2021.
+
+- **French AI Research Lab Kyutai Receives $330 Million Investment, Aims to Open Source All Results**
+
+ French billionaire and CEO of Iliad, Xavier Niel, has started an AI research lab in Paris called Kyutai. It's a privately funded non-profit organization focusing on research in artificial general intelligence. The lab has raised nearly €300 million in funding so far. Kyutai focuses on basic AI modeling, supported by top-tier computing resources in the form of Nvidia H100 GPUs from Scaleway.
+
+- **Open Source Platform Replicate Secures $40 Million in Series B Funding**
+
+ Replicate, an open-source machine learning modeling platform has announced the successful completion of a $40 million Series B funding round led by Andreessen Horowitz (a16z) to continue to enhance its open-source machine learning modeling platform.
+
+### 5.3 Mergers and acquisitions (M&A)
+
+- **AMD Acquires Open Source AI Software Nod.ai**
+
+ AMD announced on its website the signing of a definitive agreement to acquire Nod.ai, which will accelerate the deployment of optimized artificial intelligence solutions on AMD's high-performance platforms and enhance AMD's open-source software strategy.
+
+- **Snowflake intends to Acquire Ponder to Enhance Its Data Cloud Python Capabilities**
+
+ Ponder is a leading company that connects popular data science libraries to where the data is and maintains the widely used open-source library Modin for scalable Pandas operations. Snowflake has announced its intent to acquire Ponder to serve Python data practitioners better.
+
+- **Cisco announces plans to acquire cloud-native cybersecurity startup Isovalent**
+
+ Isovalent is committed to developing two critical open-source technologies, eBPF and Cilium, that provide deep insight into operating systems and cloud-native applications. Isovalent is essential to the Cloud Native Computing Foundation (CNCF) and the eBPF Foundation. Continued community support is vital to keep these open-source projects active.
+
+## 6. Open Source Education Chronicle
+
+In the China Open Source Annual Report, a new element, "Open Source Education," has been added to its list of milestones for the year. The definition of open-source education may vary among distinct organizations. This chapter aims to provide unambiguous clarity by defining open-source education as the utilization of open-source software and open educational resources to support educational goals. **This encompasses the utilization of open-source software tools, teaching materials, and instructional resources, while promoting knowledge sharing and collaboration. One of the primary objectives of open-source education is to offer more inclusive and equitable educational opportunities, thereby enabling more individuals to access high-quality educational resources.**
+
+In the open-source education model, educational resources like teaching plans, course content, and software tools are openly available to everyone. This means that anyone can use, modify, and share them. This model is highly beneficial in fostering students' innovative thinking, collaborative skills, and problem-solving abilities. By participating in open-source projects, students can gain exposure to the latest technologies and tools in the industry. They can also get a better understanding of the actual process of software development and contribute their own strengths to the open-source community.
+
+At the same time, as the drafting party of the report, "Open Source Education" is not unfamiliar to the Open Source Society. Since its establishment in 2014, the Open Source Society has actively explored the integration of open source and education. Before formally introducing the milestones of open source education in 2023, let's review the work done by the Open Source Society in the field of open source education:
+
+- In 2014, the Open Source Society initiated the first series of open source on-campus activities in China—"Open Sourcers on Tour";
+- In 2017, the Open Source Society's executive committee established working groups dedicated to open-source education, such as the Open Education Group and University Collaboration Group;
+- In 2018, the Open Source Society held the third China Open Source Conference (COSCon'18), which produced China's first "Open Source Education Track";
+- In 2019, the Open Source Society, in partnership with East China Normal University, established China's first "Open Source Education Fund";
+- In 2020, the Open Source Society produced the "Open Source Bootcamp" series, aiming to provide introductory training for open source education;
+- In 2021, the Open Source Society invited six guests to share their insights on open-source education at the sixth China Open Source Conference (COSCon'21), and for the first time, invited open-source students from universities to discuss open-source education topics;
+- In 2022, the Open Source Society actively began to explore directions related to open source education and training, such as specialized corporate open source training;
+- In 2023, the Open Source Society, for the first time, set up a "Youth Open Source Education" track at the eighth China Open Source Conference (COSCon'23), inviting young students from primary and secondary schools to share their views on open source.
+
+
+The integration of open-source principles with education has significantly deepened over the years. The Open Source Organization's development in "Open Source Education" has expanded the audience for open-source education from open-source communities to higher education institutions, secondary and primary schools, and even to a broader demographic of employed individuals.
+
+Despite this progress, there is still a global need for more skilled open-source professionals. The Linux Foundation's "10th Annual Open Source Jobs Report" reveals that 93% of employers struggle to find professionals with adequate open-source skills, and the situation is not improving. Nearly half of these employers (46%) plan to increase their recruitment of open-source talent in the next six months. Additionally, 73% of open-source professionals report ease in finding new employment opportunities to continue their open-source endeavors.
+
+This talent scarcity has elevated the importance of open-source education worldwide. China is actively fostering its open-source education landscape by encouraging participation in open-source community activities, soliciting contributions to open-source projects, establishing robust open-source education systems, and setting standards for evaluating open-source competencies. These efforts aim to stimulate a thriving open-source ecosystem and nurture talent. By doing so, students and professionals can gain a more profound understanding of the ethos behind open-source software, facilitating the integration of theory with practice, enhancing educational quality, and meeting the societal demand for innovative individuals.
+
+Lastly, let's review the significant milestones in China's open-source education journey during 2023.
+
+### 6.1 Open-source education has been thriving with interactive practices, project-based learning, and innovation competitions
+
+In 2023, China saw a significant increase in open-source educational activities. Some of the significant practical activities that took place include:
+
+- **Open Source Promotion Plan (OSPP)**: Guided by the Institute of Software, Chinese Academy of Sciences, this summer program aimed to encourage students to participate in open-source software development. In 2023, 3,475 students from 592 universities registered, and 504 students were successfully selected.
+
+- **GitLink Open Source Programming Summer Camp (GLCC)**: Hosted by the China Computer Federation, this event saw 341 students from 139 universities participate in 2023.
+
+- **The Sixth China Software Open Source Innovation Competition**: Guided by the Department of Information Science, National Natural Science Foundation of China, and hosted by CCF, this competition focuses on "bottleneck" software fields and cutting-edge technologies, with multiple tracks.
+
+- **The Twelfth "Kylin Cup" National Open Source Application Software Development Competition**: Guided by the China Software Industry Association, OpenAtom Foundation, the Open Source Development Committee of the China Computer Federation, and the China Open Source Software Promotion Alliance, this competition attracted 345 teams from over 60 universities.
+
+- **2023 OpenAtom Open Source Competition**: Hosted by the Ministry of Industry and Information Technology, the People's Government of Jiangsu Province, and the People's Government of Hunan Province, this competition aimed to unite open-source organizations, enterprises, institutions, universities, research institutes, industry organizations, and investment and financing institutions.
+
+- **The First China Postgraduate Operating System Open Source Innovation Competition**: Hosted by the China Postgraduate Innovation Practice Series Competitions, this event focused on open-source innovation in operating systems.
+
+Additionally, the 2023 Open Source and Information Consumption Competition – The Fourth Industrial APP and Information Consumption Competition, hosted by the Ministry of Industry and Information Technology and other organizations, helps promote open-source education to the professional workforce.
+
+These activities have enhanced students’ technical abilities and promoted the spread of open-source culture and the vitality of open-source communities, making significant contributions to the development of China’s open-source ecosystem.
+
+### 6.2 Domestic open-source software & hardware education theory foundation is forming
+
+In 2023, China's open-source education sector made significant progress in practice and saw a growing theoretical foundation. Teachers from higher education institutions and open-source experts began to pay more attention to the research on open-source education theory, publishing representative articles at different teaching levels and in various directions. These studies provided cases and theoretical analyses for open-source education, demonstrating its application potential in higher education and K12 education.
+
+**In higher education**: Open-source education is regarded as an innovative teaching model that helps students acquire software and hardware development skills. For example, teachers from Peking University, East China Normal University, and Shanghai University of International Business and Economics have researched the application and value of open-source education in their respective disciplinary teachings.
+
+**In K12 education**: Open-source education is often integrated with STEM, STEAM, robotics/uncrewed aerial vehicle (UAV) education, and maker education, mainly by including open-source hardware in teaching. For instance, teachers from Meihua Middle School in Zhuhai and Langya Road Primary School in Nanjing have explored the application of open-source hardware in project-based teaching.
+
+Additionally, the Shanghai Education Commission's Educational Technology and Equipment Center hosted a symposium on the development of educational UAV and open-source hardware course resources, showcasing the application of open-source hardware in primary and secondary education. The open-source robotics sports event at the 11th Primary and Secondary STEAM Education Conference also demonstrated cases and new trends of open-source education in science and technology education in schools.
+
+These activities and studies indicate that the future promotion of open-source education will differ between higher education and K12 education but will tend toward developing open-source general education and open-source software and hardware development education. Open-source education not only helps to enhance students' technical abilities but also promotes innovative thinking and teamwork spirit, contributing to the diversified development of China's education system.
+
+Articles exploring the integration of open-source education and higher education include:
+- "Exploration of a Dual-Track Open-Source Teaching Model under Industry-Education Integration: A Case Study of Peking University's 'Open-Source Software Development Fundamentals and Practice' Course" by Jing Qi and Hui Feng.
+- "Insights into the Future of Open-Source Education from the Digitalization of Open-Source Technology" by Wei Wang and Shengyu Zhao.
+- "The Value and Significance of Introducing Open-Source Education into Higher Education Institutions" by Guofeng Zhang.
+- "Research on Software Engineering Education Integrating Open-Source Software Ideas and Examples" by Huang Haowei.
+- "Construction of a New Medical Mathematics Curriculum Group Based on Blockchain from the Perspective of Open-Source Education in Colleges" by Xiaona Wang, Dan Ding, and Ge Ban.
+- "Cultivation of Innovative Software Talent in the Open-Source Ecosystem" by Tao Zhuo, Kai Wang, and Wei Ge.
+
+Articles exploring the integration of open-source education and K12 education include:
+- "Exploration of Open-Source Hardware Project-Based Teaching Practice under the STEM Education Concept: A Case Study of 'Creative Illuminated Clothing'" by Suo Fang.
+- "Promoting the Education of Young Masters Based on an Open-Source Architecture Project Research Community" by Yi Dong Qi.
+- "Research on Open-Source Hardware Chips for Information Technology Education" by Lizhi Xin, Zhang Xiangling, Yao Ziming.
+- "Injecting New Vitality into Education with Maker and Open-Source Hardware" by Jun Xi.
+
+
+### 6.3 Open-source education forums are gaining momentum, with the open-source and education community continuing to grow.
+
+In 2023, the development of open-source education in China showed a clear upward trend, as evidenced by the increased number, frequency, and quality of conferences dedicated to discussing open-source education. These conferences highlighted the influence of open-source education and fostered deep communication and collaboration between the educational and open-source communities.
+
+Some notable conferences and forums include:
+- **2023 GAIDC Global Developer Vanguard Conference**: This international developer conference featured an open-source technology forum showcasing the global application and development of open-source initiatives.
+- **The 2nd China Open Source Education Symposium (SOSEC-2) and The 3rd China Open Source Education Symposium (SOSEC-3)**: Held in Guangzhou and Shanghai, respectively, these symposia focused on the current state and future trends of open-source education in China.
+- **National College New Business Open Source Innovation Education Symposium**: Held in Shanghai, this event explored open-source applications in education, particularly its integration with business education.
+- **The 4th China Computer Education Conference**: The first Computer Open Source Education Forum was part of this conference, emphasizing the importance of open-source in computer education.
+- **2023 Zhongguancun Forum – World Open Source Innovation Development Forum**: Themed around "Open Science and Open Source Education," this forum discussed the role of open-source education in scientific research.
+- **GOTC 2023**: Hosted the Linux Foundation's Open Source Education and Talent Development Summit, highlighting the crucial role of open-source technology in talent cultivation.
+- **2023 OpenAtom Global Open Source Summit**: The successful convening of the Open Source Education and Talent Track further promoted the discussion and practice of open-source education globally.
+- **COSCon'23 The 8th China Open Source Conference**: Featured a "Youth Open Source Education" track, inviting young OpenTeen primary and secondary school students to share their experiences with open-source practices.
+
+Hosting these events and forums increased the influence of open-source education in academia and industry. It provided a platform for educators, students, and open-source community members to exchange ideas, promoting the sharing of open-source educational resources and the dissemination of best practices. As open-source education forums rise in prominence, integrating open-source and education is becoming a new trend in educational innovation and talent development.
+
+
+### 6.4 The cultivation and certification of open-source talent is gradually becoming a standardized system.
+
+In 2023, China's open-source education sector reached a significant milestone by initiating the "Open Source Talent Competency Requirements and Evaluation Standards." This standard is being developed under the leadership of the Ministry of Industry and Information Technology's Talent Exchange Center in partnership with the OpenAtom Foundation. The development meeting was attended by 36 experts from various universities and companies, including Beijing University of Aeronautics and Astronautics, Beijing Institute of Technology, East China Normal University, Huawei, Baidu, Tencent, and Xiaomi, signaling the formal inclusion of open-source talent education into the national talent cultivation strategy. Establishing this standard is crucial for constructing China's open-source talent development ecosystem, as it will help promote the high-quality development of open-source software and technology by establishing a set of scientific, industry-recognized talent competency standards through research, analysis, and refinement.
+
+Moreover, the training of open-source educators has become an essential area of exploration. For instance, the Changsha Software and Information Technology Service Industry Promotion Association hosted the 2023 Hunan Province Higher Education OpenHarmony Faculty Training, aimed at deepening college teachers' application and understanding of OpenHarmony, enhancing their ability to teach and develop based on OpenHarmony and building a robust educational information and creation ecosystem.
+
+These initiatives and developments indicate that China is actively establishing a standardized system for cultivating and certifying open-source talent. This will help enhance the professional capabilities of open-source talent and promote the widespread application and innovative development of open-source technology in the education sector. As the open-source education system continues to improve, more high-quality open-source talent is expected to emerge, starting from China, contributing to the global open-source community.
+
+
+### 6.5 Enterprises are increasingly involved in open-source education, giving rise to a new model of industry-university-research cooperation.
+
+In 2023, Chinese enterprises significantly increased their involvement in open-source education, forging more open and in-depth partnerships with universities. These collaborations typically involve integrating real-world open-source projects into the academic setting, enabling students to engage in meaningful, high-caliber open-source initiatives rather than mere operational tasks. Here are examples of such corporate-academic collaborations:
+
+- **Answer Project**: This project, chosen as the capstone project for Peking University's Guanghua MBA program, allows students to participate in live open-source projects.
+- **CloudWeGo Project**: Integrated into Peking University's graduate curriculum, it allows students to work on enterprise-backed open-source projects; it also collaborates with Nanjing University and Zhejiang University to foster campus partnerships and open-source talent development.
+- **openKylin**: Established an academic station at Tianjin University of Science and Technology, focusing on cultivating open-source talent.
+- **PingCAP**: Donated three years of partnership with the China Computer Federation (CCF) China Database Summer School, providing complete engineering practice experiments; signed a joint doctoral training agreement with East China Normal University to foster high-level talent in critical software.
+- **OceanBase**: Collaborates with East China Normal University to tackle technical challenges and lead in distributed database research innovation and open-source talent cultivation.
+- **StoneDB**: Completed the first intern training, attracting students from multiple renowned universities to focus on cultivating open-source database talent.
+- **Tencent**: Supports open-source talent development through the "OpenAtom Campus Source Line" project and launched the RhinoBird Open Source Talent Plan for 2023, assisting in cultivating open-source talent at universities.
+- **Shenkaihong**: Co-hosted an open-source Hongmeng talent training workshop with the Beijing Institute of Technology and established the "Open-source Hongmeng Talent Class" with multiple schools.
+- **Tuowei Information**: Its subsidiary KAIHONGZhiGu was involved in the Yali Lu Gu Middle School project, which was selected as a "2023 Smart Education Excellence Case."
+- **CSDC**: Collaborated with Beijing Institute of Technology and Shenkaihong to establish the first "Open-source Hongmeng Talent Class" in the Information Technology Innovation College.
+- **Shenkaihong**: Collaborated with Southeast University to cultivate university open-source talent and promote the development of the OpenHarmony talent ecosystem.
+- **Honghu Wanlian**Established a national OpenHarmony (Open-source Hongmeng) intelligent terminal and IoT industry integration of the production and education community in collaboration with multiple schools and companies.
+
+These partnerships offer students hands-on experience with real-world open-source projects and facilitate the exchange of knowledge and technology between enterprises and academia. Through these collaborations, enterprises gain insights into students' abilities and needs. In contrast, university students can collaborate directly with industry experts, which is invaluable for honing their technical skills and professional acumen. Furthermore, these partnerships contribute to advancing and popularizing open-source technology and generating innovative contributions to the open-source community.
+
+
+### 6.6 University open-source education programs are becoming more robust, and universities are enthusiastic about participating in open-source projects.
+
+In 2023, Chinese universities have made significant strides in open-source education, with many institutions advancing the cause by introducing specialized courses, the establishment of alliances, and collaborations with enterprises. Tsinghua University, Beijing University of Aeronautics and Astronautics, Zhejiang University, Shanghai Jiao Tong University, East China Normal University, and nearly a hundred other universities nationwide have announced plans to roll out open-source software courses over the next three years. These courses will cover foundational subjects such as open-source professional technologies and digital public goods, aiding students in understanding the architecture of open-source knowledge from the ground up and accelerating the cultivation of talent in crucial software domains. Here are some specific examples:
+
+- **Peking University**:
+ - Collaborated with DouGe and GitLink to create an online practical course, "OSS Development: Open-Source Software Technology," combining theory and practice to develop students' open-source software development skills.
+- **Tsinghua University**:
+ - Hosted a 2023 autumn and winter open-source operating system boot camp, where students honed their programming skills by writing the code in Rust.
+- **East China Normal University**:
+ - Introduced the course "OSS101: Open-Source Software Literacy," which aims to cultivate students' open-source awareness and skills.
+ - Led establishing the CCF Information System Professional Committee's Open-Source Education Working Group and created an "Institution-Course-Competition-Certification" integrated open-source talent development system to drive the growth of open-source education.
+- **South University of Science and Technology**:
+ - Participated in establishing an open-source university alliance at the Qizhi Developer Conference, which is dedicated to fostering the Greater Bay Area's open-source ecosystem and talent development and has a national impact.
+- **Beijing Institute of Technology**:
+ - Collaborated with Shenkaihong to hold an open-source Hongmeng talent training and scientific research cooperation workshop, enhancing industry-academia collaboration and improving the quality of talent development.
+
+These initiatives and courses not only enrich the open-source education curriculum in universities but also boost student engagement in open-source projects. Through these practices, students gain a deeper understanding of the development process of open-source software, acquire relevant skills, and participate in the open-source community. These efforts are instrumental in nurturing high-quality open-source talent that meets the needs of modern digital economic development and promoting the popularization and application of open-source technology in China.
+
+
+### 6.7 Diverse parties are driving the "Open Source into Campus" initiative to garner student interest.
+
+In 2023, one of the most noticeable activities in open-source education was the "Open Source into Campus" campaign, organized by various organizations such as the OpenAtom Open Source Foundation, the Open Source Development Committee of the China Computer Federation (CCF), the Open Source Promotion Plan (OSPP) organizing committee, and Hongshan Open Source.
+- **OpenAtom Open Source Foundation**
+ - The OpenAtom Open Source Foundation and Tencent initiated the "OpenAtom Campus Source Tour" public welfare project. Together, they explore new paths of industry-education integration by establishing university open-source communities, popularizing open-source culture, and developing open-source curriculum systems.
+- **CCF Open Source Development Committee**
+ - The "Open Source University Tour" series initiated by the Open Source Development Committee of the China Computer Federation was successfully held at prestigious universities like Tsinghua, Peking, Beihang, and Fudan, leaving a significant impact and achieving successful practices.
+- **OSPP Organizing Committee**
+ - To enable more students to understand and participate in open-source projects deeply, the OSPP organizing committee partnered with many excellent open-source communities to launch the "OSPP Campus Tour." The OSPP Campus Tour series aims to ignite the energy and vitality of the new generation of developers, allowing more students to gain an in-depth understanding of well-known open-source technologies, projects, and communities both domestically and internationally and to popularize open-source culture in more universities.
+- **Hongshan Open Source**
+ - The Hongshan Open Source community launched the "Hongshan Open Source University Tour" for key universities and directions, enhancing the community's influence and popularity and attracting more outstanding innovative resources to construct the open-source creation ecosystem.
+
+Such activities are expected to become one of the main channels through which college students can access open-source education in the future.
+
+
+### 6.8 China's policies related to open-source education
+
+In 2023, while notable advancements were made in the practical application of open-source education in China, supportive policies at the national level were relatively scarce.
+
+Certain local governments, however, have started to recognize and foster the growth of open-source education. For instance, on December 29, 2022, the Changfeng Alliance Think Tank Base submitted "Suggestions for Strengthening Open-Source Talent Education in Beijing," which systematically addresses open-source talent education's current state and challenges. The proposal advocates for enhancing open-source talent training by the Beijing municipal government. As a leading hub for China's open-source ecosystem, Beijing's role in advancing open-source talent education is pivotal, contributing to the cultivation of software talent aligned with industrial demands, establishing a sustainable open-source ecosystem, and enhancing software technological innovation and supply capabilities.
+
+Furthermore, the "Guide for the Construction of Demonstration Software Colleges with Characteristics (Trial)" jointly issued by the Ministry of Education and the Ministry of Industry and Information Technology in 2020 has prompted universities to engage more deeply in open-source education. The guide underscores the importance of cultivating software talent with distinctiveness, exploring professional development patterns, and focusing on the specific needs of open-source talent in critical areas such as foundational software, industrial software, and emerging platforms. It also encourages cultivating vital open-source projects and gathering outstanding talent, providing robust support for industrial innovation.
+
+Despite the limited dissemination of policy-related messages in 2023 (which may be amid formulation), existing policy documents have already positively impacted open-source education. As we look ahead to 2024, more national-level policies are anticipated to be released, further guiding and promoting the practice of open-source education in China.
+
+## 7. Open source ranklists and reports summary
+
+Besides KAIYUANSHE, multiple media outlets, organizations, and institutions have published numerous open-source-related rank lists, reports, blue papers, and more. To provide readers with a comprehensive understanding of this topic, we have compiled a summary in this section.
+
+### 7.1 A few valuable reports
+
+- In February 2023, KAIYUANSHE released the **China Open Source Annual Report 2022**, which has four parts: questionnaire, data, commercialization, and chronicles. The questionnaire includes data analysis measures and reports on open-source community metrics and commercialization. X-lab Open Lab, Apache Devlake Community, and Gitee produce the data chapter. Yunqi Partners wrote the commercialization chapter and focused on promoting open-source software globally. The open source chronicles chapter comprises five parts: commercialization, security, technology, law, community, and ecology.
+- In April 2023, the InfoQ Research Center released the **China Open Source Ecology Atlas 2023**. It's a user-friendly directory and map of China's open-source projects. The map includes 931 Chinese open-source projects, covering seven segments: operating systems, databases, artificial intelligence, cloud-native, big data, front-end, and middleware. Additionally, the map includes ecological organizations such as labs/institutes, open-source foundations, open-source industry alliances, developer communities, and code-hosting platforms.
+- In June 2023, the China Open Source Promotion Union (COPU) collaborated with 106 organizations, including CSDN, the Institute of Software Research of the Chinese Academy of Sciences, the Open Atom Open Source Foundation, the Beijing Open Source Innovation Committee, the Open Source Society, OSChina, Peking University, East China Normal University, National University of Defense Technology, and more than 120 open source experts and volunteers. Together, they released the **2023 Blue Book of China's Open Source Development**, which provides a comprehensive overview of China's open source industry ecosystem in 2023. The book also showcases China's current open-source technology innovation and industrial development.
+- In December 2023, the 2023 China Open Source Developer Report, co-authored by OSChina and Gitee, was officially released. The report is divided into three parts: Open Source Developer Event Review, 2023 LLM Technical Report, and Insight: New Open Source Trends for Chinese Developers.
+- In December 2023, the iResearch Consulting Group published the **2023 China Open Source Infrastructure Software Industry Research White Paper**. This report examines the growth trajectory of China's open-source software by comparing and analyzing the development experiences of the domestic and international open-source software industries. The whitepaper summarizes the open source software industry chain and main drivers, analyzes the business model and value of open source software, examines the main characteristics of open source projects and all the parties involved in the industry, and presents readers with an ecological landscape of the open source industry rooted in China.
+- The China Academy of Information and Communications Technology (CAICT) Trusted Open Source Team has been researching open source for an extended period. In 2023, they released a series of trusted open-source reports, which include "Panoramic Observations on China's Enterprise Open Source Governance", "O"pen Source Intellectual Property R"ig"hts Casebook (Copyright Chapte"r)", "Digital Public Goods Insight Report", "OSPO Case Compilation (Issue 2)", research reports or casebooks on open source technologies for front-ends, databases, and communications, and other research reports or case collections on open source technology for niche industries.
+
+### 7.2 A reference-worthy ranklist
+
+- **The 2023 Open Source Innovation List** is a selection activity co-sponsored by the Science and Technology Communication Center of the China Association for Science and Technology, the China Computer Federation, the China Institute of Communications, and the Institute of Software, Chinese Academy of Sciences. It is being undertaken by CSDN and evaluated by over 20 open-source experts from national associations, universities, research institutes, enterprises, open-source foundations, and industry alliances. The selection process is serious and rigorous.
+- A new initiative called **China Open Source Coding Hero** has been launched by SegmentFault, KAIYUANSHE, and X-lab Lab. Each year, 99 developers from China are ranked according to their contribution to open source development using the OpenRank algorithm. These developers are recognized for their valuable contributions to the open source community.
+- **OSS Compass**: Released in February 2023, OSS Compass is a platform for open-source ecological health assessment (https://oss-compass.org), open to all open-source projects on GitHub, Gitee, and other platforms. The platform is jointly initiated and collaboratively developed by the National Industrial Information Security Development and Research Center, Open Source China, Nanjing University, Huawei, Peking University, the New Generation of Artificial Intelligence Open Source Open Platform (OpenI), Baidu, and Tencent Open Source. At the same time, the platform itself is an open-source project around which an open-source and open-minded community has been formed. The platform has built an open-source ecological assessment system that includes three dimensions of productivity, robustness, and innovation, covering 14 indicator models.
+- **Alibaba Open Source Developer Contribution List**: a list that ranks open-source developers based on their contributions. This list uses the OpenRank algorithm. Two PhD students conducted research on the impact of this list. They analyzed statistical indicators of community projects and interviewed developers. The research provides valuable insights to the open-source community and has been included in ICSE 2024.
+
+### 7.3 Ranklist to watch
+
+- **China's Open Source Pioneers** is a list of 33 individuals from previous years whom Open Source Pioneers recommended. This list is co-organized by SegmentFault and KAIYUANSHE and is based entirely on preference. The selection process starts with a simple idea: "I want to introduce this friend, an open-source person, to you." The nominees are then voted on based on the principle: "I'd love to meet this friend, an open-source person, and I hope more people will meet this friend." This list is an excellent resource for anyone interested in open-source pioneers and is worth checking out to learn more about these individuals.
+- **OSC China Open Source Project Selection List**:OSChina conducted a series of selection activities in 2021 and 2022, which included evaluating the health of Chinese open source project communities, identifying the most popular Chinese open source projects, recognizing excellent international open source projects in the Chinese community, and more. However, for some reason, these selection activities were not continued in 2023.
+
+### 7.4 Worthless rank list
+
+- There is an organization called the "International Testing Committee BenchCouncil" that claims to have created a fair and scientific process for ranking open source contributors. They have published a list claiming to be "the world's first open source contribution list." However, the list ranks Linus, the creator of the Linux operating system kernel, only at 12th place, which seems absurd.
diff --git a/en/preface.md b/en/preface.md
new file mode 100644
index 0000000..7c16558
--- /dev/null
+++ b/en/preface.md
@@ -0,0 +1,37 @@
+---
+outline: deep
+---
+# Preface
+
+In order to stay ahead of the curve, I made a conscious decision to write this year's introduction without relying on AI assistance. It took a lot of grit and determination, but ultimately it has been a rewarding experience. With the growing trend of people seeking AI help in various jobs, it's important to remember the value of human skill and effort. By choosing to write this introduction by hand, I'm demonstrating my commitment to the importance of human creativity and ingenuity."I resisted the temptation to use ChatGPT and wrote this year's introduction manually. It required a lot of perseverance, but it's worth it. It seems like the trend of seeking AI assistance for various jobs is growing in 2023."It took me a lot of "perseverance" to resist the urge to use ChatGPT to help me write this year's introduction, so I'm going to write it entirely by hand. This is actually one of the trends of 2023: more and more jobs, people are trying to seek AI help.
+
+
+### AI & AIGC
+
+In 2023, we witnessed the birth of countless open-source big models, along with numerous popular GPT-based applications. Additionally, new terms and projects like AutoGPT, LangChain, CoT, and RAG emerged in various fields such as image generation, speech generation, code generation, and more. These developments have led to significant advancements in AI technology and open-source ecology.
+
+In the annual open source report, drastic changes were observed from 2020 to 2022. However, in 2023, we saw a massive wave in IT technology and open-source ecology that will shape the future of the industry.
+
+### About Omni-Data
+
+This year's annual report on open source in China has a significant story to tell. For the first time, we were able to combine GitHub and Gitee data to create a comprehensive comparison and gain new insights. Some of the findings may challenge the "prejudices" that many people hold about China's open source activity and contribution. We will continue to expand our data sources to make them truly "Global."
+
+### How does the open source community tackle those toughest challenges?
+
+In July 2023, the Linux Foundation organized the Open Source Congress in Geneva, Switzerland. The congress aimed to address pressing issues confronting the open source community, such as cybersecurity, the rise of techno-nationalism, the complexity of artificial intelligence, and the growing challenge of regulatory scrutiny. The congress invited 73 open source organizations, including KAIYUANSHE and the Open Atom Open Source Foundation, to send representatives to the meeting in Geneva.
+
+The first open source "congress" was an ambitious endeavor, and it is just the beginning. The future will require open-source practitioners worldwide to work together better to address the challenges.
+
+### How has the year gone for open source in China?
+
+Don't let statistics overshadow the bigger picture. My intuition tells me that while there is external heat, there is internal cooling. National policies, local policies, technical conferences, and community exchanges are vibrant and lively. However, the open-source community's development activity has slowed down in China and globally.
+
+We cannot afford to be complacent or discouraged. Instead, we must evaluate objectively and avoid being too late. Let's strive for progress, not perfection.
+
+As we approach 2023 and 2024, we must ask ourselves: what should we expect? Which directions should we pursue? How can we remain resilient in a rapidly changing world? Join me as we explore these questions together.
+
+
diff --git a/en/questionnaire.md b/en/questionnaire.md
new file mode 100644
index 0000000..3e43484
--- /dev/null
+++ b/en/questionnaire.md
@@ -0,0 +1,258 @@
+---
+outline: deep
+---
+# OSS Questionnaire
+
+## 1. Background
+
+As a continuation of the tradition since the release of the China Open Source Community Survey 2015 in early 2016, at the end of 2023 we launched another annual participatory survey of Chinese open source communities, dedicated to presenting the overall state of open source development in China in a multi-dimensional manner through continued developer survey reports.Using tools such as data analysis and survey reports, we have succeeded in producing a map of China’s open-source world in 2023.
+
+The questionnaire addresses the multiple roles of the interviewees and aims to gain insight into community development trends at various levels.Based on the level of participation of the open source community, the respondents are divided into several roles: users, participants, contributors, maintainers, and ecosystem operators.This shapes onion model and layer evolution.The four role levels are defined as:
+
+- User:users who have used one or more open-source products
+- Participant:Users who interact with the open source community (e.g. communication with open source communities, participation in activities of open source community organizations, etc.)
+- Contributor: Users who contribute substantially to the open source community(including code and non-code contributions.)
+- Maintainer:Users primarily responsible for daily operations to the open source community (including project maintainer, PCC members, etc.)
+
+In addition, ecosystem operators are the users who are primarily responsible for day-to-day operations in the open source communities, at a level above the participants and collectively referred to as operators.In addition to raising basic questions for all interviewees, the questionnaire addresses several different roles for users, contributors and operators.
+
+The **basic information** for this questionnaire is as follows:
+
+- **Audiences** :covers developers, community members, contributors, students, government and enterprise mangagement personnel.
+- **Topics** :mainly covers personal information, work status, open source communities, developer technologies, etc.
+- **Method** :Collects samples and data using online questionnaires to analyze data across comparisons
+- **Channels** :KAIYUANSHE, KubeCon + CloudNativeCon + Open Source Summit China, 2023 Eighth Annual Open Source Conference in China, 2023 Open Atomic Developers Conference, 2023 Open Source Industry Ecology Conference
+- **Question Type** \* :single-choice, multiple-choice, open-ended
+- **Number of Questions** :43
+- **Sample Quantity** :875
+
+## 2. Preview of questionnaire results
+
+**Characteristics of Respondents**
+
+- The age distribution of the interviewees is evenly distributed, with general education above undergraduate level. Gender and regional distributions align with the geographical distribution of developers in China, covering various roles in the computer industry.
+
+**Open Source Participation**
+
+- **The activity of the open source community** is an area of particular concern for the interviewees;**Artificial Intelligence** has become a technical area of concern for the majority of the interviewees.
+
+**Open Source Contributions**
+
+- Interviewees contribute more to warehouses of **Technical Base Type**; respondents are more motivated by **Communities / Honorary Motivations** and require less material incentives.
+
+**Community Operations Survey**
+
+- Most of the operators interviewed are in the open source community\*\*.Nearly half of the respondents' respective companies prioritize **the standardization and management of open source software usage**.
+
+**Household Open Source Development Survey**
+
+- The respondents are **optimistic about the future development of open sources in the country**.With regard to the evolution of artificial intelligence in open source ecology, developers generally appreciate the prospects for its application in **increased efficiency, automated testing and data analysis** and consider that **data security, transparency, ethics** are the main challenges.
+
+## 3. Analysis of the Questionnaire
+
+### 3.1 Features of the Interviewee
+
+First, we conduct surveys from the point of view of age, gender, academic qualifications, resident city, industry and professional identity, through which basic information about participants can be obtained, thus analysing the identity of the audience groups in open source communities.
+
+#### 3.1.1 Age, Gender, Education, City
+
+| Age | Gender |
+|:----------------------------------------------------------------:|:----------------------------------------------------------------:|
+| | |
+
+
+The age distribution of respondents to this questionnaire is similar to that of previous years, mainly in the 21-50 age group, with a more balanced age distribution.It is worth noting that the proportion of respondents under 21 years of age is 25.71%, a significant increase from 8.42% last year.The participation of young respondents in the table has increased considerably.
+
+In terms of gender, male respiondents account for a higher proportion, reaching 73.37%, while females accout for 25.83%. Compared to last year's questionnaire, the proportion of women and men interviewed has increased significantly and is consistent with the current lack of gender balance among developers.
+
+| Educational background | Region |
+|:----------------------------------------------------------------:|:----------------------------------------------------------------:|
+| | |
+
+Respondents generaly have an educational background of at least a bachelor's degree; in urban distribution, the majority of the respondents are from Jiangsu, Sichuan and Shanghai, partly because our online sources of questionnaire collection are in those cities.There are also more interviewees in Beijing and Guangdong provinces, and there is a more consistent distribution of developers in the overall distribution and data sets.
+
+#### 3.1.2 Occupation in Industry, Profession
+
+| Industry | Career status |
+| :----------------------------------------: | :----------------------------------------------------: |
+| | |
+
+The majority of the respondents are in the Internet/IT / electronic/ communications industry, accounting for 72.23%, indicating that the survey primarily covers the field of science and technology.
+
+In terms of professional status, 43.20% of students are in school, followed by back-end developers, architects and academic researchers. Overall, the respondents are predominantly technical practitioners and students and cover a number of occupations in computer industries.
+
+### 3.2 Open source participation
+
+#### 3.2.1 Level of participation by open source communities
+
+| Role of open source communities | Time to contact open source |
+| :----------------------------------------: | :----------------------------------------------------: |
+| | |
+
+The survey shows that the vast majority of members of open source communities are users (73.37%), while close to half of the participants (49.03%) and some contributors (26.51%).
+
+Regarding the duration of involvement in open source, one-third of respondents have been involved in open source communities for less than a year, while nearly half have more than 3 years of experience.
+
+We have cross-analyze the question "To what extent do you think you are a member of an open source community" with interviewees' roles in an open source community.
+
+| Extend of Considering Oneself a Part of the Open Source Community |
+| :------------------------------------------------------: |
+| |
+
+It can be seen that there is a greater sense of belonging among the maintainers, contributors, ecosystem operators than participants and users in the open source community.
+
+The following questions were addressed to respondents who had a role in the open source community at the “user” level and above.
+
+#### 3.2.2 Use of Open Source Products
+
+| Reason for Selecting Open Source Products | Factors Influencing Choice |
+| :----------------------------------------------------: | :---------------------------------------------------: |
+| | |
+
+The main reason that users chose to use open source software is free of charge for their products, followed by further development and a favourable community environment.
+
+In selecting open source products, participants are more focused on the level of code regulation and the activity of developers. This indicates that users are concerned not only about the functionality and quality of open source products, but also about the activity of communities and developers and the sustainability of projects.
+
+| Issues Encounterred When Using Open Source Products | Factors Prompting Open Source Contributions |
+| :----------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+Among the problems encountered, the most common is the lack of documentation for the project, followed by an unstable update.
+
+Factors such as personal interest, community atmosphere and technological upgrading play an important role in promoting open-source contributions.
+
+#### 3.2.3 Technical Direction
+
+| Interested Technical Directions | Known Open Source License |
+| :----------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+The interviewees show strong interest in artificial intelligence, accounting for 67.43%, followed by development tools, containerization and cloud computing.
+
+For open source licenses, Apache is the most popular option, followed by MIT and GPL.
+
+#### 3.2.4 Information Exchange
+
+| Ways to Retrieve Open Source Products | Communication Methods with the Community |
+| :----------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+When searching for open source products, most people search through code-hosting platforms, technical communities or media recommendations, and search engines.
+
+Communication with open source communities is mainly in the form of domestic communication tools (e.g. DingTalk, WeChat, QQ, Feishu, etc.) and asynchronous communication tools (e.g. GitHub Issue, Discussion, Mail List etc.), while internationalized communication tools (e.g. Slack, Skype, Telegram, Lark and others) are also widely used.The international open source community is characterized by a predominance of asynchronous communication tools, which differ remarkably from domestic practices.
+
+| Frequently Used Products / Technology Community | Media to Get Open Source Information |
+| :----------------------------------------------------: | :-----------------------------------------------------: |
+| | |
+
+Interviewees are mainly engaged through a code hosting platform and open source community participation. In addition, a large number of respondents participate in open source communities through domestic technical forums.
+
+In terms of access to open source information, video platforms and question-and-answer websites are the main options, reflecting the preference of developers for access to open-source knowledge, including through audio-visual and interactive question-and-answer sessions.
+
+### 3.3 Open Source Contribution
+
+This section's questions are aimed at respondents whose roles in the open source community are "contributors" and above.
+
+#### 3.3.1 Level of Open Source Contribution Participation
+
+| Participation in Open Source Project Activity | Time of Weekly Open Source Participation |
+| :----------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+One third of student developers have been involved in open-source activities such as Google Summer (GSoC) and Open Source Lighting Scheme (OSPP); more than half of contributors have been involved in open source activities for more than 5 hours a week, and more than 10% of contributors have participated in open source activities for 35 hours a week, nearly reaching the standard of full-time developers.
+
+#### 3.3.2 Ways of Contributing to Open Source
+
+| Main Open Source Contribution Platforms | Commonly Used Development Laungauges for Open Source Contributions|
+| :----------------------------------------------------: | :-------------------------------------------------------: |
+| | |
+
+GitHub remains the preferred platform for the most respondents, occupying a dominant position, followed by Gitee and GitLab. This indicates that among Chinese developers, GitHub still holds significant influence, although domestic platforms are gradually emerging. The main development languages used include Python, Java, C, JavaScript, Go. In addition, HTML/CSS, TypeScript and others are given a high number of choices.
+
+#### 3.3.3 Open Source Contribution Content
+
+| Main Types of Contributions | Types of Contributed Projects |
+| :----------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+Interviewees contribute to open source projects mainly by writing codes and documents. In addition, open source advocacy, open source community operations and facilitating community activities are also common contribution methods.
+
+The types of open-source project that contributed are mainly concentrated in library/middleware and common framework/infrastructure, reflecting developers' deep interest in foundational technologies.
+
+#### 3.3.4 Incentives
+
+| Incentives | Sources of Financial Return |
+| :----------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+Various incentives have been positively evaluated, indicating that the diversity of incentives has had a positive impact on open source participation by developers. In particular, respondents believe that incentives for honour and social interaction have a more significant positive impact on contributions.
+
+More than half of the developers participating in open source projects receive no financial rewards.The rest of the developers receive direct financial returns through compensation/salary, rewards/incentives, while very few developers receive financial support through advertising revenue, donations, and patent/intellectual property income.
+
+### 3.4 Community Operations Survey
+
+This section of the question is addressed to interviewees who are “operators” in the open-source community.
+
+#### 3.4.1 Overview of Open Source Communities
+
+| Number of Community Users | Active Developers |
+| :----------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+Nearly 60% of operators belong to open source communities with fewer than 200 users, while almost 30% belong to communities with over 500 users. More than half of the operators belong to the communities with fewer than 20 active developers.
+
+#### 3.4.2 Open Source Community Management
+
+| Community Management | Community Commercial Support |
+| :-------------------------------------------------------------: | :-------------------------------------------------------------: |
+| | |
+
+About half of the communities have clear governance structures and professionals responsible for day-to-day operations. At the same time, communities have generally developed clear norms and provided updated documentation to support member inclusion.
+
+Most open source communities have commercial support and are mainly in the form of declarations and synergistic development.
+
+#### 3.4.3 Research on the Commercailization of Open Source Software
+
+| Usage of Open Source Software in Enterprise | Agreement with commercialization of Open Source Projects |
+| :----------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+The vast majority of businesses use open-source software, with a clear ratio of 5:6 between samples with clear usage requirements and regulatory norms and those lacking corresponding management standards. This indicates that while some companies emphasize standards and management when using open source software, a large proportion of enterprises are still more loosely regulated, which may be influenced by factors such as company size, industry differences, and understanding of open source software.
+
+The level of acceptance for the use of open source projects for commercialization averages 3.65, with 31.66% gaving the highest acceptance ratings, indicating that most respondents hold a moderate to high acceptance of the project.
+
+### 3.5 Open Source Development Research
+
+#### 3.5.1 Open Source Development
+
+| Development of Open Source Communities |
+| :----------------------------------------------------: |
+| |
+
+Overall, the respondents generally view the future development of open sources in the country as positive in all its aspects.
+
+| Characteristics of the Continuous Development of Open Source Projects | Evaluation Indicators of Open Source Projects |
+| :-------------------------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+The respondents believe that the most important characteristic that affects the health and sustainability of an open source community is the speed of a rapid community response, and that new and emerging contributors with a continuing influx can be transformed into long-term contributors. Demonstrating long-term sustainability is critical to successful community development.
+
+When evaluating open source projects, respondents mainly focus on project influence, authority, community activity and continued renewal and maintenance. This reflects developers' concerns about the overall state of health of the project at the technical and community levels.
+
+#### 3.5.2 Impact and Challenges of Artificial Intelligence on Developers and Open Source Ecosystem
+
+| AI Impact on Developers | AI Future Role in Open Source Communities |
+| :----------------------------------------------------: | :----------------------------------------------------: |
+| | |
+
+The survey results show that developers are more optimistic about the impact of artificial intelligence technologies on open source projects, especially in terms of greater application prospects for efficiency, automated testing, data analysis and project safety.
+
+| Challenges for Artificial Intelligence in Open Source Ecosystem |
+| :-----------------------------------------------------------: |
+| |
+
+In addition, issues of privacy and data security, transparency and ethics are seen as major challenges facing artificial intelligence technologies in open source ecosystem, indicating the need to balance technological challenges and social considerations in AI technology applications.
+
+:::Expert Commentary
+**Jie YU**:Faced with the wave of AI, we should remain calm and confident, embrace it with positive attitude, learning from it, and make full use of AI technology to promote the continuous development of individuals and projects.
+:::
diff --git a/index.md b/index.md
new file mode 100644
index 0000000..1d67326
--- /dev/null
+++ b/index.md
@@ -0,0 +1,284 @@
+---
+# https://vitepress.dev/reference/default-theme-home-page
+layout: home
+
+hero:
+ name: "2023 中国开源年度报告"
+ text: ""
+ tagline: 开源社联合多家单位,纵横近十年对中国开源行业的综合性报告,每年发布一次
+ actions:
+ - theme: brand
+ text: 立即阅读 2023 年度报告
+ link: /preface
+ - theme: alt
+ text: 往年报告
+ link: https://kaiyuanshe.feishu.cn/wiki/wikcnUDeVll6PNzw900yPV71Sxd
+
+features:
+ - icon:
+ src: "/image/home/KaiYuanShe-logo.png"
+ width: 40
+ height: 40
+ title: 开源社
+ details: 开源社(英文名称为“KAIYUANSHE”)成立于 2014 年,是由志愿贡献于开源事业的个人志愿者,依 “贡献、共识、共治” 原则所组成的开源社区。开源社始终维持 “厂商中立、公益、非营利” 的理念,以 “立足中国、贡献全球,推动开源成为新时代的生活方式” 为愿景,以 “开源治理、国际接轨、社区发展、项目孵化” 为使命,旨在共创健康可持续发展的开源生态体系。
+ link: https://kaiyuanshe.cn/
+ linkText: 官网
+ - icon:
+ src: "/image/home/yunqi_partnets_logo.jpg"
+ width: 40
+ height: 40
+ title: 云启资本
+ details: 云启成立于 2014 年,国内最早专注于「科技创新+产业赋能」的专研型创投机构,投资范围覆盖前沿科技、先进制造、企业软件、产业供应链科技等赛道,多次蝉联清科、投中、36 氪等「中国最佳早期投资机构 TOP 10」。作为早期领投方,云启已投资了 170 多家优秀创业公司,其中 30 多家已成长为行业领头羊企业,包括 360 数科(NASDAQ:QFIN)、英科医疗(SZ:300677)、英科再生(SH:688087)、酷家乐、百布、元戎启行、MiniMax、擎朗智能、 XTransfer、环世物流、德风科技等优秀科技公司。同时,云启持续参与共创开源生态,领投了 PingCAP, Zilliz, Jina AI, RisingWave, TabbyML 等多家开源企业,并于 2021、2022、2023 年联合开源社出品中国开源年度报告商业化篇。
+ link: https://www.yunqi.vc/
+ linkText: 官网
+ - icon:
+ src: "/image/home/x_lab2017_logo.jpg"
+ width: 40
+ height: 40
+ title: X-lab 开放实验室
+ details: X-lab 开放实验室定位为一个开源研究与创新的开放群体,是一群由来自国内外著名高校、创业公司、部分互联网与IT企业的专家学者与工程师所构成,聚焦于开源软件产业开放式创新的共同体。专业背景包括计算机科学、软件工程、数据科学、工商管理学、社会学、经济学等跨学科领域,长期思考并实践开源战略、开源测量学、开源数字生态系统等主题。目前已在包括开源治理标准制定、开源社区行为度量与分析、开源社区流程自动化、开源全域数据治理与洞察等方面做出了较有影响力的工作。
+ link: https://github.com/X-lab2017
+ linkText: GitHub 主页
+---
+
+
+
+
+
+
+
+ 编写团队
+
+
+ 召集人
+
+
+
+
+
+ 问卷篇
+
+
+
+
+
+
+ 数据篇
+
+
+
+
+
+
+ 商业化篇
+
+
+
+
+
+
+ 开源大事记
+
+
+
+
+
+
+ 整体报告汇总/编辑
+
+
+
+
+
+ 设计/排版
+
+
+
+
+
+
+
+ 点评专家
+
+ (按姓氏字母顺序列名)
+
+
+
+